Microsoft researcher Lester Mackey and his teammates along with grad students, Jessica Hwang, and Paulo Orenstein have come out with a new machine learning based forecasting model along with a comprehensive dataset, called SubseasonalRodeo, for training the subseasonal forecasting models.
Subseasonal forecasting models are systems that are capable of predicting the temperature or precipitation 2-6 weeks in advance in the western contiguous United States.
The SubseasonalRhodeo dataset can be found at the Harvard Dataverse. Researchers have presented the details about their work in the paper titled “Improving Subseasonal Forecasting in the Western U.S. with Machine Learning”.
“What has perhaps prevented computer scientists and statisticians from aggressively pursuing this problem is that there hasn’t been a nice, neat, tidy dataset for someone to just download ..and use, so we hope that by releasing this dataset, other machine learning researchers.. will just run with it,” says Hwang.
Microsoft team states that a large amount of high-quality historical weather data along with the existing computational power makes the process of statistical forecast modeling worthwhile. Also, clubbing together the physics-based and statistics-based approaches lead to better predictions. The team’s machine learning based forecasting system combines the two regression models that are trained on its SubseasonalRodeo dataset.
The dataset consists of different weather measurements dating as far back as 1948. These weather measurements include temperature, precipitation, sea surface temperature, sea ice concentration, and relative humidity and pressure. This data is consolidated from sources like the National Center for Atmospheric Research, the National Oceanic and Atmospheric Administration’s Climate Prediction Center and the National Centers for Environmental Prediction.
First of the two models created by the team is a local linear regression with multitask model selection, or MultiLLR. Data used by the team was limited to an eight-week span in any year around the day for which the prediction was being made. There was also a selection process which made use of a customized backward stepwise procedure where two to 13 of the most relevant predictors were consolidated to make a forecast.
The second model created by the team was a multitask k-nearest neighbor autoregression, or AutoKNN. This model incorporates the historical data of only the measurement being predicted such as either the temperature or the precipitation.
Researchers state that although each model performed better on its own as compared to the competition’s baseline models, namely, a debiased version of the operational U.S. Climate Forecasting System (CFSv2) and a damped persistence model, they deal with different parts of the challenges associated with the subseasonal forecasting.
For instance, the first model created by the researchers makes use of only the recent history to make its predictions while the second model doesn’t account for other factors. So the team’s final forecasting model was a combination of the two models.
The team will be further expanding its work to the Western United States and will continue its collaboration with the Bureau of Reclamation and other agencies. “I think that subseasonal forecasting is fertile ground for machine learning development, and we’ve just scratched the surface,” mentions Mackey.
For more information, check out the official Microsoft blog.