https://en.wikipedia.org/wiki/Numerical_weather_prediction
NWP model example :
The GFS (Global Forecast System)
1)model produces forecasts out to 16 days, four times per day (every 6 hours).
2)base horizontal resolution of 18 miles (28 kilometers) between grid points
3)~300 parameters for each grid point : http://www.nco.ncep.noaa.gov/pmb/products/gfs/gfs_upgrade/gfs.t06z.pgrb2b.0p25.f000.shtml
(Temperature at surface, U-Component of Wind at 875 mb, Cloud Mixing Ratio at 500 mb ....)
Problem :
Output directly from the NWP model's lowest layer(s) generally is not used by forecasters because
1)the actual physical processes that occur within the Earth's boundary layer are crudely approximated in the model
2)relatively coarse horizontal resolution (e.g., 28 kilometers for GFS)
Solution : Model output statistics (since 1970s)
Statistical models were created based upon the three-dimensional fields produced by numerical weather models (NWP output), surface observations and the climatological conditions for specific locations. These statistical models are collectively referred to as model output statistics (MOS)
NWP + Observations + Climatologial conditions -> Accurate forecast
Predicands, often near-surface quantities, such as 2-meter air temperature, horizontal visibility, and wind direction, speed and gusts, are related statistically to one or more predictors. The predictors are typically forecasts from a numerical weather prediction (NWP) model, climatic data, and, if applicable, recent surface observations.
f ( predictors1, predictor2, ... , predictor_n) = predicand
Development of robust MOS equations for a particular NWP model required at least two years' worth of archived model output and observations, during which time the NWP model should remain unchanged, or nearly so. This requirement is necessary in order to fully capture the model's error characteristics under a wide variety of meteorological flow regimes for any particular location or region.
Machine learning is useful when analytical solution of a problem is not known or it is extremely complex.
Supervised learning is the machine learning task of inferring a function from labeled training data.
The training data consist of a set of training examples.
Each example is a pair consisting of an input object (typically a vector) and a desired output value:
(input1,output1),(input2,output2),...,(input_n,output_n)
A supervised learning algorithm analyzes the training data and produces an inferred function, which can
be used for mapping new examples.
1)Training (or learning) phase : training examples -> inferred function
2)Usage phase : use inferred function on new examples.
There are a lot of established machine learning algorithms : descision trees, neural networks, support vector machines.
Main question : how to create training examples for MOS (pairs of vectors) ?
(As soon as we have training examples we can use every supervised machine learning algorithm).
Each training example corresponds to a moment of time in the past
Input : values from NWP model, recent weather observations (e.g. 1 day ago)
Output : observed value we are interested in
1)Training(learning) phase :
NWP output for 2 years + observations for two years -> pairs of vectors -> machine learning algorithm -> function
2)Usage phase :
2.1)New NWP output (forecast for time t) + new observations (at time t - 1) -> input vector
2.2) function (input vector) -> output vector = corrected forecast