Weather Prediction Using Sequence to Sequence Reccurent Neural Networks

Mohammad Alhusseini
Runxiong Dong



Project Goal

The aim of the project is to utilize deep learning for time series weather prediction. We use multi-layered Recurrent Neural Networks for training and predicting using weather data for three cities.

Recurrent Neural Networks

Recurrent Neural Networks (RNNs) are a class of neural networks designed to process sequential data. An RNN can be thought of as the addition of loops to the architecture of a standard feed-forward nerual network. For a fixed length input vector, the RNN can be unfolded from a recursive computation into a repetitive structure. This unfolding can be seen in the computation figure below.



Figure 1: A recurrent neural network. The network processes information from the input x to a state W that is passed forward through time.  (L) Circuit diagram. (R) The same network unnfolded, where each node is now associated with a single time step.

The most effective sequence models are called gated RNNs, which are capable of learning long-term dependencies. These include the Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU). Gated RNNs are based on the idea of creating paths through time that have derivatives that do not vanish. The main difference between LSTMs and GRUs is that in the latter a single gating unit simultaneously controls the forgetting factor and the decision to update the state unit. The resulting model is simpler than standard LSTM models, and has been growing increasingly popular.



Figure 2:  (L) LSTM RNN Cell. (R) GRU RNN Cell.

Model

The sequence to sequence (seq2seq) model contains two RNNs, e.g. GRUs or LSTMs, that act as an encoder and decoder. The encoder processes the input sequence, and encodes them into a fixed length vector - the hidden state of the last encoder RNN. This vector is then passed to the decoder, which decodes it into the output sequence - expected values. The diagram below shows a model of the seq2seq model used for machine translation.



Figure 3: In this example, a seq2seq model has an input sequence of size 3, and an output sequence of size 4. It reads the input sentence "ABC", and produces the output sentence "WXYZ"

For our model, we set the input size and the output size to be an equal N number of days. For each day in the training set, the model gets the previous N days as input, and outputs the prediction for the next N days.

Dataset Details

The weather dataset comes from the National Oceanic and Atmospheric Administration (NOAA). The data we used for this project was recorded by weather stations at Los Angeles, Las Vegas, and Phoenix. The choice of cities was based on geographical proximity and reliability of the weather station records. The weather stations are located at Los Angeles International Airport, Phoenix Airport and McCarran International Airport. The downloaded data has the following features:

Data Selection and Processing

The data we downloaded for the 3 cities starts from 1/1/1995 until 4/20/2018. We chose to only use data starting from 1/1/2000 for our project, since dates before that were often missing values for one of the cities. The daily variables we chose to train on and predict are AWND, TMAX, TMIN, PRCP, and WDF2. TAVG was dropped since it was not recorded for 47% of the days across the 3 cities, so it would severly limit our usable data range. The SNOW variable was dropped because it only snowed in Las Vegas for 3 days in the whole dataset, and never snowed in Phoenix or Los Angeles. The variable SNOW should thus always be predicted as 0 due to the climate for these cities. The weather type data was dropped for the same reason. Next, we normalize the variables for each city according to the forumla:



The following is a plot of the variable TMAX for Los Angeles across 28 days


Experiments

We trained the network using both GRU and LSTM cells, and it was observed that there is no benefit of using GRU over LSTM in training and validation losses. Therefore, our final implementation in Keras uses LSTMs for the encoder and decoder RNNs. We tried different combinations of network parameters, and used the Adam optimizer to train our model. The following plot shows how the parameters affect our training loss over time. The format of the legend is hN_bM; where N is the hidden layer size, and M is the training batch size



Predictions

The following are predictions for different features in different cities that our model predicted. The model predicts all 12 features simultaneously. These predictions are based on an input of 21 days, from 3/4/2018 to 4/13/2018. The model then predicts what the conditions are for 4/14/2018 through 4/20/2018.





Conclusion

The results of this project show that Reccurent Neural Networks can be used to forecast general weather variables with a good accuracy. The model we use can be improved with more computing power, as one would be able to use national or global weather data on a longitude-latitude scale to capture the complex nature of weather movement. One can further include doppler maps to predict how weather patterns move geographically. Deep Learning seems to offer a promising method for weather forecasting that does not rely on complicated weather models and computation-expensive simulations.

Code and Data

Click here to download the code.
The datasets we used can be downloaded as an archive here.
We made equal contributions to the code and the webpage.

References