Predicting Irish electricity consumption with neural networks

Summary of Study

This analysis is divided into two parts:

The neuralnet library in R is used to predict electricity consumption through the use of various explanatory variables
An LSTM network is generated using Keras to predict electricity consumption using the time series exclusive of any explanatory variables

The relevant data was sourced from data.gov.ie and met.ie . Electricity consumption data was provided on an hourly basis, but converted to daily data for the purpose of this analysis.

The variables are as follows:

eurgbp: EUR/GBP currency rate
rain: Rainfall
maxt: Maximum temperature
mint: Minimum temperature
wdsp: Wind speed
sun: Sunlight hours
kwh: KWH (consumption)

With Ireland obtaining about 45% of its electricity from natural gas, 96% of which is imported from Scotland, EUR/GBP currency fluctuations clearly have a significant impact on the cost of electricity in Ireland, and was therefore included as an explanatory variable.

Moreover, with weather conditions also significantly influencing electricity usage, weather data for the Dublin region was also included for the relevant dates in question.

Key Findings

It was found that of the two models, LSTM was able to predict electricity consumption more accurately, with the training and test predictions closely mirroring actual consumption:

The model demonstrated an average error of 353.25 on the training dataset, and 255.13 on the test dataset (out of thousands of kilowatts).

Part 1: neuralnet

A neural network consists of:

Input layers: Layers that take inputs based on existing data
Hidden layers: Layers that use backpropagation to optimise the weights of the input variables in order to improve the predictive power of the model
Output layers: Output of predictions based on the data from the input and hidden layers

1.1. Data Normalization

The data is normalized and split into training and test data:

# MAX-MIN NORMALIZATION
> normalize <- function(x) {
>  return ((x - min(x)) / (max(x) - min(x)))
> }
> maxmindf <- as.data.frame(lapply(fullData, normalize))

# TRAINING AND TEST DATA
trainset <- maxmindf[1:378, ]
testset <- maxmindf[379:472, ]

1.2. Neural Network Output

The neural network is then run and the parameters are generated:

# NEURAL NETWORK
> library(neuralnet)
> nn <- neuralnet(kwh ~ eurgbp + rain + maxt + mint + wdsp + sun,data=trainset, hidden=c(5,2), linear.output=TRUE, threshold=0.01)
> nn$result.matrix
                                     1
error                   2.168927756297
reached.threshold       0.008657878909
steps                 994.000000000000
Intercept.to.1layhid1  -0.943475389102
eurgbp.to.1layhid1      1.221792852624
rain.to.1layhid1        0.222508044224
maxt.to.1layhid1        1.356892947349
mint.to.1layhid1       -0.377284881968
wdsp.to.1layhid1        0.749993672528
sun.to.1layhid1        -0.250669884677
Intercept.to.1layhid2   3.424295572041
eurgbp.to.1layhid2     -4.921292790902
rain.to.1layhid2        3.380551856044
maxt.to.1layhid2       -2.353604121342
mint.to.1layhid2        0.877423599705
wdsp.to.1layhid2       -0.581900515451
sun.to.1layhid2        -7.083263552687
Intercept.to.1layhid3   0.352457802915
eurgbp.to.1layhid3      3.715376984054
rain.to.1layhid3       -1.030450129246
maxt.to.1layhid3       -0.672907974572
mint.to.1layhid3        0.898040603876
wdsp.to.1layhid3       -1.474470972212
sun.to.1layhid3        -1.793900522508
Intercept.to.1layhid4   0.819225033685
eurgbp.to.1layhid4    -16.770362105816
rain.to.1layhid4       -2.483557437596
maxt.to.1layhid4       -0.059472312293
mint.to.1layhid4        2.650852686615
wdsp.to.1layhid4        3.863732942893
sun.to.1layhid4         0.224801123127
Intercept.to.1layhid5 -13.987427433833
eurgbp.to.1layhid5     -1.661519269508
rain.to.1layhid5      -52.279711798215
maxt.to.1layhid5       22.717540151979
mint.to.1layhid5       11.670399514036
wdsp.to.1layhid5        9.713301368020
sun.to.1layhid5        10.804887927196
Intercept.to.2layhid1  -0.834412474581
1layhid.1.to.2layhid1   1.629948945316
1layhid.2.to.2layhid1  -3.064448233097
1layhid.3.to.2layhid1   0.197497636177
1layhid.4.to.2layhid1  -0.370098281335
1layhid.5.to.2layhid1  -0.402324278545
Intercept.to.2layhid2  -1.176093680811
1layhid.1.to.2layhid2   1.312897190062
1layhid.2.to.2layhid2   0.593640022150
1layhid.3.to.2layhid2   1.906008701982
1layhid.4.to.2layhid2   1.811035017074
1layhid.5.to.2layhid2  -0.725078284924
Intercept.to.kwh       -0.093973916107
2layhid.1.to.kwh        0.700847362516
2layhid.2.to.kwh        0.922218125575

Here is what our neural network looks like in visual format:

1.3. Model Validation

Then, we validate (or test the accuracy of our model) by comparing the estimated consumption in KWH yielded from the neural network to the actual consumption as reported in the test output:

> results <- data.frame(actual = testset$kwh, prediction = nn.results$net.result)
> results
          actual     prediction
379 0.8394856269  0.72836479401
380 0.7976933676  0.72836479401
381 0.8125463657  0.72836479401
382 0.8377382154  0.72836479401
383 0.8394856269  0.72836479401
384 0.8415242737  0.72836479401
..........
467 0.7464359625  0.80778769677
468 0.7018769682  0.82063018370
469 0.7004207919  0.78094824279
470 0.6726078249  0.77185373598
471 0.7176036721  0.91671846789
472 0.7199335541  0.80974222504

1.4. Accuracy

In the below code, we are then converting the data back to its original format, and yielding an accuracy of 98% on a mean absolute deviation basis (i.e. the average deviation between estimated and actual electricity consumption stands at a mean of 2%). Note that we are also converting our data back into standard values given that they were previously scaled using the max-min normalization technique:

> predicted=results$prediction * abs(diff(range(kwh))) + min(kwh)
> actual=results$actual * abs(diff(range(kwh))) + min(kwh)
> comparison=data.frame(predicted,actual)
> deviation=((actual-predicted)/actual)
> comparison=data.frame(predicted,actual,deviation)
> accuracy=1-abs(mean(deviation))
> accuracy
[1] 0.9828191884

A mean accuracy of 98% is obtained using a (5,2) hidden configuration. However, note that since this is a mean accuracy, it does not necessarily imply that all predictions generated by the model will have such high accuracy. Indeed, accuracy is lower in certain cases as can be observed from the histogram below.

When we plot a histogram of the deviation (with 100 breaks), we see that the majority of forecasts fall within 10% from the actual consumption.

When plotting the predicted and actual consumption, it is observed that while the prediction series generated by the neural network follows the general range of the actual (i.e. between 4200–5000 Kwhs), the model is not particularly adept at predicting the peaks and valleys in the series (or periods of abnormally low or high usage).

Part 2: LSTM (Long-Short Term Memory Network)

A shortcoming of traditional neural network models is that they do not account for dependencies across time series data.

When a neural network was generated using neuralnet, it was assumed that all observations are independent to each other. However, this is not necessarily the case.

2.1. Issue of Stationarity

When observing line charts for both KWH (consumption) and the EUR/GBP, we can see that the KWH time series shows a stationary pattern (stationary meaning that the mean, variance, and autocorrelation are constant):

However, when the EUR/GBP currency fluctuations are plotted over the same time period, the data is clearly non-stationary, i.e. the mean, variance, and autocorrelation differ over time:

Given that non-stationarity was present in certain explanatory variables, the LSTM model will now be used to predict future values of KWH against the test set — independent of any other explanatory variables.

In other words, only the values of KWH will be predicted using LSTM. The analysis is carried out using the Keras library in Python. The following guide also provides a detailed overview of predictions with LSTM using a separate example.

2.2. Data Processing

Firstly, the relevant libraries are imported and data processing is carried out:

# Import libraries
import numpy as np
import matplotlib.pyplot as plt
from pandas import read_csv
import math
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
import os;
path="filepath"
os.chdir(path)
os.getcwd()

# Form dataset matrix
def create_dataset(dataset, previous=1):
dataX, dataY = [], []
for i in range(len(dataset)-previous-1):
a = dataset[i:(i+previous), 0]
dataX.append(a)
dataY.append(dataset[i + previous, 0])
return np.array(dataX), np.array(dataY)

# fix random seed for reproducibility
np.random.seed(7)

# load dataset
dataframe = read_csv('data.csv', usecols=[0], engine='python', skipfooter=3)
dataset = dataframe.values
dataset = dataset.astype('float32')

# normalize dataset with MinMaxScaler
scaler = MinMaxScaler(feature_range=(0, 1))
dataset = scaler.fit_transform(dataset)

# Training and Test data partition
train_size = int(len(dataset) * 0.8)
test_size = len(dataset) - train_size
train, test = dataset[0:train_size,:], dataset[train_size:len(dataset),:]

# reshape into X=t and Y=t+1
previous = 1
X_train, Y_train = create_dataset(train, previous)
X_test, Y_test = create_dataset(test, previous)

# reshape input to be [samples, time steps, features]
X_train = np.reshape(X_train, (X_train.shape[0], 1, X_train.shape[1]))
X_test = np.reshape(X_test, (X_test.shape[0], 1, X_test.shape[1]))

2.3. LSTM Generation and Predictions

Then, the LSTM model is generated and predictions are yielded:

# Generate LSTM network
model = Sequential()
model.add(LSTM(4, input_shape=(1, previous)))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(X_train, Y_train, epochs=100, batch_size=1, verbose=2)

# Generate predictions
trainpred = model.predict(X_train)
testpred = model.predict(X_test)

# Convert predictions back to normal values
trainpred = scaler.inverse_transform(trainpred)
Y_train = scaler.inverse_transform([Y_train])
testpred = scaler.inverse_transform(testpred)
Y_test = scaler.inverse_transform([Y_test])

# calculate RMSE
trainScore = math.sqrt(mean_squared_error(Y_train[0], trainpred[:,0]))
print('Train Score: %.2f RMSE' % (trainScore))
testScore = math.sqrt(mean_squared_error(Y_test[0], testpred[:,0]))
print('Test Score: %.2f RMSE' % (testScore))

# Train predictions
trainpredPlot = np.empty_like(dataset)
trainpredPlot[:, :] = np.nan
trainpredPlot[previous:len(trainpred)+previous, :] = trainpred

# Test predictions
testpredPlot = np.empty_like(dataset)
testpredPlot[:, :] = np.nan
testpredPlot[len(trainpred)+(previous*2)+1:len(dataset)-1, :] = testpred

# Plot all predictions
inversetransform, =plt.plot(scaler.inverse_transform(dataset))
trainpred, =plt.plot(trainpredPlot)
testpred, =plt.plot(testpredPlot)
plt.title("Predicted vs. Actual Consumption")
plt.show()

The model is trained over 100 epochs, and the predictions are generated.

2.4. Accuracy

When plotting the actual consumption (blue line) with the training and test predictions (orange and green lines), the two series follow each other quite closely, with the exception of certain spikes downward (or periods of abnormally low usage):

Moreover, here is our output when 100 epochs are generated:

Epoch 94/100
 - 1s - loss: 0.0108
Epoch 95/100
 - 1s - loss: 0.0108
Epoch 96/100
 - 1s - loss: 0.0107
Epoch 97/100
 - 1s - loss: 0.0108
Epoch 98/100
 - 1s - loss: 0.0108
Epoch 99/100
 - 1s - loss: 0.0108
Epoch 100/100
 - 1s - loss: 0.0109

>>> # calculate RMSE
... trainScore = math.sqrt(mean_squared_error(Y_train[0], trainpred[:,0]))
>>> print('Train Score: %.2f RMSE' % (trainScore))
Train Score: 353.25 RMSE
>>> testScore = math.sqrt(mean_squared_error(Y_test[0], testpred[:,0]))
>>> print('Test Score: %.2f RMSE' % (testScore))
Test Score: 255.13 RMSE

The model has an average error of 353.25 on the training dataset, and 255.13 on the test dataset (out of thousands of kilowatts).

However, when running this model, the prediction was made over a 1-day, i.e. t+1 period. How would the model perform over longer time periods, e.g. 10 days, 50 days? Let’s find out.

10 days

Training error: 345.31 RMSE
Test error: 283.77 RMSE

50 days

Training error: 288.94 RMSE
Test error: 396.36 RMSE

While the test error was slightly higher across the 10 and 50 day periods, this was not by a great margin. Moreover, the overall errors remain low in the context of the average of 4609 kilowatts per day in the time series itself.

Conclusion

Of the two neural networks, LSTM proved to be more accurate at predicting fluctuations in electricity consumption.

In the case of neuralnet, the model was not completely adept at handling non-stationary data present in various explanatory variables.

Moreover, factors such as temperature already follow set historical trends generally (with the exception of abnormal weather patterns which might have an effect on consumption).

In this regard, a traditional neural network with explanatory variables proved less effective in this instance than LSTM, which was able to model fluctuations in consumption without the need for explanatory data.

Summary of Study

Key Findings

Part 1: neuralnet

1.1. Data Normalization

1.2. Neural Network Output

1.3. Model Validation

1.4. Accuracy

Part 2: LSTM (Long-Short Term Memory Network)

2.1. Issue of Stationarity

2.2. Data Processing

2.3. LSTM Generation and Predictions

2.4. Accuracy

10 days

50 days

Conclusion

Recommend

再无需从头训练迁移学习模型！亚马逊开源迁移学习数据库 Xfer

Why are Machine Learning Projects so Hard to Manage?

? i18n`literally` - A simple way to introduce i18n to your JS

朱晔的互联网架构实践心得 S2E1：业务代码究竟难不难写？

Introducing AresDB

苹果短暂撤销 Facebook 和 Google 的企业证书

GitHub - GeoSn0w/OsirisJailbreak12: iOS 12.0 -> 12.1.2 Incomplete Osiris Jail...

优雅的读取http请求或响应的数据-续

氪星晚报丨微信回应下架今日头条小程序：未在限期内整改；《人民日报》官微评咪蒙道歉...

Mergeek - 欣赏美好产品给开发者掌声 - NEXT

About Joyk