Step-by-step understanding LSTM Autoencoder layers
source link: https://www.tuicool.com/articles/JzY7RjY
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Here we will break down an LSTM autoencoder network to understand them layer-by-layer. We will go over the input and output flow between the layers, and also, compare the LSTM Autoencoder with a regular LSTM network.
In my previous post, LSTM Autoencoder for Extreme Rare Event Classification [ 1 ], we learned how to build an LSTM autoencoder for a multivariate time-series data.
However, LSTMs in Deep Learning is a bit more involved. Understanding the LSTM intermediate layers and its settings is not straightforward. For example, usage of return_sequences
argument, and RepeatVector
and TimeDistributed
layers can be confusing.
LSTM tutorials have well explained the structure and input/output of LSTM cells, e.g. [ 2 , 3 ]. But despite its peculiarities, little is found that explains the mechanism of LSTM layers working together in a network.
Here we will break down an LSTM autoencoder network to understand them layer-by-layer. Additionally, the popularly used seq2seq networks are similar to LSTM Autoencoders. Hence, most of these explanations are applicable for seq2seq as well.
In this article, we will use a simple toy example to learn,
- Meaning of
return_sequences=True
,RepeatVector()
, andTimeDistributed()
. - Understanding the input and output of each LSTM Network layer.
- Differences between a regular LSTM network and an LSTM Autoencoder.
Understanding Model Architecture
Importing our necessities first.
# lstm autoencoder to recreate a timeseries
import numpy as np
from keras.models import Sequential
from keras.layers import LSTM
from keras.layers import Dense
from keras.layers import RepeatVector
from keras.layers import TimeDistributed
'''
A UDF to convert input data into 3-D
array as required for LSTM network.
'''
def temporalize(X, y, lookback):
output_X = []
output_y = []
for i in range(len(X)-lookback-1):
t = []
for j in range(1,lookback+1):
# Gather past records upto the lookback period
t.append(X[[(i+j+1)], :])
output_X.append(t)
output_y.append(y[i+lookback+1])
return output_X, output_y
Creating an example data
We will create a toy example of a multivariate time-series data.
# define input timeseries
timeseries = np.array([[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9],
[0.1**3, 0.2**3, 0.3**3, 0.4**3, 0.5**3, 0.6**3, 0.7**3, 0.8**3, 0.9**3]]).transpose()
timesteps = timeseries.shape[0]
n_features = timeseries.shape[1]
timeseries
As required for LSTM networks, we require to reshape an input data into n_samples x timesteps x n_features . In this example, the n_features
is 2. We will make timesteps = 3
. With this, the resultant n_samples
is 5 (as the input data has 9 rows).
timesteps = 3 X, y = temporalize(X = timeseries, y = np.zeros(len(timeseries)), lookback = timesteps) n_features = 2 X = np.array(X) X = X.reshape(X.shape[0], timesteps, n_features) X
Understanding an LSTM Autoencoder Structure
In this section, we will build an LSTM Autoencoder network, and visualize its architecture and data flow. We will also look at a regular LSTM Network to compare and contrast its differences with an Autoencoder.
Defining an LSTM Autoencoder.
# define model
model = Sequential()
model.add(LSTM(128, activation='relu', input_shape=(timesteps,n_features), return_sequences=True))
model.add(LSTM(64, activation='relu', return_sequences=False))
model.add(RepeatVector(timesteps))
model.add(LSTM(64, activation='relu', return_sequences=True))
model.add(LSTM(128, activation='relu', return_sequences=True))
model.add(TimeDistributed(Dense(n_features)))
model.compile(optimizer='adam', loss='mse')
model.summary()
# fit model
model.fit(X, X, epochs=300, batch_size=5, verbose=0)
# demonstrate reconstruction
yhat = model.predict(X, verbose=0)
print('---Predicted---')
print(np.round(yhat,3))
print('---Actual---')
print(np.round(X, 3))
The model.summary()
provides a summary of the model architecture. For a better understanding, let’s visualize it in Figure 2.3 below.
The diagram illustrates the flow of data through the layers of an LSTM Autoencoder network for one sample of data. A sample of data is one instance from a dataset. In our example, one sample is a sub-array of size 3×2 in Figure 1.2.
From this diagram, we learn
- The LSTM network takes a 2D array as input.
- One layer of LSTM has as many cells as the timesteps.
- Setting the
return_sequences=True
makes each cell per timestep emit a signal. - This becomes clearer in Figure 2.4 which shows the difference between
return_sequences
asTrue
(Fig. 2.4a) vsFalse
(Fig. 2.4b).
- In Fig. 2.4a, signal from a timestep cell in one layer is received by the cell of the same timestep in the subsequent layer.
- In the encoder and decoder modules in an LSTM autoencoder, it is important to have direct connections between respective timestep cells in consecutive LSTM layers as in Fig 2.4a.
- In Fig. 2.4b, only the last timestep cell emits signals. The output is, therefore, a vector .
- As shown in Fig. 2.4b, if the subsequent layer is LSTM, we duplicate this vector using
RepeatVector(timesteps)
to get a 2D array for the next layer. - No transformation is required if the subsequent layer is
Dense
(because aDense
layer expects a vector as input).
Coming back to the LSTM Autoencoder in Fig 2.3.
- The input data has 3 timesteps and 2 features.
- Layer 1, LSTM(128), reads the input data and outputs 128 features with 3 timesteps for each because
return_sequences=True
. - Layer 2, LSTM(64), takes the 3×128 input from Layer 1 and reduces the feature size to 64. Since
return_sequences=False
, it outputs a feature vector of size 1×64. - The output of this layer is the encoded feature vector of the input data.
- This encoded feature vector can be extracted and used as a data compression, or features for any other supervised or unsupervised learning (in the next post we will see how to extract this).
- Layer 3, RepeatVector(3), replicates the feature vector 3 times.
- The RepeatVector layer acts as a bridge between the encoder and decoder modules.
- It prepares the 2D array input for the first LSTM layer in Decoder.
- The Decoder layer is designed to unfold the encoding.
- Therefore, the Decoder layers are stacked in the reverse order of the Encoder.
- Layer 4, LSTM (64), and Layer 5, LSTM (128), are the mirror images of Layer 2 and Layer 1, respectively.
- Layer 6, TimeDistributed(Dense(2)), is added in the end to get the output, where “2” is the number of features in the input data.
- The TimeDistributed layer creates a vector of length equal to the number of features outputted from the previous layer. In this network, Layer 5 outputs 128 features. Therefore, the TimeDistributed layer creates a 128 long vector and duplicates it 2 (= n_features) times.
- The output of Layer 5 is a 3×128 array that we denote as U and that of TimeDistributed in Layer 6 is 128×2 array denoted as V. A matrix multiplication between U and V yields a 3×2 output.
- The objective of fitting the network is to make this output close to the input. Note that this network itself ensured that the input and output dimensions match.
Comparing LSTM Autoencoder with a regular LSTM Network
The above understanding gets clearer when we compare it with a regular LSTM network built for reconstructing the inputs.
# define model
model = Sequential()
model.add(LSTM(128, activation='relu', input_shape=(timesteps,n_features), return_sequences=True))
model.add(LSTM(64, activation='relu', return_sequences=True))
model.add(LSTM(64, activation='relu', return_sequences=True))
model.add(LSTM(128, activation='relu', return_sequences=True))
model.add(TimeDistributed(Dense(n_features)))
model.compile(optimizer='adam', loss='mse')
model.summary()
# fit model
model.fit(X, X, epochs=300, batch_size=5, verbose=0)
# demonstrate reconstruction
yhat = model.predict(X, verbose=0)
print('---Predicted---')
print(np.round(yhat,3))
print('---Actual---')
print(np.round(X, 3))
Differences between Regular LSTM network and LSTM Autoencoder
- We are using
return_sequences=True
in all the LSTM layers. - That means, each layer is outputting a 2D array containing each timesteps.
- Thus, there is no one-dimensional encoded feature vector as output of any intermediate layer. Therefore, encoding a sample into a feature vector is not happening.
- Absence of this encoding vector differentiates the regular LSTM network for reconstruction from an LSTM Autoencoder.
- However, note that the number of parameters is the same in both, the Autoencoder (Fig. 2.1) and the Regular network (Fig. 3.1).
- This is because, the extra
RepeatVector
layer in the Autoencoder does not have any additional parameter. - Most importantly, the reconstruction accuracies of both Networks are similar .
Food for thought
The rare-event classification using anomaly detection approach discussed in LSTM Autoencoder for rare-event classification [ 1 ] is training an LSTM Autoencoder to detect the rare events. The objective of the Autoencoder network in [ 1 ] is to reconstruct the input and classify the poorly reconstructed samples as a rare event.
Since, we can also build a regular LSTM network to reconstruct a time-series data as shown in Figure 3.3, will that improve the results?
The hypothesis behind this is,
due to the absence of an encoding layer the accuracy of reconstruction can be better in some cases (because the dimension time-dimension is not reduced). Unless the encoded vector is required for any other analysis, trying a regular LSTM network is worth a try for a rare-event classification.
Github Repository
The complete code can be found here .
Conclusion
In this article, we
- worked with a toy example to understand an LSTM network layer-by-layer.
- understood the input and output flow from and between each layer.
- understood the meaning of
return_sequences
,RepeatVector()
, andTimeDistributed()
. - compared and contrasted an LSTM Autoencoder with a regular LSTM network.
In the next article, we will learn about optimizing a Network: how to decide on adding a new layer and its size ?
References
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK