15

Predicting the Movement of Stocks in the Australian Securities Exchange (ASX)

 4 years ago
source link: https://mc.ai/predicting-the-movement-of-stocks-in-the-australian-securities-exchange-asx/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

I recently embarked on a task of building a prediction model that forecasts the movement of the next day stock prices in the Australian Securities Exchange (ASX). I am writing this piece to share my journey of discovering a more reliable strategy based on a long short-term memory (LSTM) model built upon approximately 2.5 years of end of day (EOD) financial market data from the ASX. The Jupyter notebook can be downloaded from my repository here .

1. Data Exporting

The first task is converting the EOD data into five seperate time series data frames; one each for o pen , h igh , l ow , c lose and volume (OHLCV). In each data frame, rows are indexed by date, and columns by ticker. There are 2773 tickers with observations over 883 days.

Next, three additional data frames were created for: return, next-day return, and high/low (these were requested in the task given to me). These metrics are potential features in addition to the OHLCV values.

2. Exploratory Data Analysis

2.1 Hypothesis, motivation, approach and strategy

As of 22 November 2019, based on ASX’s website, there are in total 2195 unique tickers currently listed. The data set however has 2773 unique tickers; which means that it has tickers of companies that have been delisted.

Before analyzing the coverage and quality, the hypothesis, scope, motivation, and strategy of this task should be discussed.

Hypothesis:

  1. Based on research [1–3], the patterns in the technical indicators derived from OHLCV of a particular stock (or company) have the predictive power of its future movements.
  2. Additionally, the technical indicators of companies contributing to composite indexes, such as ASX All Ordinaries, S&P/ASX 50, etc. can be used to predict the movement of the composite indexes that they belong to.

Scope: This work limits its scope to hypothesis 1 with an extension.

Motivation: This work extends hypothesis 1 to be more generic: unlike the hypotheses made in [1–3], the future movement of any stock price can be predicted through the movement and pattern of technical indicators; and the same prediction from movement and pattern can be applicable to the stock price of any company. In order words, a machine learning model can be trained agnostic to tickers solely based on the intrinsic patterns in the technical indicators.

Based on studies [1–4], the time series nature of the data recommends the use of the LSTM paradigm of neural networks, which has been shown to be superior to ensemble methods such as random forest and other classification techniques such as the support vector machines.

Strategy: The strategy is to build a generic prediction model using the OHLCV data and technical indicators of past 60 (trading) days so as to be able to predict the movement of the stock price of any company. Note that this excludes all composite indexes listed in ASX (as this falls under hypothesis 2).

For this, an LSTM model will be trained using train, validation, and test data to predict the following metrics: upward or downward movement of the next day stock price.

2.2. Coverage and quality

Now let us first visualize the distribution of number of non-missing values based on the histogram below.

The histogram above shows that there are about 196 tickers with no missing values in close price.

All missing values are coming from the price not being registered on the day. After crosschecking with data from the ASX’s official website, it is found that this is mainly due to the fact that the trade did not take place on that day, company listed after 2015-01-02, or company delisted before 2018-06-29.

As clean result is of paramount importance, tickers with a lot of missing values are disregarded. In this case, only tickers with more than 800 non-missing values are considered.

Secondly, for good predictability, it is important to consider only companies with good correlation to the label, which in this case is the future return movement. Also, as composite indexes do not fall within the scope of the hypothesis, they are removed — after checking with ASX’s website, they are the tickers that do not have volume information.

Based on the histogram above, it can be observed that there are very few tickers with relatively good correlation to future returns. For good predictability, only tickers that possess predictive power are used. Therefore, the tickers with average correlation value of more than the 75th percentile is considered here; and there are 150 of them.

Now, the missing values are handled. There are a couple of ways to handle the missing values on the day where the trade did not take place:

  1. Set volume to 0 and open, high, low, and close values set to previous day’s close value with the assumption that no trade occurred on that day for that particular ticker. This approach may introduce outliers in the volume and therefore noise in the data.
  2. Interpolate the data OHLCV data using the previous and next available values for smooth transition. Another alternative is to fill in the missing values to be either the mean and median of the column. This, on the other hand, introduces false information.

Here, the first approach is taken as it is theoretically more correct.

2.3. Additional transformations

Before going into performing some visualizations and analysis, some technical indicators are examined. There are numerous such transformations in literature [1–3]. The work by Borovkova et al. [1] consolidates some of the key technical indicators which can be used for additional transformations. Typically, they can be categorized into four groups: (1) momentum; (2) trend; (3) volume; and (4) volatility. Some of the commonly used indicators are:

  1. Money flow index
  2. Relative strength index
  3. Stochastic oscillator (%K)
  4. William %R
  5. Exponential moving average
  6. Moving average convergence-divergence
  7. Commodity channel index
  8. Ichimoku indicator
  9. Accumulation/distribution index
  10. Bollinger bands

All transformations were done suing an open-source Python package [5].

2.4. Data analysis

Before performing some data analysis, the data is be split to training, validation, and testing sets as follows:

  1. Train: 2015–02–01 to 2017–06–30
  2. Validation: 2017–07–01 to 2017–12–31
  3. Test: 2018–01–01 to 2018–06–29

First, the correlation between all the data frames, or in other words, between features are visualized. Because there are numerous tickers in a data frame, it could be difficult to visualize the correlation matrix for all tickers. Therefore, here, the average correlation value is taken and the correlations are visualized using a heat map.

In addition to the correlation matrix heat map, the histograms of the data are also plotted out to look for outliers. Before doing so, the data was normalized from 0 to 1 for every ticker so that they can be visualized and bench-marked appropriately.

The observations from the correlation heat map and histograms above can be summarized as:

  1. Open, high, low and close values are highly correlated to each other. This is expected because the values are within close proximity to each other. As the technical indicators are calculated using these values and intrinsically retain its information, open, high, and low values can be removed from feature list.
  2. Most of the distributions above are either normal or uniform in shape, expect for h/l and volume. This suggests that there will be a large number of outliers in those two features. Although the other features with normal distribution may have outliers in the tail section, they can be considered negligible. In addition, observing the correlation matrix, they also appear to have low correlation to future return, which is the metric we will be predicting. Therefore, it justifies to leave h/l and volume out from the feature list.
  3. Similarly, based on the correlation matrix heat map, some values appear to not have much correlation to future returns. They are the money flow index and accumulation/distribution index. To keep the prediction model simple, these features can also be regarded as not useful.
  4. Finally, it also appears that the stochastic oscillator is highly correlated to William %R. This is expected because the mathematical expression of both the indicators are similar. Here, William %R will be removed from the feature list.

Based on the discussion above, the final feature list would be: close, return, exponential moving average, relative strength index, stochastic oscillator, moving average convergence-divergence, commodity channel index, Ichimoku indicator, and the upper and lower Bollinger bands.

As outlined earlier, the strategy is to predict the upward and downward movement of a stock. Based on experience, this is a better strategy compared to predicting the (positive or negative) percentage movement of future return as the performance metric based on label can be misleading. For example, a model is built such that the mean-squared-error is minimized, that still doesn’t mean that the direction of the movement can be correct. If the actual movement is 0.1%, then investing based on 0.5% prediction is better compared to -0.1% prediction although the former has a relatively higher error value.

Another data frame registering the label is created with 1 and 0: 1 for UP and 0 for DOWN; and the distribution of the values is plotted out.

Based on the analysis above, it appears that there are more downward movements than upward ones. Therefore, some balancing is required before sending the data for machine learning.

3. Forecasting

3.1. Long short-term memory (LSTM) model

Preprocessing : Based on the analysis done in the previous section, this section builds an LSTM neural network model for the prediction of the next day stock price moment; whether it is going upward or downward.

The training and validation sets will be using during the LSTM network training, while the test set will be used for trading strategy implementation and additional testing of the final model.

Next, the data is pre-processed. The data is normalized to be between 0 and 1 using MinMaxScaler. Note that the maximum and minimum is with respect to each ticker in each train data frame.

Then, the data is scaled to have mean 0 and unit variance. This scaling is done by training the StandardScaler using data of entire normalized training data frame. This will be used to transform the training, validation, and test sets.

Next, the data was sequenced. The past 60 days of data is used in sequence for the next day prediction for all training, validation, and test data sets.

Once sequencing is complete, the data was balanced. This was to make sure the LSTM model is not biased towards the downward trends, as observed earlier during data analysis.

Building and training the LSTM model : A LSTM model using batch size of 64 and 40 epochs were compiled (using Keras in Tensorflow). The summary of the model is as follows:

Layer (type) Output Shape Param # 
=================================================================
lstm_3 (LSTM) (None, 60, 128) 72192 
_________________________________________________________________
dropout_4 (Dropout) (None, 60, 128) 0 
_________________________________________________________________
batch_normalization_3 (Batch (None, 60, 128) 512 
_________________________________________________________________
lstm_4 (LSTM) (None, 60, 128) 131584 
_________________________________________________________________
dropout_5 (Dropout) (None, 60, 128) 0 
_________________________________________________________________
batch_normalization_4 (Batch (None, 60, 128) 512 
_________________________________________________________________
lstm_5 (LSTM) (None, 128) 131584 
_________________________________________________________________
dropout_6 (Dropout) (None, 128) 0 
_________________________________________________________________
batch_normalization_5 (Batch (None, 128) 512 
_________________________________________________________________
dense_2 (Dense) (None, 32) 4128 
_________________________________________________________________
dropout_7 (Dropout) (None, 32) 0 
_________________________________________________________________
dense_3 (Dense) (None, 2) 66 
=================================================================
Total params: 341,090
Trainable params: 340,322
Non-trainable params: 768
_________________________________________________________________

Essentially, the LSTM models will have 3 LSTM layers and dense layer with activation and then finally a dense output layer with softmax activation function as this is a classification problem. The softmax activation function will output the probability of the classification.

To avoid over-fitting, dropout layers with are added. As additional measure, regularization could also be added if deemed necessary so that the weights optimized during training do not get too large. However, in this case, they are not necessary.

An adaptive first-order gradient-based optimization known as Adam is used for optimizing the model. Another famous optimizer would be the stochastic gradient descent known as SGD. As this is a classification problem, accuracy calculated using sparse categorical cross-entropy and loss functions are used as the performance metric.

All metrics specified above are hyper-parameters for this model. A hyper-parameter search using a grid of batch size, epochs, unit values and optimization methods can be performed, but due to the hardware limitations, only these initial hyper-parameters are used.

Analysing model performance : The history of loss function that is optimized is shown below. From the graph, we can see that the loss function from training and validation data are being minimized and they are within close proximity to each other. This shows that the model is not over- or under-fit. However, one improvement that could be made is adjusting the learning rate such that the leaps are smaller.

The accuracy graph below also shows the improvement of the LSTM model based on both the training and validation set.

The accuracy based on the test data set as well as the classification report based on the 2018 data which the model has never seen during training were also generated, as shown below.

Test loss: 0.6752472768227259
Test accuracy: 0.5605208333333334

Classification report:
 precision recall f1-score support

 0.0 0.72 0.50 0.59 6038
 1.0 0.44 0.66 0.53 3562

avg / total 0.61 0.56 0.57 9600

The accuracy score based on the test set shows that the results are in line with research [4] whereby the accuracy on test set is lower than that of the training and validation set. The prediction of upward movement has lower precision compared to that of the downward trend, which means that there are more false positives when predicting the upward movement. Often, the recall score is inversely proportional to the precision. Therefore, the best measure is the f1-score, which is acceptable for developing a trading strategy.

Trading strategy : A simple trading strategy is defined here. Of all 150 tickers used, a position is opened using AUD 1 for each ticker based on the predicted movement by the LSTM network, and the position is closed on the next day. This means that everyday, an investment of AUD 150 is made and position is closed the next day.

Note that, as the LSTM network requires 60 days of past data, the trading is simulated from 2018–03–27. The cumulative profit and loss (PnL) will be plotted.

Following the simple strategy, based on 64 days of trading and AUD 150 investment per day, the total profit made is about AUD 25. On average, this is a 0.26% daily return on investment (ROI). The figure below shows the distribution of ROI values based on the 64-day back-tested data. Based on the histogram, with 80% confidence, the LSTM model will provide a positive daily ROI.

Considering the confidence interval and the linear trend of the cumulative PnL, this can be considered to be a successful strategy .

Conclusion

A historical data of all companies in the Australian Securities Exchange (ASX) was given for the task of predicting the next day movement of the securities and subsequently coming up with a trading strategy by using just 883 days of end of day information of 2773 stocks.

The data was first extracted, analysed, cleaned and features were engineered. Following that, a long short-term memory (LSTM) network model was built for securities with good predictability.

The results shows that the LSTM model provides a reasonable accuracy for a successful trading strategy for the stock price of 150 Australian companies that has good predictability. Furthermore, the results shows that with 80% confidence, the LSTM model provides a positive daily return on investment based on the testing data.

Future Improvements

Based on this attempt, there are several improvements that can be done to further improve the prediction or trading performance. They are:

  1. More refined categorisation . The label for the currency movemement can be separated into three categories. For example, in the current version, the ones with future return of 0.00, are categorised as 0 which is a downward movement. Perhaps they can be classified into -1, 0, and 1 and build a trading strategy based on these categories.
  2. Attempt to tackle hypothesis 2 . The second hypothesis is that the composite indexes are dependent on its constituent securities. Therefore, these securities can be used to predict the movement of composite indexes.
  3. Using probabilities from LSTM model . The softmax activation function from the LSTM model built actually ouputs a probability of the categories. Further investigation is required to include the probailities into the trading strategy.
  4. Using convolutional neural network (CNN) . Research [6–7] and [4], respectively, show that using CNN and a hybrid method of CNN-LSTM can improve the prediction accuracy. This could be an interesting improvement.
  5. Hyperparameter search . With more computing power available, a hyperparameter search can be deployed to find the best parameters for the LSTM and model.
  6. News data . Other types of data, such as news related to the securities, can be added as one of the features, as shown by studies [2, 8]. For example, [2] shows that adding the news information (e.g., sentiment) could boost the prediction accuracy by 10% (~58% to ~70%).

References

[1] Borovkova S, Tsiamas I. An ensemble of LSTM neural networks for high-frequency stock market classification. Journal of Forecasting. 2019;1–20. https://doi.org/10.1002/for.2585

[2] Zhai Y, Hsu A, Halgamuge SK. Combining News and Technical Indicators in Daily Stock Price Trends Prediction. Lecture Notes in Computer Science. 2007; https://link.springer.com/chapter/10.1007/978-3-540-72395-0_132

[3] Hegazy O, Soliman OS, Salam MA. A Machine Learning Model for Stock Market Prediction. International Journal of Computer Science and Telecommunications. 2013; 17–23. https://www.ijcst.org/Volume4/Issue12/p4_4_12.pdf

[4] Vargas MR, dos-Anjos CEM, Bichara GLG, Evsukoff AG. Deep Learning for Stock Market Prediction Using Technical Indicators and Financial News Articles. 2018; https://doi.org/10.1109/IJCNN.2018.8489208

[5] Padial DL. Technical Analysis Library in Python. 2019; https://github.com/bukosabino/ta

[6] Selvin S, Vinayakumar R, Gopalakrishnan E.A, Menon VK, Soman K.P. Stock price prediction using LSTM, RNN and CNN-sliding window model. 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI). 2017; https://doi.org/10.1109/ICACCI.2017.8126078

[7] Chen S, He H. Stock Prediction Using Convolutional Neural Network. IOP Conference Series: Materials Science and Engineering. 2018; https://doi.org/10.1088/1757-899X/435/1/012026

[8] Weng B, Lu L, Wang X, Megahed FM, Martinez W. Predicting Short-Term Stock Prices Using Ensemble Methods and Online Data Sources. Expert Systems with Applications. 2018; https://doi.org/10.1016/j.eswa.2018.06.016


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK