The beginning of a deep learning trading bot — 95% accuracy is not enough - JOYK Joy of Geek, Geek News, Link all geek

The beginning of a deep learning trading bot — Part1: 95% accuracy is not enough

Follow me on my research journey where I develop a deep learning-based trading system.

May 17 ·10min read

jMNbIjm.png!web

O k, my friends, the objective is to develop an elaborate trading system that is capable of outperforming the market consistently. In the end, we will have a production-ready application and I’ll invest around $20,000 of my own savings and share the results with you. Hence, in this series of articles, I’ll be not just sharing proof of concepts but what it actually takes to develop AI systems that can handle the uncertainty of the real market.

Most AI and Deep Learning sources have a tendency to only present final research results, which can be frustrating when trying to comprehend and reproduce the provided solutions. Instead, I want to make this series as educational as possible and thus will be sharing my train of thought and all the experiments that go into the final solutions.

Before we can start, we have to remind ourselves that forecasting price movements of a particular stock within a market is a highly complex task. Stock movements are caused by millions of impressions and pre-conditions that each market participant is exposed to. Thus, we need to be able to capture as many of these impressions and pre-conditions as possible . In addition, we have to make a couple of assumption of how the market operates.

Assumptions

The market is not fully perfect . Meaning that information is not immediately available to all market participants, but takes time to spread.
Historical market events and stock movements influence future stock movements.
The market follows mostly people’s rational behaviour .

And, please, do read the disclaimer at the bottom.

Upcoming structure

As we cannot cover the development of an entire trading bot within one article, I envision the structure of this series as follows:

In this article, I’m sharing my first experiments using stock prices/returns and volumes to make predictions of future stock movements.
In the next article, I’ll share different experiments utilizing financial and market news to complement the historical stock prices to make more robust predictions of future stock prices and returns.
Combining all experiment results and development of a production-grade model that incorporates stock prices, volumes, news and other data points to forecast stock returns.
Building a deep reinforcement bot for trade executions. We will train a bot that learns when to sell and buy different stocks based on historical prices and our stock movement predictions.
Hosting and deploying the trading bot on a cloud service .
Hooking up the trading bot to a Paper Trading Account, as a final rehearsal.
Investing $20,000 and letting the bot interact freely with the market.

Start experimenting — Finding the right data

Before training the production-grade level models we first have to find out how explanatory stock prices and financial news are when forecasting for stock returns.

In order to get a first impression of how well stock prices and news indicate future stock price changes we initially train multiple models on a smaller dataset. The dataset that we will use to start proving our assumptions are the historical price and volume data of the IBM stock.

IBM has a fairly long price history on Yahoo, prices reach back as far as 1962. The easiest way to get the historical IBM prices is to simply download the dataset from yahoo’s IBM page . For each trading day, Yahoo provides Open, High, Low, Close prices and the Volume ( OHLCV ). Once downloaded and having it loaded into a notebook the IBM OHLCV data looks as follows.

IBM’s price and volume data

To get an idea of how the prices have changed over time, we plot the daily closing prices . The IBM price graph start date is January 2nd, 1962, ends on February 3rd, 2020, and has a price range between $7.5 – $225 .

VZ3iMbb.png!web

IBM daily close price 1962–2020

Let’s also have a look at the volume data which we will use as an additional feature to our price data points. The volume for each day is calculated by multiplying for each trade the number of shares times the trade share price . Then the products of all trades during a day are summed and form the volume data point for this particular day.

InYBRju.png!web

IBM’s daily trading volume 1962–2020

Preprocessing our data — I know it’s boring but necessary :blush:

Feeding raw price and volume data into a deep learning model is usually a bad idea. When looking at IBM’s price graph you can see the prices from 1962 to 1991 ($7- $48) are on a totally different level than the prices around the years 2000 and 2020 ($140-$220). In essence, these two price ranges have little to do with each other. Meaning that the range from 1962–1991 (average price $25) has little explanatory value for the price range of 2000–2020 (average price $130). In order to bring past price points on the same level as price points of recent times, and thus more useful for training our neural networks, we have to do a couple of preprocessing steps. Let’s start the preprocessing, I promise it‘s going to be quick.

Firstly, we are going to convert prices and volumes into price returns/percentage changes. The easiest way to do this is by using the pandas function pct_change().

The advantage of having price returns is that they are more stationary than raw price data. A simple explanation of stationarity: Stationarity = Good , because past data is more similar to future data, making forecasts easier.

IBM’s daily stock returns and volume changes

The next graph illustrates nicely that converting stock prices to stock returns removes the trend of increasing stock prices.

Secondly, a min-max normalization is applied to all price and volume data, making our data range from 0–1. Instead of using the raw price returns and volume changes, normalized data has the advantage that it allows a deep learning model to train more quickly and stably.

Thirdly, we will split the time series into training, validation and test datasets. In most cases a training and validation dataset split is sufficient. However, for time series data it is crucial that the final evaluation is performed on a test set. The test dataset has not been seen by the model at all and thus we avoid any look ahead or other temporal biases within the evaluation.

Having calculated stock returns, normalized and split the data into 3 sections, the datasets have now the illustrated shape below.

ZjymYrq.png!web

Training and testing different model architectures

After having prepared the dataset, we now can start doing the fun stuff — training different deep learning models. I’ve experimented with Bidirectional LSTMs, Transformers, and CNN + Bi-LSTM the code for all models can be found on Github .

Bidirectional-LSTM (Bi-LSTM)

A Bidirectional LSTM is an extension of a traditional LSTM ( L ong S hort T erm M emory Cell) that can improve model performance when processing sequential datasets. Bidirectional LSTMs combine two individual LSTM layers whereas the first LSTM layer receives the sequential data (e.g. IBM prices) in the correct chronological order and the second LSTM layer receives the same data only in a reversed order. After the input data has been processed by the bidirectional LSTM layer (2 LSTM layers) both outputs are being concatenated to produce the final prediction. Making it a bit simpler to grasp I’ve added an illustration and code below.

ieui6f6.png!web

Now let’s have a look at how the bidirectional LSTM model looks when coded. Instead of just having a single Bi-LSTM layer this model has been constructed with three Bi-LSTM layers to improve the model’s capacity.

Now let’s start the training process.

Evaluating the model

Bidirectional LSTM results

After having trained for 200 epochs we obtained the following results. For the validation set, we have a Mean Average Percentage Error (MAPE) of 3.5828 which can be interpreted, for the sake of simplicity, as an accuracy of 96.42% . The test dataset has a MAPE value of 4.2656 , equating to an accuracy of 95.73% . An accuracy of greater than 95% seems amazing at first, but the next graph indicates a different story.

The blue line represents the daily price changes of the IBM stock (daily stock returns) and the yellow line corresponds with our model’s predictions. Despite having an average accuracy of >95% the model merely managed to find the centre of the stock return distribution . The yellow prediction line does not divert at all from the centre of the stock returns around 0.6.

aau67n3.png!web

Bi-LSTM — IBM daily stock returns (blue) and next day stock return predictions (yellow)

The interpretation of the stable prediction line is that our models are able to identify the general trend of the IBM stock. For now, our experiments show that only a trend and not outliers are derivable by price data of a single stock alone.

CNN + Bidirectional LSTM

Now we will perform the exact same steps as with the vanilla Bi-LSTM model, but instead of using a vanilla Bi-LSTM we’ll combine the Bi-LSTM with a convolutional network (CNN).

aQZbaea.png!web

Usually, CNNs are used for image classification whereas each convolutional layer within the CNN is extracting different features from the image. However, in recent years it has been shown that CNNs provide value when analyzing time series and sequential data (e.g. sound and text). Convolutional layers are good in detecting patterns that occur between data points which are spatially close to each other. In case of your day by day price and volume data that should be the case.

The architecture of the conventional layers has been inspired by Google's Inception blocks. I changed the 2D-convolutions of the inception model to 1D-convolution making the layers compatible with our time series.

Ok, let’s construct the CNN + Bi-LSTM model.

And finally the training initiation of the CNN + Bi-LSTM model.

Evaluating the model

CNN+Bi_LSTM results

After having also trained this model for 200 epochs, the training results look as follows. The MAPE value of the validation set is 3.8454 only slightly higher than the 3.5828 of the previous model. The test dataset has a MAPE value of 4.5289 instead of the 4.2656 for the Bi-LSTM model. For both, validation and test set the CNN+Bi-LSTM network performs worse than the vanilla Bi-LSTM model. Although we have worse results, I’m satisfied to see that this time we don’t just have a straight prediction line and thus can exclude any severe model or data structure flaws.

This time our model has learned to only predict the trend of price movements again. Still a forecast for outliers is not possible with the IBM price data alone.

JvMjymV.png!web

CNN+Bi-LSTM — IBM daily stock returns (blue) and next day stock return predictions (yellow)

If you feel interested in seeing the results of all the different model architectures and the full code of my experiments have a look here. Github

Next steps — final thoughts

In conclusion, our models are able to identify the general trend of price changes, but outliers cannot be predicted. The Mean Average Percentage Error(MAPE) is always lower than 5, equating to an accuracy interpretation of >95%. For the first experiments, the results are already very promising. Nevertheless, in order to improve our model’s shortcomings and master the outlier forecast, we have a couple of options.

First, we can add additional data sources that help with the prediction of outliers, such as financial and market news.

Second, we can train larger models with more price data from different stocks. For this, I’ve curated a list of 7200 publicly traded stocks which we’ll use in a future article to increase model performance.

There are many more details to explore. And I’m sure there are many questions and unanswered parts. So, any comments and suggestion s— please do share. I’d be happy to include suggestions in the future process.

In the next post, I’m going to show how financial news can be used in combination with price data to improve the price return predictions of our networks.

Thank you very much for reading to the end.
Jan

Disclaimer

None of the content presented in this article constitutes a recommendation that any particular security, portfolio of securities, transaction or investment strategy is suitable for any specific person. Futures, stocks, and options trading involves substantial risk of loss and is not suitable for every investor. The valuation of futures, stocks and options may fluctuate, and, as a result, clients may lose more than their original investment.

The beginning of a deep learning trading bot — 95% accuracy is not enough

The beginning of a deep learning trading bot — Part1: 95% accuracy is not enough

Follow me on my research journey where I develop a deep learning-based trading system.

Assumptions

And, please, do read the disclaimer at the bottom.

Upcoming structure

Start experimenting — Finding the right data

Preprocessing our data — I know it’s boring but necessary :blush:

Training and testing different model architectures

Bidirectional-LSTM (Bi-LSTM)

Evaluating the model

CNN + Bidirectional LSTM

Evaluating the model

Next steps — final thoughts

Disclaimer

Recommend

高性能涨点的动态卷积 DyNet 与 CondConv、DynamicConv 有什么区别联系？

普通人不建议极限运动。。

爆料人称iPhone 12可能要到今年10月份才发布

内忧外患，被“度量”的百度能否绝地反击？｜观潮

京东 149=两年 Plus+1 年腾讯视频

来聊聊DenseNet及其变体PeleeNet、VoVNet

Redis 6.0更新放大招：客户端缓存怎么用好

Bite Sized Rust RE: 1 Deconstructing Hello World

真·硬核! Build 2020: 自研超算挑战全球 Top5，Windows 应用全面统一，史上最大语言模...

谁能想到，我给技术总监“上了一课”

About Joyk