

Real-Time Anomaly Detection in Time Series Data Streams
source link: https://sj14.gitlab.io/post/2018/02-21-anomaly-detection/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.


Description
The research goal of my master thesis (I’ve done in cooperation with trivago) was to find real-time capable solutions to automatically detect anomalies in time series data streams, which are especially useful to monitor servers. I evaluated several algorithms and finally ensembled an own algorithm which meets almost all of the previously gathered requirements.
In the figures, the red area indicates an anomalous region. When the algorithm detects an anomaly outside this area, it is a false-positive (should be minimized as much as possible), when the algorithm detects an anomaly inside the red area, it is a true-positive (we wanted to detect this). The darker blue line shows the measured values and the lighter blue lines are the confidence intervals (the maximum and minimum allowed deviation of the measured values).
To compare the different algorithms and allowing calculating an evaluation score for each of them, the datasets were auto-generated including the anomalous data points and labels.
Very strong trends in the data set are still tricky to handle, especially at the beginning of the measurements, because it is difficult to distinguish between a normal and an anomalous change. Welcome to the topic of anomaly detection! ;-)

The good
I took enough time to deep dive into the topic (but it is still a huge topic!) and came up with a good algorithm, which is very resource friendly (no loops over the whole dataset, just incremental updates).
The bad
During my studies, I messed up my Python installation and only the macOS built-in Python 2 worked ¯\(ツ)_/¯
As the topic was bigger than expected, the chapter about the production use case (e.g. using influxDB and kapacitor) was neglected.
Technologies used
python, tensorflow, keras, docker, influxDB, kapacitor

Recommend
-
106
Extended Isolation Forest This is a simple package implementation for the Extended Isolation Forest method. It is an improvement on the original algorithm Isolation Forest which is described (among other place...
-
43
Forseti intelligent agents: an open-source anomaly...
-
40
Unsupervised Anomaly Detection for Univariate & Multivariate Data. Susan Li in
-
160
我觉得异常检测可以被理解为一种在「无监督或者弱监督下的非平衡数据下的多分类,且要求一定的解释性」的…
-
53
How to use BigQuery ML for anomaly detection 2019-07-27
-
40
Anomaly Detection in Videos using LSTM Convolutional Autoencoder
-
39
In this post, we’ll take an AI neural network trained for anomaly detection and deploy it as a containerized REST API. Our use case is where externally collected sensor data is streamed to our API for near real-time anomaly detection analys...
-
2
Introduction Welcome to the fascinating world of stock market anomaly detection! In this project, we’ll dive into the historical data of Google’s stock from 2014-2022 and use cutting-edge anomaly detection techniques to...
-
11
Anomaly Detection in Time Series DataAnomaly detection is the process of identifying data points or patterns in a dataset that deviate significantly from the norm. A time series is a collection of data points gathere...
-
6
Time Series Sequence Anomaly Detection with Markov Chain There are many algorithms fo...
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK