Abstract

Anomalies in time series, also called “discord,” are the abnormal subsequences. The occurrence of anomalies in time series may indicate that some faults or disease will occur soon. Therefore, development of novel computational approaches for anomaly detection (discord search) in time series is of great significance for state monitoring and early warning of real-time system. Previous studies show that many algorithms were successfully developed and were used for anomaly classification, e.g., health monitoring, traffic detection, and intrusion detection. However, the anomaly detection of time series was not well studied. In this paper, we proposed a long short-term memory- (LSTM-) based anomaly detection method (LSTMAD) for discord search from univariate time series data. LSTMAD learns the structural features from normal (nonanomalous) training data and then performs anomaly detection via a statistical strategy based on the prediction error for observed data. In our experimental evaluation using public ECG datasets and real-world datasets, LSTMAD detects anomalies more accurately than other existing approaches in comparison.

1. Introduction

Time series analysis is a hot research topic, which mainly includes two aspects: (1) identifying the nature of the phenomenon represented by the time series of observation [1] and (2) predicting future values of the time series variable based on historic data [2,3]. It was widely used in many areas in the real world, e.g., signal processing, pattern recognition [4], mathematical finance [5], weather forecasting [6], and control engineering [7]. Particularly, anomaly detection of time series is a more important direction, which promotes the development of outlier recognition techniques in real-time big data [8].

In the past years, many computational approaches were developed and used for anomaly detection in many applications, e.g., traffic detection or network intrusion detection. They can be categorized to three classes: (1) statistical modeling [914], (2) data mining-based techniques [1521], and (3) machine learning-based approaches [2229]. A lot of previous studies revealed that the above models have been successfully used for anomaly classification [10,17,18]; however, the computational frameworks focusing on abnormal subsequence detection in time series are still not well developed.

Recent studies show that some time series analysis approaches [30,31] can work well, particular to some well-known public time series, such as EEG and ECG datasets. However, they face the challenges to the generalization, robustness, and efficiency [32]. These approaches always failed when they were applied on the real-world problems [31]. Because the time series from the real world is always complicated, including missing values, high noise, and normalization issue, therefore new computational strategies are urgent to address the above problems.

As a branch of machine learning, deep learning (DL) methods offer a lot of promise rather than traditional machine learning, including higher accuracy, greater flexibility, stronger generalization, and less dependency on domain knowledge [3335]. Thus, it provides a new way to improve the area of anomaly classification of time series [36]. Different from various popular computational tools of anomaly classification, DL-based discord search in time series was not well studied. As a new type of DL model, long short-term memory (LSTM) provides great power in time series forecasting [37,38], which raises a question whether we can use LSTM to achieve discord search. In this study, we proposed a LSTM-based anomaly detection approach (LSTMAD) for identifying the abnormal subsequence from univariate time series. LSTMAD can learn the temporal structure of normal signal from the historic values so that it can easily identify the discords in the testing series. In the simulation experiments, we applied our LSTMAD model on various time series datasets and found that it can offer high accuracy. Moreover, LSTMAD also outperformed three other typical discord search algorithms. In summary, the developed LSTMAD provides a new pipeline to accurately capture abnormal sequences in the real-time systems.

The rest of the paper is structured as follows: in Sections 2 and 3, the related work and the proposed computational approach LSTMAD are presented. In Section 4, the datasets for validation and the experiment design are described in detail. In addition, this section describes the steps and parameter settings of the method in detail. In Section 5, the simulation results are shown and discussed, while in Section 6 conclusions are drawn and suggestions for future work are presented.

According to the previous works reported in literature, the computational approaches for anomaly detection can be summarized as three categories: statistical approaches, data mining based techniques, and machine learning. We summarized these methods as follows.

2.1. Statistical Approaches

Yamanishi et al. proposed a Gaussian mixture model by scoring each data point and identifying the outlier with high scores [9]. Zhang and coworkers proposed a mathematical criterion to distinguish between normal and abnormal data using statistical algorithms [10]. Kosek et al. developed a regression model based method for anomaly detection [11]. Goldsein et al. proposed histogram-based outlier detection (HBOS) algorithm, which assumes independence of the features, making it much faster than multivariate anomaly detection approaches. It points out that the histogram is required if the results of outlier detection are available immediately [12]. The limitation of these approaches is that anomaly detection depends on the assumption that the data is generated in a particular statistical distribution [13].

2.2. Data Mining-Based Techniques

Solutions making anomaly detection more effective are by using data mining techniques, including clustering, or classification. Researchers have mostly used K-means clustering for grouping of similar data points [15, 16], so that the data points locating outside of these clusters were considered as anomalies. These approaches operate in an unsupervised mode; however, they may not offer accurate insights at the required level of detail in smaller datasets. Classification-based anomaly detection was also widely studied for real-world applications, e.g., traffic, intrusion, or network detection [1720]. The goal of classification is to learn from labeled classes of training data for identifying classes of new or unknown instances [39]. However, the good performance requires that the training set must have well defined labels.

2.3. Machine Learning

In recent years, machine learning techniques were widely used for anomaly detection, including fuzzy logic [2224], Bayesian approach [25,26], genetic algorithm [23,27], and neural network [28,29]. Nakano et al. proposed a fuzzy logic-based anomaly detection method for network anomaly detection [22]. Hamamto and coworkers developed a hybrid approach for network anomaly detection by using genetic algorithm and fuzzy logic [23]. Mascaro et al. explored the use of Bayesian networks for analyzing vessel behavior and detecting anomalies [26]. Combining the dynamic and static networks, they proved that their approach improved the detecting accuracy in vessel tracks. As the rapid progress of artificial intelligence, various neural network models, e.g., recurrent neural network (RNN) [29] and back propagation neural network (BPNN) [28], were developed to monitor the anomalies of a complicated system. These approaches work well in some special application areas; however, the generalization is still a big challenge.

Comparing with traditional machine learning methods, deep learning (DL) has stronger learning ability and can achieve higher accuracy [40]. The most frequently deep learning methods are generative adversarial network (GAN) [41], autoencoder [42], convolutional neural network (CNN) [43], and Long Short-Term Memory (LSTM) [44]. Previous studies show that almost all of the above models were applied to anomaly classification [4547]; however, the work focusing on DL-based abnormal subsequence detection in time series is rarely reported.

Despite this, there still have been many attempts to perform anomaly detection in time series using various statistical or SVM-based methods, including MFAD [31] and LRRDS [32]. However, few attempts have been made to accurately predict the abnormal subsequence in time series using LSTM. Therefore, a proper deep learning method is required to perform anomaly detection using LSTM.

3. Method

The flowchart of the proposed LSTMAD approach is shown in Figure 1(a). The whole framework consists of four modules, including noise reduction, normalization, LSTM model, and anomaly detection. The details of each module are described in the following sections.

3.1. Noise Reduction

Since the noisy signal might be involved in the processing of data collection, which will affect the accuracy of the computational results, therefore, it is necessary to reduce the noise from the raw sequence before constructing anomaly detection model. In this study, we removed the noise information from time series by using S-G filter, which was proposed by Savitzky and Golay in 1964 [48]. It can be applied to a set of digital data points for the purpose of smoothing the data, to increase the precision of the data without much destroying its original properties. S-G algorithm is capable of not only removing the noise from raw data, but also ensuring the shape and width of the original signal [49, 50].

3.2. Normalization

Given a univariate time series with length , the normalization was implemented as follows:where and are the mean value and standard deviation of the original series . The vector is the normalized sequence. After normalization, the series will follow 0-1 normal distribution.

3.3. LSTM Model

The Long Short-Term Memory (LSTM) model was first developed by Horchreiter and Schmidhuber in 1997 [51]. Different from RNN’s capability to process short term sequential data, LSTM can be used to represent the long-term dependencies in time series data [52]. A common LSTM unit is composed of a memory cell, an input gate, an output gate, and a forget gate (Figure 2). The cell remembers values over arbitrary time intervals and the three gates regulate the flow of data into and out of cell. The processing of state transition in the memory cell was implemented via formula (2)–(6). The input vector at time point t is , and the hidden state vector at is introduced to the LSTM unit, and then the hidden state will be finally obtained. Equation (2) decides what information is going to be thrown away from the cell state via the forget gate . The input gate decides which values to be updated, and (3) and (4) were used to update the old cell state into the new cell state . Equation (5) indicates that the output gate decides what parts of the cell state are going to be produced as output. Finally, the cell state goes through tanh layer and multiply it by the so that we get the hidden value as the output of the LSTM unit (in (6)).

According to Figure 1(b), the LSTM model in our LSTMAD framework includes five layers. The input layer has nodes, indicating that a subseries with elements was used as input to a fully connected hidden layer. There are three hidden layers to process the information from input layer. Each blue node shown in the hidden layer is a LSTM unit. The output layer only has one node , which can be obtained from where is the weight of the i-th hidden node in layer 3 (last hidden layer) and output node . is the i-th hidden node connecting to node .

For a certain subseries , where , the first elements were sent to the input layer of the LSTM model simultaneously, and then the last element was considered as the expected result to be optimized. This also can be represented as . Mathematically, the function is a trained LSTM model. Before training the LSTM model, the original time series was segmented to multiple subseries via a sliding window with length L (Figure 3). All these segmented subsequences were randomly ordered. To obtain enough training samples, each subseries was replicated copies, where . Finally, each row in the augmented matrix was input into LSTM network for model training.

According to the above description, our LSTM model can be considered as a supervised regression machine for predicting the upcoming values based on the historic data. Based on this rational, the LSTM module was firstly trained with the samples converted from the series without anomaly; hence, the model prediction would reflect the tendency of the normal signal.

3.4. Discord Search

As described above, a normal subsequence () was firstly extracted for LSTM training. In the meantime, a testing subsequence , including discords (anomalies), will be selected. Our rationale is that a trained LSTM model “memorized” the characters of a dynamic system in normal state; hence, it can predict the future state of the system if it still normally works. Given a testing sequence that contains abnormal signals, the discord values can be easily identified by comparing the predicted values from LSTM with the observed values. The calculation for discord search includes the following steps.

3.4.1. Segmentation of the Testing Sequence

Similar to the training sequence, the testing series also needs to be converted to multiple segments via sliding window (Figure 3). Here, we set the length of sliding window as L. In our experiments, we set  = X to simultaneously present the fitting error and prediction error.

3.4.2. Prediction of LSTM Model

For each segmented small piece of sequence , the element vector would be used as input of trained LSTM model. We thus obtained the model outcome , which is the theoretical value of observation . For J testing samples (subsequences), we will obtain the prediction error vector: , where .

3.4.3. Discord Search

The vector PEV reflects the difference between prediction and observation. If the value is significantly higher (), the corresponding point at time can be considered as the peak of discord. We then use Gaussian model to fit each candidate point with significant higher value , and the abnormal sequence was finally selected from the region ( and are the mean value and standard deviation, respectively).

4. Simulation Experiments

4.1. Data Collection and Preprocessing

To examine the performance, we applied the LSTMAD approach on 6 datasets, including four well-known public datasets and two industrial time series from the real world. The details of these datasets are described as follows.(1)Chf01 [30] and Chf13 [53], ECG (electrocardiogram) related data, are collected from BIDMC Congestive Heart Failure Database [53,54]. The length of both datasets is 3751 and 3750, respectively. Each of them includes two series. In our experiments, we selected the 1st series from Chf01 and the 2nd series from Chf13 to test our algorithm.(2)Ltstdb_20221 [30], an ECG dataset, is selected from Long Term ST Database. Its length is also 3750. We used the 1st series in our experiment.(3)Xmitdb_x108 [30,55], an ECG dataset with length 5400, is selected from MIT-BIH Arrhythmia Database. The first sequence was used in our simulation.(4)SLD1 and SLD2, two sequences related with “soil pressure” in shield tunneling machine [56], were collected from a project of shield tunnel construction in the real world. The real-time construction state was collected at each 10 seconds by local sensors. Totally, over 400 features were observed during the whole process of construction. In our experiments, we focused on the time series related with “soil pressure” because abnormal pressure is a typical fault in tunneling construction. The lengths of SLD1 and SLD2 are 18,087 and 210,907, respectively.

Before the implementation of anomaly detection, the performance of S-G filters on both categories (original signal and processed signal) of data sample was evaluated in terms of PSNR (peak signal-to-noise ratio), SNR (signal-to-noise ratio), MSE (mean square error), and PRD (root mean square difference) values [57].

4.2. Experiment Design

First, we applied the proposed LSTMAD approach on the above six datasets to prove its outstanding performance. Second, we further compared LSTMAD with three well-known algorithms, including HOT SAX [30], Robust Random Cut Forest (RRCF) [58], and Telemanom [59]. To evaluate the accuracy of anomaly detection, two statistical metrics, and , are defined as follows:

As reported in previous studies, MCC produces a more informative and truthful score in evaluating binary classifications, particularly for the imbalanced data [60, 61]. In (8), TP, FP, TN, and FN define the number of normal subsequences correctly detected as normal (true positive), the number of abnormal subsequences that are detected as normal (false positive), the number of abnormal subsequences that are predicted as abnormal (true negative), and the number of normal states that are recognized as abnormal (false negative). The above four variables were counted if a predicted anomaly overlapped with the observed anomaly. In addition, is defined as the global overlapping degree between predicted and observed anomalies. denotes the overlapping degree of i-th abnormal subsequence between prediction and observation .

4.3. Experiment Parameters

All the simulations were performed under the environment of Keras 2.2.4 [62] and Python 3.5.4 with Intel Core i7-7700HQ Processor and 8 G RAM (2.8 GHz). For the S-G filter, the size of sliding window is 11, and the order is 3-4. The LSTM network was constructed with five layers. The input layer includes 49 neurons, and the output layer has only one neuron. The size of the three hidden layers is 64, 256, and 100 neurons, respectively. Default parameters were set as set batch size = 500 and dropout = 0.2. Loss function is MSE (mean square error). Optimizer was set as “rmsprop” [63].

5. Results

5.1. Evaluation of Noise Reduction

Firstly, we evaluated the quality of each time series processed by S-G filter. The performance of S-G filter of data sample was compared in terms of PSNR, SNR, MSE, and PRD values. Table 1 shows that S-G filter works well on the six datasets. The anomaly detections implemented on the processed datasets are reliable.

5.2. Validation on Univariate Time Series

According to the description in Section 4.2, the proposed LSTMAD approach was tested on six time series datasets shown in the above section. The simulation results were presented as follows. The reference (observed) and predicted anomalies were highlighted with red color.

Figure 4 shows the simulation results of LSMAD on chfdb_chf01. Figure 4(a) presents a reference anomaly, which locates in the range [2182, 2392]. Figure 4(b) shows an abnormal subsequence from 2252 to 2392 identified by LSTMAD.

Comparing with Figure 4(a), we found that the predicted result is very close to the reference.

Similarly, Figure 5 shows the simulation results of LSTMAD on the dataset chfdb_chf13. In Figure 5(a), we found that the normal signal is a periodic sequence, which is repeated many times. Moreover, there is a discord located in the range [2758, 2967]. The outcomes of LSTMAD revealed that the predicted anomaly occurred in the range from 2758 to 2874 (Figure 5(b)). It indicates that the prediction of LSTMAD fit the observation well.

Different from the above two sequences (Figures 4 and 5), the series ltstdb_20221 is not easily identified because the abnormal subsequence is very similar to the normal signal. In Figure 6(a), the discord is determined in the range [583, 783]. After calculating with LSTMAD, we predicted the subsequence locating at [583, 857] as a discord (Figure 6(b)).

In addition, we examined the performance of LSTMAD on the last ECG dataset xmitdb_x108 (Figure 7). The reference and predicted anomaly locate at [3995, 4207] and [3899, 4207], respectively. Taken the above together, we found that the proposed algorithm works well on four well-known ECG datasets.

Furthermore, we applied the LSTMAD framework on two real datasets, which were generated from a shield tunnel construction project. For SLD1, the log file recorded that there was a fault (“soil pressure continues to decrease” that occurred in the region from 11,940 to 12,160. The reference discord also can be obviously identified in Figure 8(a).

The prediction of LSTMAD shows that our method is capable of capturing the abnormal subsequence (Figure 8(b)). However, the predicted discord is located at the region [11,255, 12,219], where there exists a little bias.

Finally, we tested the performance of LSTMAD on the time series SLD2. It seems that there are two peaks in the reference sequence (Figure 9(a)); however, only one fault was reported in the log file. The reference anomaly, from 173,982 to 174,002, was shown in Figure 9(a) with red color. Our algorithm successfully identified the anomaly in the range [173,982, 174,002] (Figure 9(b)). In summary, the developed LSTMAD approach not only works well on some public time series, but also works on real-world sequence.

5.3. Comparison with Other Algorithms

To further prove the effectiveness of the proposed algorithm, we tested all the above datasets on three classic anomaly detection methods: Hot SAX [30], Robust Random Cut Forest (RRCF) [58], and Telemanom [59]. Table 2 shows that LSTMAD outperformed the three other methods for anomaly detection in univariate time series. The values of MCC on six datasets show that LSTMAD can capture the abnormal subsequences in most of the time series. However, the performance of RRCF and Telemanom is obviously lower than that of others. Moreover, the measurements of on six datasets indicate that the predicted anomalies obtained from LSTMAD match the references very well. In summary, the accuracy of our approach is significantly superior to existing methods.

6. Discussion and Conclusion

In this study, we proposed a novel LSTM-based approach (LSTMAD) for anomaly detection in time series data. LSTMAD was developed by combing LSTM network with a statistical strategy. There is no need to depend on prior knowledge; our method is capable of learning the context of sequence data from the normal signal and then identifying the abnormal regions based on the prediction error for observed data. To verify the performance, we applied LSTMAD on several time series datasets, including well-known public data and real-world data. The simulation results revealed that LSTMAD can identify the discords from a whole sequence with high accuracy. Moreover, LSTMAD outperformed the other golden standard approaches on all the testing datasets.

In previous studies, LSTM was widely used for time series classification or forecasting [2,3,17,44]. However, it was rarely reported for discord search in time series. We are the first to build a predictive model from nonanomalous training data and then perform anomaly detection based on the prediction error for observed data. Moreover, the experimental evaluations also indicate that both the performance and generalization of LSTMAD are strong.

Our method is suitable for real-time anomaly prediction, especially when the underlying physical process is less fully understood and characterized. It does not rely on prior knowledge and is not sensitive to the length of sliding window; it thus will be a scalable algorithm for future application.

Limitations exist in the proposed LSTMAD approach. First, the current version is mainly developed for univariate time series so that it cannot directly address multivariate sequences. Second, bias also exists in the selection of public data because anomalies in periodic sequences are often more easily detected [30]. Third, enough evidence is still lacking to mathematically prove that the structure of current LSTM network is optimal. To refine the LSTMAD approach, there are four aspects that need to be considered in the future work: (1) new component will be included into the current framework to transform the multivariate sequence to univariate; (2) the rationality of the LSTM network needs further argumentation; (3) we will further design a reasonable strategy for parameter search in the future to improve the performance of our model; (4) various golden-standard time sequences need to be tested.

Data Availability

All the data used in this study are available at GitHub: https://github.com/Lostinparadise1981/LSTMAD.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this manuscript.

Authors’ Contributions

Z. J. conceived and designed the algorithms. J. G. performed the simulations and processed and analyzed the data. Z. J. and J. G. wrote the paper and provided ideas to improve the computational approach. J. F. advised on the description of some analyses.

Acknowledgments

This work was supported by the National Science Foundation of Zhejiang Province (no. LY20F020003). This work was partially supported by the Startup Award of New Professor at Nanjing Agricultural University (no. 106/804001).