## How To Identify Patterns in Time Series Data: Part I – Discrete Fourier Transform

In many real-world applications,  "Signals" are typically recorded and represented as time dependent functions. For example, we can record the daily trading price in the stock market, or measuring the hourly temperature in Los Angeles. In order to extract meaningful characteristics of the data, many different transforms have developed to decompose the data into simpler pieces.

The question is: How to design a good transform on the signal we want to analyze? It becomes a classical problem in signal representation- We want to define a transform which provides a "sparse" representation of the signal which capture most or all information of a signal. "Sparseness" is one of the reasons for the extensive use of popular transforms, because they discover the structure of the singal and provide a "compact" representation. In this post, we provide an example that how to analyze the web traffic by Discrete Fourier Transform (DFT).

Website traffic

In every hour, we record the total number of user actions in our website and the data is shown in figure 1. This signal seems to be periodic, but the question is: How could we explore the structure? DFT provides a nice tool to represent the discrete time signal into periodic Fourier series. The original signal $x(t_n)$ can be represented by

\label{eq:1}
x(t_n) = \frac{1}{\sqrt{N}} \sum_{k=0}^{N-1} X_k \cdot e^{i 2 \pi k n / N},

where the coefficients of Fourier series $X_k$ are defined as

\label{eq:2}
X_k = \frac{1}{\sqrt{N}} \sum_{n=0}^{N-1} x(t_n) \cdot e^{-i 2 \pi k n / N}, \quad k\in\mathbb{Z}\

After DFT, we can analyze the energy in each frequency component. The frequency component is given by $f_k = \frac{k}{N~T}$ , where $T$ is the sampling frequency. We plot the power spectrum of the data $|X(f_k)|$ in figure 2. The result shows the web traffic is mostly on low-frequency components.

Energy Spectrum

Let's look at different frequency components of the data: We compare different frequency components with the original data: the original signal is shown in figure 3(a), the components with frequency $|f| < 0.46$ is in figure 3(b), and components with frequency $0.46 \leq |f| < 1.40$ in figure 3(c). Note that the traffic in weekdays is much higher than weekend, and the first components successfully capture the traffic change between weekdays and weekend. Also, the traffic is much higher for working hours than the rest periods, which can be clearly identified for the daily change in the second components.

Signal Components

Based on sparse representation, we can select significant'' components for many different applications, i.e, signal de-noising and or pattern recognition. We will introduce the wavelet transform in the next blog.

1. Spot on with this write-up, I honestly believe that this website needs much more
attention. I'll probably be back again to read more,
thanks for the information!

my weblog :: acebook login (http://youtu.be/WZ3zr0kqVHk)

2. This blog is really good. But I still have doubts regarding performing DFT. I have used the formula as shown in eqn 2. But in some books, the (1/Sqrt(N)) is missing. Could you please give justifications. Could you please share on FFT similarly but with an example.
Example:
Lets take you have a column of measured responses at every 0.025 secs for 1800 secs.
No of responses = 1800/0.025= 72000
Sampling frequency = 1/.025=80 Hz
Could you please do this example using DFT and FFT.

I wrote a small code in python for this, as i'm not interested to use inbuilt functions in matlab or anywhere. I wish to ensure how DFT and FFT works and discuss more on this. Anyone interested and willing to share knowledge, kindly please drop an email for discussion. My email id: vijaykgnitk@gmail.com