A Comprehensive Guide to Time Series Analysis
https://www.analyticsvidhya.com/blog/2021/10/a-comprehensive-guide-to-time-series-analysis/
Overview of Time Series Analysis Process
Synopsis of Time Series Analysis
A Time-Series represents a series of time-based orders. It would be Years, Months, Weeks, Days, Hours, Minutes, and Seconds
A Time Series is an observation from the sequence of discrete-time of successive intervals.
A Time-series is a running chart.
The time variable/feature is the independent variable and supports the target variable to predict the results.
Time Series Analysis(TSA) is used in different fields for time-based predictions - like Weather Forecasting, Financial, Signal processing, Engineering domain - Control Systems, Communications Systems.
Since TSA involves producing the set of information in a particular sequence, it makes a distinct from spatial and other analyses.
Using AR, MA, ARMA, and ARIMA models, we could predict the future.
Introduction to Time Series Analysis
Time Series Analysis is the way of studying the characteristics of the response variable with respect to time as the independent variable. To estimate the target variable in the name of predicting or forecasting, use the time variable as the point of reference. In this article we will discuss in detail TSA objectives, assumptions, Components (stationary, and Non-stationary). Along with the TSA algorithm and specific use case in Python.
What is Time Series Analysis
How to analyze Time Series
Significance of Time Series and its types
Components of Time Series Analysis
What are the limitations of Time Series Analysis
Data Type of Time Series
Methods to check Stationarity
Convert Non-Stationary into Stationary
Moving Average Methodology
Time Series Analysis In Data Science and Machine Learning
What is Time Series Analysis
Definition: A time series is nothing but a sequence of various data points that occurred in a successive order for a given period of time.
Objective:
To understand how time series works, what factors are affecting a certain variable(s) at different points of time.
Time series analysis will provide the consequences and insights of features of the given dataset that changes over time.
Supporting to derive the predicting the future values of the time series variable.
Assumptions: There is one and the only assumption that is “stationary“, which means that the origin of time, does not affect the properties of the process under the statistical factor.
How to analyze Time Series
Quick steps here for your reference. Will see this in detail in this article later.
Collecting the data and cleaning it.
Preparing Visualization with respect to time vs key feature.
Observing the stationary of the series
Developing charts to understand its nature.
Model building - AR, MA, ARMA and ARIMA
Extracting insights from prediction.
Significance of Time Series and its types
TSA is the backbone for prediction and forecasting analysis, specific to the time-based problem statements.
Analyzing the historical dataset and its patterns
Understanding and matching the current situation with patterns derived from the previous stage.
Understanding the factor or factors influencing certain variable(s) in different periods.
With help of “Time Series” we can prepare numerous time-based analyses and results.
Forecasting
Segmentation
Classification
Descriptive analysis
Intervention analysis
Components of Time Series Analysis
Trend
Seasonality
Cyclical
Irregularity
Trend: In which there is no fixed interval and any divergence within the given dataset is a continuous timeline. The trend would be Negative or Positive or Null Trend
Seasonality: In which regular or fixed interval shifts within the dataset in a continuous timeline. Would be bell curve or saw tooth
Cyclical: In which there is no fixed interval, uncertainty in movement and its pattern
Irregularity: Unexpected situations/events/scenarios and spikes in a short time span.
What are the limitations of Time Series Analysis
Time series has the below-mentioned limitations, we have to take care of those during our analysis,
Similar to other models, the missing values are not supported by TSA
The data points must be linear in their relationship.
Data transformations are mandatory, so a little expensive.
Models mostly work on Uni-variate data.
Data Type of Time Series
Let’s discuss the time series’ data types and their influence. While discussing TS data-types, there are two major types.
Stationary
Non- Stationary
Stationary:
A dataset should follow the below thumb rules, without having Trend, Seasonality, Cyclical, and Irregularity component of time series
The MEAN value of them should be completely constant in the data during the analysis
The VARIANCE should be constant with respect to the time-frame
The COVARIANCE measures the relationship between two variables.
Non- Stationary:
This is just the opposite of Stationary.
Methods to check Stationarity
During the TSA model preparation workflow, we must access if the given dataset is Stationary or NOT. Using Statistical and Plots test.
Statistical Test:
There are two tests available to test if the dataset is Stationary or NOT.
Augmented Dickey-Fuller (ADF) Test
Kwiatkowski-Phillips-Schmidt-Shin (KPSS) Test
Augmented Dickey-Fuller (ADF)
Test or Unit Root Test: The ADF test is the most popular statistical test and with the following assumptions.
Null Hypothesis (H0): Series is non-stationary
Alternate Hypothesis (HA): Series is stationary
p-value >0.05 Fail to reject (H0)
p-value <= 0.05 Accept (H1)
Kwiatkowski–Phillips–Schmidt–Shin (KPSS):
these tests are used for testing a NULL Hypothesis (HO), that will perceive the time-series, as stationary around a deterministic trend against the alternative of a unit root. Since TSA looking for Stationary Data for its further analysis, we have to make sure that the dataset should be stationary.
Convert Non-Stationary into Stationary
Let’s discuss quickly how to convert Non- stationary into stationary for effective time series modeling. There are two major methods available for this conversion.
Detrending
Differencing
Transformation
Detrending:
It involves removing the trend effects from the given dataset and showing only the differences in values from the trend. it always allows the cyclical patterns to be identified.
Differencing:
This is a simple transformation of the series into a new time series, which we use to remove the series dependence on time and stabilize the mean of the time series, so trend and seasonality are reduced during this transformation.
Yt= Yt – Yt-1
Yt=Value with time
Detrending and Differencing extractions
Transformation:
This includes three different methods they are Power Transform, Square Root, and Log Transfer. most commonly used one is Log Transfer.
Moving Average Methodology
The commonly used time series method is Moving Average. This method is slick with random short-term variations. Relatively associated with the components of time series.
The Moving Average (MA) (Or) Rolling Mean: In which MA has calculated by taking averaging data of the time-series, within k periods.
Let’s see the types of moving averages:
Simple Moving Average (SMA),
Cumulative Moving Average (CMA)
Exponential Moving Average (EMA)
Simple Moving Average (SMA)
The SMA is the unweighted mean of the n previous times. The selection of sliding window data points depending on the amount of smoothing is preferred since increasing the value of previous times, improves the smoothing at expense of accuracy.
Objective: SMA is used to reduce the noise of time series.
Cumulative Moving Average (CMA)
The CMA is the unweighted mean of past values, till the current time.
Exponential Moving Average (EMA)
EMA is mainly used to identify trends and to filter out noise. The weight of elements is decreased gradually over time. This means It gives weight to recent data points, not historical ones. Compared with SMA, the EMA is faster to change and more sensitive.
α –>Smoothing Factor.
It has a value between 0,1.
Represents the weighting applied to the very recent period.
Lets will apply the exponential moving averages with a smoothing factor of 0.1 and 0.3 in the given dataset.
Time Series Analysis In Data Science and Machine Learning
When dealing with TSA in Data Science and Machine Learning, there are multiple model options are available. In which the Autoregressive–Moving-Average (ARMA) models with [p, d, and q].
P==> autoregressive lags
q== moving average lags
d==> difference in the order
Before we get to know about Arima, first you should understand the below terms better.
Auto-Correlation Function (ACF)
Partial Auto-Correlation Function (PACF)
Auto-Correlation Function (ACF)
ACF is used to indicate and how similar a value is within a given time series and the previous value. (OR) It measures the degree of the similarity between a given time series and the lagged version of that time series at different intervals that we observed.
Python Statsmodels library calculates autocorrelation. This is used to identify a set of trends in the given dataset and the influence of former observed values on the currently observed values.
Partial Auto-Correlation (PACF)
PACF is similar to Auto-Correlation Function and is a little challenging to understand. It always shows the correlation of the sequence with itself with some number of time units per sequence order in which only the direct effect has been shown, and all other intermediary effects are removed from the given time series.
Observation: The previous temperature influences the current temperature, but the significance of that influence decreases and slightly increases from the above visualization along with the temperature with regular time intervals.
Types of Auto-correlation
Interpret ACF and PACF plot
Remember that both ACF and PACF require stationary time series for analysis.
Auto-Regressive model
This is a simple model, that predicts future performance based on past performance. mainly used for forecasting, when there is some correlation between values in a given time series and the values that precede and succeed (back and forth).
An AR model is a Linear Regression model, that uses lagged variables as input. The Linear Regression model can be easily built using the scikit-learn library by indicating the input to use. Statsmodels library is used to provide autoregression model-specific functions where you have to specify an appropriate lag value and train the model. It is provided in the AutoTeg class to get the results, using simple steps
Creating the model AutoReg()
Call fit() to train it on our dataset.
Returns an AutoRegResults object.
Once fit, make a prediction by calling the predict () function
The equation for the AR model (Let’s compare Y=mX+c)
Yt =C+b1 Yt-1+ b2 Yt-2+……+ bp Yt-p+ Ert
Key Parameters
p=past values
Yt=Function of different past values
Ert=errors in time
C=intercept
Implementation of Auto-Regressive model
Moving Average (WEIGHTS – SIMPLE MOVING AVERAGE)
Implementation of Moving Average (WEIGHTS – SIMPLE MOVING AVERAGE)
Understanding ARMA and ARIMA
ARMA This is a combination of the Auto-Regressive and Moving Average model for forecasting. This model provides a weakly stationary stochastic process in terms of two polynomials, one for the Auto-Regressive and the second for the Moving Average.
ARMA is best for predicting stationary series. So ARIMA came in since it supports stationary as well as non-stationary.
AR+I+MA= ARIMA
AR ==> Uses the past values to predict the future
MA ==> Uses the past error terms in the given series to predict the future
I==> uses the differencing of observation and makes the stationary data
Understand the Signature of ARIMA
p==> log order => No of lag observations.
d==> degree of differencing => No of times that the raw observations are differenced.
q==>order of moving average => the size of the moving average window
Implementation steps for ARIMA
Step 1: Plot a time series format
Step 2: Difference to make stationary on mean by removing the trend
Step 3: Make stationary by applying log transform.
Step 4: Difference log transform to make as stationary on both statistic mean and variance
Step 5: Plot ACF & PACF, and identify the potential AR and MA model
Step 6: Discovery of best fit ARIMA model
Step 7: Forecast/Predict the value, using the best fit ARIMA model
Step 8: Plot ACF & PACF for residuals of the ARIMA model, and ensure no more information is left.
xxxx