Exploring Log Returns Distributions

Hi All! In our previous tutorial, we learnt how to consider inflation rate in the return series of a stock and obtaining the adjusted return series. In this tutorial, we will start exploring stylized facts of asset returns from technical and analytical point of view and exploring log returns distributions using Python. If you’re new to this series, please go to part 1 of Financial Analytics to learn the basics.

Exploring Log Returns Distributions – What are Stylized facts?

Stylized facts are very important to account for when we’re building financial models. They are statistical effects which are found in all asset return series.

There are 5 stylized facts:

Distribution of returns – Is it non-Gaussian?
Are Volatility clusters formed in returns chart?
Is autocorrelation absent in returns
Decreasing autocorrelation trend in squared/absolute returns
Leverage effect

Importing the MSFT stocks and obtaining log returns

# Importing libraries
import pandas as pd
import yfinance as yf
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import scipy.stats as scs
import statsmodels.api as sm
import statsmodels.tsa.api as smt

# Downloading MSFT data from yfinance from 1st January 2010 to 31st March 2020
msftStockData = yf.download( 'MSFT',
                        start = '2010-01-01',
                        end = '2020-03-31',
                        progress = False)

# Checking what's in there the dataframe by loading first 5 rows
msftStockData.head()

# Checking what's in there the dataframe by loading last 5 rows
msftStockData.tail()

# Calculating log returns and obtaining column to contain it
msftStockData['Log Returns'] = np.log(msftStockData['Adj Close']/msftStockData['Adj Close'].shift(1))

# Checking what's in there the dataframe by loading first 5 rows
msftStockData.head()

# Using back fill method to replace NaN values
msftStockData['Log Returns'] = msftStockData['Log Returns'].fillna(method = 'bfill')
msftStockData.head()

Stylized Fact 1: Distribution of returns – Is it non-Gaussian?

Calculating mu, sigma and pdf series

Let’s obtain the histogram and Q-Q plot of log returns to see if this fact exists for MSFT log returns or not.

# Obtaining range of the plot
plot_range = np.linspace(min(msftStockData['Log Returns']), max(msftStockData['Log Returns']), num=5000 )

# Obtaining the mean
mu = msftStockData['Log Returns'].mean()

# Obtaining the standard deviation
sigma = msftStockData['Log Returns'].std()

# Obtaining the probability distribution function of the log returns series
pdf_series = scs.norm.pdf(plot_range, loc=mu, scale=sigma)

# Printing mu and sigma
print((mu, sigma))

(0.0007435881298640539, 0.015673460428275297)

Exploring Log Returns Distributions -Obtaining the histogram and Q-Q plot

# Obtaining 2 subplots
fig, ax = plt.subplots(1, 2, figsize=(16, 8))

# Subplot 1
# Obtaining histogram
# Calling distplot of seaborn to obtain distribution plot
# kde : Whether to plot a gaussian kernel density estimate. Setting it to False to disable that
# norm_hist : produces density curve instead of count
sns.distplot(msftStockData['Log Returns'].values, kde=False, norm_hist=True, ax=ax[0])

# Setting name and fontsize of title, range, curve color and label of the plot. Also, setting labels to upper left of first plot
ax[0].set_title('Distribution of MSFT returns', fontsize=16)
ax[0].plot(plot_range, pdf_series, 'b', lw=2,
 label=f'N({mu:.2f}, {sigma**2:.4f})')
ax[0].legend(loc='upper left');

# Subplot 2
# Obtaining Q-Q plot using qqplot function of statsmodels.api library
qq = sm.qqplot(msftStockData['Log Returns'].values, line='s', ax=ax[1])
#setting title and fontsize of the second plot
ax[1].set_title('Q-Q plot', fontsize = 16)

Text(0.5, 1.0, 'Q-Q plot')

Calculating skewness and kurtosis of the log returns series

s = msftStockData['Log Returns'].skew()
print("Skewness of the log return series is", round(s, 2))

Skewness of the log return series is -0.27

k = msftStockData['Log Returns'].kurtosis()
print("Kurtosis of the log return series is", round(k, 2))

Kurtosis of the log return series is 11.88

Performing Jarque-Bera test

NULL Hypothesis, $H_0$ : Log series distribution of MSFT stock is normal at 99% confidence level
Alternate Hypothesis, $H_1$ : Log series distribution of MSFT stock is not normal at 99% confidence level

Jarque-Bera test is a statistical method of checking whether a distribution has skewness and kurtosis values matching that of a normal distribution. The result of the test is also a non-negative value. The more far the value from zero, the greater it deviates from normal distribution

Now, let’s run Jarque-Bera test by calling scs.jarque_bera function on Log Returns series.

value = round(scs.jarque_bera(msftStockData['Log Returns'])[0], 2)
p_value = round(scs.jarque_bera(msftStockData['Log Returns'])[1], 3)
print("The Jarque-Bera test statistic value is", value, "with probability of", p_value)

The Jarque-Bera test statistic value is 15116.82 with probability of 0.0

Thus, we reject the null hypothesis that distribution is normal at 99% confidence level.

Inference – histogram plot

In order to obtain this plot, we had set the number of points to 5000. Rule of thumb – the more the number of points, the smoother the curve.
By default, in the sns.distplot, the default value of mu is zero, variance is one and standard deviation is +/- one. That is why, in calling the function, we specified the value of log as mean of log returns series and scale to standard deviation of log returns series.
From the histogram, we can see that there are more points above the peak of the curve and at the tails also. Though, the curve estimates to a normal distribution, but it is certainly not a normal distribution as it diverges from that behavior at the peak and tails.
Negative skewness signifies that left tail of the distribution is longer and the concentration of frequency is more at the right tail.
Kurtosis value greater than 0 signifies that the distribution is Leptokurtic and excess value signifies that the tails of distribution are fat and the peak is very high.

Inference – Q-Q plot

Q-Q plot is generally obtain to help us in understanding how the observed quantiles vary in comparison to the expected or theoretical quantiles.
In our case, the expected distribution is Gaussian distribution and the expected quantiles are attributed to Gaussian distribution only.
The observed distribution is the distribution of Log Returns series and the observed quantiles are attributed to log returns distribution.
The observed distribution becomes a Gaussian distribution is majority of the points lie on the red line and don’t deviate from it.
From our Q-Q plot, we can see that though at the center, all the points lie on the red line, at the left and right end, this is not the case.
The left-most tail has points which are more negative than or smaller than expected when we compare this with Gaussian distribution. Thus, the left most tail is heavier in comparison to that of Gaussian distribution.
The right-most tail has points which are more positive than expected ones from Gaussian distribution.

Inference – Jarque-Bera test

High positive value of 15116.82 with a probability value of 0% signifies that the log returns distribution is not normal at 99% confidence level.