DS Concepts DS Languages

Exploring Log Returns Distributions: FA8

 

Exploring Log Returns Distributions

Hi All! In our previous tutorial, we learnt how to consider inflation rate in the return series of a stock and obtaining the adjusted return series. In this tutorial, we will start exploring stylized facts of asset returns from technical and analytical point of view and exploring log returns distributions using Python. If you’re new to this series, please go to part 1 of Financial Analytics to learn the basics.

Exploring Log Returns Distributions – What are Stylized facts?

Stylized facts are very important to account for when we’re building financial models. They are statistical effects which are found in all asset return series.

There are 5 stylized facts:

  • Distribution of returns – Is it non-Gaussian?
  • Are Volatility clusters formed in returns chart?
  • Is autocorrelation absent in returns
  • Decreasing autocorrelation trend in squared/absolute returns
  • Leverage effect

Importing the MSFT stocks and obtaining log returns

In [1]:
# Importing libraries
import pandas as pd
import yfinance as yf
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import scipy.stats as scs
import statsmodels.api as sm
import statsmodels.tsa.api as smt
In [2]:
# Downloading MSFT data from yfinance from 1st January 2010 to 31st March 2020
msftStockData = yf.download( 'MSFT',
                        start = '2010-01-01',
                        end = '2020-03-31',
                        progress = False)
In [3]:
# Checking what's in there the dataframe by loading first 5 rows
msftStockData.head()
Out[3]:
Open High Low Close Adj Close Volume
Date
2009-12-31 30.980000 30.990000 30.480000 30.480000 23.925440 31929700
2010-01-04 30.620001 31.100000 30.590000 30.950001 24.294369 38409100
2010-01-05 30.850000 31.100000 30.639999 30.959999 24.302216 49749600
2010-01-06 30.879999 31.080000 30.520000 30.770000 24.153070 58182400
2010-01-07 30.629999 30.700001 30.190001 30.450001 23.901886 50559700
In [4]:
# Checking what's in there the dataframe by loading last 5 rows
msftStockData.tail()
Out[4]:
Open High Low Close Adj Close Volume
Date
2020-03-24 143.750000 149.600006 141.270004 148.339996 148.339996 82516700
2020-03-25 148.910004 154.330002 144.440002 146.919998 146.919998 75638200
2020-03-26 148.399994 156.660004 148.369995 156.110001 156.110001 64568100
2020-03-27 151.750000 154.889999 149.199997 149.699997 149.699997 57042300
2020-03-30 152.440002 160.600006 150.009995 160.229996 160.229996 63420300
In [5]:
# Calculating log returns and obtaining column to contain it
msftStockData['Log Returns'] = np.log(msftStockData['Adj Close']/msftStockData['Adj Close'].shift(1)) 
In [6]:
# Checking what's in there the dataframe by loading first 5 rows
msftStockData.head()
Out[6]:
Open High Low Close Adj Close Volume Log Returns
Date
2009-12-31 30.980000 30.990000 30.480000 30.480000 23.925440 31929700 NaN
2010-01-04 30.620001 31.100000 30.590000 30.950001 24.294369 38409100 0.015302
2010-01-05 30.850000 31.100000 30.639999 30.959999 24.302216 49749600 0.000323
2010-01-06 30.879999 31.080000 30.520000 30.770000 24.153070 58182400 -0.006156
2010-01-07 30.629999 30.700001 30.190001 30.450001 23.901886 50559700 -0.010454
In [7]:
# Using back fill method to replace NaN values
msftStockData['Log Returns'] = msftStockData['Log Returns'].fillna(method = 'bfill')
msftStockData.head()
Out[7]:
Open High Low Close Adj Close Volume Log Returns
Date
2009-12-31 30.980000 30.990000 30.480000 30.480000 23.925440 31929700 0.015302
2010-01-04 30.620001 31.100000 30.590000 30.950001 24.294369 38409100 0.015302
2010-01-05 30.850000 31.100000 30.639999 30.959999 24.302216 49749600 0.000323
2010-01-06 30.879999 31.080000 30.520000 30.770000 24.153070 58182400 -0.006156
2010-01-07 30.629999 30.700001 30.190001 30.450001 23.901886 50559700 -0.010454

Stylized Fact 1: Distribution of returns – Is it non-Gaussian?

Calculating mu, sigma and pdf series

Let’s obtain the histogram and Q-Q plot of log returns to see if this fact exists for MSFT log returns or not.

In [8]:
# Obtaining range of the plot
plot_range = np.linspace(min(msftStockData['Log Returns']), max(msftStockData['Log Returns']), num=5000 )

# Obtaining the mean
mu = msftStockData['Log Returns'].mean()

# Obtaining the standard deviation
sigma = msftStockData['Log Returns'].std()

# Obtaining the probability distribution function of the log returns series
pdf_series = scs.norm.pdf(plot_range, loc=mu, scale=sigma)

# Printing mu and sigma
print((mu, sigma))
(0.0007435881298640539, 0.015673460428275297)

Exploring Log Returns Distributions -Obtaining the histogram and Q-Q plot

In [9]:
# Obtaining 2 subplots
fig, ax = plt.subplots(1, 2, figsize=(16, 8))

# Subplot 1
# Obtaining histogram
# Calling distplot of seaborn to obtain distribution plot
# kde : Whether to plot a gaussian kernel density estimate. Setting it to False to disable that
# norm_hist : produces density curve instead of count
sns.distplot(msftStockData['Log Returns'].values, kde=False, norm_hist=True, ax=ax[0])

# Setting name and fontsize of title, range, curve color and label of the plot. Also, setting labels to upper left of first plot
ax[0].set_title('Distribution of MSFT returns', fontsize=16)
ax[0].plot(plot_range, pdf_series, 'b', lw=2,
 label=f'N({mu:.2f}, {sigma**2:.4f})')
ax[0].legend(loc='upper left');

# Subplot 2
# Obtaining Q-Q plot using qqplot function of statsmodels.api library
qq = sm.qqplot(msftStockData['Log Returns'].values, line='s', ax=ax[1])
#setting title and fontsize of the second plot
ax[1].set_title('Q-Q plot', fontsize = 16)
Out[9]:
Text(0.5, 1.0, 'Q-Q plot')
download

Calculating skewness and kurtosis of the log returns series

In [10]:
s = msftStockData['Log Returns'].skew()
print("Skewness of the log return series is", round(s, 2))
Skewness of the log return series is -0.27
In [11]:
k = msftStockData['Log Returns'].kurtosis()
print("Kurtosis of the log return series is", round(k, 2))
Kurtosis of the log return series is 11.88

Performing Jarque-Bera test

NULL Hypothesis, $H_0$ : Log series distribution of MSFT stock is normal at 99% confidence level
Alternate Hypothesis, $H_1$ : Log series distribution of MSFT stock is not normal at 99% confidence level

Jarque-Bera test is a statistical method of checking whether a distribution has skewness and kurtosis values matching that of a normal distribution. The result of the test is also a non-negative value. The more far the value from zero, the greater it deviates from normal distribution

Now, let’s run Jarque-Bera test by calling scs.jarque_bera function on Log Returns series.

In [12]:
value = round(scs.jarque_bera(msftStockData['Log Returns'])[0], 2)
p_value = round(scs.jarque_bera(msftStockData['Log Returns'])[1], 3)
print("The Jarque-Bera test statistic value is", value, "with probability of", p_value)
The Jarque-Bera test statistic value is 15116.82 with probability of 0.0

Thus, we reject the null hypothesis that distribution is normal at 99% confidence level.

Inference – histogram plot

  • In order to obtain this plot, we had set the number of points to 5000. Rule of thumb – the more the number of points, the smoother the curve.
  • By default, in the sns.distplot, the default value of mu is zero, variance is one and standard deviation is +/- one. That is why, in calling the function, we specified the value of log as mean of log returns series and scale to standard deviation of log returns series.
  • From the histogram, we can see that there are more points above the peak of the curve and at the tails also. Though, the curve estimates to a normal distribution, but it is certainly not a normal distribution as it diverges from that behavior at the peak and tails.
  • Negative skewness signifies that left tail of the distribution is longer and the concentration of frequency is more at the right tail.
  • Kurtosis value greater than 0 signifies that the distribution is Leptokurtic and excess value signifies that the tails of distribution are fat and the peak is very high.

Inference – Q-Q plot

  • Q-Q plot is generally obtain to help us in understanding how the observed quantiles vary in comparison to the expected or theoretical quantiles.
  • In our case, the expected distribution is Gaussian distribution and the expected quantiles are attributed to Gaussian distribution only.
  • The observed distribution is the distribution of Log Returns series and the observed quantiles are attributed to log returns distribution.
  • The observed distribution becomes a Gaussian distribution is majority of the points lie on the red line and don’t deviate from it.
  • From our Q-Q plot, we can see that though at the center, all the points lie on the red line, at the left and right end, this is not the case.
  • The left-most tail has points which are more negative than or smaller than expected when we compare this with Gaussian distribution. Thus, the left most tail is heavier in comparison to that of Gaussian distribution.
  • The right-most tail has points which are more positive than expected ones from Gaussian distribution.

Inference – Jarque-Bera test

High positive value of 15116.82 with a probability value of 0% signifies that the log returns distribution is not normal at 99% confidence level.

Exploring Log Returns Distributions -Verdict

We can clearly state that the Log Returns series of MSFT stock does not follow Gaussian distribution. Also, negative skewness, high positive kurtosis value and high Jarque-Bera test statistic proves that.So, our Stylized fact 1 gets checked here.

So guys, in this tutorial we learnt about Stylized fact 1 and different ways to checking it. In the next tutorial, we will learn about Stylized fact 2: Are Volatility clusters formed in returns chart? Stay tuned! Also, don’t forget to subscribe to our YouTube channel.

 

 

One thought on “Exploring Log Returns Distributions: FA8

Leave a Reply

Back To Top

Discover more from Machine Learning For Analytics

Subscribe now to keep reading and get access to the full archive.

Continue reading