# Markowitz Theory using Python: FA5 # Markowitz Theory using Python

Hi ML Enthusiasts! In the previous tutorial, we learnt about Markowitz Portfolio theory. In this part 5 of our Financial Analytics series, we will learn building efficient frontier using Markowitz Theory using Python  and we will also learn how we can implement this in Python. In case you’re new to this series, we suggest you go through this series from its starting point, i.e., part 1 of Financial Analytics series.

### Markowitz Theory using Python – Real-life scenario

We will look at what shares and stocks profile look like in real-life. In real-life, the shares having higher expected returns have higher risk involved with them, i.e., higher standard deviation. While shares having lower expected returns have lower risk involved.

For this post, we will be used the following values corresponding to shares X and Y.

Parameter Share X Share Y
Expected return 9% 11%
Standard deviation 8% 10%
Correlation 20%

After using the same methodologies and formulae that we used in our previous post, we get following table with only difference being that we use the weight steps of 10%:

Portfolio No Weight A Weight B Expected Return Standard Deviation
Portfolio 1 100% 0% 9.00% 8.0%
Portfolio 2 90% 10% 9.20% 7.5%
Portfolio 3 80% 20% 9.40% 7.1%
Portfolio 4 70% 30% 9.60% 6.9%
Portfolio 5 60% 40% 9.80% 6.8%
Portfolio 6 50% 50% 10.00% 7.0%
Portfolio 7 40% 60% 10.20% 7.3%
Portfolio 8 30% 70% 10.40% 7.8%
Portfolio 9 20% 80% 10.60% 8.5%
Portfolio 10 10% 90% 10.80% 9.2%
Portfolio 11 0% 100% 11.00% 10.0%

Let’s start analyzing this by Python code.

Let’s first import all the libraries we will be needing for this analysis.

In :
```import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
```

Next, we will convert the values in Expected Return column into a list and then will pass this list to a variable, ExpectedReturn. The same will be done for StandardDeviation.

In :
```ExpectedReturn = [9.00,9.20,9.40,9.60,9.80,10.00,10.20,10.40,10.60,10.80,11.00]
StandardDeviation = [8.0,7.5,7.1,6.9,6.8,7.0,7.3,7.8,8.5,9.2,10.0]
```

We will now use matplotlib library of python to come up with scatter plot. We want standard deviation on x-axis and expected return on y-axis.

In :
```plt.scatter(StandardDeviation, ExpectedReturn)
plt.xlabel("Standard deviation (in %)")
plt.ylabel("Expected return (in %)")
plt.title("Markowitz Portfolio Analysis")
plt.show()
``` ### Markowitz Theory using Python – Efficient and inefficient frontiers

From above chart, we can see that for same values of standard deviation, we are getting both higher and lower values of returns – these points correspond to standard deviation of 7, 7.5 and 8, for which the curve is forming parabola – for these values of standard deviation and lower returns, the portfolios that we get are termed as Inefficient Frontier and the need is to avoid the values/points below expected return of 9.75%. All values above this value form part of efficient frontier.

Let’s now import pandas_datareader library and start implementing everything we have learnt so far.

In :
```from pandas_datareader import data as dr
```

We will be choosing The Procter and Gamble company and The Microsoft Corporation historical data in our analysis.

In :
```stocks = ['PG', 'MSFT']
```

Fetching last 10 year data – Adj Close figures for each of these companies

In :
```stock_data = pd.DataFrame()
for s in stocks:
```
In :
```stock_data.head()
```
Out:
PG MSFT
Date
2009-12-31 43.997097 23.925440
2010-01-04 44.352673 24.294369
2010-01-05 44.367176 24.302216
2010-01-06 44.156742 24.153070
2010-01-07 43.917282 23.901886
In :
```stock_data.tail()
```
Out:
PG MSFT
Date
2020-04-03 115.080002 153.830002
2020-04-06 117.809998 165.270004
2020-04-07 112.769997 163.490005
2020-04-08 115.099998 165.130005
2020-04-09 114.660004 165.139999

Let’s now normalize this data and see their line charts to see their trends and compare them with each-other.

In :
```(stock_data/stock_data.iloc * 100).plot(figsize = (10,8))
```
Out: From the above chart, we can see that Microsoft had exponential growth. PG, on the other hand, took approximately had linear growth with low slope. Till 2015, PG had upper hand, after 2015, Microsoft, owing to its exponential rate, did wonders. In 2020, both saw downward trend (COVID-19 impact?).

Let’s obtain the logarithmic returns figures for both of them.

In :

```logReturns = np.log(stock_data/stock_data.shift(1))
logReturns
```
Out:
PG MSFT
Date
2009-12-31 NaN NaN
2010-01-04 0.008049 0.015302
2010-01-05 0.000327 0.000323
2010-01-06 -0.004754 -0.006156
2010-01-07 -0.005438 -0.010454
2020-04-03 0.005926 -0.009253
2020-04-06 0.023446 0.071732
2020-04-07 -0.043723 -0.010829
2020-04-08 0.020451 0.009981
2020-04-09 -0.003830 0.000061

2586 rows × 2 columns

In :
```#To obtain annual average returns!
logReturns.mean() * 250
```
Out:
```PG      0.092635
MSFT    0.186833
dtype: float64```
In :
```#To obtain annual covariance between PG and Microsoft
logReturns.cov() * 250
```
Out:
PG MSFT
PG 0.029078 0.019192
MSFT 0.019192 0.061942

From above, we can see that annual average returns of Microsoft, after looking at the past 10 years data, comes out to be 18.7% while that of PG comes out to be 9.26%, just half of that of Microsoft.

The covariance of PG and MSFT is 0.0192. The autocovariance of MSFT is 0.062 and that of PG is 0.029.

Let’s now compute the correlation matrix for both of them.

In :
```stock_data.corr()
```
Out:
PG MSFT
PG 1.000000 0.925072
MSFT 0.925072 1.000000

From above, we can see that there is fair amount of relationship of 92.5% (>30%) between PG and MSFT.

Let’s now start creating efficient frontier.

In :
```# Dynamically generating weights code
numberOfStocks = len(stocks)
numberOfStocks
```
Out:
`2`
In :
```# Creating random weights
# Function random of numpy will generate two floats
weights = np.random.random(numberOfStocks)
weights
```
Out:
`array([0.19018562, 0.93358835])`
In :
```weights.sum()
```
Out:
`1.1237739704342409`
In :
```weights = weights/np.sum(weights)
weights
```
Out:
`array([0.16923832, 0.83076168])`
In :
```weights.sum()
```
Out:
`1.0`

We see that weights array has sum equal to 1 or 100%. The weight of first stock, i.e., PG will be set as 16.92% and that of MSFT will be set as 93.35%.

### Calculating expected return of portfolio

In :
```(weights * logReturns.mean()).sum() * 250
```
Out:
`0.17089096007095564`

Thus, the expected return of the portfolio with these weights comes out to be 17.09&

### Expected standard deviance or volatility

In :
```np.sqrt(np.dot(weights.T, np.dot(logReturns.cov() * 250, weights)))
```
Out:
`0.22131358215073907`

We can see that the standard variance or volatility of the portfolio comes out to be 22.13% which is very high.

### Simulation with same stocks but different weights

We are doing this simulation to find out the most optimum sets of weights for which standard deviation or volatility comes out to be minimum and expected return comes out to be maximum. Let’s do this for 100 different sets of weights first.

In :
```#Creating blank lists
expectedReturn = []
standardDeviation = []
weightList0 = []
weightList1 = []

# Running simulations for finding optimum weights
for i in range(100):
weights = np.random.random(numberOfStocks)
weights = weights/ weights.sum()
weightList0.append(weights)
weightList1.append(weights)
expectedReturn.append((weights * logReturns.mean()).sum() * 250)
standardDeviation.append(np.sqrt(np.dot(weights.T, np.dot(logReturns.cov() * 250, weights))))

#Converting lists into arrays
weightList0 = np.array(weightList0) #Weights for PG
weightList1 = np.array(weightList1) #Weights for MSFT
expectedReturn = np.array(expectedReturn)
standardDeviation = np.array(standardDeviation)

#Creating dataframe
df = pd.DataFrame({"Weight of PG": weightList0, "Weight of MSFT": weightList1, "Expected Return": expectedReturn, "Standard deviation": standardDeviation})
```
Out:
Weight of PG Weight of MSFT Expected Return Standard deviation
0 0.278827 0.721173 0.160568 0.205413
1 0.525801 0.474199 0.137304 0.177590
2 0.291572 0.708428 0.159367 0.203684
3 0.517516 0.482484 0.138084 0.178302
4 0.369174 0.630826 0.152058 0.193782

Let’s now plot this on a scatter chart

In :
```plt.figure(figsize=(14, 10), dpi=80)
plt.scatter(df["Standard deviation"], df["Expected Return"])
plt.xlabel("Standard deviation")
plt.ylabel("Expected return (in %)")
plt.show()
``` From above chart, we see that values above the expected return of 0.11 or 11% correspond to the efficient frontier and those below 11% correspond to inefficient frontier.

The above chart also states the same thing – if you want greater return, you will have to take greater risk! If you’re risk averse person, then take the values of weights corresponding to expected return of 11%. Let’s see what are the values corresponding to them.

In :
```df[(df["Expected Return"]>0.11) & (df["Expected Return"]< 0.12)].sort_values(by=['Expected Return'])
```
Out:
Weight of PG Weight of MSFT Expected Return Standard deviation
57 0.775649 0.224351 0.113769 0.165201
48 0.773329 0.226671 0.113987 0.165229
64 0.756733 0.243267 0.115550 0.165478
88 0.740753 0.259247 0.117056 0.165800
12 0.737312 0.262688 0.117380 0.165880
8 0.721501 0.278499 0.118869 0.166295
51 0.713798 0.286202 0.119595 0.166525
In :
```df[(df["Expected Return"]>0.11)].sort_values(by=['Expected Return']).head(10)
```
Out:
Weight of PG Weight of MSFT Expected Return Standard deviation
57 0.775649 0.224351 0.113769 0.165201
48 0.773329 0.226671 0.113987 0.165229
64 0.756733 0.243267 0.115550 0.165478
88 0.740753 0.259247 0.117056 0.165800
12 0.737312 0.262688 0.117380 0.165880
8 0.721501 0.278499 0.118869 0.166295
51 0.713798 0.286202 0.119595 0.166525
87 0.700007 0.299993 0.120894 0.166983
96 0.698075 0.301925 0.121076 0.167052
56 0.691362 0.308638 0.121708 0.167301

Thus, we can see that in case of efficient frontiers, for expected return of 11.37%, the standard deviation is 16.52% which corresponds to weight of 77.56% of PG and 22.43% of MSFT shares.

### Finding most optimum portfolio

In :
```df["Expected Return"].mean()
```
Out:
`0.1387349843777325`
In :
```df["Expected Return"].sort_values().median()
```
Out:
`0.14025710287549686`
In :
```df[(df["Expected Return"]>0.135)].sort_values(by=['Expected Return'])
```
Out:
Weight of PG Weight of MSFT Expected Return Standard deviation
25 0.543636 0.456364 0.135624 0.176117
5 0.542631 0.457369 0.135718 0.176198
35 0.538627 0.461373 0.136095 0.176522
1 0.525801 0.474199 0.137304 0.177590
75 0.525679 0.474321 0.137315 0.177600
47 0.523680 0.476320 0.137503 0.177770
22 0.520882 0.479118 0.137767 0.178010
3 0.517516 0.482484 0.138084 0.178302
63 0.515741 0.484259 0.138251 0.178456
49 0.495760 0.504240 0.140133 0.180253
82 0.493133 0.506867 0.140381 0.180497
81 0.483274 0.516726 0.141310 0.181426
80 0.464489 0.535511 0.143079 0.183260
98 0.463382 0.536618 0.143183 0.183371
16 0.461050 0.538950 0.143403 0.183605
60 0.451849 0.548151 0.144270 0.184541
58 0.450599 0.549401 0.144387 0.184670
70 0.447576 0.552424 0.144672 0.184982
79 0.445431 0.554569 0.144874 0.185205
69 0.445005 0.554995 0.144914 0.185250
90 0.433947 0.566053 0.145956 0.186417
26 0.430959 0.569041 0.146238 0.186737
20 0.430628 0.569372 0.146269 0.186773
85 0.429980 0.570020 0.146330 0.186843
59 0.427723 0.572277 0.146542 0.187086
27 0.427393 0.572607 0.146573 0.187122
15 0.412033 0.587967 0.148020 0.188810
71 0.401164 0.598836 0.149044 0.190035
62 0.395195 0.604805 0.149606 0.190718
28 0.376870 0.623130 0.151333 0.192861
66 0.375135 0.624865 0.151496 0.193068
41 0.373676 0.626324 0.151633 0.193242
4 0.369174 0.630826 0.152058 0.193782
38 0.363711 0.636289 0.152572 0.194442
86 0.363361 0.636639 0.152605 0.194484
74 0.362230 0.637770 0.152712 0.194622
33 0.359815 0.640185 0.152939 0.194916
46 0.357472 0.642528 0.153160 0.195203
40 0.348384 0.651616 0.154016 0.196325
68 0.345886 0.654114 0.154251 0.196637
31 0.339572 0.660428 0.154846 0.197428
34 0.339189 0.660811 0.154882 0.197477
17 0.322243 0.677757 0.156478 0.199640
2 0.291572 0.708428 0.159367 0.203684
0 0.278827 0.721173 0.160568 0.205413
6 0.270825 0.729175 0.161322 0.206512
95 0.256047 0.743953 0.162714 0.208568
7 0.249317 0.750683 0.163348 0.209517
92 0.240060 0.759940 0.164220 0.210832
89 0.228268 0.771732 0.165330 0.212527
44 0.184636 0.815364 0.169441 0.218975
32 0.175740 0.824260 0.170279 0.220322
42 0.135386 0.864614 0.174080 0.226564
55 0.126267 0.873733 0.174939 0.228003
76 0.125248 0.874752 0.175035 0.228165
61 0.116925 0.883075 0.175819 0.229488
37 0.077115 0.922885 0.179569 0.235927
97 0.074217 0.925783 0.179842 0.236403
84 0.053124 0.946876 0.181829 0.239893
18 0.048360 0.951640 0.182277 0.240687
In :
```df.loc
```
Out:
```Weight of PG          0.515741
Weight of MSFT        0.484259
Expected Return       0.138251
Standard deviation    0.178456
Name: 63, dtype: float64```
In :
```df.loc
```
Out:
```Weight of PG          0.495760
Weight of MSFT        0.504240
Expected Return       0.140133
Standard deviation    0.180253
Name: 49, dtype: float64```

Going by both mean and median of efficient frontiers, we can see that Weight of PG varies from 51.57% to 49.57% and that of MSFT varies from 48.42% to 50.42%. This variation leads to expected return from 13.82% to 14.01% with volatility from 17.84% to 18.02%.

The point worth to be mentioned over here is that the volatility of MSFT is higher than that of PG. PG is more stable than MSFT. So, as we go from reducing PG weights and increasing MSFT weights, we are not only increasing returns, we are also increasing volatility of portfolio.

So guys, with this we conclude this tutorial on Financial Analytics. In the next part, we will be diving deeper into the world of Financial Analytics. Stay tuned!