Site icon Machine Learning For Analytics

Different Financial Data Sources: FA6

Different Financial Data Sources: Fetching data

Hi All! Till now, we have mainly worked on fetching financial data from yahoo finance using pandas_datareader library. In this post, we will learn about other well-known different financial data sources from where we can get financial data and how to use them to fetch data in Python. In case you are new to this series, please go to part 1 of Financial Analytics series to learn the basics.

Different Financial Data Sources – The sources

The sources we will identify in this post are:

  • Yahoo finance
  • Quandl
  • Intrinio

Different Financial Data Sources – Yahoo finance

Besides pandas_datareader library of Python, we can use yfinance and yahoofinancials library of Python to fetch the historical stocks data. Let’s learn how to use yfinance to fetch the historical stocks data of Microsoft Corporation.

1. Download and install the libraries

For doing this, go to Anaconda prompt and type pip install package_name
For installing yfinance, type pip install yfinance in Anaconda prompt.

2. Import the libraries

In [1]:
import pandas as pd
import yfinance as yf

3. Download the data

Here, we are going to use download function of yfinance library to fetch the data.

In [4]:
msftStockData = yf.download( 'MSFT',
                        start = '2010-01-01',
                        end = '2020-04-14',
                        progress = False)

4. Viewing and summarizing the data

In [5]:
msftStockData.head()
Out[5]:
Open High Low Close Adj Close Volume
Date
2009-12-31 30.980000 30.990000 30.480000 30.480000 23.925440 31929700
2010-01-04 30.620001 31.100000 30.590000 30.950001 24.294369 38409100
2010-01-05 30.850000 31.100000 30.639999 30.959999 24.302216 49749600
2010-01-06 30.879999 31.080000 30.520000 30.770000 24.153070 58182400
2010-01-07 30.629999 30.700001 30.190001 30.450001 23.901886 50559700
In [6]:
msftStockData.tail()
Out[6]:
Open High Low Close Adj Close Volume
Date
2020-04-06 160.320007 166.500000 157.580002 165.270004 165.270004 67111700
2020-04-07 169.589996 170.000000 163.259995 163.490005 163.490005 62769000
2020-04-08 165.669998 166.669998 163.500000 165.130005 165.130005 48318200
2020-04-09 166.360001 167.369995 163.330002 165.139999 165.139999 51431800
2020-04-13 164.350006 165.570007 162.300003 165.509995 165.509995 41814200
In [7]:
msftStockData.describe()
Out[7]:
Open High Low Close Adj Close Volume
count 2587.000000 2587.000000 2587.000000 2587.000000 2587.000000 2.587000e+03
mean 59.209161 59.739065 58.651179 59.226374 55.346432 4.018791e+07
std 37.752028 38.141107 37.321394 37.757021 39.124660 2.304514e+07
min 23.090000 23.320000 22.730000 23.010000 18.228165 7.425600e+06
25% 29.685000 29.900000 29.440000 29.714999 24.443306 2.434560e+07
50% 45.450001 45.950001 45.110001 45.540001 40.846195 3.440140e+07
75% 74.244999 74.660000 73.824997 74.299999 71.416775 5.026130e+07
max 190.649994 190.699997 186.470001 188.699997 188.185989 3.193179e+08

More points to know

  • In the download function, we have to define the ticker (‘MSFT’ for Microsoft,’PG’ for The Procter and Gamble Company, ‘AAPL’ for Apple etc.) as the first argument.
  • If we don’t define the start parameter (starting date), it tries to fetch all the data since 1950.
  • List of multiple tickers can be passed as the first argument.
  • If you want to download only the Adjusted price (Adj Close), use auto_adjust = True.
  • progress parameter is for providing us with the progress bar.
  • end denotes the end date.

Quandl

Quandl is another provider of historical financial data. In order to use Quandl via Python, first of all, we need to go to https://www.quandl.com/ and create our profile there. After that, we need to get the api key from our personal profile at https://www.quandl.com/account/profile in order to fetch data from Python.

Drawback of using Quandl – It contains all historical data till April 2018. You will not be able to find any data after April 2018.

1. Importing the libraries

In [9]:
import quandl

2. Key Authentication

In [11]:
quandl.ApiConfig.api_key = key

3. Download the data

In [13]:
msftQuandlData = quandl.get(dataset='WIKI/MSFT',
                            start_date='2010-01-01',
                            end_date='2020-04-14')

4. Viewing and summarizing the data

In [14]:
msftQuandlData.head()
Out[14]:
Open High Low Close Volume Ex-Dividend Split Ratio Adj. Open Adj. High Adj. Low Adj. Close Adj. Volume
Date
2010-01-04 30.62 31.10 30.59 30.950 38409100.0 0.0 1.0 24.885276 25.275379 24.860895 25.153472 38409100.0
2010-01-05 30.85 31.10 30.64 30.960 49749600.0 0.0 1.0 25.072201 25.275379 24.901531 25.161599 49749600.0
2010-01-06 30.88 31.08 30.52 30.770 58182400.0 0.0 1.0 25.096582 25.259125 24.804005 25.007183 58182400.0
2010-01-07 30.63 30.70 30.19 30.452 50559700.0 0.0 1.0 24.893404 24.950294 24.535810 24.748741 50559700.0
2010-01-08 30.28 30.88 30.24 30.660 51197400.0 0.0 1.0 24.608954 25.096582 24.576445 24.917785 51197400.0
In [15]:
msftQuandlData.tail()
Out[15]:
Open High Low Close Volume Ex-Dividend Split Ratio Adj. Open Adj. High Adj. Low Adj. Close Adj. Volume
Date
2018-03-21 92.930 94.050 92.21 92.48 23753263.0 0.0 1.0 92.930 94.050 92.21 92.48 23753263.0
2018-03-22 91.265 91.750 89.66 89.79 37578166.0 0.0 1.0 91.265 91.750 89.66 89.79 37578166.0
2018-03-23 89.500 90.460 87.08 87.18 42159397.0 0.0 1.0 89.500 90.460 87.08 87.18 42159397.0
2018-03-26 90.610 94.000 90.40 93.78 55031149.0 0.0 1.0 90.610 94.000 90.40 93.78 55031149.0
2018-03-27 94.940 95.139 88.51 89.47 53704562.0 0.0 1.0 94.940 95.139 88.51 89.47 53704562.0
In [16]:
msftQuandlData.describe()
Out[16]:
Open High Low Close Volume Ex-Dividend Split Ratio Adj. Open Adj. High Adj. Low Adj. Close Adj. Volume
count 2071.000000 2071.000000 2071.000000 2071.000000 2.071000e+03 2071.000000 2071.0 2071.000000 2071.000000 2071.000000 2071.000000 2.071000e+03
mean 42.862722 43.218251 42.502944 42.879908 4.252287e+07 0.004119 1.0 39.748998 40.077317 39.417026 39.765121 4.252287e+07
std 17.405500 17.524218 17.278283 17.405875 2.400108e+07 0.034688 0.0 18.706579 18.837182 18.567043 18.707100 2.400108e+07
min 23.090000 23.320000 22.730000 23.010000 7.425503e+06 0.000000 1.0 18.937293 19.125928 18.642039 18.871681 7.425503e+06
25% 28.325000 28.575000 28.060000 28.340000 2.627534e+07 0.000000 1.0 24.116187 24.274529 23.833306 24.058958 2.627534e+07
50% 37.870000 38.135000 37.540000 37.860000 3.713556e+07 0.000000 1.0 34.230102 34.552500 34.010999 34.290807 3.713556e+07
75% 52.620000 53.085000 52.125000 52.715000 5.250545e+07 0.000000 1.0 50.343515 50.877178 49.989786 50.434308 5.250545e+07
max 97.000000 97.240000 96.040000 96.770000 3.193179e+08 0.420000 1.0 97.000000 97.240000 96.040000 96.770000 3.193179e+08

More points to note

  • You can define tickers in the form ‘WIKI/ticker_symbol’.
  • You can pass the list of tickets in the form [‘WIKI/ticker1’, ‘WIKI/ticker2’]
  • You can use collapse parameter to define if you want data on daily, weekly, monthly or yearly basis.

Intrinio

Intrinio is a paid database with some features coming under free access. From Intrinio, one can also get some pre-calculated KPIs such as Moving Averages etc. Like Quandl, one will have to make a profile on Intrinio to access the data using Python via API. Go to https://intrinio.com/ and sign up and copy the API key.

1. Import libraries

In [18]:
import intrinio_sdk

2. Key authentication

In [20]:
intrinio_sdk.ApiClient().configuration.api_key['api_key'] = 'key'
In [21]:
security_api = intrinio_sdk.SecurityApi()

3. Fetching the data

In [24]:
fetch = security_api.get_security_stock_prices(identifier='MSFT',
                                               start_date='2010-01-01',
                                               end_date='2020-04-14',
                                               frequency='weekly',
                                               page_size=10)
In [25]:
fetch
Out[25]:
{'next_page': 'MjAyMC0wMi0xNHwxMzQ4MDY0ODgxOQ==',
 'security': {'code': 'EQS',
              'company_id': 'com_qzEByJ',
              'composite_figi': 'BBG000BPH459',
              'composite_ticker': 'MSFT:US',
              'currency': 'USD',
              'figi': 'BBG000BPHFS9',
              'id': 'sec_XaL6mg',
              'name': 'Microsoft Corporation',
              'share_class_figi': 'BBG001S5TD05',
              'ticker': 'MSFT'},
 'stock_prices': [{'adj_close': 165.51,
                   'adj_high': 165.57,
                   'adj_low': 162.3,
                   'adj_open': 164.35,
                   'adj_volume': 41905264.0,
                   'close': 165.51,
                   'date': datetime.date(2020, 4, 13),
                   'frequency': 'weekly',
                   'high': 165.57,
                   'intraperiod': True,
                   'low': 162.3,
                   'open': 164.35,
                   'volume': 41905264.0},
                  {'adj_close': 165.14,
                   'adj_high': 170.0,
                   'adj_low': 157.58,
                   'adj_open': 160.32,
                   'adj_volume': 229630744.0,
                   'close': 165.14,
                   'date': datetime.date(2020, 4, 10),
                   'frequency': 'weekly',
                   'high': 170.0,
                   'intraperiod': False,
                   'low': 157.58,
                   'open': 160.32,
                   'volume': 229630744.0},
                  {'adj_close': 153.83,
                   'adj_high': 164.78,
                   'adj_low': 150.01,
                   'adj_open': 152.44,
                   'adj_volume': 290191457.0,
                   'close': 153.83,
                   'date': datetime.date(2020, 4, 3),
                   'frequency': 'weekly',
                   'high': 164.78,
                   'intraperiod': False,
                   'low': 150.01,
                   'open': 152.44,
                   'volume': 290191457.0},
                  {'adj_close': 149.7,
                   'adj_high': 156.66,
                   'adj_low': 132.52,
                   'adj_open': 137.01,
                   'adj_volume': 358866845.0,
                   'close': 149.7,
                   'date': datetime.date(2020, 3, 27),
                   'frequency': 'weekly',
                   'high': 156.66,
                   'intraperiod': False,
                   'low': 132.52,
                   'open': 137.01,
                   'volume': 358866845.0},
                  {'adj_close': 137.35,
                   'adj_high': 150.15,
                   'adj_low': 135.0,
                   'adj_open': 140.0,
                   'adj_volume': 421347734.0,
                   'close': 137.35,
                   'date': datetime.date(2020, 3, 20),
                   'frequency': 'weekly',
                   'high': 150.15,
                   'intraperiod': False,
                   'low': 135.0,
                   'open': 140.0,
                   'volume': 421347734.0},
                  {'adj_close': 158.83,
                   'adj_high': 161.91,
                   'adj_low': 138.58,
                   'adj_open': 151.0,
                   'adj_volume': 378231775.0,
                   'close': 158.83,
                   'date': datetime.date(2020, 3, 13),
                   'frequency': 'weekly',
                   'high': 161.91,
                   'intraperiod': False,
                   'low': 138.58,
                   'open': 151.0,
                   'volume': 378231775.0},
                  {'adj_close': 161.57,
                   'adj_high': 175.0,
                   'adj_low': 156.0,
                   'adj_open': 165.31,
                   'adj_volume': 313160520.0,
                   'close': 161.57,
                   'date': datetime.date(2020, 3, 6),
                   'frequency': 'weekly',
                   'high': 175.0,
                   'intraperiod': False,
                   'low': 156.0,
                   'open': 165.31,
                   'volume': 313160520.0},
                  {'adj_close': 162.01,
                   'adj_high': 174.84,
                   'adj_low': 152.0,
                   'adj_open': 167.77,
                   'adj_volume': 383019972.0,
                   'close': 162.01,
                   'date': datetime.date(2020, 2, 28),
                   'frequency': 'weekly',
                   'high': 174.84,
                   'intraperiod': False,
                   'low': 152.0,
                   'open': 167.77,
                   'volume': 383019972.0},
                  {'adj_close': 178.59,
                   'adj_high': 188.18,
                   'adj_low': 177.25,
                   'adj_open': 185.099426373978,
                   'adj_volume': 143313345.0,
                   'close': 178.59,
                   'date': datetime.date(2020, 2, 21),
                   'frequency': 'weekly',
                   'high': 188.18,
                   'intraperiod': False,
                   'low': 177.25,
                   'open': 185.605,
                   'volume': 143313345.0},
                  {'adj_close': 184.845120974203,
                   'adj_high': 190.180547989104,
                   'adj_low': 181.354654702772,
                   'adj_open': 183.079942316936,
                   'adj_volume': 194512444.0,
                   'close': 185.35,
                   'date': datetime.date(2020, 2, 14),
                   'frequency': 'weekly',
                   'high': 190.7,
                   'intraperiod': False,
                   'low': 181.85,
                   'open': 183.58,
                   'volume': 194512444.0}]}

As can be seen above, this is a JSON object (dictionary) having 10 key: value pairs. Let’s convert this into a dataframe.

4. Converting fetched dictionary to dataframe

In [27]:
fetch_list = [f.to_dict() for f in fetch.stock_prices]
In [28]:
#Provides list of dictionaries for each week
fetch_list
Out[28]:
[{'date': datetime.date(2020, 4, 13),
  'intraperiod': True,
  'frequency': 'weekly',
  'open': 164.35,
  'high': 165.57,
  'low': 162.3,
  'close': 165.51,
  'volume': 41905264.0,
  'adj_open': 164.35,
  'adj_high': 165.57,
  'adj_low': 162.3,
  'adj_close': 165.51,
  'adj_volume': 41905264.0},
 {'date': datetime.date(2020, 4, 10),
  'intraperiod': False,
  'frequency': 'weekly',
  'open': 160.32,
  'high': 170.0,
  'low': 157.58,
  'close': 165.14,
  'volume': 229630744.0,
  'adj_open': 160.32,
  'adj_high': 170.0,
  'adj_low': 157.58,
  'adj_close': 165.14,
  'adj_volume': 229630744.0},
 {'date': datetime.date(2020, 4, 3),
  'intraperiod': False,
  'frequency': 'weekly',
  'open': 152.44,
  'high': 164.78,
  'low': 150.01,
  'close': 153.83,
  'volume': 290191457.0,
  'adj_open': 152.44,
  'adj_high': 164.78,
  'adj_low': 150.01,
  'adj_close': 153.83,
  'adj_volume': 290191457.0},
 {'date': datetime.date(2020, 3, 27),
  'intraperiod': False,
  'frequency': 'weekly',
  'open': 137.01,
  'high': 156.66,
  'low': 132.52,
  'close': 149.7,
  'volume': 358866845.0,
  'adj_open': 137.01,
  'adj_high': 156.66,
  'adj_low': 132.52,
  'adj_close': 149.7,
  'adj_volume': 358866845.0},
 {'date': datetime.date(2020, 3, 20),
  'intraperiod': False,
  'frequency': 'weekly',
  'open': 140.0,
  'high': 150.15,
  'low': 135.0,
  'close': 137.35,
  'volume': 421347734.0,
  'adj_open': 140.0,
  'adj_high': 150.15,
  'adj_low': 135.0,
  'adj_close': 137.35,
  'adj_volume': 421347734.0},
 {'date': datetime.date(2020, 3, 13),
  'intraperiod': False,
  'frequency': 'weekly',
  'open': 151.0,
  'high': 161.91,
  'low': 138.58,
  'close': 158.83,
  'volume': 378231775.0,
  'adj_open': 151.0,
  'adj_high': 161.91,
  'adj_low': 138.58,
  'adj_close': 158.83,
  'adj_volume': 378231775.0},
 {'date': datetime.date(2020, 3, 6),
  'intraperiod': False,
  'frequency': 'weekly',
  'open': 165.31,
  'high': 175.0,
  'low': 156.0,
  'close': 161.57,
  'volume': 313160520.0,
  'adj_open': 165.31,
  'adj_high': 175.0,
  'adj_low': 156.0,
  'adj_close': 161.57,
  'adj_volume': 313160520.0},
 {'date': datetime.date(2020, 2, 28),
  'intraperiod': False,
  'frequency': 'weekly',
  'open': 167.77,
  'high': 174.84,
  'low': 152.0,
  'close': 162.01,
  'volume': 383019972.0,
  'adj_open': 167.77,
  'adj_high': 174.84,
  'adj_low': 152.0,
  'adj_close': 162.01,
  'adj_volume': 383019972.0},
 {'date': datetime.date(2020, 2, 21),
  'intraperiod': False,
  'frequency': 'weekly',
  'open': 185.605,
  'high': 188.18,
  'low': 177.25,
  'close': 178.59,
  'volume': 143313345.0,
  'adj_open': 185.099426373978,
  'adj_high': 188.18,
  'adj_low': 177.25,
  'adj_close': 178.59,
  'adj_volume': 143313345.0},
 {'date': datetime.date(2020, 2, 14),
  'intraperiod': False,
  'frequency': 'weekly',
  'open': 183.58,
  'high': 190.7,
  'low': 181.85,
  'close': 185.35,
  'volume': 194512444.0,
  'adj_open': 183.079942316936,
  'adj_high': 190.180547989104,
  'adj_low': 181.354654702772,
  'adj_close': 184.845120974203,
  'adj_volume': 194512444.0}]
In [29]:
#Converting each dictionary to a row of dataframe with each key as column and then sorting by date
msftIntrinio = pd.DataFrame(fetch_list).sort_values('date')
In [30]:
msftIntrinio.head()
Out[30]:
date intraperiod frequency open high low close volume adj_open adj_high adj_low adj_close adj_volume
9 2020-02-14 False weekly 183.580 190.70 181.85 185.35 194512444.0 183.079942 190.180548 181.354655 184.845121 194512444.0
8 2020-02-21 False weekly 185.605 188.18 177.25 178.59 143313345.0 185.099426 188.180000 177.250000 178.590000 143313345.0
7 2020-02-28 False weekly 167.770 174.84 152.00 162.01 383019972.0 167.770000 174.840000 152.000000 162.010000 383019972.0
6 2020-03-06 False weekly 165.310 175.00 156.00 161.57 313160520.0 165.310000 175.000000 156.000000 161.570000 313160520.0
5 2020-03-13 False weekly 151.000 161.91 138.58 158.83 378231775.0 151.000000 161.910000 138.580000 158.830000 378231775.0
In [31]:
#Renaming row indices to date in place of row numbers
msftIntrinio.set_index('date', inplace=True)
In [32]:
msftIntrinio.head()
Out[32]:
intraperiod frequency open high low close volume adj_open adj_high adj_low adj_close adj_volume
date
2020-02-14 False weekly 183.580 190.70 181.85 185.35 194512444.0 183.079942 190.180548 181.354655 184.845121 194512444.0
2020-02-21 False weekly 185.605 188.18 177.25 178.59 143313345.0 185.099426 188.180000 177.250000 178.590000 143313345.0
2020-02-28 False weekly 167.770 174.84 152.00 162.01 383019972.0 167.770000 174.840000 152.000000 162.010000 383019972.0
2020-03-06 False weekly 165.310 175.00 156.00 161.57 313160520.0 165.310000 175.000000 156.000000 161.570000 313160520.0
2020-03-13 False weekly 151.000 161.91 138.58 158.83 378231775.0 151.000000 161.910000 138.580000 158.830000 378231775.0

So guys, In this tutorial, we learnt about different sources from where we can get historical financial data and the ways to access them via API. With this, I conclude this tutorial. In our next tutorial we will learn about Adjusted Returns including Inflation. Stay tuned for more such informative tutorials! Please don’t forget to subscribe our channel here to learn Power BI, DAX, Python and develop foundation for Business intelligence and data science.

Exit mobile version