MorningStar.com scraper that consolidates tens of thousands of financial records into a SQLite relational database. Class 'dataframes' easily converts the SQLite data into pandas DataFrames (see Jupyter notebook for examples)
msTables was a MorningStar.com scraper written in python that fetches, parses and stores financial and market data for over 70k securities into a relational SQLite database. The scraper provided a Command Line Interface (CLI) that allows the user greater flexibility for creating and managing multiple .sqlite files. Once data has been downloaded into the database files, dataframes.py module can be used to easily create DataFrame objects from the database tables for further analysis.
The scraper had to change starting in 2016 and finally in 2017 Morningstar paywalled access to MsRatio data seeMorningstar API. We are expanding upon the work that caiobran did for us to include multiple API's, pandas datareader, and local financial analysis for multiple functions. See input/api.json for the complete list of URL's.
IMPORTANT: The Morningstar.com data is protected under "Copyright (c) 2018 Morningstar. All rights reserved." This tool should be for personal purposes only. See the following links for more information regarding Morningstar.com terms & conditions:
As a fan of Benjamin Graham's value investing, I have always searched for sources of consolidated financial data that would allow me to identify 'undervalued' companies from a large pool of global public stocks. However, most (if not all) financial services that provide such data consolidation are not free and, as a small retail investor, I was not willing to pay for their fees. In fact, most of the data I needed was already available for free on various financial website, just not in a consolidated format. Therefore, I decided to create a web scraper for MorningStar.com, which is the website that I found to have the most available data in a more standardized and structured format. MS was also one of the only website services that published free financial performance data for the past 10 yrs, while most sites only provided free data for last 5 yrs.
The scraper should run on any Linux distribution that has Python3 and the following modules installed:
To view the notebook with data visualization examples mentioned in the instructions below, you must also have Jupyter and matplotlib installed.
Open a Linux terminal in the desired installation directory and execute git clone https://github.com/caiobran/msTables.git
to download the project files.
Execute python main.py
from the project root directory to start the scraper CLI. If the program has started correctly, you should see the following interface:
1
to create the initial SQLite database tables.2
to download the latest data from the MorningStar URL's.
1000000
if you would like the scraper to update all records. You may also enter smaller quantities if you do not want the scraper to run for a long period of time.(documentation in progress, to be updated with instructions on remaining actions)
The scraper will automatically create a directory db/ in the root folder to store the .sqlite database files generated. The current file name in use will be displayed on the scraper CLI under action 0
(see CLI figure above). Database files will contain a relational database with the following main tables:
Database Tables
How to slice and dice the data using dataframes.py
Module dataframes contains a class that can be used to generate pandas DataFrames for the data in the SQLite database file that is generated by the web crawler.
See Jupyter notebook data_overview.ipynb for examples on how to create DataFrame objects to manipulate and visualize the data. Below is a list of all content found in the notebook:
Juypter Notebook Content
quoteheader
- MorningStar (MS) Quote Header
valuation
- MS Valuation table with Price Ratios (P/E, P/S, P/B, P/C) for the past 10 yrs
keyratios
- MS Ratio - Key Financial Ratios & Values
finhealth
- MS Ratio - Financial Health
profitability
- MS Ratio - Profitability
growth
- MS Ratio - Growth
cfhealth
- MS Ratio - Cash Flow Health
efficiency
- MS Ratio - Efficiency
annualIS
- MS Annual Income Statements
quarterlyIS
- MS Quarterly Income Statements
annualBS
- MS Annual Balance Sheets
quarterlyBS
- MS Quarterly Balance Sheets
annualCF
- MS Annual Cash Flow Statements
quarterlyCF
- MS Quarterly Cash Flow Statements
insider_trades
- Insider transactions for the past year
Below are sample snip-its of code from data_overview.ipynb:
Count of records downloaded from Morningstar.com by security type:
Plot of average US stocks P/E by sector for the past 10 years:
Applying fundamental rules to screen the list of stocks (see sample output):
Copyright (c) 2019 Caio Brandao
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.