Scrape, analyze & visualize stock market data for the S&P500 using Python. Build a basic trading strategy using machine learning to assess company performance and determine buy, sell, hold. Read me & instructions available in Spanish. This is a working repo, with plans to expand the project from technical analysis to fundamental analysis.
Use Python to scrape data and join with financial data from Yahoo Finance (or another finance) API. Use data manipulation and visualization for financial and investment analysis (i.e. compare rates of return, calculate risk, build trading algorithms, and make investment decisions).
Use the Stock_Market_Data_Analysis.ipynb
file to run the program in Jupyter Notebook. Use the ".py" file (Stock_Market_DataAnalysis_DataVisualization.py
) to run the program only in Python.
For a walk through of the project - how to install Python, the necessary packages, Jupyter, and run the notebook file - please see my Demo Screen Recording here.
pip install pandas
pip install pandas-datareader
pip install beautifulsoup4
pip install scikit-learn
pip install numpy
pip install matplotlib
pip install mplfinance
pip install mpl-finance
pip install yfinance
pip install jupyter
Once everything is installed, change directory (cd) to navigate to where the project has been downloaded.
Locate the ".ipynb" file Stock_Market_Data_Analysis.ipynb
and run Jupyter with the command: jupyter notebook
in your terminal or cmd; this will take you to the project in Jupyter Notebook, opening up a browser.
Once Jupyter opens in the browser, you should see the Stock_Market_Data_Analysis.ipynb
notebook file. Double click to open the file.
To run the program, select "Cell," and "Run All."
For additional charts, please run the Stock_Market_Data_Analysis_DataVisualization.ipynb
file in Jupyter Notebook. Here I focus mainly on data visualizations for large cap tech stocks (i.e. Apple, Google, Facebook), and model various chart types. I also begin the next part of the project: Fundamental Analysis. In this part, we will be using finanical statements from the SEC Edgar website.
The script will scrape data for S&P500 tickers, pull financial data from Yahoo Finance API, and download into a csv file. It will also manipulate/clean data, and merge multiple data frames into one large csv file. The script uses for loops, dictionaries, and error handling. Further, there is additional data visualization in the "Stock_Market_Data_Analysis_DataVisualization.ipynb" Jupyter Notebook file; this is done using matplotlib to build various stock charts (i.e. line charts, bar charts, moving average bar charts, candlestick charts). Additional features are highlighted below:
pandas
pandas-datareader
beautifulsoup4
scikit-learn
numpy
yfinance
(or another finance API)jupyter
matplotlib
mplfinance
, mpl_finance
pip install --upgrade mplfinance
import requests
import datetime as dt
import os
import io
import pandas as pd
import pandas_datareader.data as web
import pandas.plotting
from pandas.plotting import register_matplotlib_converters
import pandas.testing #pandas.testing.assert_frame_equal
from pandas.testing import assert_frame_equal #assert_frame_equal
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
import matplotlib.dates as mdates
import matplotlib.colors as mcolors
from matplotlib import style
import mplfinance as mpf
import mpl_finance as mplf
from mpl_finance import candlestick_ohlc
import collections
from collections import Counter
import sklearn
from sklearn import svm, neighbors
from sklearn.svm import LinearSVC
from sklearn.model_selection import train_test_split
from sklearn.ensemble import VotingClassifier, RandomForestClassifier
style.use('ggplot')
BeautifulSoup
Matplotlib
yfinance
for financial data.Pandas
to join stock tickers with financial data.NOTE: If you are new to Python, check out the Python Programming Fundamentals website for tutorials. You will need to review up to installing Python packages and modules with pip.
There is a new matplotlib finance (mplfinance) API that has made creating financial plots easier. Some of the updates include automatic features for the user, and improved interacing with Pandas DataFrames.
According to the matplotlib/mplfinance repo, the conventional way to import the new API is as follows:
import mplfinance as mpf
The most common usage is then to call
mpf.plot(data)
where data
is the Pandas DataFrame
object, which contains Open, High, Low and Close pricing data, using a Pandas DatetimeIndex
.
Further details on how to call the new API can be found below under Basic Usage on the matplotlib/mplfinance repo, as well as in the jupyter notebooks in the examples folder.
365 Careers (2020). Python for Finance: Investment Fundamentals & Data Analytics
B., V. (2019). Stock Market Data and Analysis in Python
Boller, K. (2018). Python for Finance: Stock Portfolio Analyses
Danielsson, J (2011). Financial Risk Forecasting
Dhiman, A. (2019). Stock Market Data Analysis with Python
Efstathopoulos, G. (2019). Python for Finance, Part I: Yahoo & Google Finance API, Pandas, and Matplotlib
Huang, S. (2019). Best 5 Free Stock Market APIs in 2020
Lewinson, E. (2020). A Comprehensive Guide to Downloading Stock Prices in Python.
IEX Cloud (2020). IEX CLoud API
Kharkar, R. (2020). How to Get Stock Data Using Python
Matplotlib (2020). Matplotlib Documentation
Miller, C. (2018). Introduction to Stock Market Data Analysis with Python
Miller, C. (2019). Pakt Publishing: Training Your Systems with Python Statistical Modeling
O'Keefe, C. (2020). Practical Introduction to Web Scraping in Python
Pandas (2020). Pandas Documentation
Python Programming (2020). Pythonprogramming.net
Python Fundamentals > Basics Complete through how to install python packages and programs using pip (about 10-11 tutorials)
Quandl (2020). Quandl API: Core Financial Data
Quandl (2020). Get Financial Data Directly into Python.
Vaidyanathan, V. (2020). Coursera: Investment Management with Python and Machine Learning Specialization
Vaidyanathan, V. (2020b). Coursera: Course 1 - Introduction to Portfolio Construction and Analysis with Python
Vaidyanathan, V. (2020c). Coursera: Course 2 - Advanced Portfolio Construction and Analysis with Python
Vaidyanathan, V. (2020d). Coursera: Course 3 - Python and Machine Learning for Asset Management
Vaidyanathan, V. (2020e). Coursera: Course 4 - PYthon and Machine Learning for Asset Management with Alternative Data Sets