Library to download, analyze, and visualize events in Major League Baseball games
Table of Contents
This package fetches and parses event data for Major League Baseball games. Game objects generated via the _from_url methods pull data from MLB endpoints where events are published within about 30 seconds of occurring. This XML/JSON source data zip file contains event data from MLB games 1974 - 2020.
pip3 install baseball
git clone [email protected]:benjamincrom/baseball.git
cd baseball/
python3 setup.py install
Fetch an object which contains metadata and events for a single MLB game.
import baseball
game_id, game = baseball.get_game_from_url('2017-11-1', 'HOU', 'LAD', 1)
game_dict = game._asdict()
game_json_str = game.json()
Write scorecard as SVG image:
with open(game_id + '.svg', 'w') as fh:
fh.write(game.get_svg_str())
2017-11-01-HOU-LAD-1.svg
Fetch a list of game objects which each contain metadata and events for a single MLB game.
First, download and unzip the source data zip file:
wget https://spaces-host.nyc3.digitaloceanspaces.com/livebaseballscorecards-artifacts/baseball_files_2008-2017.zip
unzip baseball_files_2008-2017.zip -d ./baseball_files_2008-2017
Then import the files in Python using this library:
import baseball
game_tuple_list = baseball.get_game_list_from_file_range('1-1-2017', '12-31-2017', 'baseball_files_2008-2017')
get_game_generator_from_file_range(start_date_str, end_date_str, input_dir)
Returns generator which yields (game_id, Game) tuples
get_game_xml_from_url(date_str, away_code, home_code, game_number)
Returns game_id and three strings containing XML documents: (game_id, boxscore_raw_xml, players_raw_xml, inning_raw_xml)
get_game_from_xml_strings(boxscore_raw_xml, players_raw_xml, inning_raw_xml)
Returns Game object if enough information to create one is provided. Otherwise returns None.
write_games_for_date(this_datetime, output_dir)
Writes SVG files for all games played on the given date
import matplotlib
import matplotlib.pyplot as plt
import pandas as pd
import baseball
%matplotlib inline
game_id, game = baseball.get_game_from_url('11-1-2017', 'HOU', 'LAD', 1)
pitch_tuple_list = []
for inning in game.inning_list:
for appearance in inning.top_half_appearance_list:
for event in appearance.event_list:
if isinstance(event, baseball.Pitch):
pitch_tuple_list.append(
(str(appearance.pitcher),
event.pitch_description,
event.pitch_position,
event.pitch_speed,
event.pitch_type)
)
data = pd.DataFrame(data=pitch_tuple_list, columns=['Pitcher', 'Pitch Description', 'Pitch Coordinate', 'Pitch Speed', 'Pitch Type'])
data.head()
Pitcher | Pitch Description | Pitch Coordinate | Pitch Speed | Pitch Type | |
---|---|---|---|---|---|
0 | 21 Yu Darvish | Ball | (155.47, 160.83) | 96.0 | FF |
1 | 21 Yu Darvish | Called Strike | (107.0, 171.09) | 83.9 | FC |
2 | 21 Yu Darvish | In play, no out | (115.36, 183.1) | 83.9 | SL |
3 | 21 Yu Darvish | In play, run(s) | (80.06, 168.03) | 96.6 | FF |
4 | 21 Yu Darvish | Ball | (54.1, 216.52) | 84.6 | SL |