Baseball Save

Library to download, analyze, and visualize events in Major League Baseball games

Project README

Table of Contents

Baseball

This package fetches and parses event data for Major League Baseball games. Game objects generated via the _from_url methods pull data from MLB endpoints where events are published within about 30 seconds of occurring. This XML/JSON source data zip file contains event data from MLB games 1974 - 2020.

Installing from pypi

pip3 install baseball

Installing from source

git clone [email protected]:benjamincrom/baseball.git
cd baseball/
python3 setup.py install

Fetch individual MLB game

  • get_game_from_url(date_str, away_code, home_code, game_number)

Fetch an object which contains metadata and events for a single MLB game.

import baseball
game_id, game = baseball.get_game_from_url('2017-11-1', 'HOU', 'LAD', 1)
game_dict = game._asdict()
game_json_str = game.json()

Write scorecard as SVG image:

with open(game_id + '.svg', 'w') as fh:
    fh.write(game.get_svg_str())

2017-11-01-HOU-LAD-1.svg svg

Fetch list of MLB games

  • get_game_list_from_file_range(start_date_str, end_date_str, input_dir)

Fetch a list of game objects which each contain metadata and events for a single MLB game.

First, download and unzip the source data zip file:

wget https://spaces-host.nyc3.digitaloceanspaces.com/livebaseballscorecards-artifacts/baseball_files_2008-2017.zip
unzip baseball_files_2008-2017.zip -d ./baseball_files_2008-2017

Then import the files in Python using this library:

import baseball
game_tuple_list = baseball.get_game_list_from_file_range('1-1-2017', '12-31-2017', 'baseball_files_2008-2017')

Get Game generator given target directory and date range

  • get_game_generator_from_file_range(start_date_str, end_date_str, input_dir)

    Returns generator which yields (game_id, Game) tuples

Get raw XML files for an individual MLB game

  • get_game_xml_from_url(date_str, away_code, home_code, game_number)

    Returns game_id and three strings containing XML documents: (game_id, boxscore_raw_xml, players_raw_xml, inning_raw_xml)

Convert XML documents into Game object

  • get_game_from_xml_strings(boxscore_raw_xml, players_raw_xml, inning_raw_xml)

    Returns Game object if enough information to create one is provided. Otherwise returns None.

Write scorecard SVGs for all MLB games on a given date

  • write_games_for_date(this_datetime, output_dir)

    Writes SVG files for all games played on the given date

Game Class Structure

Game

  • away_batter_box_score_dict
  • away_pitcher_box_score_dict
  • away_team (Team)
  • away_team_stats
  • start_datetime
  • expected_start_datetime
  • game_date_str
  • home_batter_box_score_dict
  • home_pitcher_box_score_dict
  • home_team (Team)
  • home_team_stats
  • inning_list (Inning list)
  • end_datetime
  • location
  • attendance
  • weather
  • temp
  • timezone_str
  • is_postponed
  • is_suspended
  • is_doubleheader
  • is_today
  • get_svg_str()
  • json()
  • _asdict()

Team

  • abbreviation
  • batting_order_list_list (list of nine PlayerAppearance lists)
  • name
  • pitcher_list (PlayerAppearance list)
  • player_id_dict
  • player_last_name_dict
  • player_name_dict
  • _asdict()

Inning

  • bottom_half_appearance_list (PlateAppearance list)
  • bottom_half_inning_stats
  • top_half_appearance_list (PlateAppearance list)
  • top_half_inning_stats
  • _asdict()

PlateAppearance

  • start_datetime
  • end_datetime
  • batter (Player)
  • batting_team (Team)
  • error_str
  • event_list (list of Pitch, Pickoff, RunnerAdvance, Substitution, Switch objects)
  • got_on_base
  • hit_location
  • inning_outs
  • out_runners_list (Player list)
  • pitcher (Player)
  • plate_appearance_description
  • plate_appearance_summary
  • runners_batted_in_list (Player list)
  • scorecard_summary
  • scoring_runners_list (Player list)
  • _asdict()

Player

  • era
  • first_name
  • last_name
  • mlb_id
  • number
  • obp
  • slg
  • _asdict()

PlayerAppearance

  • start_inning_batter_num
  • start_inning_half
  • start_inning_num
  • end_inning_batter_num
  • end_inning_half
  • end_inning_num
  • pitcher_credit_code
  • player_obj (Player)
  • position
  • _asdict()

Pitch

  • pitch_datetime
  • pitch_description
  • pitch_position
  • pitch_speed
  • pitch_type
  • _asdict()

Pickoff

  • pickoff_description
  • pickoff_base
  • pickoff_was_successful
  • _asdict()

RunnerAdvance

  • runner_advance_datetime
  • run_description
  • runner (Player)
  • start_base
  • end_base
  • runner_scored
  • run_earned
  • is_rbi
  • _asdict()

Substitution

  • substitution_datetime
  • incoming_player (Player)
  • outgoing_player (Player)
  • batting_order
  • position
  • _asdict()

Switch

  • switch_datetime
  • player (Player)
  • old_position_num
  • new_position_num
  • new_batting_order
  • _asdict()

Analyze a game: 2017 World Series - Game 7

import matplotlib
import matplotlib.pyplot as plt
import pandas as pd

import baseball

%matplotlib inline

game_id, game = baseball.get_game_from_url('11-1-2017', 'HOU', 'LAD', 1)

pitch_tuple_list = []
for inning in game.inning_list:
    for appearance in inning.top_half_appearance_list:
        for event in appearance.event_list:
            if isinstance(event, baseball.Pitch):
                pitch_tuple_list.append(
                    (str(appearance.pitcher), 
                     event.pitch_description,
                     event.pitch_position,
                     event.pitch_speed,
                     event.pitch_type)
                )

data = pd.DataFrame(data=pitch_tuple_list, columns=['Pitcher', 'Pitch Description', 'Pitch Coordinate', 'Pitch Speed', 'Pitch Type'])
data.head()
Pitcher Pitch Description Pitch Coordinate Pitch Speed Pitch Type
0 21 Yu Darvish Ball (155.47, 160.83) 96.0 FF
1 21 Yu Darvish Called Strike (107.0, 171.09) 83.9 FC
2 21 Yu Darvish In play, no out (115.36, 183.1) 83.9 SL
3 21 Yu Darvish In play, run(s) (80.06, 168.03) 96.6 FF
4 21 Yu Darvish Ball (54.1, 216.52) 84.6 SL
Open Source Agenda is not affiliated with "Baseball" Project. README Source: benjamincrom/baseball
Stars
70
Open Issues
1
Last Commit
2 weeks ago
License
MIT

Open Source Agenda Badge

Open Source Agenda Rating