NflfastR Versions Save

A Set of Functions to Efficiently Scrape NFL Play by Play Data

v4.6.1

4 months ago
  • The function calculate_series_conversion_rates() now correctly aggregates season level conversion rates. Performance has also been improved. (#440)
  • Adjusted test behavior at CRAN's request.

Thank you to @andrewtek, @gregalvi86, @Ic4ru5Wing, @JoeMarino2021, @jreddy1990, @marvin3FF, @mrcaseb, @RicShern, @SPNE, and @trivialfis for their questions, feedback, and contributions towards this release.

v4.6.0

6 months ago

New Features

  • nflfastR now fully supports loading raw pbp data from local file system. The best way to use this feature is to set options("nflfastR.raw_directory" = {"your/local/directory"}). Alternatively, both build_nflfastR_pbp() and fast_scraper() support the argument dir which defaults to the above option. (#423)
  • Added the new function save_raw_pbp() which efficiently downloads raw play-by-play data and saves it to the local file system. This serves as a helper to setup the system for faster play-by-play parsing via the above functionality. (#423)
  • Added the new function missing_raw_pbp() that computes a vector of game IDs missing in the local raw play-by-play directory. (#423)

Minor Improvements and Bugfixes

  • The internal function get_pbp_nfl() now uses ifelse() instead of dplyr::if_else() to handle some null-checking, fixes bug found in 2022_21_CIN_KC match.
  • The function calculate_player_stats() now summarises target share and air yards share correctly when called with argument weekly = FALSE (#413)
  • The function calculate_player_stats() now returns the opponent team when called with argument weekly = TRUE (#414)
  • The function calculate_player_stats_def() no longer errors when small subsets of pbp data are missing stats. (#415)
  • The function calculate_series_conversion_rates() no longer returns NA values if a small subset of pbp data is missing series on offense or defense. (#417)
  • fixed_drive now correctly increments on plays where posteam lost a fumble but remains posteam because defteam also lost a fumble during the same play. (#419)
  • nflfastR now fixes missing drive number counts in raw pbp data in order to provide accurate drive information. (#420)
  • nflfastR now returns correct kick_distance on all punts and kickoffs. (#422)
  • Decode player IDs in 2023 pbp. (#425)
  • Drop the pseudo plays TV Timeout and Two-Minute Warning. (#426)
  • Fix posteam on kickoffs and PATs following a defensive TD in 2023+ pbp. (#427)
  • calculate_player_stats() no more counts lost fumbles on plays where a player fumbles, a team mate recovers and then loses a fumble to the defense. (#431)
  • The variables passer, receiver, and rusher no more return NA on "abnormal" plays - like direct snaps, aborted snaps, laterals etc. - that resulted in a penalty. (#435)

Thank you to @903124, @ak47twq, @andrewtek, @darkhark, @dennisbrookner, @marvin3FF, @mistakia, @mrcaseb, @nicholasmendoza22, @rickstarblazer, @RileyJohnson22, and @tanho63 for their questions, feedback, and contributions towards this release.

v4.5.1

1 year ago
  • New implementation of tests to be able to identify breaking changes in reverse dependencies (#396, #406)
  • calculate_standings() no more freezes when computing standings from schedules where some games are missing results, i.e. upcoming games.
  • Bug fix that caused problems with upcoming dplyr and tidyselect updates that weren't reverse compatible.
  • Significant performance improvements of internal functions. (#402)
  • Wrap examples in try() to avoid CRAN problems. (#404)
  • Fixed a bug where calculate_standings() wasn't able to handle nflverse pbp data. (#404)

v4.5.0

1 year ago

New (experimental) functions

  • Added new function calculate_player_stats_def() that aggregates defensive player stats either at game level or overall. (#288)
  • The situation report nflverse_sitrep which is an alias of the already available report()
  • Added new function calculate_player_stats_kicking() that aggregates player stats for field goals and extra points at game level or overall. (#381)
  • Added new function calculate_series_conversion_rates() that computes series conversion and series result rates at a game level or season level. (#393)

Bugfixes and Minor Improvements

  • Internal change to calculate_player_stats() that reflects new nflverse data infrastructure.
  • calculate_player_stats() now unifies player names and joins the following player information via nflreadr::load_players():
    • player_display_name - Full name of the player
    • position - Position of the player
    • position_group - Position group of the player
    • headshot_url - URL to a player headshot image
  • Make data work in 2022 (hopefully)
  • Fix Amon-Ra St. Brown breaking the name parser
  • Add gsis_id patch to clean_pbp().
  • calculate_player_stats_def() failed in situations where play-by-play data is missing certain stats. (#382)
  • Spot-fixing calculate_player_stats() for NA names.

v4.4.0

1 year ago

New Functions, Options, Data

  • Added new function calculate_standings() that computes regular season division standings and playoff seeds from nflverse data.
  • The database function update_db() now supports the option "nflfastR.dbdirectory" which can be used to set the directory of the nflfastR pbp database globally and independent of any project structure or working directories.
  • The embedded data frame ?teams_colors_logos has been updated to reflect the most recent team color themes and gained additional variables for conference and division as well as logo urls to the conference and league logos. (#290)
  • The embedded data frame ?teams_colors_logos has been updated with the Washington Commanders. (#312)

Deprecation

  • The argument qs in the functions load_pbp() and load_player_stats() has been deprecated as of nflfastR 4.3.0. This release removes the argument entirely.

Bugfixes and Minor Improvements

  • Fixed bug where a player could be duplicated in calculate_player_stats() in very rare cases caused by plays with laterals. (#289)
  • Fixed a bug where the function add_xpass() failed when called with an empty data frame. (#296)
  • Fixed a bug where play_type showed no_play on plays with penalties that don't result in a replay of the down. (#277, #281)
  • Fixed a bug in the variable descriptions of total_home_score and total_away_score. (#300)
  • fast_scraper_rosters() and fast_scraper_schedules() now call nflreadr::load_rosters() and nflreadr::load_schedules() under the hood (#304)
  • Fixed a bug causing missing EPA on game-ending turnovers in overtime
  • Bump minimum nflreadr version to 1.2.0 for data repository change
  • Fix a bug affecting yardline for a very small number of plays in the 2000 season (#323)
  • update_db() now uses a default play to predefine column types for all db drivers. (#324)
  • Fix a bug that resulted in incorrect xyac_mean_yardage on 4th downs (#327)
  • Fix a bug that resulted in missing xyac information for plays involving J.O'Shaughnessy (#329)
  • Fix a bug that resulted in missing epa on the last play of some games involving NE and BUF (#331)
  • fast_scraper() and build_nflfastR_pbp() now return data frames of class nflverse_data to be consistent with nflreadr.
  • Fix behavior of EP model in neutral site games (treats both teams as away teams)

v4.3.0

2 years ago

Minor Changes

  • Add nflreadr to dependecies and drop lubridate and magrittr dependency
  • The functions load_pbp() and load_player_stats() now call nflreadr::load_pbp() and nflreadr::load_player_stats() respectively. Therefore the argument qs has been deprecated in both functions. It will be removed in a future release. Running load_player_stats() without any argument will now return player stats of the current season only (the default in nflreadr).
  • The deprecated arguments source and pp in the functions fast_scraper_*() and build_nflfastR_pbp() have been removed
  • Added the variables racr ("Receiver Air Conversion Ratio"), target_share, air_yards_share, wopr ("Weighted Opportunity Rating") and pacr ("Passing Air Conversion Ratio") to the output of calculate_player_stats()
  • Added the function report() which will be used by the maintainers to help users debug their problems (#274).

Bug Fixes

  • Fixed a minor bug in the console output of update_db()
  • Fix for a handful of missing receiver names (#270)
  • Fixed bug with missing return_team on interception return touchdowns (#275)
  • Fixed a rare bug where an internal object wasn't predefined (#272)

v4.2.0

2 years ago
  • All wpa variables are NA on end game line
  • All wp variables are 0, 0.5, 1, or NA on end game line
  • Fix bug where win prob on PATs assumed a PAT placed at 15 yard line, even in older seasons
  • The function decode_player_ids() now really decodes the new variable fantasy_id (#229)
  • Fixed a bug that caused slightly differing wp values depending on the first game in the data set (#183)
  • Edited GitHub references to point to nflverse
  • Added the variables sack_yards, sack_fumbles, rushing_fumbles and receiving_fumbles to the output of the function calculate_player_stats(), thanks to Mike Filicicchia (@TheMathNinja). (#239)
  • Fixed a bug where calculate_player_stats() falsely counted lost fumbles on aborted snaps (#238)
  • Added the variable season_type to the output of calculate_player_stats() and load_player_stats() in preparation of the extended Regular Season starting in 2021 (#240)
  • Updated season_type definitions in preparation of the extended Regular Season starting in 2021 (#242)
  • Fix for fixed_drive where it wasn't incrementing when there was a muffed punt followed by timeout (#244)
  • Fix for fixed_drive where it wasn't incrementing following an interception with the intercepting player then losing a fumble (#247)
  • Fix for more issues with missing play info in 2018_01_ATL_PHI (#246)
  • Added the variables safety_player_name and safety_player_id to the play-by-play data (#252)
  • Dropped the dependency usethis

v4.1.0

3 years ago

Breaking changes

Functions

  • Added the function calculate_player_stats() that aggregates official passing, rushing, and receiving stats either at game level or overall
  • Added the function load_player_stats() that loads weekly player stats from 1999 to the most recent season
  • The performance of the functions add_xyac() and clean_pbp() has been significantly improved

New Variables

  • Added the new columns td_player_name and td_player_id to clearly identify the player who scored a touchdown (this is especially helpful for plays with multiple fumbles or laterals resulting in a touchdown)
  • The function calculate_player_stats() now adds the variable dakota, the epa + cpoe composite, for players with minimum 5 pass attempts.
  • Added column home_opening_kickoff to clean_pbp()
  • Added the variables sack_player_id, sack_player_name, half_sack_1_player_id, half_sack_1_player_name, half_sack_2_player_id and half_sack_2_player_name who identify players that recorded sacks (or half sacks). Also updated the description of the variables qb_hit_1_player_id, qb_hit_1_player_name, qb_hit_2_player_id and qb_hit_2_player_name to make more clear that they did not record a sack. (#180)

Minor improvements and fixes

  • The variable qb_scramble was incomplete for the 2005 season because of missing scramble indicators in the play description. This has been mostly fixed courtesy of charting data from Football Outsiders (with thanks to Aaron Schatz!). Some notes on this fix: Weeks 1-16 are based on charting. Weeks 17-21 are guesses (basically every QB run except those that were a) a loss, b) no gain, or c) on 3/4 down with 1-2 to go). Plays nullified by penalty are not included.
  • Change name, id, rusher, and rusher_id to be the player charged with the fumble on aborted snaps when the QB is unable to make a play (i.e. pass, sack, or scramble) (#162)
  • The function clean_pbp() now standardizes the team name columns tackle_with_assist_*_team
  • Fix bug in drive that was causing incorrect overtime win probabilities (#194)
  • Fixed a bug where posteam was not NA on end of quarter 2 (or end of quarter 4 in overtime games) causing wrong values for fixed_drive, fixed_drive_result, series and series_result
  • Fixed a bug where fixed_drive and series were falsely incrementing on kickoffs recovered by the kicking team or on defensive touchdowns followed by timeouts
  • Fixed a bug where fixed_drive and series were falsely incrementing on muffed punts recovered by the punting team for a touchdown
  • Fixed a bug where add_xpass() crashed when ran with data already including xpass variables.
  • Fixed a bug in epa when a safety is scored by the team beginning the play in possession of the ball (#186)
  • Fix some bugs related to David and Duke Johnson on the Texans in 2020 (#163)
  • Fix yet another bug related to correctly identifying possession team on kickoffs nullified by penalty (#199)
  • Fixed a bug where calculate_player_stats() forgot to clean player names by using their IDs
  • Fixed a bug where special teams touchdowns were missing in the output of calculate_player_stats() (#203)
  • Fixed for some old Jaguars games where the wrong team was awarded points for safeties and kickoff return TDs (#209)
  • The function update_db() no more falsely closes a database connection provided by the argument db_connection (#210)
  • Fixed a bug where yards_gained was missing yardage on plays with laterals. (#216)
  • Fixed a bug where there were stats wrongly given on a play with penalty (#218)
  • fixed_drive now increments properly on onside kick recoveries (#215)
  • fixed_drive no longer counts a muffed kickoff as a one-play drive on its own (#217)
  • fixed_drive now properly increments after a safety (#219)
  • Improved parser for penalty_type and updated the description of the variable to make more clear it's the first penalty that happened on a play. (#223)

v4.0.0

3 years ago

Breaking changes

Changed Functions

  • Deprecated the arguments source and pp all across the package. Using them will cause a warning. Parallel processing has to be activated by choosing an appropriate future::plan() before calling the relevant functions. For more information please see the package documentation.
  • The function build_nflfastR_pbp() will now run decode_player_ids() by default (can be deactivated with the argument decode = FALSE).
  • The function build_nflfastR_pbp() will now run add_xpass() by default and add the new variables xpass and pass_oe.
  • The functions fast_scraper() and build_nflfastR_pbp() now allow the output of fast_scraper_schedules() directly as input so it's not necessary anymore to pull the game_id first.

New Functions and Variables

  • Added the new function load_pbp() that loads complete seasons into memory for fast access of the play-by-play data.
  • Added the new variables rushing_yards, lateral_rushing_yards, passing_yards, receiving_yards, lateral_receiving_yards to fix an old bug where yards_gained gets overwritten on plays with laterals (#115).
  • Added columns vegas_wpa and vegas_home_wpa which contain Win Probability Added from the spread-adjusted WP model
  • Added column out_of_bounds
  • Added columns fantasy, fantasy_id, fantasy_player_name, and fantasy_player_id that indicate the rusher or receiver on the play
  • Added columns tackle_with_assist, tackle_with_assist_1_player_id, tackle_with_assist_1_player_name, tackle_with_assist_1_team, tackle_with_assist_2_player_id, tackle_with_assist_2_player_name, tackle_with_assist_2_team

Models and Miscellaneous

  • Tuned spread-adjusted win probability model one final (?) time. Expected points is now no longer required for calculate_win_probability()
  • Added field descriptions vignette("field_descriptions") with a searchable list of all nflfastR variables
  • Switched data source for 2001-2010 to what is used for 2011 and on
  • All models have been moved to the fastrmodels package
  • Added the data frames ?field_descriptions and ?stat_ids to the package

Minor improvements and fixes

  • Fix bug where fixed_drive and series weren't updating after muffed punt (#144)
  • Fix bug induced by fixing the above (#149)
  • Fix bug where some special teams plays were incorrectly being labeled as pass plays (#125)
  • Fix bug where points for safeties were given to the defteam instead of the posteam (#152)
  • Fix bug where a muffed punt TD was given to the wrong team in a 2011 Jaguars game (#154)
  • Win probability is now calculated prior to PAT attempts rather than using WP on the ensuing kickoff
  • Improved performance of internal functions that speed up the rebuilding process in update_db() (added qs and curl to dependencies)
  • Fixed a bug where calculate_expected_points() and calculate_win_probability() duplicated some existing variables instead of replacing them (#170)
  • Fixed a bug where penalty_type wasn't "no_play" although it should have been (#172)
  • Fixed a bug where penalty_team could be incorrect in games of the Jaguars in the seasons 2011 - 2015 (#174)
  • Fixed a bug related to the calculation of epa on plays before a failed pass interference challenge in a few 2019 games (#175)
  • Fixed a bug related to lots of fields with NA on offsetting penalties (#44)
  • Fixed a bug in epa when possession team changes at end of 1st or 3rd quarter (#182)
  • Fixed a bug where various functions have left open connections
  • vegas_wp is now NA on final line since there is no possession team

v3.2.0

3 years ago

Models

  • Performance update for win probability model with point spread (vegas_wp)
  • Added yardline_100 as an input to both win probability models (not having it included was an oversight)

Minor improvements and fixes

  • Fixed a bug where series was increased on PATs
  • Fixed a bug affecting the week 10 Raiders-Broncos game
  • Added the column team_wordmark - which contains URLs to the team's wordmarks - to the included data frame ?teams_colors_logos