A Set of Functions to Efficiently Scrape NFL Play by Play Data
calculate_series_conversion_rates()
now correctly aggregates season level conversion rates. Performance has also been improved. (#440)Thank you to @andrewtek, @gregalvi86, @Ic4ru5Wing, @JoeMarino2021, @jreddy1990, @marvin3FF, @mrcaseb, @RicShern, @SPNE, and @trivialfis for their questions, feedback, and contributions towards this release.
options("nflfastR.raw_directory" = {"your/local/directory"})
. Alternatively, both build_nflfastR_pbp()
and fast_scraper()
support the argument dir
which defaults to the above option. (#423)save_raw_pbp()
which efficiently downloads raw play-by-play data and saves it to the local file system. This serves as a helper to setup the system for faster play-by-play parsing via the above functionality. (#423)missing_raw_pbp()
that computes a vector of game IDs missing in the local raw play-by-play directory. (#423)get_pbp_nfl()
now uses ifelse()
instead of dplyr::if_else()
to handle some null-checking, fixes bug found in 2022_21_CIN_KC
match.calculate_player_stats()
now summarises target share and air yards share correctly when called with argument weekly = FALSE
(#413)calculate_player_stats()
now returns the opponent team when called with argument weekly = TRUE
(#414)calculate_player_stats_def()
no longer errors when small subsets of pbp data are missing stats. (#415)calculate_series_conversion_rates()
no longer returns NA
values if a small subset of pbp data is missing series on offense or defense. (#417)fixed_drive
now correctly increments on plays where posteam lost a fumble but remains posteam because defteam also lost a fumble during the same play. (#419)kick_distance
on all punts and kickoffs. (#422)calculate_player_stats()
no more counts lost fumbles on plays where a player fumbles, a team mate recovers and then loses a fumble to the defense. (#431)passer
, receiver
, and rusher
no more return NA
on "abnormal" plays - like direct snaps, aborted snaps, laterals etc. - that resulted in a penalty. (#435)Thank you to @903124, @ak47twq, @andrewtek, @darkhark, @dennisbrookner, @marvin3FF, @mistakia, @mrcaseb, @nicholasmendoza22, @rickstarblazer, @RileyJohnson22, and @tanho63 for their questions, feedback, and contributions towards this release.
calculate_standings()
no more freezes when computing standings from schedules where some games are missing results, i.e. upcoming games.try()
to avoid CRAN problems. (#404)calculate_standings()
wasn't able to handle nflverse pbp data. (#404)calculate_player_stats_def()
that aggregates defensive player stats either at game level or overall. (#288)nflverse_sitrep
which is an alias of the already available report()
calculate_player_stats_kicking()
that aggregates player stats for field goals and extra points at game level or overall. (#381)calculate_series_conversion_rates()
that computes series conversion and series result rates at a game level or season level. (#393)calculate_player_stats()
that reflects new nflverse data infrastructure.calculate_player_stats()
now unifies player names and joins the following player information via nflreadr::load_players()
:
player_display_name
- Full name of the playerposition
- Position of the playerposition_group
- Position group of the playerheadshot_url
- URL to a player headshot imageclean_pbp()
.calculate_player_stats_def()
failed in situations where play-by-play data is missing certain stats. (#382)calculate_player_stats()
for NA
names.calculate_standings()
that computes regular season division standings and playoff seeds from nflverse data.update_db()
now supports the option "nflfastR.dbdirectory" which can be used to set the directory of the nflfastR pbp database globally and independent of any project structure or working directories.?teams_colors_logos
has been updated to reflect the most recent team color themes and gained additional variables for conference and division as well as logo urls to the conference and league logos. (#290)?teams_colors_logos
has been updated with the Washington Commanders. (#312)qs
in the functions load_pbp()
and load_player_stats()
has been deprecated as of nflfastR 4.3.0. This release removes the argument entirely.calculate_player_stats()
in very rare cases caused by plays with laterals. (#289)add_xpass()
failed when called with an empty data frame. (#296)play_type
showed no_play
on plays with penalties that don't result in a replay of the down. (#277, #281)total_home_score
and total_away_score
. (#300)fast_scraper_rosters()
and fast_scraper_schedules()
now call nflreadr::load_rosters()
and nflreadr::load_schedules()
under the hood (#304)update_db()
now uses a default play to predefine column types for all db drivers. (#324)xyac_mean_yardage
on 4th downs (#327)xyac
information for plays involving J.O'Shaughnessy (#329)epa
on the last play of some games involving NE and BUF (#331)fast_scraper()
and build_nflfastR_pbp()
now return data frames of class nflverse_data
to be consistent with nflreadr
.load_pbp()
and load_player_stats()
now call nflreadr::load_pbp()
and nflreadr::load_player_stats()
respectively. Therefore the argument qs
has been deprecated in both functions. It will be removed in a future release. Running load_player_stats()
without any argument will now return player stats of the current season only (the default in nflreadr
).source
and pp
in the functions fast_scraper_*()
and build_nflfastR_pbp()
have been removedracr
("Receiver Air Conversion Ratio"), target_share
, air_yards_share
, wopr
("Weighted Opportunity Rating") and pacr
("Passing Air Conversion Ratio") to the output of calculate_player_stats()
report()
which will be used by the maintainers to help users debug their problems (#274).update_db()
receiver
names (#270)return_team
on interception return touchdowns (#275)wpa
variables are NA
on end game linewp
variables are 0, 0.5, 1, or NA
on end game linedecode_player_ids()
now really decodes the new variable fantasy_id
(#229)wp
values depending on the first game in the data set (#183)sack_yards
, sack_fumbles
, rushing_fumbles
and receiving_fumbles
to the output of the function calculate_player_stats()
, thanks to Mike Filicicchia (@TheMathNinja). (#239)calculate_player_stats()
falsely counted lost fumbles on aborted snaps (#238)season_type
to the output of calculate_player_stats()
and load_player_stats()
in preparation of the extended Regular Season starting in 2021 (#240)season_type
definitions in preparation of the extended Regular Season starting in 2021 (#242)fixed_drive
where it wasn't incrementing when there was a muffed punt followed by timeout (#244)fixed_drive
where it wasn't incrementing following an interception with the intercepting player then losing a fumble (#247)safety_player_name
and safety_player_id
to the play-by-play data (#252)usethis
calculate_player_stats()
that aggregates official passing, rushing, and receiving stats either at game level or overallload_player_stats()
that loads weekly player stats from 1999 to the most recent seasonadd_xyac()
and clean_pbp()
has been significantly improvedtd_player_name
and td_player_id
to clearly identify the player who scored a touchdown (this is especially helpful for plays with multiple fumbles or laterals resulting in a touchdown)calculate_player_stats()
now adds the variable dakota
, the epa
+ cpoe
composite, for players with minimum 5 pass attempts.home_opening_kickoff
to clean_pbp()
sack_player_id
, sack_player_name
, half_sack_1_player_id
, half_sack_1_player_name
, half_sack_2_player_id
and half_sack_2_player_name
who identify players that recorded sacks (or half sacks). Also updated the description of the variables qb_hit_1_player_id
, qb_hit_1_player_name
, qb_hit_2_player_id
and qb_hit_2_player_name
to make more clear that they did not record a sack. (#180)qb_scramble
was incomplete for the 2005 season because of missing scramble indicators in the play description. This has been mostly fixed courtesy of charting data from Football Outsiders (with thanks to Aaron Schatz!). Some notes on this fix: Weeks 1-16 are based on charting. Weeks 17-21 are guesses (basically every QB run except those that were a) a loss, b) no gain, or c) on 3/4 down with 1-2 to go). Plays nullified by penalty are not included.name
, id
, rusher
, and rusher_id
to be the player charged with the fumble on aborted snaps when the QB is unable to make a play (i.e. pass, sack, or scramble) (#162)clean_pbp()
now standardizes the team name columns tackle_with_assist_*_team
drive
that was causing incorrect overtime win probabilities (#194)posteam
was not NA
on end of quarter 2 (or end of quarter 4 in overtime games) causing wrong values for fixed_drive
, fixed_drive_result
, series
and series_result
fixed_drive
and series
were falsely incrementing on kickoffs recovered by the kicking team or on defensive touchdowns followed by timeoutsfixed_drive
and series
were falsely incrementing on muffed punts recovered by the punting team for a touchdownadd_xpass()
crashed when ran with data already including xpass variables.epa
when a safety is scored by the team beginning the play in possession of the ball (#186)calculate_player_stats()
forgot to clean player names by using their IDscalculate_player_stats()
(#203)update_db()
no more falsely closes a database connection provided by the argument db_connection
(#210)yards_gained
was missing yardage on plays with laterals. (#216)fixed_drive
now increments properly on onside kick recoveries (#215)fixed_drive
no longer counts a muffed kickoff as a one-play drive on its own (#217)fixed_drive
now properly increments after a safety (#219)penalty_type
and updated the description of the variable to make more clear it's the first penalty that happened on a play. (#223)source
and pp
all across the package. Using them will cause a
warning. Parallel processing has to be activated by choosing an appropriate future::plan()
before
calling the relevant functions. For more information please see the package documentation.build_nflfastR_pbp()
will now run decode_player_ids()
by default (can be deactivated with the argument decode = FALSE
).build_nflfastR_pbp()
will now run add_xpass()
by default and add the new variables xpass
and pass_oe
.fast_scraper()
and build_nflfastR_pbp()
now allow the output of fast_scraper_schedules()
directly as input so it's not necessary anymore to pull the game_id
first.load_pbp()
that loads complete seasons into memory for fast access of the play-by-play data.rushing_yards
, lateral_rushing_yards
, passing_yards
, receiving_yards
, lateral_receiving_yards
to fix an old bug where yards_gained
gets overwritten on plays with laterals (#115).vegas_wpa
and vegas_home_wpa
which contain Win Probability Added from the spread-adjusted WP modelout_of_bounds
fantasy
, fantasy_id
, fantasy_player_name
, and fantasy_player_id
that indicate the rusher or receiver on the playtackle_with_assist
, tackle_with_assist_1_player_id
, tackle_with_assist_1_player_name
, tackle_with_assist_1_team
, tackle_with_assist_2_player_id
, tackle_with_assist_2_player_name
, tackle_with_assist_2_team
calculate_win_probability()
vignette("field_descriptions")
with a searchable list of all nflfastR variables?field_descriptions
and ?stat_ids
to the packagefixed_drive
and series
weren't updating after muffed punt (#144)defteam
instead of the posteam
(#152)update_db()
(added qs
and curl
to dependencies)calculate_expected_points()
and calculate_win_probability()
duplicated some existing variables instead of replacing them (#170)penalty_type
wasn't "no_play"
although it should have been (#172)penalty_team
could be incorrect in games of the Jaguars in the seasons 2011 - 2015 (#174)epa
on plays before a failed pass interference challenge in a few 2019 games (#175)NA
on offsetting penalties (#44)epa
when possession team changes at end of 1st or 3rd quarter (#182)vegas_wp
is now NA
on final line since there is no possession teamvegas_wp
)yardline_100
as an input to both win probability models (not having it included was an oversight)series
was increased on PATsteam_wordmark
- which contains URLs to the team's wordmarks - to the included data frame ?teams_colors_logos