Recreating DailyBaseballData.com Part 2
If you haven’t read my last post about recreating dailybaseballdata.com, check it out here.
In this second part, I will be tackling the Pitcher vs. Batter Matchups. As the MLB season is two days in, I want to get this done as quick as possible so I can collect historical data for the whole 2022 season, I’ll fill in the missed games manually.
Approach
My first thought in retreving pitcher data was to scrape baseball-reference for every single one of the pitcher’s games; make a big database, and query from there when needed. I would do the same with the batters. Obviously, this would be very expensive, but I like the idea of building my own database to pull historic data from.
Thankfully, I found pybaseball
, “a Python package for baseball data analysis.” Instead of scraping every MLB database, pybaseball will save me some hours (or weeks).
Data Retrieval?
As I said before, pybaseball
is going to save me countless hours.
The goal: Grab all pitcher vs batter matchups for each day.
So, I need to get each team’s lineups each day. I would use the script that I use for matchups, but that only gets teams and location (as stated in my previous post). Therefore, I need to get each starting pitcher and starting lineup for each game.
I will be pulling this data from the MLB website.
Data Retrieval!
Once I have the starting lineups, I call pybaseball
for the data of a batter and filter it pitches between the opposing pitcher. If a batter and pitcher have never faced each other, then obviously we will not show any data for that matchup.
After filtering the data, I do some simple calculations to retrieve the following attributes of a pvb matchup:
- no. pitches
- plate appearances
- at bats
- walks
- 1B, 2B, 3B, HR
- strikeouts
- hit by pitch
- sac flys
- rbis
- batting average
- slugging
- on base %
- on plate slugging
- iso
Once all the data is retrieved, it is exported to a json file which is then parsed on the front end. I felt this was easiest to update the data and webpage throughout the day.
I also retrieve data for a batter’s last 5 games, but we’ll come back to that later.
Results
As always, there will be more work to improve this, but for now here is the progress:
There is still some more work to do, but here is the result as of now: