If you haven’t read my last post about recreating dailybaseballdata.com, check it out here.

In this second part, I will be tackling the Pitcher vs. Batter Matchups. As the MLB season is two days in, I want to get this done as quick as possible so I can collect historical data for the whole 2022 season, I’ll fill in the missed games manually.

Approach

My first thought in retreving pitcher data was to scrape baseball-reference for every single one of the pitcher’s games; make a big database, and query from there when needed. I would do the same with the batters. Obviously, this would be very expensive, but I like the idea of building my own database to pull historic data from. Thankfully, I found pybaseball, “a Python package for baseball data analysis.” Instead of scraping every MLB database, pybaseball will save me some hours (or weeks).

Data Retrieval?

As I said before, pybaseball is going to save me countless hours. The goal: Grab all pitcher vs batter matchups for each day.

So, I need to get each team’s lineups each day. I would use the script that I use for matchups, but that only gets teams and location (as stated in my previous post). Therefore, I need to get each starting pitcher and starting lineup for each game.

I will be pulling this data from the MLB website.

Data Retrieval!

Once I have the starting lineups, I call pybaseball for the data of a batter and filter it pitches between the opposing pitcher. If a batter and pitcher have never faced each other, then obviously we will not show any data for that matchup.

After filtering the data, I do some simple calculations to retrieve the following attributes of a pvb matchup:

  • no. pitches
  • plate appearances
  • at bats
  • walks
  • 1B, 2B, 3B, HR
  • strikeouts
  • hit by pitch
  • sac flys
  • rbis
  • batting average
  • slugging
  • on base %
  • on plate slugging
  • iso

Once all the data is retrieved, it is exported to a json file which is then parsed on the front end. I felt this was easiest to update the data and webpage throughout the day.

I also retrieve data for a batter’s last 5 games, but we’ll come back to that later.

Results

As always, there will be more work to improve this, but for now here is the progress:

There is still some more work to do, but here is the result as of now:

Check it out