Editor’s note: Major League Baseball™ (MLB™) is working closely with Google Cloud to score more use cases for its massive amounts of data. Here’s how the organization is using Google Cloud services to change the game for players, fans, and broadcasters. MLB Senior Director of Software Engineering, Rob Engel, contributed to this blog.
Baseball and statistics go way back. Since the first pitch flew across home plate at a professional ballgame more than 150 years ago, practically every action on the field has been tallied, added, averaged, and saved to chronicle America’s favorite pastime. With explosive growth in the amount and variety of baseball data being collected, when cloud computing came around it was, well, a game changer.
These days, cloud technology enables MLB™ to collect and analyze 25 million unique data points from each of its 2,430 regular season games. From helping players perfect their game, to bringing fans closer to the game, we’ll walk through a few ways MLB is hitting a grand slam with data.
Computing in stadium faster than a fastball
From the second batting practice begins to the time a walk-off hit ends the game, MLB is collecting data on the field. Statcast player and ball tracking technology allows for collection and analysis of a massive amount of baseball data in ways that were never possible in the past. Beginning in 2020, Statcast is powered by a Hawk-Eye system that uses 12 high-resolution cameras at all 30 MLB stadiums to track every movement of the ball and every player at 30 frames per second, with 18 unique points on each player’s body. As soon as the ball leaves the pitcher’s hand, Hawk-Eye captures roughly 60 data points, including speed and break angle as it reaches the batter.
ThroughAnthos and aGoogle Kubernetes Engine cluster, those cameras use on-site processing to turn the video feeds into structured data that’s instantly transmitted to the scoreboard and broadcasters. The result are stats that display faster than a 95-mile-an-hour pitch. And for the fans watching at home, the data from Hawk-Eye enables a live strike-zone visualization centered over home plate.
“Using Anthos, we’re able to do that all on-premises and replicate the entire software infrastructure that we run in Google Cloud,” said Rob Engel, senior director of software engineering at MLB. “It’s deployed on-premises and we don’t have to do anything too different.” This uniformity across deployment environments is key for MLB developers who may be running in the cloud, in a data center, or in a stadium.
Anthos also enables a backup solution as a pinch-hitter if the in-stadium system fails. For example, if the broadcast at Yankee Stadium™ stopped, MLB could run its code across New York City at Citi Field™ where the Mets play, or even in the cloud, and continue broadcasting without interruption. “If we had any issue in any stadium, we can shoot that data up to Google Cloud and process it there,” Engel said.
Adding context to amazing feats
But what about marrying all that in-stadium data with the years of historical Statcast data? Josh Frost, VP of product management at MLB, explains, “The exit velocity of a ball that was hit was 110 miles an hour—is that good? Is that bad? How does that compare across the league? That’s where we’re really focused as an organization—not just giving data to people but giving it context to make it information that can help them enjoy the game better.”
While Hawk-Eye can clock a pitch at 95 miles an hour with precise location, it is up to umpires to call the shots, and determine whether the pitch was a ball or a strike, or if a player is safe at first base. That’s where manual operators come in. Before each pitch is thrown, MLB staff manually tag metadata, such as the current pitcher, batter, inning, and so on.
Throughout a game, MLB is constantly uploading game data into Google Cloud to the point where each season amounts to over 25 terabytes of information. The player positional pose tracking data is stored in Bigtable and all the other game data is stored in Cloud SQL for PostgreSQL. And every night MLB runs a batch job usingDataflow to move game data from Bigtable andCloud SQL toCloud Storage buckets andBigQuery.
In the MLB Gameday Engine, which has the 150-year-old rules of baseball codified into logic, the organization’s live tracking statistics combine with traditional statistics—including batting average, strikeouts, and at bats. So when a player decides to steal third and sprints at 30 feet per second, MLB can rank that speed and provide context to instantly see if the player is in the top echelon for runners.
Pitching endless data possibilities
Everything—live, historical and in between—is fed into the MLB Stats API that populates consumer-facing tools like Baseball Savant, where fans can search for things like hit distance and launch angle. It also powers real-time use cases for broadcasters, as well as the MLB app andFilm Room. “We’re pulling in data from the API for everything from reviewing major league on-field performance, to player acquisitions, to running our models, to how player performance is going. It’s endless,” said John Krazit, director of baseball systems at the Arizona Diamondbacks™.
With endless data possibilities, MLB is putting together some amazing new experiences. On deck for this year is bringing FieldVision to the next level. This technology gives fans a 3D look at the field using the Hawk-Eye pose tracking data on players’ movements that’s stored in Bigtable. With the ability to generate replays from any position on the field, FieldVision delivers a view beyond what MLB has offered in the past, bringing fans closer to the field right from their desktop or mobile apps.
Now that’s a home run for everyone.
Major League Baseball trademarks and copyrights are used with permission of Major League Baseball. Visit MLB.com.
Cloud BlogRead More