Is Buster Posey Destined for the Baseball Hall of Fame? Data Science Says Yes

baseball field

Andrew Yeh has always been fascinated with data science—whether he realized it or not.

As a kid growing up in Houston, Yeh was obsessed with baseball, specifically the hometown Astros. He pored over the batting averages of Glenn Davis, Craig Biggio, and Jeff Bagwell on the backs of their baseball cards and closely followed box scores in the morning newspaper. When he went off to study chemistry and biology in college (Cornell) and grad school (California Institute of Technology), Yeh kept an eye on player stats that determined the nightly success of his fantasy teams. Even when he moved to the Bay Area to work in the pharmaceutical industry, he followed his ‘Stros from afar.

So, earlier this year when Yeh decided that his career conducting drug experiments and doing lab work for big pharmaceutical companies wasn’t fulfilling, it should come as no great shock that he started looking into the field of data science.

Andrew Yeh at the Baseball Hall of Fame in Cooperstown, New York.

Andrew Yeh at the Baseball Hall of Fame in Cooperstown, New York.

“I think I just wanted to try something different,” Yeh said. “And nowadays so much data is gathered in all industries. It’s a big opportunity.”

Yeh’s next step down his new path was to formally acquire the necessary skills. Enter Galvanize. He attended several meet-up events in San Francisco, and at one particular gathering, Yeh saw a panel discussion featuring professionals in the field who had gone through Galvanize Data Science. “They had a lot of great things to say about it,” he said.

In April, Yeh started the fast-paced 13-week bootcamp. He was drilled in everything from basic fundamentals to different modes of programming to more advanced machine learning models. Yeh says in addition to the knowledgeable and helpful instructors and staff at Galvanize, he also learned a lot from his classmates, who had come from a wide array of industries, including computer programming, insurance, statistics, and technology.

But the most important thing Yeh learned during his Galvanize immersion was that he had made the right career decision. “Going through the program reinforced the notion that this was something I would enjoy,” he said.

Bootcamp was that much more enjoyable for Yeh because, for his capstone project, he decided to combine his newfound field with the game he’s always loved. Today, baseball is synonymous with the word “data.” Ever since journalist Michael Lewis penned the 2003 book “Moneyball” about the Oakland Athletics’ successful use of advanced metrics rather than eye-ball scouting to build their team, America’s pastime has been a game of numbers. But while teams use statistics to anticipate an athlete’s on-field performance, Yeh’s capstone idea was to use data to predict a player’s legacy.

slide.001_top15_active_players

Yeh’s family now lives in upstate New York, and when Yeh goes to visit, he often drives over to The National Baseball Hall of Fame in Cooperstown, New York. His idea was to compare current players’ career statistics to those of Hall of Famers and forecast whether or not the active players might be on pace that might someday lead to enshrinement.

Yeh broke down his analysis into five key counting statistics: Runs scored, runs batted in, home runs, hits, and stolen bases. He would later factor in awards, such as MVPs and Gold Gloves. A key was converting these counting statistics to rate statistics, which gave a measure of the pace at which a current player is accumulating statistics as compared to a Hall of Famer who played the same position at the same point in his career.

For instance, he looked at Buster Posey, the current San Francisco Giants catcher who’s been at his position for eight years. Then Yeh compared Posey’s numbers to those of New York Yankees’ Hall of Famer Yogi Berra at his first eight seasons. If Posey were producing home runs at a pace slower than Berra did at that same point in his career, the home run rate would be less than “1”, if he were on the same hits pace, the hits rate would be “1”, and if he were accumulating stolen bases at twice the rate, the stolen base rate would be “2”.

slide.005_top_younger_players_vs_similar_HOFers

Posey compared favorably with Berra. Young superstar Bryce Harper is on par with a 23-year-old Willie Mays. According to Yeh, veterans Albert Pujols and Ichiro Suzuki can go ahead and buy their tickets to Cooperstown. Unfortunately, none of Yeh’s beloved Astros made the cut.

Yeh’s disappointment was softened by the fact that his capstone project was a hit. When he presented it to prospective employers at a hiring event, Yeh fielded a number of excited questions from other data scientists that turned out to be fellow baseball fans.

“It was a good jump start to the job search,” he said. “And it was nice to talk baseball.”

Meanwhile Yeh is not just waiting to see if that employer enthusiasm is a sound indicator of his job prospects; he’s following up on interviews and leads, doing a little consulting work for pharmaceutical companies. And of course, as the pennant races heat up, keeping an eye on his team back in Houston.

galvanize_logomark_text_4c

Level Up