I have been dabbling in the KBO (Korean Baseball Organization) as I was asked about providing some plays for that league. Let’s start by saying I would provide selections and I was only going to be wagering for pizza money on these games. My initial concern was data availability and the sabermetrics of that data, unknown players, and managers; if the talent were good enough to maintain a resemblance to modeling for MLB, and if algorithms from MLB would apply to the KBO. Although I have concerns, there is one element that stands true. Human nature. Humans are not robotic and do not deliver consistency game to game or start to start.
The regression and progression to their mean averages do apply and are a quality element to predicting the performances we should see. For example, Game Score is a metric that measures the quality of a start for a pitcher. The mean average is 50. I found that data is available for Game Score for these games and can even be listed going back to like 50 games. The purpose is to provide a standard for this pitcher’s quality. Once that is determined, we can easily view the ups and downs above and below his mean average, thus we can predict (to a degree) when his better starts will be and when his worse ones will be. It kind of looks like this. Pitcher A has a last 10 game mean average of say 52. He then throws a 60 his 1st time out then a 57 his 2nd time out. His current 2 games mean average is 58.5 which is much higher than his last 10 so he is pitching above his mean average. It is just a matter of time by which a set back will happen that brings his average near or at his normal mean. By knowing this pitcher is in a regression scenario, we then can match other data to support the scenario or rebuff it to make a logical prediction of his performance.
I was extremely surprised to see that I could get a preseason season wins projection result via FanGraphs. They are a data-driven MLB stat company that I use daily. This was a great starting point for knowing the teams, then I could investigate further developing scenarios by applying some algorithms to predict the near- and long-term futures. FanGraphs has not been what it is to me for MLB, so I had to investigate further. I have found a couple of fantasy sites that have been helpful for knowing things like park effect, team strikeout rates, and tidbits of detail for players. Still yet, this is not enough to get a game flow going, so I had to find more. Then I found Statiz.com. I must use Google translate because the entire website is Korean, but this resembles the data I was searching. I get wRC+, wOBA, and WAR from this site. There is more data there like team performance vs left-hand pitchers, defense, and other metrics. I can only use what I can understand though. Due to the translation, there is much of it that I am not sure what the numbers represent. Ha-ha. However, I can generate a game flow that I will use to read into the outcomes of these games. One of the toughest areas to get numbers is team stats. The bullpens in this league are not very good, but I could only find each player's data, then not knowing who is who, it was difficult to gauge each teams’ results. I found a twitter guy who has been detailing every game into a spreadsheet and is charting specifically the bullpens, hence I found that data. After searching, I believe there is not much else readily available that I have not looked at. However, I would welcome more data!
In terms of the data being reliable, it is too early to say, but interesting detail is developing. There are a few players who are head and shoulders above the quality of the rest. They just tear up this league. They perform nearly 80% better than the average player so quickly identifying these guys was important and became somewhat simple. Many of them are former MLB guys who could not succeed at the big-league level but were supposed to. These players do not usually miss a beat, in fact, 5 of the top 6 players are from MLB on offense and 4 of the top 5 pitchers are from MLB. This does not make the league reliable, but those players surely are.
Not that I have not lost MLB game wagers in many ways, but in a week’s time and not even 10 wagers in, I have lost 2 games via very strange things. One was due to a walk-off wild pitch and the other was a walk-off balk! This brings me to the quality of play in this league. Regardless of the data, modeling, and algorithms, if the players do perform at a level high enough to stabilize the projections, then there is no point in having them. It means that if the quality of the play is so low that common results become uncommon, then it is the data that will become too flawed to use. Lower levels of play are entertaining to watch, but I am trying to make some pizza here! I cannot have these guys making ridiculous and obvious base running mistakes or throwing to the wrong bases, etc. So far there have been 82 errors made in 124 games. There have been roughly 66% of the games with an error not to mention any mistakes that have been made. Adding mistakes and errors would likely double this equation. There have been 55 wild pitches and an average of 3.4 BB per game! They have already plunked 65 guys! There have been 1102 innings pitched and 1196 hits allowed. As you can see, there is no lack of abundance for scoring opportunities. All of that is reliable if the quality of play does not fall below a level producing unpredictable outcomes. Guys run to bases that they should not or because they make mistakes in loss of concentration can not be properly accounted for in data.
The MLB is such an elite standard for which we have taken for granted. It has the most elite players. The best quality stadiums. The premier data and coaching tools available. They make their errors and mistakes, but they are much further separated by quality play, so the data flaws are minimalized. It remains to be seen if the KBO can live up to the quality necessary to have the data reflect predictability with some certainty, but over the 1st week, they have been fun to watch and my instinct says that the KBO will be just fine for eating well.