I have scraped data from the SC2 API and generated some graphs.
Balance is a constant discussion point in SC2, and it mostly revolves around winrates in the different matchups. I think this is the wrong way to look at it, and I will try to explain why I think so.
First of all, let's look at MMR - this is basically the same thing as ELO rating, commonly used in chess. Given enough time, and enough consistency in player skill, the MMR rating will stabilize and become a good indicator of relative skill of the total player pool.
You can think of MMR as function of skill and balance.
mmr = f(skill, race_bonus)
Now, we don't know what the function actually looks like, but we don't really have to.
If the meta game is stable and there are no balance changes, then players will move towards their real MMR (for this particular meta and balance) and then the win-rate will also move towards an equilibrium. Win-rate only changes as a result of external changes (balance patches), where players current MMR no longer reflects their potential MMR.
So win-rate itself is not very useful on its own. Instead we should look at player skill distribution.
If we assume that players are evenly distributed across the races with regards to their skill level, we should see an even distribution of MMR as well, assuming we have a good balance.
This is perhaps not a totally fair assumption, because we also need to consider other factors, such as game enjoyment per race. Even if races would be perfectly balanced, some races might be more frustrating to play and lead to more race switches or quitting the game entirely.
So for fun, I've started to scrape data from the SC2 ladder using the official API's. For each 1v1 ladder, I've gathered player-id, mmr and race. There are currently about 20,000 players in Korea, 60,000 in US and 70,000 in Europe. The current data was gathered at 2019-02-09.
Once that data is gathered, it is trivial to generate graphs of various ways of slicing and grouping the data.
Let's start with the most obvious graph - What is the distribution of MMR per race? I did this as a cumulative graph, since that makes it easier to identify break-even points.
This means that each point in the graph represent that number of players that have reached this MMR (or higher). One effect of this is that the left-most point represents the total amount of players and the right-most point is no players at all. (I am using data from Europe here)
As we can see here, the distribution of players is slightly uneven. There are more terran than protoss, and more protoss than zerg. Random players are lagging behind a lot, but let's ignore them since the focus is on race balance.
Even though zerg is the least represented race over all, they are the most represented in the range 2500-5000 MMR - by a large margin up until 3500.
Let's normalize the graph to account for different population sizes:
This doesn't change the zerg dominance, but interestingly enough it affects the random players who now have a similar graph to zerg. The right side of the graph is a bit hard to read, so let's show relative sizes between the races instead.
Now it looks like zerg has a clear over-representation all the way up to 5300 MMR, where protoss catches up. Terran however seems to be trailing behind.
I tried finding more interesting data given other ways of looking at the data, but didn't find anything interesting enough. For instance, perhaps the data would look different if you filtered away all off-races, or only looking a new or old players.
If you want to dig deeper, go here for all the graphs. They are organized by region: US, Europe, Korea
Source could can be found at github.com/krka/sc2stats. It contains a potentially useful Blizzard API data collector as well as code for generating the graphs shown here.
If you have ideas for other ways to explore the data, let me know, by making a pull request or creating an issue in the project!