Calculating WE for Ace Factor

Here’s an demonstration of how a WE score was calculated for a particular pitcher’s outing.

The Scene: August 25, 2010, Fenway Park, Felix Hernandez against Boston Red Sox

Exit Scenario: 7.1 innings, runners on second and third, one out

In this outing, Felix Hernandez was removed with one out in the eighth. Anything that happened beyond that point is discarded for this analysis.

Scoring environment is determined by the league’s average runs per game (RPG) allowed by all pitching staffs for the entire season. In the AL, RPG was 4.42 in 2010. The scoring environment is already integrated into the WE lookup spreadsheet. It’s also actively used by the algorithm so that a relative score can be generated for the WE lookup, per the equation below.

A relative score is a pitcher’s team score minus his opponents score. The opponent’s score is easy: in this case, it’s the number of actual earned runs Hernandez allowed (2) at the time he was lifted. For Hernandez’s team, instead of using the actual runs the Mariners may have scored, the scoring environment is used. To determine the effective score of the Mariners at the time Hernandez was lifted, the RPG number (which can also be expressed as “runs per 27 outs”) is scaled to the corresponding number of outs Hernandez pitched. This translates to a score of 3.60 runs (=4.42/27 x 22) corresponding to the 7.1 innings. Subtracting the actual earned runs allowed by Hernandez gives a relative score of +2.60.

With all of the parameters associated with the moment Hernandez left the game in place (inning, outs, baserunners, RPG, and relative score), the WE retrieved from the spreadsheet was found to be 0.797.

(Note that the spreadsheet data was converted to a data file which was subsequently accessed by the software while running the Ace Factor algorithm.)

The table below shows Hernandez’s seasonal game logs for 2010.

A few more notes (also known as “the fine print”):

1.       Separate run environments were specified between leagues (National and American) over the time period. To handle inter-league play, the home site dictated the scoring environment.

2.      The top-half of the inning section of the WE spreadsheet was used for all WE lookups, regardless of whether the pitcher being graded was home or away. For identical pitching lines, the home pitcher’s WE will always be better than a visiting pitcher’s WE simply because probabilities dictate that the home team has an advantage over the visiting team in terms of win expectancies. Thus, using a home/visitor distinction in the WE lookup wouldn’t be fair in grading pitchers. If anything, the visiting pitcher should get a better grade than the home pitcher—everything else being equal—because it’s tougher to win on the road. The solution then was to eliminate this home/visitor aspect altogether by using only the top-half of the inning section of the WE spreadsheet.

3.      Nine innings was the highest number of innings considered in a single game. Any innings pitched beyond nine innings by any starter had no impact on a pitcher’s WE score for the game.

4.      A “manager’s curve” was applied to ERs in the late innings. The theory is that it would be unfair that under certain conditions some pitchers had to endure an onslaught of runs charged against them in the 8th and 9th innings whereas other pitchers might have been lifted. The number of ERs charged against pitchers in the 8th inning, 9th inning, and beyond were derated as follows: Any ERs beyond the 9th were ignored (#3 above asserts that any innings beyond nine innings were ignored as well). A pitcher is charged 100%  for the first ER in the 9th, 50% for the second, 25% for the third, 0% for any runs beyond three ERs. In the eighth, a pitcher is charged 100% for the first ER, 75% for the second ER, 50% for the third ER, 25% for the fourth ER, and 0% for any runs beyond four ERs. Your first thought might be that this seems like a generous discount. You might argue that those ERs should be included in the pitcher’s body of work. Again, the thought is that some pitchers would have been lifted and not incurred those runs, which would inject an unfair bias against the pitcher charged with those runs. Also, this curve is an attempt to shift part of the blame for those runs to the manager for either not going to the bullpen early enough or sticking with the pitcher because of a depleted or incompetent bullpen. Note that if the pitcher is taken out mid-inning in the 8th and 9th innings, there is no adjustment to the WE lookup. So if a pitcher leaves with the bases loaded and nobody out, he’ll incur the full brunt of the WE reading. Originally, I intended to factor in the average number of innings pitched per start into the manager’s curve, because it is different across eras. After I found that this number hovered around 6.5 from the ’50s through the ’70s to just below 6.0 for much of the ’90s and ’00s, I decided it wasn’t worth the complexity for a situation that simply does not occur often enough, so I kept the manager’s curve constant for all eras.

5.      The WE spreadsheet lists WE values according to relative scores presented as integers, i.e. -2, -1, 0, +1, +2, etc. The relative score calculated for when the pitcher was lifted in most cases resulted in non-integer values. Therefore, two WE values—upper bound and lower bound—were looked up for each relative score. For instance, if the relative score was -2.3, the upper bound WE was looked up using a relative score of -2, and the lower bound WE was looked up using an relative score of -3. From there, the precise WE was determined by using the fractional part of the relative score to extrapolate. In this case, the “-0.3″ represents 30% below the -2 integer, so the precise WE calculated would be 30% less than the upper bound WE score. This technique is a tradeoff since the relationship between WE values among relative score integers is not a linear one. The alternative, I believe, would be to integrate the entire process used for generating the WE spreadsheet numbers into this algorithm. Would it be worth the effort so that we can gain the accuracy lost in the extrapolation technique? For now, I leave that to the WE/WP experts.

6.  The theoretical WE grade for all complete games wins is 1.000. Two issues arise because of this. First, there isn’t a way to distinguish between CGs where pitchers allowed a different number of ERs. A CG outing while allowing zero ERs shouldn’t be equivalent to a CG outing while allowing 3 ERs. Second, CG shutouts in different scoring environments warrant different WE grades—a CGSO in 1998 is much harder to accomplish than a CGSO in 1968. The solution was to convert all CG scenarios to 8.2 innings (from 9) using no baserunners—in other words, the closest scenario to a CG—and adjust the WE score corresponding to that scenario according to a predetermined extrapolation factor that was derived from the chart readings between relative score boundaries.

Share