There’s an old saying in the building trade (a trade that I’m utterly unfamiliar with, whose work often approximates black magic to this Grumpy Metal Guy) - Measure twice, cut once. The further into my professional career I get, the more I’m convinced that this applies not just to physical tasks, but software related tasks as well.

In software terms, this saying is (roughly) translated to Quantify Everything. Statistics are the best tool in our arsenal to determine how well something is performing, how efficient our people and processes are, and how to compare two things in a non-subjective sense. This last point is particularly important, as without cold, hard, emotionless statistics to fall back on, any other description or comparison can be described as “Just your opinion. And a Grumpy one at that!”

Statistics can be a double-edged sword however. Once people know that statistics are being gathered, many will have a natural tendency to game them, and to use them to look as good as they possibly can. Therefore, most statistics that you’re going to pull together are best looked at holistically, in conjunction with each other. Stats on the stats can also often prove useful in determining patterns that might show gaming behaviour taking place.

Useful Stats

Some example of statistics that have proven useful in the past:

Statistic Why Use It How To Avoid Misuse
Code quality statistics. Indicates quality and maintainability of code. Examples include cyclomatic complexity, Halstead Metrics, and the number of lines of code removed (this is driven by the need to reduce code complexity and cruft by removing code instead of adding it). Anything you can gather that helps give you a clearer idea of how your development process is operating is useful! These tend to be fairly difficult to game, as they’re showing up bad code. Gaming this would be good!
Commits per day. Show regular work effort. Simply committing one line at a time would boost this stat. Combine with some kind of measure of how much work each commit contains.
Production job failures. A regular number of production job failures shows your code is not handling certain issues. This needs fixing! You should ideally differentiate between intermittent and regular failures - these need different solutions.
Release frequency. You should be releasing code regularly. If not, any errors in your production code will continue until fixed and released. Regular releases with no real content could make this look good. Ensure that releases are meaningful by measuring the change in each release.
Code committed but not merged. If we’ve worked on something, we should merge it to ensure that others are able to use it. The work was clearly valuable enough to complete, and we run the risk of it being lost if someone moves on and the change is forgotten about, or even worse, reimplemented by someone else.
Work estimate. Allows tracking of how long we think work should take. Validate work estimates during sprint planning or similar, otherwise simple tasks may be scoped to take days or weeks longer than they should.
Actual work effort. Track how long work actually took against the estimated work. This one needs careful tracking, but if the estimated work effort has already been validated during planning, it should be possible to check that any pattern of significant variance in this is seen.

A Picture is Worth 2^10 Words

The best way to deal with statistics in all their messy glory is to display them. The human mind is well adapted to spotting patterns, so plotting different measurements on top of each other or next to one another can give quick and meaningful insights into the underlying structure in our systems. I tend to try and break data down into system-specific measures (failures, releases) and people-specific measures (work estimates vs actual time). This allows senior management to get high-level insight into system-specific behaviours quickly, while providing managers with views on how their teams are performing at the individual level.

It’s Not Personal

It definitely isn’t. Statistics aren’t personal. Ultimately, this is the whole point of the generation process. If someone is unhappy with what a graph generated from statistics is showing them, it mostly tends to be because it’s revealed something that they’re not doing properly. Admittedly, there may be other statistics that are not being captured that help offset a perceived negative reaction to the stats being looked at. This is a good thing! Generate new statistics, graph them, and tell a better story! Everyone wins!

But otherwise, and it’s hard to believe that Grumpy Metal Guy is typing this, excessive grumping will not help here, and this can hopefully be a lesson in self-reflection. Use the stats to identify things that need improving, then make the change.