- Posted on June 26, 2008 9:25:AM by Earl Beede to Practicing Earl
- Technique, sigma, measurement, metrics, Management
I have a love/hate relationship with software metrics. While I acknowledge that well designed, well collected, and well utilized measures on software projects can be of huge benefit to the project team, most of us suffer under the burden of measures that are non-designed, ad hoc collected, and not utilized by the team.
Let's take a quick look at those three areas of software measurement: design, collection, utilization.
The starting point for any useful software measurement is in the design. Most software measurement gurus I know point to Vic Basili's Goal/Question/Metric (GQM) approach for designing useful software measures. In this approach, you start by articulating a clear goal you want your task/project/organization to achieve.
Well, right there is were the measurement woes begin. It is not common to see well articulated goals in software development in general let alone associated with the measurement program. What I see at clients' sites is often more like a general wish than a goal: "improve productivity", "reduce defects".
Where I have seen at least a measurable goal, "reduce defects by 50%," no mention of what kind of defects we cared about. The team previous to the "goal" implemented small enhancements requested by the customer by adding them to the defect database. With general defect reduction now the goal, the team started arguing with the clients about what was a "defect" and forced all small changes into the bureaucratic change control process. Not only did the released quality not change but the customer was even more upset than before.
Truly sick sigma in action.
There is a lot of information on how to write good goals out there. The more interesting question is that why, with all that good information, do we still not have well stated goals in general and software measurement in particular?
My take is that good goals require both resources to achieve and accountability. Both are in short supply when the focus of the organization is almost entirely on getting the product out the door. You need slack to do good measurements and our getting lean and mean leaves no room for any goals than "ship it".
After a reasonable design is in place (and that may take a few iterations) it is time to move to collection. Here, sick sigma metrics fail in two areas: practice and accuracy.
"Practice makes pretty useful" should be a mantra for software measurement. Given a well designed metric, the people collecting the data still require a lot of practice in identifying instances of the metric. Without the practice the data collected will end up comparing apples and applets. Close but no banana.
Take the common data collection item Line of Code. What is a line of code? Does it include comments? Only executable statements or physical lines? What about actual function? Would we count it differently in C++ than COBAL?
Or how about Defect? Do I include defects committed in requirements but discovered in design? How about defects in areas that the specifications simply did not address? How long after initial customer acceptance do we still count the defect for our metric?
While the design should address some of these questions, only practice in the world of messy reality can help a software measurement. To paraphrase Fred Brooks, plan to measure several times, you are going to throw the first few away anyhow.
The second area that sick sigma falls down is accuracy. Sick sigma substitutes precision for accuracy. The metrics produced by sick sigma are very nice down the fifth or six decimal point. Unfortunately, they are wrong.
The data collection in almost all of our human systems can only achieve a rough level of precision. With overly precise measure we make it harder to collect and easy to dispute. Instead we want measures that are easy to collect and harder to dispute.
For example, say you are collecting effort data. Sick sigma implementations would have each technical professional record down to 10 or 15 minute increments twice a day how much time was spent on task. Not only does this high level of precision make it irritating for the data collectors (the technical staff) but the wide variation of knowledge work will result in people recording more the time they spent in the office than work on task. This makes it ripe for dismissal as useless.
Instead, if had them round it off to the nearest hour at the end of the week, then we could present the data as imprecise but accurate; it tells the right story. The question of effort data is not, "did we spend 17 hours or 18 hours" on a task but, "did we spend a small amount or large amount" of time. Here, rounded off data that is accurate but not precise can tell us what we need to know. Attempts to dismiss the data as inaccurate is far more easier to refute since it isn't trying to be precise as well.
The last area where sick sigma hinders software teams is in the utilization of the measurements. In sick sigma organizations the data collected disappears into a metrics black hole. I call this "measures for the merely curious" since it never seems to change anything. This usually covers almost all metrics that are requested by upper management.
Data collectors should only collect data so decisions can be made and actions taken.
What we need to do is feedback the measures to the data generators/collectors as quickly as possible. That is where the main action is so that is where information can aid in making decisions.
The best case is that the data is immediately visible on a common wall where the data generators see it on a regular basis. I recall one shop where code growth, earned value, and defect counts were all kept on a wall in the midst of the development team. Once a week this information was updated and the team made decisions based on the data. This is not so different from what many agile teams are doing today.
Healing Sick Sigma
Using GQM, getting practice, telling the right story, and using the information locally are all ways to heal the wounds caused by poor metrics.
By the way, any resemblance between "sick sigma" and "Six Sigma" is . . . uh . . . purely coincidental, yeah! that's the ticket. If you want to know my views on the latter, drop me a line and I will sure to trend the results, Pareto the feedback, and analyze it to seven significant digits. Then I will stick it somewhere, I am not sure where (and not there either), but it will be colorful!