Measuring Productivity of Individual Programmers

  1. Posted on April 9, 2008 5:22:PM by Steve McConnell to 10x Software Development
  2. Methods & Processes, Testing & QA, productivity, Management, Design, Individual Productivity, Construction

My last couple of posts on productivity variations among programmers and the Chief Programmer Team model gave rise to some discussion about hazards of measuring software productivity at the individual programmer level. Software engineering studies normally measure productivity in terms of time to complete a specific task, or sometimes in terms of lines of code per effort-hour, staff-month, or some other measure of effort. Regardless of how you choose to measure productivity, there will be issues.

Productivity in Lines of Code Per Staff Month

Software design is a non-deterministic activity, and researchers have found 10x variations in the code volume that different designer/developers will generate in response to a particular problem specification. If productivity is measured as lines of code per staff month (or equivalent), that implicitly suggests that the programmer who writes 10 times the amount of code to solve a particular problem is more productive than the programmer who writes 1 times the amount of code. That clearly is not right. Some commenters on my previous blog entry asserted that great programmers always write less code. My observation is that there’s a correlation there, but I wouldn’t make that statement that strongly. I would say that great programmers always write clear code, and that often translates to less code. Sometimes the clearest, simplest, and most obvious design takes a little more code than a design that’s more "clever"--in those cases I think the great programmer will write more code to avoid an overly clever design solution. Regardless, the idea that productivity can be measured cleanly as "lines of code per staff month" is subject to problems either way.

The problem with measuring productivity in terms of lines of code per staff month is the old Dilbert joke about Wally coding himself a minivan. If you measure productivity in terms of volume of code generated, some people will optimize for that measure, i.e., they will find ways to write more lines of code, even if more lines of code aren’t needed. This isn’t really a problem with this specific way of measuring productivity. This really just speaks to the management chestnut that "what gets measured gets done," so you need to be careful what you measure.

Productivity in Function Points

Some of the problems of "lines of code per staff month" can be avoided by measuring program size in function points rather than lines of code. Function points are a "synthetic" measure of program size in which inputs, outputs, queries, and files are counted to determine program size. An inefficient design/coding style won’t generate more function points, so function points aren’t subject to the same issues as lines of code. They are however subject to more practical issues, namely that to get an accurate count of function points you need the services of a certified function point counter (which most organizations don’t have available), and the mapping between how function points are counted and individual work packages is rough enough that it becomes impractical to use them to ascertain the productivity of individual programmers.

What about Complexity?

Managers frequently mention this issue: "I always give my best programmer the most difficult/most complex sections of code to work on. His productivity on any measured basis might very well be low compared to programmers who get easier assignments, but my other programmers would take twice as long." Yep. That’s a legitimate issue too.

Is There Any Way to Measure Individual Productivity?

Difficulties like these have led many people to conclude that measuring individual productivity is so fraught with problems that no one should even try. I think it is possible to measure individual productivity meaningfully, as long as you keep several key factors in mind.

1. Don’t expect any single dimensional measure of productivity to give you a very good picture of individual productivity. Think about all the statistics that are collected in sports. We can’t even use a single measure to determine how good a hitter in baseball is. We consider batting average, home runs, runs batted in, on-base percentage, and other factors--and then we still argue about what the numbers mean. If we can’t measure the "good hitter" using a simple measure, why would we expect we could measure something as complex as individual productivity using a simple measure? What we need to do instead is use a combination of measures, which collectively will give us insights into individual productivities. (Measures could include on-time task completion percentage, manager evaluation on a scale of 1-10, peer evaluation on a scale of 1-10, lines of code per staff month, defects reported per line of code, defects, fixed per line of code, bad fix injection rate, etc.)

2. Don’t expect any measures--whether single measures of a combination of measures--to support fine-grain discriminations in productivity among individuals. A good guideline is that measures of individual productivity give you questions to ask but they don’t give you the answers. Using measures of performance for, say, individual performance reviews is both bad management and bad statistics.

3. Remember that trends are usually more important than single-point measures. Measures of individual productivity tend to be far less useful in comparing one individual to another than they are in seeing how one individual is progressing over time.

4. Ask why you need to measure individual productivity at all. In a research setting, researchers need to measure productivity to assess the relative effectiveness of different techniques, and their use of these measures is subject to far fewer problems than measuring individual productivity on real projects is. In a real project environment, what do you want to use the measure(s) for? Performance reviews? Not a good idea for the reasons mentioned above. Task assignments? Most managers I talk with say they *know* who their star contributors are without measuring, and I believe them. Estimation? No, the variations caused by different design approaches, different task difficulty, and related factors make that an ineffective way to build up project estimates.

On real projects it’s hard to find a use for individual productivity measures that is both useful and statistically valid. In my experience, aside from research settings the attempt to measure individual performance arises most often from a desire to do something with the measurements that isn’t statistically valid. So while I see the value of measuring individual performance in research settings, I think it’s difficult to find cases in which the effort is justified on real projects.

(Measuring team productivity and organizational productivity is a different matter -- I'll blog about that soon).

Daniel Yokomizo said:

April 9, 2008 7:25:PM

A quick note on individual productivity measurement and function points. I was a certified function point specialist by IFPUG by three years and worked a lot with this method, so I know something about it. Anyway conceptually measuring productivity by function points is a wonderful idea, after all function points are an objective measure (when done right), but it faces a few problems that are quite unsolvable:

1. IFPUG's (there are other methods, like COSMIC-FFP) function points (FPs)  have a complexity limit, after a certain threshold a function is considered to be complex, without a distinction in how complex it is and two "complex" functions have the same measure in FPs so you can't expect to handle N FPs worth of features to one developer, do the same to another and expect the productivity measure to matter. This is specific to IFPUG's FPs but it's the most widely used today AFAICS. This limit impairs greatly the correct measurement of complex systems, a reason why I started studying COSMIC FFP (which don't have this flaw).

2. FPs are a measure of complexity based on some objective criteria of a feature, but we don't have a comprehensive measure of complexity. Most methods ignore algorithmic complexity which can make a feature much harder to implement, test and understand. Eventually we may figure out a way to assess feature complexity objectively and without problems but today the methods fail at this.

3. FPs are external to software and define the complexity of the feature as a whole. For example if we have two features: create foo and change foo both will share code (e.g. foo's class, persistence layer, UI), so after doing one of them the other will be much easier to implement (compared to  the effort of doing it first). As an application grows (hopefully) more code is shared, so the productivity "improves" (i.e. the measure becomes more skewed) for features that are similar to the ones already implemented. Code sharing, frameworks, libraries and tools all affect measurement of productivity of individual features to the point where correlation is insignificant. On the studies I made the data suggested no pattern on the measurement of individual features.

Now when we use FPs to measure teams productivity on bigger sets of features (e.g. 30+ use cases) the correlation becomes significant.

If we want to measure something on an individual basis I would suggest that we should measure variation between the estimate an individual makes and the actual time required, so we can train people to give better estimates and learn how to avoid a few biases in estimation and planning. Other than that there's too much noise.

I did individual productivity measurement (based on a variation of Mark II function points to avoid the complexity limit of IFPUG's FPs) in the study I mentioned in the previous blog post's comments, it was a small team doing four iterations of the entire development process (i.e. from analysis to deployment on each iteration) using an incremental approach (different features on each iteration). Each developer was responsible for a full module on each iteration, so we minimized effects of code sharing and the iterations decreased in business and technical complexity (i.e. most complex and risk at the initial iteration, easier ones later) so we minimized effects of algorithmic complexity within the iterations (roughly every developer got an module with equivalent complexity). The interesting result we got was that the difference in productivity of the programmers didn't vary between them, on each iteration they kept the same relative productivities, but on each iteration as the team got more knowledgeable of the domain, the application's infrastructure become richer and more robust, and the features become easier, their productivity increased by 2.16x (IIRC the values were 1x, 2x, 4.5x, 10.1x). So the entire team doubled productivity on each iteration, but they kept the relative productivities. In the end every programmer implemented the same amount of FPs (within 10% variation) on each iteration. On later projects we found out that the productivity values changed too much, but the relative productivities kept stable. As we had management approval we could divide the features and iterations almost perfectly and we could resist the urges to drop the constraints to keep an absurd schedule or introduce features in the middle of an iteration. In the end the study was considered useful to identify the strengths and weaknesses of the team, but failed to provide a way to make better estimates (the original goal of the study) and we realized the importance of good product and risk management (i.e. developing the riskier features first) and stability of features within iterations.

My conclusion from this study was that we can focus on individual programmers' differences but they don't predict much unless they are supported by a good set of development practices and a sane development process. Also a follow up study, tracking bugs to individual developers (using Subversion) of this project for a period of one year after final deployment, showed that incidence, severity and effort to correct bugs is unrelated to productivity for most programmers, but all of these are things that matter (even more than the productivity IME) in a project and would be important to measure. In the end I gave up on measuring as a career and focused on development practices, product management and tools (languages, libraries, IDEs and frameworks, in that order) to make  productivity (and other measures) more predictable.

Mark Roddy said:

April 10, 2008 12:37:AM

One of the issues that generally disregarded (though not always) is that productivity is also a poor measure if desired level of quality is not taken into account.  

A real world example.  There was a developer who worked for our department that was known for being SUPER productive.  He'd churn out massive volumes of code in time that dwarfed everyone else, but for some reason he was constantly being shifted to different teams.  When he was shifted to my team I found out why.  Though he did turn out code faster then anyone else I've ever worked with, the quality was so low that the savings of his high productivity was lost on the maintenance several times over.  

If we had a position in which most of the work was one off and the code would be thrown out he would have been a godsend, but due to our desired level of quality based on the fact that we are producing applcations expected to live for many years his high productivty was actual resulted in losses in the long run.

I think that any discussion of productivity needs to take into account the level of quality of code produced versus a desired level of quality based on the needs of the project at hand.  But then compared to productivity, quality is probably a much harder metric to measure.

Hamish said:

April 10, 2008 12:44:AM

Another great post Steve, many thanks.

Paul Johnson said:

April 10, 2008 1:54:AM

Thankyou Steve, and also thankyou Daniel Yokomizo for a great followup.

Big Maybe said:

April 10, 2008 10:07:AM

Actually, in my experience, LOC is a pretty good rough measure of productivity.

The mediocre programmer will need to patch together line after line of spaghetti to accomplish the same task that supercoder does in far fewer lines. That much is true.

However, the supercoder factors his code properly and will design tight class hierarchies to eliminate code duplication, raise cohesion and reduce coupling. He will put in exception handling at every point at which one can concievably occur. Then he will code up logging, monitoring, and alerting modules so that program bugs can be easily tracked and fixed.

The super coder checks his calendar and - hey - I've got a few more days, so let's refactor this table structure in a way that will enable useful reporting; let's attribute these structures properly so that they can be serialized to file; let's go ahead and internationalize all I/O; I can continue, but you get the idea. Now hundreds of more LOCs pour in, each one adding quality to the product.

Meantime, mediocre coder is still struggling to get his code to work and adding lines, commenting out other lines, until he can get it to work "on his machine".

Steve McConnell said:

April 10, 2008 5:01:PM

@Mark's comment about high "productivity" but low quality -- one truism is that work will move from unmeasured categories to measured categories. So if you're measuring quantity but not quality, you'll get quantity but not quality. That's the tricky part of measuring almost anything. You have to define a set of measures that collectively don't leave any room for unwanted behavior to creep in. Arriving at a set of measures that accomplishes that is often an iterative process, and reasonable guesses about what responses a measure will produce are often wrong. The result is that early iterations can miss the mark by a lot.

Daniel Holmes said:

April 11, 2008 1:17:PM

I think that Dilbert was that the boss was going to improve quality by paying a bonus for every bug the developers fixed.  So Wally was going to write some bugs to be fixed.

I agree with you basic observations still.

Simon Parmenter said:

April 13, 2008 3:56:AM

@Big Maybe: So the super coder does analysis and design where the mediocre coder does not? A better planner as well?

Charles Seybold said:

April 13, 2008 9:22:PM

Great post. I strongly believe that measurement is tricky and can easily influence productivity in the wrong direction. The point Daniel makes is key: “measure variation between the estimate an individual makes and the actual time required”. I’d go one step father and make that idea the basis for almost every task and project: Try a continuous cycle of 1) ranged estimation, 2) execute, 3) look at the variance (together), and 4) repeat weekly until done. This will keep expectations aligned and uncertainty in clear focus.  If you do this, I think you’ll be near the high end of productivity for any given team with a minimum amount of overhead.  If we’re going to measure, we should reserve the effort for quality metrics not productivity metrics. If people are hitting their honest estimates with good quality, they are doing their best or at least as much as they get paid for.  

As an aside, on our press tour for LP, we analysts always asked us if we’d use the estimation history to rate estimators. We told them the point was to help people learn rather than give managers a stick to beat people up. This is the danger of individual metrics, the temptation to misuse them is great. Metrics at the team level, project level, or category level – well that’s a completely different story; looking forward to the next post.

Tom Chizek said:

April 14, 2008 9:03:AM

Great post! Not enough attention is paid to the downside of bad measurement. I have worked several places where great value was placed on cranking out code but little attention was paid to the quality of the code that was cranked. The result was predicable (now that I have lived through it) lots and lots of very buggy code. The turn around was when we started looking at the number of bugs reported along with amount of code produced. With that we (Management - the team already knew but could not get the problem addressed) could see who was just writing junk code and who was writing less code but also fewer bugs. It was really a case of people reacting to measurement.

Ben Northrop said:

April 14, 2008 3:09:PM

Excellent post!  I think an interesting angle to consider is how developing for maintainability and reuse can affect productivity over time.  A developer who solves a problem rather myopically, not considering future requirements/changes, might look like a super-star initially in terms of productivity - he finished his feature on time, wrote a lot of function points, had no bugs, etc.  Three months later, however, when a new round of requirements comes in, his productivity may plummet (or at least stagnate), because his prior work didn't make his future work any easier.  A good developer knows when to sacrifice productivity in the short-run to realize gains in the long-run, however a manager who uses single, one-point-in-time metrics of productivity might actually discourage developers from writing clean/reusable code - which will of course hurt productivity in the long run.  

software_engineer said:

April 18, 2008 2:35:AM

good post ..

all of us know that measurement is a tricky thing and it influence productivity very much .. every article like that it is very usefull to read ..

thank you.

Gabriel Belingueres said:

April 18, 2008 8:54:AM

I would measure productivity of an employee based on the common sense: There are no unproductive people in a project if there are no stressful situations regarding the people in the office.

So I would measure as how many stressful situations are present in the working office because of this employee's work/decisions/behavior.

This of course has sense if you can trace back the problem to the right time and person. For software developers you can see who committed a file in a source repository. Other tools may help but traceability is the king here.

In every project there are little failures and successes so this would be the rules for assigning a "productivity index" to person P:

a) If P's decisions/work (which are OUT of the scope of P) lead to a past/present/future non stressful situation for at least other person in the project, then he has +1 point.

b) If P's decisions/work (which are IN the scope of P) leads to a past/present/future stressful situation for at least other person in the project, then he has -1 point.

c) When the manager is in doubt, run an ANONYMOUS survey asking the team WHO's work/decision leaded to the current stressful situation, and WHY. (Include yourself in the survey.) If 80% of the answers conclude in the same guilty person AND reason then you assign a -1 to this person.

At the end of the period (month/year/etc) sum all the points of each employee and then you have it.

Gabriel

chrishmorris said:

April 18, 2008 1:22:PM

A measure of program size that I like is the number of assertions in an adequate set of test cases.

Chris Mountford said:

April 20, 2008 6:08:AM

Why do you hire a programmer? So that your LOC graphs go up?

A good productivity indicator correlates strongly with real business value.

If you cannot measure productivity - and for simplicity's sake I'll assume that degrades into an inability to measure value, then maybe there is no value there! What makes you think there is value? You know there is value and you know you should be able to measure it but why is it so hard?

Perhaps because you're trying to measure some abstract intermediate indicator of progress (such as lines of code). Failing to find a satisfying (reliable, consistent) abstract intermediate indicator of progress does not imply that developer productivity cannot be measured.

My preferred measure of programmer productivity is based on passing tests on shippable software. What kind of tests? All the kinds you need to show your software does what it is supposed to. Tests not enough for you? Where is the problem now?

Mitch Barnett said:

April 21, 2008 4:38:PM

All good stuff, but here is a real world problem, I and others face all of the time.  I work for a professional services company and we build large scale, custom  ecommerce sites. Customers want us to estimate the effort and therefore cost and schedule to deliver the project on with nothing more than a features list.

As odd as it may sound, each ecommerce site is quite different in functionality (not necessarily content) and in most cases, we may build from the ground up, but it is built on top of an ecommerce framework.  Still it may take 60K lines of custom code on top of the framework to implement a project.

So what are the best ways to estimate the effort?  Tried FPA, but that requires skills beyond what we have.  We have tried other levels of abstractions like use case estimating and the like, but not with a great deal of success.  It seems like the only way is back to counting lines of code.  

We can equate lines of code to features, which is how the ecommerce project is sold.  For example, a PayPal payment feature may be 50 SLOC’s  Are there any tools that can help us build a features library and then store historically both the lines of code and the effort it took to write those lines of code, so that over time we can just assemble the estimate from this library?

Would be very curious to hear other options or opinions… Many thanks in advance?

PS.  I wish we are at a level that we could worry about programmer productivity, we are still at the stone age level of just trying to grasp the size and effort of our projects…  

Mal said:

April 23, 2008 3:52:PM

Don't take this the wrong way Steve, bu can I have your baby?

Every time, I read one of your books or blogs I walk away incredibly impressed.

Solidpoint said:

May 8, 2008 3:16:AM

I like the comments of  Chris Mountford  the best. The only measure of productivity is how much did it cost to deliver the functionality the user wanted - or should have wanted - as is too often the case. A real-life example illustrates this well. When I worked at Citicorp another R&D group spent several million dollars rewriting an ALCO related program to give it a GUI because the users were inputing data in a big, unstructured file and would make input errors like double decimal points, resulting in the failure of a 12 hour job. The GUI, predictably, was very tedious to use and output the very same flat file users had been creating in Notepad for years. Users hated the GUI. They wanted to copy a large file, modify a few things, and know it wouldn't blow up somewhere in the night due to data entry errors. I wrote a recursive routine to check common data entry errors, support embedded comments, and sent it to a friend who was doing damage control on the GUI project. My 3-4 hours coding replaced, and surpassed several million dollars worth of development as the users quickly abandoned using the GUI entirely after a stand-alone validation utility was made available. I could  also point to a small CASE system I wrote and used to generate 12 billion lines of prototype code at FIB bank in the 80's...and on and on. Software is congealed intellect. If you don't know the domain you're working in don't be surprise if MOST of what you produce turns out to be redundant. Remember this when some guy is sitting staring at his screen for hours, scribbling on a white board, or is taking long walks with the other super-nerd in your group. It may well be that he's found a way to meet your goal 10,000 times faster. Are you smart enough, and do you know the domain well enough to know when he's bringing you a moon-rock? That is the challenge.  Great software isn't a process of saving money, its a process of implementing inspired genius - in exactly the same way that the CFO can never save his company to greatness. Engineering can. Marketing can. The leasing department can. The bean counters can't. Their role is defensive. The best they can do is contribute at the margins by making the process of implementing genius cheaper - right up to the point where they kill it entirely.

Post a Comment:

 
 

Steve McConnell

Steve McConnell is CEO and Chief Software Engineer at Construx Software where he consults to a broad range of industries, teaches seminars, and oversees Construx’s software development practices. In 1998, readers of Software Development magazine named Steve one of the three most influential people in the software industry along with Bill Gates and Linus Torvalds.

Steve is the author of Software Estimation: Demystifying the Black Art (2006), Code Complete (1993, 2004), Rapid Development (1996), Software Project Survival Guide (1998), and Professional Software Development (2004). His books twice won Software Development magazine's Jolt Excellence award for outstanding software development book of the year.

Steve has served as Editor in Chief of IEEE Software magazine, on the Panel of Experts of the SWEBOK project, and as Chair of the IEEE Computer Society’s Professional Practices Committee.

Steve received a Bachelor’s degree from Whitman College, graduating Magna Cum Laud, Phi Beta Kappa, and earned a Master’s degree in software engineering from Seattle University.
Contact Steve