I've mentioned this before but most of my work these days revolves around helping organizations and software development teams adopt agile methods or otherwise restructure their operations. Specifically, a lot of our clients (like most teams at Reaktor) are using or looking at Scrum so I get to see a lot of teams take their first steps in that direction. And I get asked similar questions over and over again. While there's really no single "correct" answer to most of those questions, some of them I tend to respond to with an almost identical answer or suggestion.
Earlier today I found an old index card with a note sketched vertically along the side saying, "[empty tick box] story points for product backlog", with the word "product" underlined. I don't really remember what I was trying to say to myself with that note (that happens a lot with notes I've made more than a few weeks ago) but I interpret it to mean that the team I had been consulting at that time had decided to use story points for estimating tasks for their sprint backlogs - and I wanted to bring forth some thoughts about that at the next suitable moment.
Now, the use of story points has been the topic of many of those recurring questions I get asked so I decided to (gasp) blog about. I haven't done this in a while so it's kind of exciting. Anyway, here's a short introduction to story points and, related to that, what I often suggest to teams I coach or consult with.
STORY POINTS - THE DIRECTORLASSE'S CUT
Scene 1: Relative estimation
Story points are an abstract unit of effort (or complexity but I'll stick with effort for now). They're not hours. They're not days. They're not quarters-of-a-day. They have no defined relationship with any time unit you can think of. What they do have in common is the scale - it's linear and proportional. In other words, the distance between "1 story point" and "2 story points" is the same as the distance between "2" and "3", between "3" and "4", and so forth. The measures are relative and can be compared with each other. For example, 2 story points is twice the size of 1 story point.
So we know that story points are an abstract unit of effort. What do we measure in story points, then? We use story points for estimating the effort of implementing user stories (or whatever format you prefer for expressing requirements or desired features for your product). In other words, we assign story points (or just "points") to features based on how much effort we estimate them to require. And, remember, story points are not a measure of time but effort - not even a measure of effort in time. A measure of effort. Period.
The advantage of estimating in story points (compared to, say, effective engineering hours or calendar days) is speed. With story points we're estimating relatively, comparing items of work to each other and placing them into virtual buckets representing our chosen scale. Items of size "1" go with other items of size "1". Items of size "3" go with other items of size "3", and so forth. When we encounter a story that's almost twice as much work as an item in the 3-point bucket, we throw that in the 5-point bucket. It's all relative. And that's what makes it fast - we (humans) are good at comparing things. We're not that good at estimating how long it will take to make that point-of-sale system support an additional type of campaign pricing scheme. With relative estimation, we can plow through many more backlog items with equal accuracy (and I mean accuracy, not precision) compared to breaking them all up to smaller tasks, estimating them, adding them up, and slapping some buffer on top.
So where does this take us? We have a technique for quickly generating estimates by comparing the items relatively to each other but those estimates are presented in abstract units of effort that have no relationship to the Gregorian calendar or time - let alone the project's deadline. How do we know how long it will take to build the product or finish the project?
Well, only the almighty god would know the answer - on a good day - but we do have the means for producing a useful prediction. It's called an experiment. We simply start working on a couple of those items and see how much we could accomplish in a given period of time - how many story points worth of features did we implement?
If we completed 18 story points in the first week and the product backlog has a total of 440 story points remaining, we can quite confidently say that the project will finish within 4-18 months. After the second week, with 14 more story points completed, we can quite confidently say that the project will finish within 8-16 months. After the third week, with 15 more story points completed, we can quite confidently say that the project will finish within 7-12 months. After the fourth week, we're again that much more confident that our velocity - the pace at which we complete features - is representative of the rest of the project and yields a reliable prediction for the overall completion date.
We won't be 100% certain before all features have been delivered. And that goes for any other estimation techniques, too.
Scene 2: Calibrating the scale
Now this is all fine and dandy but what if we need to make a prediction about the completion date without the luxury of working on the project first for a few weeks? That's where calibration comes into the picture.
Calibration, referring to calibrating the story point scale, is an activity where a small number of items are analyzed more closely - the traditional work breakdown structure approach - in an effort to produce a time-based estimate, for example, in effective engineering hours (also referred to as ideal engineering hours) or calendar days.
For example, you could take items that are roughly 10 hours, 20 hours, and 30 hours of effort in effective engineering hours. Then you would assign these items a story point-based estimate, effectively anchoring your story point scale to time. For instance, those three items you picked could be assigned 3, 5, and 8 story points, respectively. It's not a mathematically solid translation but it's close enough.
Now, if you need to make that prediction about the completion date, you estimate the product backlog relatively in story points, add it all up, and translate back to calendar time with whatever formula you used for calibration, not forgetting to translate from effective engineering hours to calendar time. What you should forget at this point is the formula – it has served its purpose and is now just mental baggage, seducing our feeble minds to continue translating back and forth between hours and points.
But doesn't this mean that the story points do relate to time after all? Where's the catch?
Yes, there is a relation between story points and time. I apologize for claiming otherwise. I would like to point out, however, that I did it with good intentions. You see, whenever we deal with story points we should think in terms of story points and relative estimates, not making the mental translation back and forth between hours and points. When we start doing that, we start losing the advantage of relative estimation. The neurolinguistic programming we do with "points" versus "hours" is needed in order to help us get to (and stay in) the relative estimation mindset.
Scene 3: Scales come in different shapes
In the calibration example above, I suggested that the scale was set with 3, 5 and 8 story points for items of an estimated 10, 20, and 30 effective engineering hours, respectively. Why that mapping? Wouldn't it be easier to just keep the numbers and rename the unit to "story points"?
First of all, that whole NLP stuff isn't just fluff. "Just" calling them story points doesn't do the trick. It's not enough to turn that mental switch from hours to points. Some kind of a (preferably non-trivial) translation is necessary. Second, the larger the numbers get the more difficult it becomes for our relative comparison to work effectively. We're really good at telling whether item A is twice or three times the size of item B. We suck at telling whether item C is 9 or 42 times the size of item D. We need a small scale to make this work. As a rule of thumb, anything with two digits is already too big.
The mapping from 10 to 3, 20 to 5, and 30 to 8 was not purely coincidental either. I could've just divided everything by four and gotten 2.5, 5 and 7.5 – nice and small numbers between 1 and 10. Floating-point numbers are too complicated, though, so that wouldn't work. I also could've divided by four and rounded down to, say, 2, 5, and 8. But I didn't. Why?
First of all, that would've been quite all right. I just happen to like the practice of making all story point estimates fit into the Fibonacci sequence-based scale of 1, 2, 3, 5 and 8. It might seem like a small difference but I've found that a more limited scale like this (compared to the full scale from 1-8) helps me think in relative terms when I can't assign a "4" or a "6" on a story. Besides, the increasing gap between the available values quite nicely represents the increase in uncertainty when estimating bigger items. Many other teams and consultants have found this scale useful, too. It's not something I'd fight for but I like it and recommend it.
My second favorite would be something like a scale limited to 1, 2, 4 and 8. This would be simpler but for some reason I'm not sure I'm fully aware of, I prefer the Fibonacci sequence. It probably has to do with what my colleague Jukka pointed out – the Fibonacci sequence has slightly smaller gaps and, given a backlog item to estimate, it’s that much less likely to fall ‘in the between’ the available values compared to the 1-2-4-8 scale. In other words, the gaps of the Fibonacci sequence just feel right for me.
Scene 4: The note
Remember that old note on an index card I mentioned in the beginning of this rather lengthy blog post? It related to a team using story points for estimating their sprint backlog and I had something to say about that practice. Here's that something.
Most consultants and practitioners recommend using story points for estimating the product backlog and effective engineering hours for estimating the sprint backlog. Myself included. Why is that? As I mentioned earlier, one of the major advantages of estimating relatively in story points is that we can do it quickly for even a large list of backlog items because we're not entrenching ourselves in the nitty-gritty details of what a given backlog item entails in terms of implementation - the rough estimates in the product backlog are quite sufficient because the all of those variances cancel each other out in the long run.
For a sprint backlog of, say, two weeks of work, that canceling out doesn't happen quite as effectively as it does for several months of work. In other words, when we're estimating the sprint backlog - trying to figure out whether we can deliver a given set of backlog items in those two weeks or not - we benefit from more detail, more analysis, more effort put into the estimation. While the story point estimates for our product backlog items indicate how much we tend to deliver in two weeks, time-based estimates increase the certainty of that tendency. For many teams, this increase feels significant enough to justify the higher effort.
Some teams, however, are doing great by using story points for estimating their sprint backlog, too. Having broken down backlog items into tasks and established (anchored) a scale for technical tasks, they routinely run through the sprint backlog, throwing tasks into buckets according to their "task point" scale. That scale is typically different from the one used for the product backlog, mind you. Using the same scale would practically push the story point values beyond the comfort zone of single-digit estimates and could possibly bias the estimates given for technical tasks. We (humans) are good at making the world fit a pattern. That's also true for making task estimates add up to story point estimates so I recommend sticking to a different scale.
While I do prefer and recommend keeping story points and relative estimation in the realm of the product backlog, I have seen relative, point-based estimation work quite well for sprint backlogs, too. Just ensure that the points are small enough – most tasks should be doable in less than a day. I know it can work and if it does, it's quite a breeze to do sprint planning (assuming that there's no bottleneck in, say, access to information about the problem domain or in our understanding of the code base). Still, I acknowledge that it's more intuitive and probably - on average - more accurate to estimate the technical tasks of a sprint backlog in terms of engineering hours.
There. I think I've said everything I wanted to. Except the things I forgot already. Well, that's what the blog comments are for...







