How Accurate are Software Estimates? Traditional vs Scientific Estimates


One of the first ques­tions asked when un­der­tak­ing any type of pro­ject is usu­ally how much will it cost? It’s a rea­son­able ques­tion to ask, be­fore mak­ing a de­ci­sion it’s only nat­ural to want all the facts at hand. There are some in­dus­tries where this has be­come the norm - the paint­ing and con­struc­tion in­dus­tries come to mind. However, this is vastly dif­fer­ent in the soft­ware in­dus­try.

This ar­ti­cle is de­signed to un­pack the soft­ware es­ti­ma­tion process and the dif­fer­ent ap­proaches used in the in­dus­try.

The in­dus­try

Generally the next fol­low up ques­tion asked is why can’t you tell me ex­actly how long it will take? Software dif­fers to other in­dus­tries as it deals pre­dom­i­nantly with un­knowns. Every piece of soft­ware is de­signed to be dif­fer­ent in one way shape or form, oth­er­wise there is no unique value propo­si­tion. Couple that with the fact that tech­nol­ogy con­tin­ues to im­prove rapidly, with a new ‘industry stan­dard’ emerg­ing every cou­ple of years and it be­comes in­cred­i­bly un­likely that a soft­ware de­vel­oper will build the same ap­pli­ca­tion, the same way, twice.

The prob­lem with soft­ware es­ti­mates

There is a so­ci­etal prob­lem when it comes to es­ti­mat­ing work. In many in­stances the cheap­est es­ti­mate wins the work. This would­n’t be an is­sue pro­vided the es­ti­mate was ac­cu­rate and did not im­pact the qual­ity of the pro­ject. So if the pro­ject blows out or the qual­ity is not up to scratch, does the blame rest with the de­vel­op­ment com­pany for low-balling the pro­ject or the cus­tomer that chose the cheap­est op­tion? With a greater knowl­edge and un­der­stand­ing of the es­ti­ma­tions process, we can move away from this cy­cle.

When to es­ti­mate

There are many com­pet­ing lines of thought here with dif­fer­ent strate­gies rec­om­mended based on which phase the pro­ject is in. For the pur­pose of this ar­ti­cle we’ll break a pro­ject into the com­mon stages; ideat­ing, scop­ing and de­vel­op­ing.


Before the pro­ject is prop­erly scoped it is still in an ideation phase. This means all the re­quire­ments are not yet known.


This is one of the more pop­u­lar ap­proaches to un­der­take when the soft­ware is at such an early stage. Based on the soft­ware de­vel­op­er’s prior ex­pe­ri­ences and gut feel­ing they may give a broad bracket as to the length of time based on a few cen­tral re­quire­ments.

Historical com­par­i­son

If the com­plex­ity and size of the pro­ject sounds sim­i­lar to a past pro­ject than the es­ti­mate given may sim­ply be the time it took to build the past pro­ject.

The prob­lem with giv­ing an es­ti­mate this early on in the process is that there are so many un­knowns. Generally, the client will bud­get and set their ex­pec­ta­tions based on a very early stage es­ti­mate (if it has been given). What hap­pens when the scope is un­packed and the ini­tial es­ti­mate is in­ac­cu­rate? It cre­ates ten­sion and im­pacts the trust be­tween the soft­ware de­vel­op­ment com­pany and the cus­tomer.

No de­vel­op­ment es­ti­mate

WorkingMouse’s pre­ferred ap­proach to es­ti­mat­ing a pro­ject be­fore it is scoped is sim­ple. We don’t. We can es­ti­mate the time it will take to prop­erly scope the pro­ject based on its com­plex­ity but we do not have enough in­for­ma­tion to es­ti­mate de­vel­op­ment time.


T-Shirt sizes

This ex­er­cise is one of rel­a­tiv­ity. By mark­ing func­tion­al­ity as XS, S, M, L, XL (and so on) we can group to­gether sim­i­lar pieces of func­tion­al­ity. It’s gen­er­ally much faster than tra­di­tional time based es­ti­mates.

Once the tick­ets are grouped, es­ti­ma­tion val­ues can be put to each group. For ex­am­ple, an XS ticket may on av­er­age take 2 hours to com­plete. In a short space of time, the de­vel­op­ment team and prod­uct owner can start gaug­ing the rel­a­tive size of the ap­pli­ca­tion. Keep in mind, the es­ti­mate can still be in­ac­cu­rate at this stage.

Fibonacci es­ti­ma­tions

As men­tioned above, soft­ware es­ti­ma­tions are in­her­ently quite dif­fi­cult. The Fibonacci-type ap­proach to es­ti­ma­tions tries to sim­plify things. It is based on the the­ory that the big­ger some­thing is, the less pre­cise we can be. It is rea­son­able to as­sume that a small piece of func­tion­al­ity can be es­ti­mated down to the hour. However when we start look­ing at big­ger pieces of func­tion­al­ity, for ex­am­ple com­plex API in­te­gra­tions, which have a high de­gree of com­plex­ity and dif­fi­culty, we can­not be that pre­cise. Something may be es­ti­mated to take a week or 3 days but it would be bold and ul­ti­mately un­wise to es­ti­mate that it will take 1 week, 1 day and 2 hours.

It is rec­om­mended to wait un­til the end of scope, once all the im­me­di­ate func­tion­al­ity is known to hold an es­ti­ma­tions ses­sion. Without stake­hold­ers aligned on the ac­cep­tance cri­te­ria of each piece of func­tion­al­ity and its im­pact across the wider ap­pli­ca­tion, it is dif­fi­cult to en­sure the ac­cu­racy of the es­ti­mates.

Planning poker

This is a slightly dif­fer­ent way of run­ning the Fibonacci ap­proach to es­ti­mat­ing. Everyone is given a set of cards with lengths of time. It’s de­signed to en­sure that one team mem­bers es­ti­mate (the first to speak) does­n’t in­flu­ence other team mem­bers es­ti­mates.


During a plan­ning ses­sion

During de­vel­op­ment, plan­ning ses­sions oc­cur at the be­gin­ning of each it­er­a­tion (or sprint). It gives the de­vel­op­ment team an op­por­tu­nity to learn from ear­lier it­er­a­tions. It also gives them an op­por­tu­nity to feed learn­ings back into the es­ti­ma­tions.

Let’s say for ex­am­ple af­ter two it­er­a­tions a de­vel­oper was able to lever­age a React li­brary more than ini­tially an­tic­i­pated. That might bring the es­ti­mates for some func­tion­al­ity down, al­low­ing more to be com­pleted dur­ing the it­er­a­tion. On the other hand, there may be a com­pli­ca­tion that means cer­tain func­tion­al­ity takes longer than ini­tially ex­pected. By elab­o­rat­ing on es­ti­ma­tions dur­ing the plan­ning ses­sion, the de­vel­op­ment team has the most re­cent in­for­ma­tion avail­able to make an es­ti­mate.

Types of es­ti­ma­tions

While all the tech­niques listed above are help­ful, it’s more im­por­tant to dis­tin­guish be­tween tra­di­tional es­ti­ma­tions and how they’ve been re­fined to em­brace a more sci­en­tific ap­proach.

Traditional soft­ware es­ti­ma­tions

These are quite sim­ply es­ti­mates against func­tion­al­ity. By ask­ing how long “X” will take and do­ing some sim­ple ad­di­tion, that will give you the tra­di­tional pro­ject es­ti­mate.

The is­sue with this ap­proach is that it fails to cap­ture a num­ber of other fac­tors that have a ma­jor im­pact on time. For ex­am­ple, it is un­re­al­is­tic to be­lieve that 100% of a de­vel­op­ers day will be spent work­ing on a pro­ject. There are plan­ning meet­ings, morn­ing hud­dles and com­pany-wide meet­ings that im­pact pro­duc­tiv­ity.

Scientific soft­ware es­ti­ma­tions

By un­der­stand­ing and mea­sur­ing the im­pact that other fac­tors have on de­vel­op­ment time, we can be more sci­en­tific in our es­ti­ma­tions. After run­ning an ex­per­i­ment over a num­ber of pro­jects we found a few fac­tors im­pacted de­vel­op­ment length. These fac­tors will be ad­dressed in more de­tail be­low.

Each of these fac­tors have a mul­ti­plier based on the im­pact that they have on es­ti­mates. As men­tioned above, a rushed pro­ject can detri­men­tal to the qual­ity of the ap­pli­ca­tion. It is ab­solutely nec­es­sary to take these fac­tors into con­sid­er­a­tion in or­der to ac­cu­rately set ex­pec­ta­tions.

Risk is the most com­plex fac­tor to mea­sure. As a gen­eral rule, some pro­jects are riskier than oth­ers. So then, how do we cap­ture risk? Our ap­proach is to sep­a­rate risk into two con­tribut­ing fac­tors; com­plex­ity and un­fa­mil­iar­ity. These are mea­sured on a 1-5 scale for each piece of func­tion­al­ity. The higher the risk, the greater the es­ti­mate for that func­tion­al­ity.

What is the av­er­age es­ti­mate for a fea­ture?

As men­tioned ear­lier, it is no­to­ri­ously dif­fi­cult to es­ti­mate a soft­ware pro­ject be­fore it is scoped. However, there are a range of learn­ings that can be made by look­ing at his­tor­i­cal data for trends. To give you as much in­sight as pos­si­ble, we’re shar­ing novel data, taken from past pro­jects at WorkingMouse.

Table 1: Average size of past tick­ets at WorkingMouse

Size of software estimations at WorkingMouse

This table tells us a few things.

Firstly, there is risk in leav­ing tick­ets large (over 1+ day in length). It is com­mon for de­vel­op­ers and de­vel­op­ment agen­cies to break these tick­ets down into smaller, more man­age­able pieces. Rather than a sin­gle 2-day ticket which might en­com­pass a few com­po­nents, there is less risk in 3x 5-hour tick­ets that are smaller in size and more fo­cused in func­tion­al­ity.

Secondly, there are min­i­mal tick­ets that can be com­pleted in un­der 4 hours. Between fea­ture de­vel­op­ment, writ­ing tests and re­leas­ing, it’s dif­fi­cult to com­plete a ticket to a high qual­ity in un­der 4 hours.

Finally, there is safety in the pack. This can be a re­sult of se­quence bias or the golden mean fal­lacy. The se­quence bias is the like­li­hood that the pre­dic­tion be­fore will in­flu­ence the pre­dic­tion af­ter. So, if the first few tick­ets are medium sized tick­ets, then the like­li­hood that the next ticket is also es­ti­mated as a medium ticket is raised. The golden mean fal­lacy is the per­cep­tion that the truth lies some­where in the mid­dle. Hence the idea that the right es­ti­mate is some­where be­tween 4 and 8 hours in length.

How to in­crease the ac­cu­racy of a soft­ware es­ti­mate

The mil­lion-dol­lar ques­tion; how do we im­prove as an in­dus­try and in­crease the ac­cu­racy of soft­ware es­ti­mates?

This comes down to re­fin­ing the way that we es­ti­mate. We ap­proach this prob­lem with the view that no mat­ter what the so­lu­tion is, it will never be per­fect. But we can con­tinue to try get closer and closer to per­fec­tion.

What goes into completing a ticket

The best sci­en­tific for­mula we’ve cre­ated to date takes into con­sid­er­a­tion:

  • Feature de­vel­op­ment,
  • Testing,
  • Allocation for other tasks,
  • Discovery,
  • Peer re­view,
  • Delivery/releasing.

This ar­ti­cle rep­re­sents our cur­rent ap­proach to soft­ware es­ti­ma­tions. In the past these fac­tors and the in­flu­ence they have on es­ti­mates dif­fered. It’s only through a mode of con­tin­u­ous learn­ing that we can in­crease the ac­cu­racy of soft­ware es­ti­mates.

To sum­marise

If you’ve got­ten this far, well done. While the sci­ence be­hind soft­ware es­ti­ma­tions may not ex­cite many, there is no ques­tion that they have a huge im­pact on the suc­cess of a pro­ject. Start with an in­ac­cu­rate es­ti­mate and you’ll find your­self on the back-foot, of­ten at the ex­pense of qual­ity.

My ad­vice is to ask your de­vel­op­ment com­pany how they ap­proach the es­ti­ma­tions process. If they’re will­ing to give a quote or es­ti­mate be­fore the scope is fully fleshed out and agreed upon, that should be a red flag. Also ask about the al­lowances they have for nec­es­sary work that is­n’t fea­ture de­vel­op­ment. If you’d like to see how es­ti­ma­tions are in­cluded in our process, please down­load the Way of Working.

Discover Software


Yianni Stergou

Get cu­rated con­tent on soft­ware de­vel­op­ment, straight to your in­box.

How to Budget for an Agile Software Development Project

11 September 2019

What are the monthly op­er­a­tional ex­penses to bud­get for a soft­ware ap­pli­ca­tion pro­ject?

02 March 2020

The Process and Price of Software Releases

19 May 2020

Your vi­sion,

our ex­per­tise