Software Testing Metrics

Introduction

Quality is ideally quantified by clearly defined metrics. In the area of software quality, there are two major types of metrics. The first class of metrics measure software test execution. How much testing has been done to date, and how much remains to be done? The second set of metrics attempt to quantify the quality of the software itself.

What are the advantages and disadvantages of the different types of software testing metrics?

Software Test Execution Metrics

Software test execution metrics attempt to quantify the progress of software testing. This metric is driven by the need to track progress on Gantt charts and project schedules.

The simplest software test execution metrics are captured by measuring the number of software test procedure steps completed, the number or percentage of test cases completed, and the percentage of software code tested.

Measuring Progress by the Number of Steps

Advantages of Tracking Testing Via the Number of Steps

  • The number of steps performed is easily tracked.
  • This metric can be used with automated test procedures as well as human testing. In the case of automated testing, the number of steps and percentage complete can be traced by the number of tests completed out of the total test procedure.

Disadvantages of Counting the Number of Steps

  • Long software test procedures with a failure at the last step look artificially good. If the final step in a procedure is a failure, the test metric may still say 95% complete.
  • The metric itself changes as test procedure changes. Adding missing steps to test procedures or a few additional checks will cause the percentage of steps executed to go down if testers are at the same place in the procedure.
  • If the software test procedure is simplified and unnecessary tests dropped, the productivity change makes the software quality look worse unless there were defects caught as a result of the dropped software test steps.
  • Less thorough test procedures can look artificially good when this metric is used

Counting the Test Cases Completed to Measure Testing Progress

Advantages of this Software Testing Metric

  • This method of tracking software testing progress is not inflated by tests with many steps.
  • You can consolidate UAT, unit testing and integration testing as test cases into this metric. This method of tracking the progress of testers easily incorporates unit testing, integrated NHA testing and customer acceptance testing as one or more test cases. The number of steps or even the procedure sub-contractors use is irrelevant to the person gathering the metrics, simply the fact that it is done or not yet complete.
  • If one test case is added due to a gap found by engineering, the overall metrics of the project are not appreciably changed whether the new test is passed or a failure.


Disadvantages of This Metric

  • The metric does not reflect the time involved to complete it; someone who completes three small tests may be seen as more productive than someone slogging through a long test sequence.
  • When new defects are found and the number of test cases is increased, the percentage of test cases completed goes down though this effort increases overall software quality.
  • One bad variable or dependency can cause multiple test cases to fail.
  • A particularly problematic test case with multiple failures may be worked on by software engineers who are reducing the number and severity of errors in that one code module or function, but the metric itself remains stuck at the same level.

Defect density metrics and defects per lines of code attempt to standardize defect rates for software programs of all sizes.
Defect density metrics and defects per lines of code attempt to standardize defect rates for software programs of all sizes. | Source

Tracking the Percentage of Code Tested

Advantages of Using This Metric

  • When this software testing method is used, test procedures are written to test all of the code, including rarely used functions.
  • This method of tracking the progress of software testing forces testers to review every code change to ensure that it is all tested, and they may find code that is redundant and easily removed from the app. Code is reviewed by more people, and that is generally a good thing.

Disadvantages of Using This Metric

  • It may be easy to calculate the number of lines of code, but it is harder to determine how many test steps are necessary to verify it all.
  • This metric only tracks progress made in testing, not the quality of the code.
  • It may take a complex series of steps to test rarely used functionality test procedures may not reflect actual user behavior with the application. Thorough testing of a complex software module won’t move the percentage completed bar much, though it can and should be done.

Metrics for the Defects Found in Software Testing

The number of defects found during software testing reflects the quality of the program or the system on which it operates. Ideally, the number of defects found over time should go down as issues are resolved.

There are several ways of tracking and measuring the defects found during software testing. The simplest method is measuring software quality via a direct defect count. A second method is calculating the software defect density. The defects per Lines of Code Metric rivals the software defect density metric.

Tracking Software Testing Defects via Defect Count

Advantages of the Defect Count Metric

  • The defect count is easy to measure. Did you come across a problem or error during testing? Then total up the number of software defects they reported.
  • It is easy to track over time. The metric will not change appreciably if the test procedure is lengthened.


Disadvantages of the Defect Count Metric

  • This metric does not take software complexity into account nor the amount of testing to be performed.
  • The defect count metric may not rate the severity of defects.
  • A number of cosmetic defects and minor errors can inflate the metric total without affecting software quality.
  • Test escapes aren't counted.
  • You need to make sure the defect count isn't inflated by multiple people reporting the same issue.
  • The defect count metric may not take into account whether the defect comes from legacy code, new changes to the application, supporting software applications or the operating system itself. Teasing out the root cause of the defects found could allow the official defect count to come down, but then time is spent determining whether or not to count a defect in the defect count.

Measuring Software Defects Using Defect Densities

The defect density measures the number of defects per unit of software size. This software testing metric is similar to defects per lines of code but does not necessarily count lines of code. It could also look at the executable file size or other variables.

Advantages of the Defect Density Metric

  • This software metric takes the size of a software application into account.
  • If the number of defects remains stable while additional code modules are added, this metric accurately reflects relatively improving overall software quality.
  • Whether basing the density on the lines of code can be counted as opportunities for defects or file size, this metric can be used to determine if your software quality reaches six sigma quality levels.


Disadvantages of the Defect Density Metric

  • Software code reuse may decrease the defect density, while the metric doesn't show the high ratio of defects in new code.
  • The density defect metric does not always reflect the severity of the defects it counts.
  • If developers are able to eliminate software modules and streamline the code faster than the number of defects are reduced, this metric says quality dropped though the application is actually being improved.
  • In theory, software with a virus embedded in it improves the software program's quality by increasing its apparent size.

The Defects per Lines of Code Metric

Advantages of the Defects per Lines of Code Metric

  • This is a simple metric to calculate. Compilers can determine the lines of code present if the programmer cannot.
  • The metric accurately reflects the fact that two programs with an equal number of defects are not equivalent in quality if one contains more code than the other.


Disadvantages of the Defects per Lines of Code Metric

  • Not all code is created equal. Very short programs with serious errors and long programs with a number of minor errors get the same score.
  • There are jokes that programmers paid by the line of code write excessively long programs instead of shorter and more elegant code. And then there is the argument about whether or not documentation within the code can be used to make this metric look better, by interspersing a book among the functional code. The quality metric looks better, but the compilation time and run time go up.
  • Writing additional code to solve a problem looks better when this metric is used compared to finding a solution that does not increase the program’s size.

More by this Author


Comments 3 comments

ISTQBTester profile image

ISTQBTester 2 years ago

Very informative read


alocsin profile image

alocsin 3 years ago from Orange County, CA

This is a great overview of a process that has mystified me. I imagine this is what software quality assurance professionals do? Voting this Up and Interesting.


ib radmasters profile image

ib radmasters 3 years ago from Southern California

Tamarawilhite

Another well done hub filled with useful information.

One of the things that I didn't see here was the importance of determining the severity of the defect. It could be I just skipped over it.

With SW or FW there are stages of defects, starting with compiler errors. Fixing compiler errors doesn't tell you a lot about the functionality of the code, just that you failed the language part.

Then during unit testing you get the logic tested, at least the basic logic. The trick then is to put things together to work as a bigger unit.

It would take large hubs to go through the whole sequence, but it is much like building a large structure, the foundation is very important.

Testing the all the lines of code is interesting, it can be done primitively by inserting test code, or code traps, but I believe that there are third party automated sw that can help.

Unfortunately, most of the difficult defects are found by the users, and many times after the production release. Computer systems using multiple processor and threads can cause subtle timing that can cause random errors. These are very difficult to duplicate.

One of the major problems that I have seen is when the design process is cut short because management has a very aggressive product schedule. They think that because it is sw that things can be fixed later, but many times it is more like feature creep than a fix. And these features can exceed the current state of the design. Adding these features will corrupt the existing design, and make a kludge of a product that will be a maintenance nightmare.

Thanks

    0 of 8192 characters used
    Post Comment

    No HTML is allowed in comments, but URLs will be hyperlinked. Comments are not for promoting your articles or other sites.


    tamarawilhite profile image

    Tamara Wilhite (tamarawilhite)320 Followers
    635 Articles

    Tamara Wilhite is a technical writer, engineer, mother of 2, and published scifi and horror author.



    Click to Rate This Article
    working