Medical Device
& Diagnostic Industry Magazine | MDDI Article
Index
Originally published December 1994
SOFTWARE
A Maturity Model for Automated Software Testing
Mitchel H. Krause
Aside from their mandate to provide a safe and reliable product,
manufacturers of computerized medical devices may have three very
practical reasons for automating their software testing program:
their product is too complicated to test manually, the time devoted
to manual testing is cutting into potential profits, and current FDA
requirements will be easier to satisfy with automated testing and
documentation. If any of these factors motivates your company, this
article will help you to sort out the issues to be considered and
options available. Then, when the automated test program is in
place, safer and more reliable products will follow.1 The
sorting instrument presented is a maturity model that plots four
levels of testing maturity in terms of the resources required to
move from one level to the next. The model can be used to determine
the level that best fits your company and its products.
THE SOFTWARE TESTING MATURITY MODEL
The software testing maturity model, shown in Figure 1, is
similar to a software process maturity model that is familiar to
many software engineers. It has been described by Watts S. Humphrey
in his book Managing the Software Process,2 and has been
cited by Frank Houston, a former FDA staffer, and Steven Rakitin in
presentations to the Health Industry Manufacturers
Association.3,4 The version shown here as Figure 2 is
adapted from Rakitin's presentation. The process model adapts well
to automated software testing because effective software
verification and validation programs grow out of development
programs that are well planned, executed, managed, and monitored. A
good software test program cannot stand alone; it must be an
integral part of the software development process.
Level 1: Accidental Automation. The first level of the software
testing model--like level 1 in the software process model-- is
characterized by ad hoc, individualistic, chaotic attempts to get
the job done. Important information (for example, what to test) is
not documented and must be extracted from in-house experts. Test
plans are sketchy. Test results are not documented consistently.
Schedules slip. Either products are delayed or testing becomes a
cursory, poorly documented exercise. Management is uninvolved or
uninformed.
This level has been designated Accidental Automation because the
use of any automated tools or techniques comes about almost as if by
accident and is not supported by process, planning, or management
functions. Products released on the basis of such testing may well
be accidents waiting to happen. Testing at this level may be
appropriate only for a product that has no potential for harming the
patient or user; it is never appropriate for a computerized medical
device.
Level 2: Beginning Automation. The second testing level
corresponds directly to Level 2Repeatable in the software
process maturity model (see Figure 2). There are hundreds of
capture-and-replay test tools on the market today that simply repeat
the responses of a system under test.5 As in the process
model, however, these tools have limited capabilities and lose their
economic usefulness quickly as a product changes.
Level 2 testing is still dependent on information locked in the
minds of in-house experts, although documentation is beginning to
appear in the form of software requirements specifications (SRSs)
and test requirements specifications (TRSs). However, in most cases,
large portions of these documents are written after the fact and
used to meet regulatory requirements rather than to direct the
development and test processes. Writing them does, however, provide
good practice for moving to level 3.
Level 3: Intentional Automation. At the third level, automated
testing becomes both well defined and well managed. The TRSs and the
test scripts themselves proceed logically from the SRSs and design
documents. Furthermore, because the test team is now part of the
development process, these documents are written before the product
is delivered for testing. Consequently, schedules become more
reliable. Level 3 is appropriate for many medical device
manufacturers.
Level 4: Advanced Automation. The highest testing maturity level
is a practiced and perfected version of level 3 with one major
addition: postrelease defect tracking. Defects are trapped and sent
directly back through the fix, test creation, and regression test
processes. The software test team is now an integral part of product
development, and testers and developers work together to build a
product that will meet test requirements. Any software bugs that do
occur are caught early, when they are much less expensive to fix.
When testing is performed at this level, an FDA inspector can pick
up any piece of product documentation and trace the development
process all the way from the SRS that describes the feature to the
test results that validate it.
A Checklist of Issues. How can these software testing maturity
levels help a company to plan and implement an automated software
test program? The answer to that question comes from careful
consideration of four issues:
* What is the profile of your company and its products?
* What processes do you need to implement as part of an automated
testing program?
* What kind of people do you need in order to create and run a
testing program?
* Which automated software test products fit your profile and
process?
Significantly, price is not on the list. That is because the cost
of any one component, especially the test tool, becomes
insignificant when it is compared with the potential payback. A
well-planned and well-executed software test automation process will
pay for itself many times over by ensuring fewer bugs and field
fixes, shortening product development cycles, and providing labor
savings. And, if you keep your ultimate goal in mind when defining
processes, choosing staff, and buying test tools, your testing
program will continue to yield a good return as you advance from one
level of maturity to the next.
PROFILE: RANKING YOUR COMPANY'S PRODUCTS
Most computerized medical devices can benefit from some type of
automated testing. In fact, Boris Beizer, who is probably the most
well known expert in the field of software testing, has said, "As
far as I'm concerned, manual testing is ludicrous and
self-contradictory. It's based upon a fallacy. Anybody who thinks
they can test manually, doesn't take into account the error rate in
manual test execution."5 However, knowing what level of
automation is appropriate requires a good understanding of your
company's products.
The exercise described below will help you to create a test-level
profile of your company and its products. The profile is a guide to
how your company may benefit from an investment in processes,
people, and automated software test products. The point scores at
the end of each section provide a rough estimate of the level of
software testing maturity you should strive to meet.
How Large Are Your Software Projects? As software projects
increase in size, the resulting products become harder to test and
at some point manual testing can no longer cover enough
functionalities to ensure safe and reliable products. There are many
ways to measure the scope of a software project, but a simple line
count is a start:
* Score 1 if your product has fewer than 10,000 lines of code.
* Score 2 if your product has between 10,000 and 30,000 lines of
code.
* Score 3 if your product has between 30,000 and 70,000 lines of
code.
* Score 4 if your product has more than 70,000 lines of code.
How Complex Is Your Product? Systems with multiple inputs and
outputs, graphics screens or printers, embedded processors, or
multiple microprocessors are all candidates for the controlled
sophistication of automated testing. If two or more interactive
processors are used, the product probably presents integration and
timing issues that cannot be tested manually. Similarly, if the
product has an embedded processor, it may have functionalities that
cannot be tested manually. In other cases, it may simply be
impractical to test the system manually. Printers are one example of
a common peripheral that is hard to test by hand. They not only
accept commands and data from a software system, they also send back
status and error signals to which the system must respond correctly.
It is slow, inconvenient, and sometimes impossible for a tester to
follow test plans that try to duplicate all the combinations of
acknowledgment, system-busy, paper-out, baud rate, error, select,
sensor, and other signals the printer might return. The input
simulation provided by a sophisticated automatic test system can
both speed up this process and make it traceable and reproducible.
Even testing of that seemingly ubiquitous input device, a
keyboard or keypad, can benefit from using an automated test tool
with simulation capabilities. Timing issues, especially, are nearly
impossible to test manually. The fatal accidents in the mid 1980s
that involved a radiation therapy machine are a good example of the
kinds of problems that can occur. This particular machine had both
therapy and diagnosis modes, and operators entered a series of
keystrokes to switch the system from a high-energy to a low- energy
mode. If the keystrokes were typed in too fast, however, the
high-energy mode would remain in effect even though the operator
would assume the change had been made. Later, when the system was
activated, it sent a damaging and sometimes lethal dose of radiation
into the patient.6 An automated test tool with simulation
capabilities could have detected this problem early, before any harm
was done. Keyboard simulations could have been set up to test the
effect of varying keyboard input speeds. (The actual resolution of
the problem involved many factors in addition to keyboard input
speed; the report cited gives a full account of these accidents and
their outcome.)
System outputs may also be tested more efficiently using
automated methods. After an 8-, 10-, or 12-hour day, even the most
conscientious human tester will fail to notice some errors or forget
to document them. Other outputs either cannot be monitored manually
or the testing may require nonintegrated measurement devices that
may be difficult to set up and monitor. Finally, some potentially
fatal software flaws may never show up during functional (black-box)
testing. Detecting these problems requires an automated system that
can use white-box test methods to look inside the
system.1
* Score 1 if your product has a single processor and simple
inputs and outputs.
* Score 2 if your product has a single processor and common
inputs and outputs.
* Score 3 if your product has uncommon inputs and outputs or if
it uses a graphics screen or printer.
* Score 4 if your product uses multiple or embedded processors
that cannot be fully tested using black-box methods.
What Financial Risk Does Your Product Pose for the Company? Both
loss of market share and exposure to liability claims can create
substantial financial risks for medical device companies. Because
all products have a life cycle, the more time a new product spends
in the test-and-fix-and-retest cycle, the less time it will spend on
the market. Also, when market entry is delayed, sales will be lost
even if the product is better than its competition. Even greater
losses can occur if a poorly tested product harms someone. The
manufacturer will face costly FDA actions and product liability
suits. In worst-case scenarios, the product may never return to the
market and the company itself will fail.
* Score 1 if a malfunction or failure of your product poses no
threat to the financial health of your company, from either
liability claims or loss of market share.
* Score 2 if a malfunction or failure of your product presents a
small but acceptable risk to the financial health of your company.
* Score 3 if a malfunction or failure of your product presents an
unacceptable risk to the financial health of your company.
* Score 4 if a malfunction or failure of your product would cause
irreparable harm to your company.
What Risk Does Your Product Pose for the Patient and Operator?
Although concerns about size, complexity, and financial risk are
important in all software projects, the bottom line for a medical
device company is risk to patients and health-care providers.
Medical products must be both safe and effective. That is, they must
do what they are designed to do and, when something does go wrong,
the malfunction or failure must cause no harm. The product's FDA
classification and hazard analysis results may determine if
automated testing should be implemented. If a computerized medical
device is categorized as Class II or Class III, an automated
software test program may be necessary to provide both the testing
and documentation required. Similarly, if the product presents
software-related hazards, an automated test program might help your
company to verify, validate, and document the measures taken to
mitigate those hazards.
* Score 1 if your product is FDA Class I and a hazard analysis
has shown there is no possibility of its software causing harm to a
patient or operator.
* Score 2 if your product is FDA Class I and a hazard analysis
has shown there is a remote possibility of its software causing harm
to a patient or operator.
* Score 3 if your product is FDA Class II.
* Score 4 if your product is FDA Class III.
Evaluating Your Scores. In its "Reviewer Guidance for
Computer-Controlled Medical Devices," FDA supplies an approach to
evaluating the scores assigned in this exercise: "When a level of
concern is assigned for each functioning component of the software,
the highest level of concern generated is that assigned to the
software aspect of the device."7 Thus, if you want to
ensure the long-term success of your company, aim for the level of
automated software testing equal to your highest score in any
category.
PROCESS: CONTROLLING TEST POLICIES AND PROCEDURES
If any one word sums up the regulatory demands being placed on
medical device manufacturers, it is process. No matter how much
effort goes into designing, testing, and manufacturing a product, an
auditor will not be satisfied if the process is not written down,
followed, and documented. Process-related expenses will be incurred
regardless of the testing level achieved or whether or not the
software test process is automated; however, they can vary
significantly across the testing levels.
Level 1 Process Costs. When software testing is at level 1,
process costs are hidden. They arise from not having a defined
process and can be very high, indeed. Such costs can include those
incurred by delayed product introductions, the need for frequent
field fixes, and a generally ineffective product development effort.
Level 2 Process Costs. Surprisingly, process costs can be highest
for a company that is testing at level 2, especially one that is
contemplating a move to level 3 in the foreseeable future. The costs
are high because at level 2 the company is probably just starting to
evaluate its software testing needs and to put standardized
procedures in place. It may have to experiment, hire consultants,
and establish or expand job areas, such as regulatory affairs.
Process Costs at Levels 3 and 4. Although the two major forces
behind process improvement--FDA regulation and the need for ISO 9000
certification--may affect any company, those testing at levels 3 or
4 almost certainly need to meet FDA software test requirements. Such
compliance is expensive and time-consuming, but the good news is
that creating and documenting procedures for an automated testing
program is no more expensive than doing so for a manual one. In
fact, use of an automated test tool with scripting, test
identification, and automatic documentation capabilities can reduce
costs by providing some of the framework and content required.
The FDA "Reviewer Guidance for Computer-Controlled Medical
Devices Undergoing 510(k) Review" states that "FDA is focusing
attention on the software development process to assure that
potential hazardous failures have been addressed, effective
performance has been defined, and means of verifying both safe and
effective performance have been planned, carried out, and properly
reviewed."8 In order to get marketing approval for any
product, its manufacturer must prove to FDA that the product does
what it is supposed to do and that it is safe. The way to do that is
not only through clinical trials but also by documenting the process
that was followed to make the product eligible for such trials.
In contrast, ISO 9000 certification is based on process alone.
Because the products themselves are not certified, the certification
authority is concerned solely with whether the process that created
the product is traceable, repeatable, and documented. When the
process is proven, the site responsible for making the product is
certified. An ISO 9000 certification audit costs about $10,000 to
$20,000, but that is only the barest tip of the iceberg. The total
cost includes the resources required to evaluate the company's
needs, get the appropriate procedures in place, have them audited
and approved, and motivate personnel to use them.
If established procedures are being revised to accommodate
automation, existing regulatory affairs and quality assurance
personnel may need to devote two to four weeks each to the project.
In addition, it may take a technical writer about a month to rewrite
the policy and procedure manuals. Finally, occasional technical
support will be required from software developers and test
engineers.
PEOPLE: CHOOSING QUALIFIED TESTERS
No matter what type of testing a company does, manual or
automated, experienced people are needed to create the test plans
and write test scripts.
Level 1 People Costs. At test maturity level 1, testing is often
limited to debugging. A programmer writes and debugs the product's
software until everything seems to work correctly. Because only the
programmer is involved, testing costs are hidden in the cost of
development. Likewise, the potential benefits of better test
practices are hidden in field-support and product- upgrade costs.
Thus, level 1 people costs are essentially unknown.
Level 2 People Costs. In software testing programs at level 2,
testing is recognized as a separate function. Test plans and scripts
are generally written by an experienced product user or support
person who may or may not have programming experience. In any case,
the person performing this task must understand the SRSs and design
specifications well enough to write a comprehensive test plan and
test scripts. The scripts are then given to testers who run them and
record the results. One option is to hire a group of low-paid,
inexperienced users; another is to recruit testers in-house. Whoever
the testers are, they must understand that their job is to try to
break the system as well as to make sure it works right. Level 2
people costs may also include one or more high-level support people
to coordinate test writing, supervise the testers, and edit the
results. Also, since the labor that goes into setting up a
capture-and-replay tool is not reusable, the cost of one test cycle
must be multiplied by the number of test cycles expected.
People Costs at Levels 3 and 4. Automated testing plans are most
often written by a software test engineer, who should also
participate in product development meetings with design engineers to
help build testability into the product. The test engineer's
programming background combined with a familiarity with the product
will ensure the creation of efficient tests that attack the weakest
parts of the product. If the test tool has white-box test
capabilities, the test engineer uses his or her knowledge of system
internals to specify tests for functions that cannot be tested
manually.
The test plan is then used to write the test script programs.
This work can be done by the test engineer or given to application
programmers. The level of programming experience required to write
test scripts depends on the test tool used. Generally, the most
versatile tools run on scripts written in some version of a common
programming language, such as C. Other tools use simplified
languages. In any case, at least one member of the test team must
have some familiarity with writing a structured set of instructions.
Because the automated testing tool runs the tests and creates the
documentation, no costs are added for hiring testers or diverting
in-house personnel to perform and document the tests.
PRODUCTS: CHOOSING THE RIGHT TESTING TOOL
The requirements of the product and process determine the
selection of an automated testing tool. However, medical device
manufacturers should beware of confusing development aids with
automated software test tools. Companies can spend large sums on
many kinds of debugging tools and in-circuit emulators and still not
have an automated test program. A software development aid has done
its job when the product, or product component, is debugged and
seems to work. Automated test tools, on the other hand, are designed
not only to verify the system, but also to stress it to the point
that it will break in the lab before it can fail in the field and
harm a patient or operator.
Level 1 Tool Costs. Although development aids such as debugging
programs and in-circuit emulators may be used in level 1 test
programs, no automated test tools are used. Therefore, there are no
tool costs at this level.
Level 2 Tool Costs. Level 2 testing is the domain of simple
capture-and-replay tools that employ rudimentary scripting
capabilities and are often used to verify operator interfaces.
Prices for such tools start at about $200 and can reach $5000 or
more for the more-sophisticated models. The less-expensive,
software-only versions are often intrusive; that is, they run on the
same computer as the software application being tested. Because the
tool and product occupy the same space, product timing and
performance can undergo unpredictable changes. Even if no problems
show up during testing, the product shipped is never exactly the
same as the product tested. Capture-and-replay tools with integral
capture hardware eliminate the problems associated with
intrusiveness but retain another problem characteristic of such
systems--inflexibility.
Because a capture-and-replay test suite for a graphic user
interface (GUI) can contain thousands of captured screen images and
consume megabytes of memory, the time it takes to gather these
images is significant. Timing variations and the fact that GUI
displays are seldom static can add even more time. Most significant,
however, is the amount of time needed to recapture, integrate, and
retest the inevitable changes caused by debugging and last-minute
product upgrades. Thus, capture-and-replay tools should be used only
for the simplest of products.
Tool Costs at Levels 3 and 4. High-level test tools can include
several advanced capabilities in addition to capture and replay. The
following are features to look for when purchasing tools:
* Scripting. The tool's test script language should be as
functional as a high-level computer language, permitting the
inclusion of files, libraries, loops, and conditional statements. It
also should include aids to help debug the scripts themselves.
* Monitoring. A choice of intrusive software monitoring, such as
that used in capture- and-replay tools, or nonintrusive hardware
monitoring of system outputs may be available. An added high-level
feature in the most sophisticated systems is direct-processor
monitoring. With direct-processor monitoring, a connector similar to
an in-circuit emulator pod is mounted on the processor and monitors
the activity of the product under test. The test tool is
nonintrusive because the connector never sends signals to the
application being tested. It is also quite fast and accurate because
it works at the processor level.
* Black-Box Simulation and Stimulation. A high-level tool should
be able to emulate the actions of a human tester. Hardware is
available that can simulate such product stimulations as keys being
pressed, printers responding, tones being generated, relays opening
and closing, and other analog or digital inputs. In short, advanced
simulation capabilities should enable tests to run unattended.
* White-Box Simulation and Stimulation. The test tool should also
be able to simulate and monitor the internal workings of the product
tested. Such white-box testing capabilities permit testing of
timing, integration, and resource issues that cannot be tested
manually.
* Documentation. Automated test tools can log both test
parameters and test results. If integrated into the software
development process, a sophisticated system should be able to
produce much of the documentation required by regulatory agencies.
Test tools suitable for testing at levels 3 and 4 cost from
$15,000 to $75,000.
CONCLUSION
As described above, once you determine your company profile,
perfect your processes, establish test specialists, and give the
team members appropriate testing tools, your company can realize the
benefits of automated software testing. When compared with manual
programs, automation properly applied will result in higher-quality
products, lower risks to your company and the patients you serve,
faster regulatory approvals, and decreased time to market. The
higher level you reach on the automated software testing maturity
model, the more benefits you will realize. Whatever level you
choose, however, keep in mind a major lesson of the last 30 years of
computing: No matter what tools you buy, your largest investment by
far will be in the processes and people you put in place to use
those tools. Purchase automated software testing tools based on how
they can maximize your investments in processes and people, not on
the price of the tools themselves.
REFERENCES
1. Weide P, "Improving Medical Device Safety with Automated
Software Testing," Med Dev Diag Indust, 16(8):6679, 1994.
2. Humphrey WS, Managing the Software Process, Reading, MA,
Addison-Wesley, 1989.
3. Houston F, "Software Development and Quality Assurance: FDA
Expectations," in Proceedings of the 1992 HIMA Conference, HIMA
Publication 93-5, Washington, DC, Health Industry Manufacturers
Association, pp 4351, 1993.
4. Rakitin SR, "The Economics of Software Process Improvement,"
presented to the 1994 Medical Device Software Conference sponsored
by the Health Industry Manufacturers Association, Washington, DC,
May 1994.
5. Johnson M, "Dr. Boris Beizer on Software Testing: An
Interview, Part I," The Software QA Quarterly, 1(2):713, 1994.
6. Leveson NG, and Turner CS, "An Investigation of the Therac-25
Accidents," Computer, July, p 23, 1993.
7. "Reviewer Guidance for Computer-Controlled Medical Devices
Undergoing 510(k) Review," Section 2.0 Levels of Concern, Rockville,
MD, FDA, Office of Device Evaluation, Center for Devices and
Radiological Health, 1991.
8. "Reviewer Guidance for Computer-Controlled Medical Devices
Undergoing 510(k) Review," Section 1.0 Introduction, Rockville, MD,
FDA, Office of Device Evaluation, Center for Devices and
Radiological Health, 1991.
Mitchel H. Krause is director of testing and quality control for
B-Tree Verification Systems, Inc. (Minneapolis).
Comments about this article? Post them in our Members' Discussion
Forums. |