November 17, 2004
Participants in this eWorkshop included:
Lead discussants:
·
Vic Basili, Fraunhofer Center Maryland and University of
Maryland College Park
·
Jürgen Münch, Fraunhofer Institute for Experimental Software
Engineering (Germany)
Discussants:
·
Sally Godfrey, NASA Goddard Space Flight Center
·
Oliver Laitenberger, Droege and Company (Germany)
·
Patricia Larsen, Fraunhofer Center Maryland
·
Sandro Morasca, U. degli Studi dell'Insubria (Italy)
·
Rose Pajerski, Fraunhofer Center Maryland
·
Frank Sazama, Q-Labs (Germany)
·
Kurt Schneider, Universität Hannover (Germany)
·
Carolyn Seaman, Fraunhofer Center Maryland and University of
Maryland Baltimore County
·
Dave Weiss, Avaya
·
Marv
Zelkowitz, Fraunhofer Center Maryland and University of Maryland College Park
eWorkshop support:
·
Forrest Shull, Fraunhofer Center Maryland
·
Raimund
Feldmann, Fraunhofer Center Maryland
·
Patricia Costa, Fraunhofer Center Maryland
Topic I: Return on Investment (ROI)
The focus of the first part of the discussion was on a specific measurement goal – showing the ROI of introducing software development practices – which is highly relevant for industry yet immensely challenging. The overall discussion goal was to gather stories and experiences about past efforts at calculating ROI and understand whether those experiences were “success stories” or not, that is, whether they supported industrial change of the desired type. Based on these experiences, we also aimed to categorize the types of issues encountered and begin to understand the relevant factors and activities involved.
The discussion
began with an overview of participants’ experience in the area of ROI
measurement. It became clear that ROI measures had been undertaken for a number
of reasons, which helped to clarify the range of discussion. Those reasons can
be summarized as:
· Need to convince internal management to invest in some particular development practice
· Need to convince people within the organization to change their work habits
· Need to show improvement over time, that is, that the organization as a whole was improving. (A particularly well-known example seemed to be the measures done within NASA’s Software Engineering Laboratory (SEL).)
Participants also discussed which practices had actually been the subjects of ROI analyses. The list that was brainstormed included:
· Reviews/Inspections
· Experience gathering techniques
· Requirements engineering
· Testing
Yet, as Oliver Laitenberger pointed out, since the ROI analyses in most of those cases showed a positive return on investment, and some of those practices are still not used widely, perhaps these were not necessarily examples of ROI success stories. From here, the conversation quickly turned to the question of what types of factors make people more or less likely to believe the results of an ROI analysis.
Important factors for calculating ROI
There was some disagreement about the relevance of the context in which an ROI measurement is done, and how it relates to the targeted users of the ROI information. Some participants (especially Vic Basili, Jürgen Münch) felt that people are more likely to believe ROI data that were collected within their own organization. Other participants felt just the opposite, that people are more inclined to believe the seemingly more objective numbers from external organizations (Laitenberger).
Carolyn Seaman attempted to bridge the gap by noting that people seem to believe ROI numbers from outside their own organization only if there is some transparency in how the numbers were computed (i.e. exactly what costs and benefits were accounted for and how they were measured), and such factors are judged to be locally relevant. Seaman felt that more important than whether the source is internal or external to the organization is whether the decision makers know and trust the source.
The time duration of the ROI measurement was also identified as a key concept affecting believability. Sandro Morasca noted that more believable effects are the ones measured over a long time period, perhaps because they give more confidence that the effect is not transitory or related to some artificial aspect of a given study. Seaman responded that long time periods may make studies more convincing, but may be actually less rigorous, since over a long time period it is difficult to pinpoint the actual cause of any improvement.
Basili noted that one difficult thing about ROI measures is
that there is usually no point of comparison: "It was always a problem to
explain to people that we had no control group and therefore no basis to decide
if we would have gotten that improvement if nothing happened."
A final factor was the actual set of measures used to indicate ROI. Laitenberger felt that convincing ROI studies are those which base conclusions on business numbers (e.g. project cost), rather than internal technical numbers (e.g. number of defects present at a given phase of software development). However, Basili felt that, similarly to the discussion about time duration, business numbers may be more convincing but less rigorous since it can be difficult to show the cause and effect.
So what IS convincing?
Given the above factors, the discussion quickly moved to trying to describe what kind of ROI study would have a high chance of being convincing.
Some participants focused on particular types of data that could help or hinder the case. Kurt Schneider felt that data from the outside may be perceived as being presented to support a predetermined conclusion (“unsolicited” data, “collected before they even asked a question”) and hence discounted. Frank Sazama agreed with Laitenberger’s earlier comment that data "must be related to the business... not technical."
Other participants focused on the need to build up convincing argumentation based on combining multiple types of data. In this case, more end users may end up seeing a form of data as part of the set that they find personally convincing. Münch pointed out that reference projects from similar contexts may be also convincing to managers, but can be risky if only one such project is used. More scientific or technical personnel may be more convinced by multiple studies from across a sampling of projects. Marvin Zelkowitz reinforced this impression by summarizing a study he conducted in which industrial managers were asked about which kinds of data they found most compelling, and reported that they preferred “real world validation such as large case studies.” The same study found that researchers in contrast were more interested in more rigorous, reproducible studies.
More than just convincing different types of people, having different types of data building on each other can be in total more convincing than any one type in isolation. As said by Münch: “…a combination of both controlled experiments (repeatable, statistically significant results) and case studies in realistic contexts are an appropriate means to come up with convincing results (if the results are positive in both cases).”
The discussion also questioned whether data alone were sufficient. Schneider reported the thesis of a talk he had heard, which argued that “you do not need so much data, but you need a convincing theory that is supported by the little data you have.” Seaman agreed that there was a need for a compelling, over-arching theory before data themselves would be convincing. The question was raised whether people only pretend to make decisions based on quantitative evidence. Seaman mentioned that she felt (although she could not prove) that this was certainly the case, and that managers make decisions based upon how much they respect and trust the source of the advice they get. (This was emphatically seconded by Zelkowitz.) However, managers still need the quantitative data for making the case to the rest of the organization.
Seaman summarized the above discussion by saying that convincing people "takes a convincing theory and a reputable expert." Once a decision-maker has been convinced and the desired practice needs to be implemented, Basili summarized the steps for building a successful ROI case that can be used to actually affect organizational process change:
1. You need some data to start with;
2. The second step is to say exactly what hypotheses you expect to be true in the organization and what data they should collect. It has to be shown that the hypotheses relate to a business issue.
3.
Finally, you need to collect the data to show
ROI in the company itself.
Based on the above discussion, a fair question (raised by Laitenberger) seemed to be whether there was still a need for additional ROI studies to be conducted by researchers. Basili and Morasca felt that the answer was yes: The difficulty in any case study or controlled experiment is showing a clear cause-effect relationship between the practice introduced and the effects. Running more studies in diverse environments can help produce a more compelling body of knowledge and give champions the ammunition they need to help get process change started.
Concrete approaches that might help
Some participants
raised questions about specific practices/mechanisms that might be used to help
produce convincing ROI cases.
One question was about the “balanced scorecard” approach and whether it could be used to target specific areas or metrics that should be addressed. Laitenberger and Rose Pajerski felt that it could be useful if only because many company managers are familiar with the business goals approach and it therefore has a certain comfort level associated with it. However, Münch was more ambivalent about whether the measures suggested by a balanced scorecard approach would translate well into useful software metrics: “In the context of monitoring systems (such as dashboards) I recognized very often that the areas Balanced Scorecard (Business Goals) and Software Goals (e.g., defined with GQM) are completely separated. The input for balanced scorecards, for instance, is typically not derived from a software measurement activity.”
There was also some discussion (and a bit more controversy) over whether simulation models of software development practices could be used to help convey a better idea of costs and benefits to decision makers. Basili and Münch felt that the answer was yes, although it is hard to get the data needed to build such models and companies may not be willing to invest in such activities. As an argument for their potential, though, Münch pointed out that simulation is very successfully applied in non-software domains. Other participants felt that the likely benefits were limited, both because in the software domain such models would be too simplistic and not capture a range of qualitative or people-related issues that are important (Schneider, Morasca), and because the use of simulation models is too complicated to give managers the type of understanding with which they are comfortable (Laitenberger).
General Conclusions:
· There seemed to be a general consensus about the role of ROI data in affecting actual organizational change. There was no dispute that different audiences find different types of evidence compelling and that producing only one type (e.g. an argument based only on quantitative data from rigorous controlled experiments) is not a successful strategy. As Schneider said, "…we cannot ignore data. But we need to embed it into a convincing longer story (with case studies to start with)."
·
Soliciting information from the
intended users about what types of data would be compelling can be useful, but
should be taken with a grain of salt. For example, managers say they make decisions based on
quantitative data, but the data is really used just to back up decisions
already made.
· Decisions are based more on being convinced by known and trusted experts who have stories and/or data. (Experts need not necessarily be internal)
· The effects of using a practice need to be related to the business goals that managers have. (The Goal-Question-Metric paradigm was suggested as a useful tool for this.)
Topic II: Measurement in process maturity models
In the second part of the discussion, participants moved to focus on the implications that common process maturity models and continuous improvement approaches (like CMM, CMMI, Profes[m1] ) have for measurement programs. Specifically, the discussion goals were to gather specific measurement goals that are implied by maturity models, and discuss based on this what kinds of process/product measures are required by the models and which are missed.
The discussion began by trying to define which models were of interest. Participants broadened the discussion to focus not just on CMM models and Profes but also on SPICE, the Quality Improvement Paradigm (QIP), and Experience Factory (EF). Participants felt these could be grouped in two broad categories:
- Those that recommend specific practices, for various levels of maturity: CMM, CMMI, SPICE
o CMMI emphasizes linking business goals to project measurements (Patricia Larsen, Sazama). Both CMM & CMMI provide guidance about best practices that should be implemented and assessing the current state of usage (Münch). CMMI includes some product measures (Sazama)
- Those that provide guidance for general improvement, along dimensions specified by a particular user: Profes, QIP, EF
o These focus more on "concrete problems and measuring the effects of solutions for these problems", i.e. "guidance on how to improve in these problem areas." (Münch, agreed by Schneider)
Because the models in each category address problems at different levels, it was agreed that there were no contradictions between the two sets. In a way, models in the “CMM-like” set can be viewed as measuring the organization, while those in the second set emphasize measurements of processes or products (Dave Weiss, agreed to by Laitenberger, Basili). Münch gave two practical examples:
- "CMM could give orientation [i.e. identify beneficial improvement areas] and QIP helps to identify key issues in these areas and helps to improve."
- SPICE has also been used "as a starting point for identifying improvement areas. It gives companies kind of a benchmark."
While agreeing that there is no incompatibility between the two levels, there was a spirited discussion as to whether CMM-like models were beneficial or whether they could in fact be “destructive” to organizations. Weiss related his experience that organizations become too focused on “achieving a certain number, rather than becoming a more effective development organization" (agreed to by Morasca and Larsen). "That's why a goal-oriented approach is needed. Goals should be to improve the organization. Start from there and not from achieving a certain number or level."
Other participants did not argue with the observation that organizations applying CMM-like assessment models can get too focused on the end number, rather than what it means. However, they felt there were some positive effects that can outweigh those dangers. Sazama agreed that assessment models can be destructive "in the beginning" but that learning starts "in the background," i.e. the act of trying to progress toward a higher number leads organizations to learn things about process improvement in the process. Laitenberger felt that "at least the number motivates for some action instead of maintaining the status quo." And Basili said: "I agree that CMM is a good consciousness raising approach, but if taken literally, it becomes destructive."
In the end, there
was some argument whether the benefits of the CMM-style models were
sufficiently valuable to outweigh the potential negatives. Basili felt that the
benefits given are actually somewhat limited: Organizations add processes
without understanding the problems or learning to select/evolve process over
time. Yet this is exactly what is needed for building adaptable, continually
improving organizations. Weiss felt that CMM(I) isn't really
needed to identify the critical problems to tackle first, anyway: "Unless
the company is very well-disciplined and everything is running smoothly, my
experience is that it is easy to identify the big problems. Everyone knows
them. It's just that they are often afraid to admit it" (agreed to by Schneider,
Sazama, Laitenberger, Zelkowitz).
General Conclusions:
·
Process
models can be grouped into two categories: Those that assess the organization
(CMM, CMMI, SPICE) versus those that suggest process or product measures for a
given situation (Profes, QIP, EF).
·
The two
levels focus on different things and there is no reason they can’t be
used together.
·
CMMI or
SPICE can be used to help organizations start on process improvement, at least
identifying areas that should be considered for improvement. However, there may
be "destructive" effects also, if the company's focus is on the end
number and nothing else.
· "Destructive" effects can also result if the company tries to implement all process recommendations without first understanding what is/isn't working. Practices need to be implemented instead by analyzing what is related to the business goals (e.g. through analysis using QIP or EF).
·
As
with ROI numbers, there is an element of education needed so that people don't
misinterpret the results.
[m1]To my understanding Profes is not a maturity model but a continuous improvement approach.
|
||||
© 2002 Fraunhofer Center MD © 1999-2002 Fraunhofer IESE For suggestions and comments, please contact webmaster@fc-md.umd.edu |