Note: No all links in this file work as it has only recently been transferred to this platform.
Impact evaluation should always be considered in evaluation design, but it should not be assumed that impact evaluation (also known as high-level outcome/impact attribution evaluation) should always be attempted. Impact evaluation attempts to prove that changes in high-level outcomes can be attributed to a particular intervention. The appropriateness, feasibility and affordability of doing impact evaluation for any intervention should always be carefully assessed. It is is often be better to save precious evaluation resources for use only on selected high priority impact evaluations or for non-impact evaluation (e.g. implementation/formative/developmental evaluation). Attempting impact evaluation where it is not appropriate, feasible or affordable can lead to pseudo-impact evaluations which appear to be impact evaluations, but do not provide robust information sufficient to satisfy key stakeholders that changes in outcomes can actually be attributed to the particular program.
The problem with assuming that impact evaluation should always be done
It should not be presumed, before the fact, that impact evaluation should be always be attempted on a program, organization or other intervention. In each case, whenever impact evaluation is being considered, there needs to be a careful analysis of the appropriateness, feasibility and affordability of undertaking an impact evaluation. Impact evaluation may, or may not, be appropriate, feasible and/or affordable in the specific instance of a program, organization or other intervention.
Unfortunately, the expectation of many stakeholders at many levels, is that impact evaluation will be routinely undertaken in regard to all programs. Paradoxically, such insistence on attempting impact evaluation can lead to the undesired result of wasting evaluation resources. A naive insistence within evidence-based practice systems that only the results of impact evaluation will be used to determine which interventions should be used, without any consideration of the appropriateness, feasibility and affordability of impact evaluation can lead to serious distortions of such systems in the direction of ending up simply doing that which can be easily evaluated. In contrast, strategic decisions should not be based on just doing the easily evaluated, but doing that which has the greatest chance of success and is likely on a broad analysis to be the most efficient. (For more see the article on the Implications of an exclusive focus on impact evaluation in ‘what works’ evidence-based practice systems).
Understanding impact evaluation as just one part of an overall ‘outcomes system’ for an intervention
Impact evaluation is only one of six types of evidential information that can be provided regarding whether a ‘program works’ or not within an outcomes system. These six types of evidential information are known as the six building blocks of any outcomes system. They are set out in Figure 1 below and in the article on Outcomes system building blocks.
The five building blocks which can be drawn on in any outcomes system are:
- An outcomes model / intervention logic model which sets out all of the outcomes being sought and the lower-level steps it is believed are necessary to achieve them.
- Not-necessarily demonstrably attributable indicators. Measures of steps and outcomes in the outcomes model tracking improvement (but not by their mere measurement necessarily demonstrating that they have been improved by a particular program, organization or intervention).
- Demonstrably attributable indicators. Measures of steps and outcomes for which it can be proved improvements have been caused by a particular program, organization or intervention.
- High-level outcome/impact attribution evaluation designs. Attempts to establish attribution of improvements in high-level outcomes to particular programs, organizations or interventions.
- Non-outcome (formative and process) evaluation. Non-outcome/impact evaluation to improve the implementation of, or content of, the outcomes model. These are: formative evaluation (for improvement of program implementation) and process evaluation (for describing the course and context of a program).
- Economic and comparative evaluation. Comparing one program with another.
|Figure 1: Outcomes System Building Blocks|
In contrast to the idea that impact evaluation is all there is to evaluation, outcomes theory puts impact evaluation into its rightful place as a very powerful technique, but as one of five potentially complementary sources of evidence providing information on whether or not a program ‘works’. This approach does not diminish the importance and usefulness of impact evaluation – which can yield uniquely powerful information about program effectiveness. What it attempts to do is to assist in situations where impact evaluation is not appropriate, feasible or affordable. It does this by helping evaluation planners identify a mix of other types of evidence which can be brought into play when considering the question: ‘does this program work’?
The problems which arise if it is assumed that impact evaluation should always be done
A number of problems arise if it is assumed that impact evaluation should always be done in the case of any program, organization or other intervention, these are:
- The blind pursuit of impact evaluation even where it is not appropriate, feasible or affordable leads to pseudo-impact evaluations being undertaken. These are impact evaluations where a poor methodology is used which may convince some stakeholders that an impact study is being done, but which, when subject to critical methodological review will not be accepted as having robustly established attribution.
- Waste of evaluation resources. Evaluation resources need to be allocated strategically so that they are not wasted. It may be better to save scare evaluation resources for undertaking well designed, well implemented, robust impact evaluations in a few instances rather than attempting them in many instances. It may also be a much more effective use of funding to spend evaluation resources on formative evaluation (i.e. to help to ensure effective implementation) or process evaluation (i.e. to help describe an existing program so that best practice can be spread across many programs) rather than undertaking impact evaluation in a specific case.
Determining whether or not impact evaluation should be undertaken
The following process should be used to determine whether or not impact evaluation should be undertaken.
2. Determining whether an impact evaluation on full-roll out, or impact evaluation on piloting only approach should be taken. It is important to differentiate between two possible approaches. The first is where impact evaluation is undertaken on what is called the ‘full roll-out’ of a program. In this instance, the evaluation question being asked is: ‘Did this program improve high-level outcomes in the case of the full roll-out of the program?’ An alternative approach is to only do impact evaluation on a pilot of the program and then, if successful, not attempt an impact evaluation on the full roll-out. In such instances, all that is done on the full roll-out of the program is best-practice implementation monitoring. This is used to make sure that the best practice identified in the pilot is implemented in the full roll-out. The evaluation questions asked when using this second approach are: 1) ‘Did the pilot program improve high-level outcomes?’ (an impact evaluation question); 2) ‘What are the details of what the intervention consisted of in the pilot?’ (a process evaluation question); and, 3) Is best practice from the pilot being implemented in the full roll-out of the program?’ (a formative evaluation question). (Further information in the article: Full roll-out impact/outcome evaluation versus piloting impact/outcome evaluation plus best practice monitoring. Information on how to ensure best practice is implemented on full roll-out is available in the article on: Best practice representation and dissemination using outcomes models).
A good example of the widespread use of the second paradigm is in the area of normal clinical medical treatment. A pragmatic and effective approach to the use of evaluation resources is used in this sector. When a patient visits a physician and they are prescribed a drug treatment, in the normal course of medical practice, there is often no attempt to undertake an impact/outcome evaluation to establish whether any improvement takes place because of the treatment, placebo or some other factor. However, the concept is that the physician will be applying best practice in their decision to give the treatment based on impact/outcome evaluations which have been undertaken in a ‘pilot’ phase (i.e. in the course of drug trials).
There is currently an irony in the practice of evaluation and evidence-based practice. This is the fact that clinical medical treatment is viewed as a relatively evidence-focused endeavor – when compared, for instance, with some types of social programs. Such social programs are encouraged to adopt a more evidence-based approach and sometimes clinical medicine is held up as an example (for instance by reference to such comprehensive reviews of effectiveness in medicine as the Cochrane Collaboration). In response the attempt is made to evaluate the effectiveness of such social programs following the example of clinical medicine. However, sometimes the naive attempt is made to evaluate social programs using the first paradigm described in this article – impact/outcome evaluation on full program roll-out, even where this is not appropriate, feasible or affordable. If those designing the evaluation of such social programs clearly differentiated between the two paradigms outlined in this article, they could use the often more appropriate second paradigm – impact/outcome evaluation on piloting and best practice monitoring on full program roll-out. In doing so they would be emulating the normal approach being used in routine clinical medical treatment.
3. Assessing the appropriateness, feasibility and affordability of impact evaluation designs. If it has been decided, following consideration of 1 and 2 above, to investigate the feasibility of undertaking an impact evaluation of a program, then the three factors of appropriateness, feasibility and affordability of particular impact evaluation designs then needs to be considered. Appropriateness relates to issues of the ethical and cultural appropriateness of undertaking an impact evaluation. Feasibility relates to whether or not is is going to be possible to actually undertake the impact evaluation. Perhaps the most difficult aspect of determining feasibility is to make sure that there has been consideration of the feasibility of seeing the impact evaluation right though to a conclusion. Impact evaluations are sometimes launched by those inexperienced in the reality of overseeing such evaluations. There are many practical problems which may arise in the course of an impact evaluation which may mean that it has to be abandoned before it provides any useful results. All of these possible risks should be thought through as part of the assessment of the feasibility of an impact evaluation. Lastly, even if doing an impact evaluation is appropriate and feasible, it may not be affordable. Affordability needs to be considered in terms of the alternative possible uses of the evaluation funds which will be used on the impact evaluation. Undertaking an effective impact evaluation can be an expensive exercise.
Examples of appropriateness, feasibility and affordability analysis of impact evaluation
Examples of the analysis of appropriateness, feasibility and affordability of impact evaluation designs are given below:
Example 1. Performance Based Research Fund (PBRF)
The Performance Based Research Fund (PBRF) is a nation-wide academic research output assessment system. An evaluation plan was developed for it using an applied method based on outcomes theory. This applied method is now known as Easy Outcomes (it was formerly known as the REMLogic Method and the OIIWA approach). The analysis of outcome evaluation designs is shown in Table 5 on pages 29-33 of the evaluation plan for the PBRF.
Example 2. Community Central
Community Central Evaluation is a national internet-based on-line networking platform for the community sector. A visual evaluation plan using the Easy Outcomes approach has been developed for Community Central (it is here). Within the evaluation plan, an analysis was undertaken of possible impact evaluation designs. This analysis is set out below:
NOT FEASIBLE. It is not feasible to randomly assign groups of sector users to using or not using Community Central on the full-roll out of the system as one could not stop the control group using electronic networking in their work. However the system will be piloted, but not using an experimental design because of the expense of an experimental design in piloting.
NOT FEASIBLE. This design creates an intervention group by selecting those in most need of an intervention, giving them the intervention and comparing their outcomes to those less in need. It is not appropriate or feasible in this case.
Time-series analysis design
NOT FEASIBLE. There is no good regular collections of data on networking amongst the relevant sectors which would provide a sufficiently long and detailed data-series to allow identification of the impact of the introduction of Community Central at a particular point in time on networking within the relevant sectors.
Constructed matched comparison group design
NOT FEASIBLE. There is no groups which is sufficiently similar to those who will be using Community Central which will not be using electronic networking and which could be used as a comparison group. This is because most people in most sectors are increasingly making use of electronic networking. An international comparison with another country would also not be feasible because all similar countries are moving to use electronic networking in relevant sectors.
Exhaustive causal identification and elimination design
NOT FEASIBLE. This design would rely on a robust measure of increased sector networking and then would try to identify all of the possibilities for why this may have occurred rather than Community Central having caused it. Then the role of these other factors would be systematically examined and, if eliminated, the conclusion would be drawn that Community Central had caused the change. There is no real external measure of networking in the sector apart from the usage results from Community Central from which stakeholders will be able to draw their own conclusions about the level of networking occurring on Community Central.
Expert judgment design
FEASIBLE, HOWEVER NOT AFFORDABLE WITHIN EVALUATION BUDGET. This design, of asking an expert whether in their judgment there is improved networking in the relevant sectors is unlikely to add much more information over and above the usage measures which will be able to be provided from the system and from which stakeholders can draw their own conclusions about the level of networking occurring on Community Central.
Key informant design
APPROPRIATE, FEASIBLE AND AFFORDABLE. WILL BE DONE. This is the approach which will be used to answer this question. An electronic questionnaire will be circulated to groups of users within Community Central. This will include general users and administrative users who will be in more of a position to comment on the use of the system overall.
Example 3: A national new building regulatory regime
An evaluation plan for a national new building regulatory regime was developed (it is available here). The new building regulatory regime was introduced as a consequence of the failure (due to leaking) of a number of buildings under the previous national regulatory regime. The analysis of the possible impact evaluation designs is given below:
NOT FEASIBLE. This design would set up a comparison between a group which receives the intervention and a group (ideally randomly selected from the same pool) which does not. For ethical, political, legal and design compromise reasons it is not possible to implement the interventions in one or more localities while other localities (serving as a control group) do not have the interventions. Apart from anything else, statutory regulation could not be imposed on only part of the country. In addition, there is a major design compromise problem given the practical and political importance of having a high standard of new building work it is likely that compensatory rivalry would reduce any difference outcomes between the intervention and control group. Compensatory rivalry is where the control locality also implements the interventions which are being evaluated because it also wants to achieve the outcomes which area as important to it as to the locality receiving the intervention.
NOT FEASIBLE. This design would graph those localities which could potentially receive the intervention on a measurable continuum (e.g. the quality of buildings in the locality). The intervention would then only be applied to those localities below a certain cut-off level. Any effect should show as an upwards shift of the graph at the cut-off point. In theory it would be possible to rank local authorities in order of the quality of their new building work and if resources for the intervention were limited it would be ethical to only intervene in those with the worst new building work occurring and hence establish a regression discontinuity design. However, the political, legal and design compromise (as in the above experimental design) mean that a regression-discontinuity design is not feasible.
NOT FEASIBLE. This design tracks a measure of an outcome a large number of times (say 30) and then looks to see if there is a clear change at the point in time when the intervention was introduced. This design would be possible if multiple measures of new building quality were available over a lengthy (say 20 year) time series which could then continue to be tracked over the course of the intervention. However this design has the design compromise problem that there is another major factor – which can be termed the ‘crystallization of liability’ which is occurring at the same time as the introduction of the new building regulatory regime. The crystallization of liability is a consequence of all the stakeholders now becoming aware of the liability they can be exposed to due to failure of many buildings and the attendant liability claims which have arisen from them. It should be noted that this crystallization, of course, does not mean that any available time series data cannot be used as a way of tracking the not-necessarily attributable indicator of quality of new building work over time. It is just that any such time series analysis would be silent on the question of attribution of change to the new building regulatory regime.
Constructed matched comparison group design
NOT FEASIBLE. This design would attempt to locate a group which is matched to the intervention group on all important variables apart from not receiving the intervention. This would require the construction (identification) of a comparison group not subject to a change in its regulatory regime, ideally over the same time period as the intervention. Since the new building regulatory regime is a national intervention such a comparison group will not be able to be located within the country in question. It is theoretically possible that one or more comparison groups could be constructed from other countries or regions within other countries. However discussions so far with experts in the area have concluded that it is virtually impossible for a country or region to be identified which could be used in a way that meets the assumptions of this design. These assumptions are: that the initial regulatory regime in the other country was the same; that the conditions new buildings are exposed to in the other country are similar; that the authorities in the other country do not respond to new building quality issues by changing the regulatory regime themselves; and that there are sufficient valid and reliable ways of measuring new building quality in both countries before and after the intervention. It should be noted that while some of these assumptions may be met in regard to some overseas countries, all of them would need to be met for a particular country to provide an appropriate comparison group.
Causal identification and elimination design
LOW FEASIBILITY. This design works through first identifying that there has been a change in observed outcomes and then undertaking a detailed analysis of all of the possible causes of a change in the outcome and elimination of all other causes apart from the intervention. In some cases it is possible to develop a detailed list of possible causes of observed outcomes and then to use a ‘forensic’ type process (just as a detective does) to identify what is most likely to have created the observed effect. This goes far beyond just accumulating evidence as to why it may be possible to explain the observed outcome by way of the intervention and requires that the alternative explanations be eliminated as having caused the outcome. This may not be possible in this case due to the concurrent crystallization of liability, discussed above, which occurred in the same timeframe as the intervention. It is likely that this cause is significantly intertwined with the intervention in being responsible for any change that occurs in new building practice and that it will be impossible to disaggregate the effect of the intervention from the effect of crystallization of liability. A feasibility study should be undertaken to make sure that this design is not feasible.
Expert judgment design
HIGH FEASIBILITY. This design consists of asking a subject expert(s) to analyze a situation in a way that makes sense to them and to assess whether on balance they accept the hypothesis that the intervention may have caused the outcome. One or more well regarded and appropriate independent expert(s) in building regulation (presumably from overseas in order to ensure independence) could be asked to visit the country and to assess whether they believe that any change in the new building outcomes is a result of the new building regulatory regime. This would be based on their professional judgment and they would take into account what data they believe they require in order to make their judgment. Their report would spell out the basis on which they made their judgment. This approach is highly feasible but provides a significantly lower level of certainty than all of the other outcomes evaluation designs described above. If this design is used then the evaluation question being answered should always be clearly identified as: In the opinion of an independent expert(s) has the new building regulatory regime led to an improvement in building outcomes? There are obvious linkages between this design and the causal identification and elimination design above and the feasibility study for that design should also look in detail at the possibilities for the expert judgment design.
Key informant judgment design
HIGH FEASIBILITY. This design consists of asking key informants (people who have access by virtue of their position to knowledge about what has occurred regarding the intervention) to analyze a situation in a way that makes sense to them and to assess whether on balance they accept the hypothesis that the intervention may have caused the outcome. A selection of stakeholder key informants (key informants are people who have knowledge of what has occurred in an intervention) could be interviewed in face to face interviews and their opinions regarding what outcomes can be attributed to the new building regime could be summarized and analyzed in order to draw general conclusions about the effect of the intervention. This could be linked in with an expert judgment and a causal elimination design as are described above.
This type of analysis of impact evaluation is an integral way of working when using the Easy Outcomes approach. The Easy Outcomes approach provides a way of setting out an evaluation plan around a visual outcomes model (intervention logic model).
Please comment on this article
This article is based on the developing area of outcomes theory which is still in a relatively early stage of development. Please critique any of the arguments laid out in this article so that they can be improved through critical examination and reflection.
Citing this article
Duignan, P. (2009). Impact evaluation – when it should, and should not, be used. Outcomes Theory Knowledge Base Article No. 242. (http://knol.google.com/k/paul-duignan-phd/impact-evaluation-when-it-should-and/2m7zd68aaz774/86 ) or (http://tinyurl.com/otheory242).
[If you are reading this in a PDF or printed copy, the web page version may have been updated].