What is impact evaluation?
In impact evaluation (IE), the term ‘impact’ is synonymous with attribution, or ‘what difference’ was made by a development project, intervention, programme, or policy. An impact evaluation measures the difference in outcomes with the programme compared to outcomes without the programme, and tells us whether the programme worked or not. One way to understand impact evaluation is to contrast it with ‘outcome monitoring’ a famous example being the Millennium Development Goals (MDGs). As important as the MDGs have been in helping set a more common agenda in development, monitoring targets within this framework just tells us if outcomes are being achieved or not, but not the extent to which these can be attributed to the activities of development stakeholders. What if conditions – weather, conflict, recession, pandemics etc. – are making it harder to tackle poverty and disease, or reduce malnutrition, or prevent mothers dying during labour and children dying before their first birthday, and without development programmes these indicators would be worse? We can’t tell this simply by looking at outcomes data.
Impact evaluation – a brief history
Impact evaluation has become an increasingly ‘hot’ topic in international development over the last decade, due to the work of research organisations – notably the Poverty Action Lab (J-PAL) and sister organisation Innovations for Poverty Action (IPA), the Development Impact Evaluation Initiative (DIME), and 3ie – and the support of Southern governments, particularly in Latin America, and international development agencies. Indeed, donors are increasingly coming under pressure to ‘prove’ to their constituents that what they’re doing is working, and impact evaluation is central to the ‘evidence based’ movement in development policy-making worldwide. This movement has been inspired by pioneering IEs in the area of Conditional Cash Transfer programmes, most famously Mexico’s Progresa (e.g. Gertler, 2000), which showed big effects on child wellbeing and has been hugely influential in policy circles both in Mexico and globally.
However, IE itself is not new – use of ‘social experiments’ and trials of economic policy interventions in Western country contexts dates back at least to the 1960s, while high quality IEs have been conducted in Southern countries in the areas of nutrition and water and sanitation (WASH) since the 1970s. The big differences are, firstly, in the range of interventions which are being covered by IE – right across the social sectors, to areas where many thought IE was not possible even 5 years ago, like infrastructure and SMEs development. Secondly, the ways in which IEs are carried out have changed, and there has been a huge amount of cross-disciplinary learning. Many of the techniques that are now routinely drawn on in economics and political science originate from medicine and psychology – the most obvious being the randomised control trial (RCT) evaluation design.
What research methods are used in impact evaluation?
Although the randomised control trial is probably most closely associated with IE and often held up as the ‘gold standard’, there are in fact lots of different ways of evaluating impact quantitatively (see Ravallion 1999; and here). For most development interventions and outcomes, a strong evaluation should collect data on outcomes from two groups – a treatment group which receives the intervention, and a comparison group which doesn’t; and impact is calculated by comparing outcomes between the two groups. The simple reasoning behind this is that there are lots of additional things happening that affect development outcomes, which might confound our attempts to attribute outcomes to our intervention, which a control group helps filter out. Secondly, the evaluation needs to be able to control for ‘selection bias’, which is a fancy way of saying that programme recipients are usually not typical of the general population. Imagine an anti-poverty programme which targets the extreme poor, or a micro-credit programme that is more likely to lend to less risk averse people. Simply comparing beneficiaries and non-beneficiaries would likely lead to the wrong estimate of the impact of the intervention, probably by overstating benefits in the case of micro-credit (less risk averse people are more likely to do better than others anyway in most outcomes, with the possible exception of health due to accident), and understating them in anti-poverty programmes (poorer people are by definition doing worse than most). This is why RCTs randomly allocate people to the intervention or control group (to see why this might solve the selection bias problem, see White, 2011), and other methods use statistical measures like propensity score matching, regression discontinuity design, and instrumental variables regression analysis (see Ravallion, 2008).
So far, impact evaluation has largely been a quantitative beast. The emphasis in international development programme evaluation, especially with the push from 3ie and impact evaluation network NONIE, is for theory-based impact evaluation which aims to answer not just whether our programme works, but why and how to improve outcomes. It is important to incorporate mixed methods of research in IE, since qualitative methods are frequently the best way to answer the ‘why’ and ‘how’ questions (see Bamberger , Rao and Woolcock, 2010).
What are the challenges?
IE is still fairly new and exciting, and there are concerns about the approach, such as on the ethics of RCTs or on external validity (generalising findings to other contexts). Some of these critiques are justified, but generic scare stories also abound. Withholding ‘treatment’ from the control populations may certainly be unethical in some circumstances, but as in medicine you can have a ‘standard treatment’ control group or employ a waiting list (pipeline) design (as was adopted in the Progresa evaluation). It is also sometimes argued that IE methods like RCTs are not appropriate to evaluate some of the most important policy interventions that affect the whole population (e.g. pension system reforms), but in many cases an ‘encouragement’ design can be used. A third critique is that IEs are not applicable to a lot of development interventions, either because they are too expensive, or because there are not enough beneficiaries to employ a control group or use statistical methods. Oxfam GB discuss their approach to dealing with this here, and 3ie will shortly be publishing a paper on ‘small n’ impact evaluation using qualitative methods.
The most obvious way to assess generalisability is by replicating interventions in many different contexts. Another way is to ‘open the black box’ by collecting process data to help understand the factors driving success and failure. But in making generalisations for lesson learning purposes, we also need to be careful that we are not cherry picking evidence to support our prior beliefs, through systematic collection of existing evidence and analysis. Systematic reviews, which synthesise all the available evidence in a rigorous and transparent way, are the ‘missing link’ between the production of evidence and its consumption by policy-makers and practitioners (for organisations supporting reviews, see here, here and here).
If you want to know more…
3ie has recently announced a call for proposals for systematic reviews. For this call, 3ie developed a list of proposed systematic review questions, which address ten policy relevant questions in a broad range of sectors like microfinance, health insurance, water and sanitation, agriculture, governance and climate change. For more information on how to apply, or see the existing reviews which 3ie is funding, see here.
In addition, in 2011 3ie joined forces with the Campbell Collaboration to establish a group to produce systematic reviews in development, called the International Development Coordinating Group (IDCG). The group is multidisciplinary and aims to support both the production of high quality reviews, and to build capacity to conduct reviews among authors, including those based in low and middle income countries. The group’s secretariat is based at 3ie’s offices at LIDC.
You can meet us and learn more about IE at the monthly 3ie-LIDC Seminar Series ‘What works in international development’. We will also be presenting at the upcoming LIDC conference ‘Measuring the impact of higher education for development’ on 19-20 March.
Contributed by Hugh Waddington, Senior Evaluation Officer with the International Initiative for Impact Evaluation (3ie), based at LIDC.