The Methodology Debate on Evidence
by Olaf Rieper and Hanne Foss Hansen, November 2007
Summary
The aim of this report is to provide an overview of central methodological positions and discussions in what is termed the evidence movement. The term »evidence« has become a positive concept in government: evidence-based policy, evidence-based practice, evidence-based management, evidence-based medicine, educational approaches, etc. The essence of the evidence movement is to summarise knowledge from several individual studies and evaluations. The objective is to produce and disseminate the best possible knowledge on the results of given interventions. The evidence movement has gained ground internationally and nationally over the last 10-15 years. We restrict ourselves to what we maintain is the novel aspect of the evidence concept, i.e. the global and international organisations and networks that specialise in producing, commissioning and communicating publicly available systematic reviews to decision-makers in policy and practice. Systematic reviews are abstracts of available research and studies on a given subject, such as the effects of a certain intervention or treatment, and they are carried out in a systematic, transparent way. Our primary focus is on the major welfare areas of healthcare, social issues and education.
Our principal conclusion is that the sometimes lively debate for and against a narrow or broad concept of evidence varies among the different sectors (the healthcare, education and social sectors). The discussions are, to a considerable degree, influenced by the traditions and interests characterising the professional groups in the various sectors. Previously, methodology discussions were mainly conducted among researchers. The evidence movement has moved the methodology discussions beyond the world of research to the political and professional level. The possible consequences are that education, social work and healthcare services may be changed and justified in policy and practice. Research knowledge plays an increasing role in how we design our society, including the public sector. And it seems that the evidence movement can make a significant contribution to this. The danger lies in a narrow definition of the evidence base on the basis of randomised controlled trials and quantitative analyses only. We maintain that the evidence movement should be organised with a broad approach to research methodology.
A number of the organisations and networks that produce and disseminate systematic reviews target certain players in several countries. For example, the Cochrane cooperation operates in the healthcare area, while the Campbell cooperation operates in the social and labour-market areas as well as criminology. Other organisations have national target groups, e.g. the UK-based »Evidence for Policy and Practice Information and Co-ordinating Centre (EPPI)«. We described these organisations in a previous report with a European focus (Bhatti, Hansen and Rieper 2006).
The products of these organisations, i.e. the systematic reviews, have great potential influence on what is accepted by the target groups – decision-makers at various levels – as reliable knowledge. This makes them important generators of knowledge that is regarded as reliable and legitimate for policy and practice. They simply define the boundaries of what can be considered »valid« knowledge. The debate therefore plays a key role in relation to the production of systematic reviews. In other words, the debate is about which methodology should be applied to the preparation of systematic reviews. One particular element of producing systematic reviews is a core issue in the debate: the qualitative assessment of primary studies to decide whether to include them in a given systematic review. Other methodology issues are also debated, such as methods to synthesise (summarise) the results of multiple primary studies. We will touch on this issue later, but our focus is on the qualitative assessment of primary studies.
In this report we first describe one of the most widely used approaches to this assessment, the «evidence hierarchy«. The »golden standard« of research design, i.e. randomised controlled trials (RCT) tops the hierarchy. Other designs are further down the scale, e.g. longitudinal studies, and, even further down, case studies. As such, the notion is that research design can be ranked to the effect that, assuming optimum implementation, some designs will provide more reliable results than others. We describe the various designs in a typical evidence hierarchy and provide examples of studies based on the various designs.
Secondly, we have reviewed the guidelines and handbooks of 10 evidence-producing organisations in the USA and Europe. Against this background, we have described these organisations' own approaches to review of primary studies. It turns out that six out of the 10 organisations state, in their own guidelines, that they apply the logic of the evidence hierarchy. (In addition, two organisations cite the guidelines of these organisations) The remaining two organisations state that their approaches are not based on a ranking of research design. They write, among other things, that their systematic reviews are not only about the effects of interventions, but also about implementation, user perception, etc. for which designs other than RCT are ideally suited. They apply a broader knowledge base than research, taking into account knowledge acquired by professionals in practice as well as knowledge acquired by users. Furthermore, they integrate research based on different designs into one systematic review. However, a qualitative assessment of primary studies is also required after their design-based selection. After all, like other designs, an RCT may have been carried out more or less successfully. It emerges that, as indicated by the guidelines of the organisations, the organisations that attach importance to the evidence hierarchy apply assessment criteria based on what might be termed a neo-positivistic paradigm with internal validity as its core. On the other hand, the organisations with another point of departure than the evidence hierarchy emphasise the relevance of primary studies to a given topic and apply a design-based assessment method, as it were. As regards synthesisation of the findings, organisations prioritising RCT (which is at the top of the evidence hierarchy) recommend meta-analysis as the ideal method. The other organisations adopt a pluralistic approach to synthesisation, also recommending narrative and conceptual synthesisation depending on the issue and the design of the primary studies. Some systematic reviews combine several synthesisation methods.
Thirdly, on the basis of examples of systematic reviews and other sources, we present an analysis of evidence-producing organisations' actual compliance with their own methodology guidelines and recommendations. It turns out that the organisations that attach importance to the evidence hierarchy have a greater share of systematic reviews comprising solely RCT-based primary studies. However, even the most ardent advocates of RCT also produce systematic reviews that include quasi experimental and other designs. One explanation for this is that there is simply not sufficient quality RCT available in the relevant area. This applies especially to Europe where RCT-based research in the social and educational sectors is less common than in the USA.
Fourthly, we outline arguments for and against RCT as a research design since RCT, at the top of the evidence hierarchy, is at the core of the methodology debate in and on the evidence movement. The RCT design is suitable for analysing the effects of limited and specific interventions e.g. clinical trials. Randomisation of the intervention and control groups ensures that neither the subjects of the intervention nor the people in charge of the study know who is included in the intervention and control groups, respectively. Furthermore, it is possible to keep all factors constant by establishing baselines before the intervention commences and by measuring the effects of the intervention. The application of RCT also gives rise to challenges and limitations, however. Firstly, RCT designs produce narrow evidence in the sense that they solely have rhetorical force concerning effects, i.e. about which interventions are effective and which are not. They have no rhetorical force concerning why something works or does not work, nor about how the subjects perceive the intervention. Secondly, there are a number of technical problems in some contexts. For example, when applying RCT in the welfare and educational areas, it is often difficult to ensure blinding, i.e. that the participants in the trial do not know whether they are in the control group or the trial group. In addition, critics have formulated a number of arguments against applying RCT in areas with complex and dynamic interventions where the context influences whether and how the interventions work. The discussions about the strengths and weaknesses of applying RCT disclose variations in understanding of causality and science-theoretical paradigms.
Finally, we briefly introduce the concept of »evidence typology« as an alternative or supplement to the evidence hierarchy. The thinking behind evidence typologies is that different study designs can potentially answer different study questions. The point of departure is not the notion that some study designs are stronger than others, since the challenge lies in adapting the study design to the questions addressed by the study. The typology approach can inspire development of more holistic study designs and systematic reviews, i.e. knowledge about various aspects of a given intervention is assessed on the basis of a whole array of study designs.



Danish Institute of Governmental Research | Købmagergade 22 | 1150 København K | E-mail: