home  news  search  vre  contact  sitemap
 Capacity building resources

Experimental Designs

Stephen Gorard


Stephen is ...


1. Introduction
            1.1 We all need trials
            1.2 The basic idea
2. The value of trials
            2.1 Comparators
            2.2 Matching groups
            2.3 Causal models
3. Simple trial designs
            3.1 Sampling
            3.2 Design
4. Related issues
            4.1 Ethical considerations
            4.2 Warranting conclusions
            4.3 The full cycle of research
5. Resources for those conducting trials
            5.1 Reporting trials
            5.2 Tips for those in the field
6. Alternatives to trials
            6.1 Regression discontinuity
            6.2 Design experiments
7. Some examples of trials
            7.1 STAR
            7.2 High/Scope
            7.3 ICT
8. References

How to reference this page

6. Alternatives to randomised controlled trials

6.1 Regression continuity

By Thomas Cook, Northwestern University

The RCT is the most efficient method of establishing a causal relationship between two interventions; however, it is not the only approach. Theoretically a regression discontinuity design can produce as unbiased an estimate of effect as a RCT (see Luyten 2006). There are many instances where it is not possible to undertake a RCT, due often to political or practical rather than ethical reasons, and other designs to establish a causal relationship have to be sought. The RCT and RD have a common feature: participants are selected for an intervention using a known and measurable variable. In a RCT this is achieved by random allocation. After the random allocation we know the random allocation status of each participant, whereas with RD we can achieve this measuring some characteristic of a participant and determining group allocation on this measurement. As an example, Cook and Wong discuss an intervention aimed at children who are gifted and talented. We might measure a child’s IQ and offer an intervention to those who score 140 or higher and not offer it to those who score lower. Or we might offer a school or university scholarship to those who are from families with incomes below a certain threshold and not offer the scholarship to those above the threshold. University students might be offered increased support if they have an A level point score below a threshold and not if they have one above.

To assess the effectiveness of an intervention we would then follow up the whole cohort of participants: those above and those below the threshold, and essentially plot their outcomes against their baseline score which determined their group allocation. If the intervention has an effect on outcomes we would expect to see a break or a ‘discontinuity’ at the point where the assignment on the basis of the allocation score was made. If there was no effect there would be no break in the regression line. Mathematically this approach has been proven to be as unbiased, if implemented properly, as a properly implemented RCT. Nevertheless in the ‘real’ world few RCTs or RDs can be implemented perfectly. Consequently it is important to test empirically whether or not the RD design as implemented produces similar results to a real world RCT. After introducing the RD design and summarising the history of its development Cook and Wong then go on to produce an overview of 3 methodological studies where the outcomes of RD studies were compared with similar RCTs. All of these methodological experiments had some limitations; nevertheless, there was broad concordance between the RD and the RCT, thus giving support to the notion that where an RCT is either not feasible or practical then a RD should be the quasi-experimental design of choice.

The RD design addresses some important limitations of the RCT. For those who are at the extreme of a distribution and it is thought to be unethical or infeasible to withhold an intervention from a random proportion of the population then the RD approach can allow us to evaluate the intervention in a robust manner. The drawbacks of the RD design are mainly twofold: first the cut point determining the allocation needs to be respected otherwise the design becomes ‘fuzzy’ and loses power; second, even when there is a sharp break the design is at least 2.75 times less powerful than the RCT. On the other hand power might be increased by the ability of the RD method to include a larger cohort in the study than an RCT. The RD design is an elegant solution to the problem of not being able to undertake an RCT; however, it is not used widely partly due to lack of dissemination and research support of this method. When a RCT is not possible researchers should automatically consider using a regression discontinuity design and only choose other quasi-experimental methods if this is not possible.

6.2 Design experiments

Historically, design experiments have been the province of the artificial and design sciences including such disciplines as aeronautics, artificial intelligence, architecture, engineering and medicine. Whereas the natural sciences have been concerned with how things work and how they may be explained, design sciences are more concerned with producing and improving artefacts or design interventions, and establishing how they behave under different conditions. The process of designing an artefact or intervention that can be utilised independently, across a variety of settings, may begin with established theory and may provide insights that reflect on, shape, or create established theory. However, this is not the main aim of the design science approach as it often is for the natural science approach (Kelly 2003).

In 1992, Ann Brown, a psychologist, looked to the field of engineering for inspiration on how to conduct experimental research in classrooms. She called her formulation - a hybrid cycle of prototyping, classroom field-testing, and laboratory study - ‘design experimentation’. This concept has been transmuted into a diverse set of teaching interventions, educational software design projects, and learning environment manipulations loosely termed ‘design experiments’, ‘design studies’, or ‘teaching experiments’. Design studies tend not to be very prescriptive, not having a given set of rules which the researcher should follow. In this sense they represent an approach to doing research. In education, this kind of research has been typically associated with the development of curricular products, teaching and learning methods, or software tools (Collins 1992). At the core of such research is the development of an artefact for the purposes of improving teaching and learning, and recent years have seen a marked growth in the number of investigators associating their work with this genre of research (Cobb et al. 2003).

The main objective of a design experiment is to produce an artefact, intervention or initiative in the form of a workable design. The emphasis, therefore, is on a general solution that can be ‘transported’ to any working environment where others might determine the final product within their particular context. This emphasis also encourages theory-building and model-building as important outcomes from the research process. The strength of this is that the ‘design’ for an intervention, artefact or initiative can then be readily modified in different settings. Or it can be further developed by other practitioners, policy-makers, researchers or designers without having to make a ‘leap of faith’ in its interpretation, generalisation or transformation.

Traditionally, in order to determine what strategies are effective in education, educational processes have been subjected to experiments based on made-up situations in laboratory conditions, which isolate the topic from its context and which rest on the assumption that there is a clear theoretical basis for addressing questions related to the processes being tested. Within a design science approach on the other hand, currently accepted theory is used to develop an educational artefact or intervention that is tested, modified, re-tested and re-designed in both the laboratory and the classroom, until a version is developed that both achieves the educational aims required for the classroom context, and allows reflection on the educational processes involved in attaining those aims. In other words, a design science approach allows the education researcher to study learning in context, while systematically designing and producing usable and effective classroom artefacts and interventions. In doing so, it seeks to learn from sister fields such as engineering product design, the diffusion of innovations, and analysis of institutional change (Zaritsky et al. 2003).

The potential of this shift from traditional to design-based experimental approaches to educational research is illustrated in Brown’s (1992) seminal piece on design experimentation. Brown’s work began by addressing a theoretical question concerning the relative contributions of capacity and strategic activity in children’s learning and instruction. Laboratory experimentation was used to address the question of why children fail to use appropriate strategies in their approach to learning tasks. This experimentation involved teaching children strategies for learning, and then asking them to use the strategies to memorise lists of words. Results suggested that even the most meagre form of strategic activity would increase the children’s memory, but that this improvement was not maintained outside of the laboratory. The strategies were not transferred from the work with word lists as used in laboratory conditions to the coherent content children are expected to learn in complex, realistic settings.

A design experiment approach was used, therefore, to address the question of what the absolutely essential features are that must be in place to cause change in children’s capacities for learning under the conditions that one can reasonably hope to exist in normal school settings. This question was addressed by designing an intervention, known as reciprocal teaching, on the basis of (impoverished) theoretical understanding. The intervention was implemented in the classroom, evaluated, allowed to modify current theoretical understanding, revised, re-evaluated and re-applied in an iterative fashion. It demonstrated feedback-coupling from each stage, within an overall iterative process. Each modification to the design of the reciprocal teaching intervention was monitored and recorded, and represented a new phase in the design experiment. Testing of the design iterated between the laboratory and the classroom as an attempt was made to arrive at an optimal design for the classroom setting, while also building theoretical understanding of the mechanisms involved in learning, and generating questions for further research. Testing relied on the evaluation of each modification to the design on the basis of observational data, measurement data and current theory (just like complex interventions). The researcher and the teachers were able to make in situ changes to the intervention, making it possible to establish via observation which were the critical and non-critical elements of the reciprocal teaching strategy, as well as establishing how the strategy worked. Thus, the design experiment generated an effective classroom intervention that could be used independently by teachers in their own classrooms.

Design activity is a creative process incorporating continuous modification and testing. It begins with proposing the ‘form’of an artefact, intervention or initiative, which in turn drives its ‘behaviour’. The behaviour constitutes observable and empirical data. An iterative design-and-test cycle is critical for the transformation of both the ‘form’ of the artefact, intervention or initiative and of its intended function to conform to the pragmatic demands of utility and market. The resulting design is not necessarily an actual product. Rather, designs can be thought of as the theory that specifies the parameters of a ostensible product, and the conditions under which a product (if it embodies the design) can be successfully implemented. For example, the concept of a ‘bridge’ and more specifically the designof a bridge are plausible solutions to the problem of getting from point A to point B across a river. The power of a design as a theory rests in the fact that a common design can be enacted across different situations. In other words a common blueprint can guide the construction of two different bridges across two different chasms, each with different geology, elevation and other minor differences and still be recognised as the same design.

An example from education is a common curriculum. Situational demands (e.g. national and local standards, school calendar, language constraints) dictate that teachers alter the available tasks and materials to meet the needs of their students. Even so, the resulting classroom activity, obstacles, useful tools, models for teaching and subsequent achievement based on a curriculum are likely to be similar from class to class and school to school across the intended age range or Key Stage. In other words, though variation will exist in the outcomes of a design, the range of possible outcomes is finite, and the qualities of the outcomes will be constrained by the behaviour of the design. Whether or not that behaviour facilitates or inhibits learning and instruction is an empirical question of the degree to which the design process accounts for robustness to situational perturbations. This can be assessed in a trial or testing phase.

Traditional approaches to establishing the effectiveness of educational artefacts and interventions, such as laboratory-based experiments, can be unrealistic and they tend to rely on the assumption that the educational processes under study are already well theorised. Because educational processes operate in complex social situations, and are often poorly understood theoretically, design experiments may offer a more appropriate approach to many areas of educational research (National Research Council 2002). Traditional approaches, such as laboratory experiments, end when an artefact or intervention is found to be ineffective and is therefore discarded. Design experiments carry the additional benefit of using, rather than discarding, an ineffective design as the starting point for the next phase of the design process. Whereas laboratory experiments may never indicate why a particular artefact or intervention is ineffective (only that it is ineffective), the changes that are necessary to move from an ineffective to an effective design in a design experiment may well illuminate the sources of the original design’s failure. Additionally, design experiments provide a formal, clear and structured place for the expertise of practitioners or policy-makers to be incorporated within the production of artefacts and interventions designed for use in the classroom. Where other approaches, such as action research, also provide such a role for implementers, the design experiment does so while retaining the benefits and minimising the drawbacks of an experimental approach to education research.

A particular strength of design experiments is an idea that unifies all of engineering – the concept of ‘failure’. From the simplest paper clip to the Space Shuttle, inventions are successful only to the extent that their developers properly anticipate how a device can fail to perform as intended. The good scientist (or engineer) recognises sources of error: errors of model formulation, of model specification, and model validation. Design experiments must also recognise error but, more importantly, they must understand error or failure, build better practical theory (“humble theory” according to Cobb et al. 2003) and design things that work (whether these are processes or products). Unfortunately, many design experiments have been conducted by advocates of particular approaches or artefacts, and sometimes by those with a commercial interest in their success. This makes it an especial concern that so few of their reports make any mention of comparison or control groups (see ‘Educational Researcher’ 2003, 32, 1). Without these, and without working towards a definitive test, how can they persude others that they have successfully eliminate plausible rival explanations? Is retrospective narrative analysis enough? We do not believe so.

Most engineers develop failure criteria, which they make explicit from the outset. These criteria provide limits that cannot be exceeded as the design develops. However, failure manifests itself differently in different branches of engineering. Some problems of engineering design do not lend themselves to analytic failure criteria, but rather to models of trial and error, or to build-and-measure techniques. In the design of computer programs, for example, the software is first alpha-tested by its designers and then beta-tested by real users in real settings. These users often uncover bugs that were generated in the design or its modification. Furthermore, these users also serve to show how the program might fail to perform as intended. No matter what method is used to test a design, the central underlying principle of this work is to obviate failure. This is a very different model than the one currently practiced by many social scientists.

The first phase, a feasibility study, would start with an initial design of the intervention, ensuring that the intervention was grounded in whatever theory was available, and an explicit interpretation of the proposed causal mechanism. In Brown’s example above the reciprocal teaching intervention was designed on the basis of a priori theoretical understanding. Without this the intervention may have been entirely inappropriate to the problem being addressed, and without some understanding of the theoretical underpinnings of the intervention there may be great difficulty in understanding how the intervention works, hence how to modify it or evaluate it. The early stages of the feasibility study would involve primarily qualitative methods in the formative evaluation of the intervention, using interviews, focus groups, observation and case studies to identify how the trial intervention is working, barriers and facilitators to its implementation, and provide early indications as to how it may be improved. These more ‘explanatory’ routes of enquiry powerfully complement any earlier use of secondary data analysis in identifying the initial research problem.

The first phase of the design experiment should, therefore, start with an intervention that has been sufficiently well developed to be tested in a feasibility study, where it can be implemented in full, and tested for acceptability to both providers (policy-makers and teachers) and the target audience (students and learners). The feasibility study is also an opportunity to test the trial procedures, such as the definition of the alternative treatment, which may be the usual care, control, or some alternative intervention, and to pilot and test the eventual outcome measures. It may be used to provide a tentative estimate of the intervention impact, which can then be used to plan the size of the later trials. The results of the feasibility study will help to decide whether the intervention should proceed to the next phase, or whether it is necessary to return to the process of identifying the research problem and developing the theoretical framework which the intervention is originally based. Given the pragmatic and fiscal constraints of all scientific research the feasibility study may suggest that the entire research process should end at this first phase – although real-life funding structures suggest this is unlikely to happen in practice.

The second phase (prototyping and trialling) begins a process of iteration between the testing and further modification of the intervention. Parallel to this is the potential to iterate the process between the laboratory (or other controlled environments) and the classroom (or real-life environments). These iterative processes continue into the third phase (field study). Phase 2 is characterised by piloting small-scale multiple prototypes of the intervention (in the same way that a wind tunnel can be used to test many variations of an aircraft wing at the same time). As the iterations between testing and further design become more sophisticated, and the iterations between laboratory and classroom settings become more robust, advances are made in the intervention’s prepositional framework and in outlining its plausible causal models.

It is at this point that the research sequence enters the third phase (the field study), where it is implemented in full and tested for acceptability to both providers and the target audience. The iterative process may continue but the design of instructional sequences become stronger and stronger leading, eventually, to a robust model that aids the implementation of the intervention in many contexts. At this point the documentation and recording of the process for implementing the intervention should be systematic, as this develops the parameters for future ‘transportation’ of the design. This field study should involve a definitive test. In the design experiment, this definitive trial could take the form of a randomised controlled trial, an interrupted time series analysis or a concurrent quasi-experiment. Borrowing this procedure from the complex intervention model suggests that the outcome(s) of interest for the design experiment must be fixed first else, if this is modified as well as the intervention, then there will be no fixed point to the research. The approach simply becomes a 'trawl' that will eventually find something.

By allowing theory to be built alongside the design and testing of an intervention of some sort, design experiments negate the need for well-established theory stipulated by the complex intervention (Kelly and Lesh 2002). As a consequence, design experiments can be used more widely. They also allow evaluation, modification, and re-evaluation of a design that is not accommodated by the approach to complex interventions. If a complex intervention fails, a researcher must go back to the drawing board, rather than being able to arrive at an optimal new design in a rigorous way. Put another way, design experiments give us a potentially superior approach to the model- and hypothesis-generating stages of research – of working towards a trial.


de Corte, E., Verschaffel, L. and van De Ven, A. (2001) Improving text comprehension strategies in upper primary school children: a design experiment, British Journal of Educational Psychology, 71, 531-559

Gorard, S., Roberts, K. and Taylor, C. (2004) What kind of creature is a design experiment?, British Educational Research Journal, 30, 4, 577-590

Sloane, F. and Gorard, S. (2003) Exploring modeling aspects of design experiments, Educational Researcher, 32, 1, 29-31


How to reference this page: Gorard, S. (2007) Experimental Designs. London: TLRP. Online at (accessed )

Creative Commons License TLRP Resources for Research in Education by Teaching and Learning Research Programme is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 2.0 UK: England & Wales License




homepage ESRC