home  news  search  vre  contact  sitemap
 Capacity building resources

Experimental Designs

Stephen Gorard

Stephen is Chair in Education Research at the University of Birmingham, UK


1. Introduction
            1.1 We all need trials
            1.2 The basic idea
2. The value of trials
            2.1 Comparators
            2.2 Matching groups
            2.3 Causal models
3. Simple trial designs
            3.1 Sampling
            3.2 Design
4. Related issues
            4.1 Ethical considerations
            4.2 Warranting conclusions
            4.3 The full cycle of research
5. Resources for those conducting trials
            5.1 Reporting trials
            5.2 Tips for those in the field
6. Alternatives to trials
            6.1 Regression discontinuity
            6.2 Design experiments
7. Some examples of trials
            7.1 STAR
            7.2 High/Scope
            7.3 ICT
8. References

How to reference this page


This resource is intended for those wishing to increase their understanding and appreciation of experimental designs for the social sciences, especially education. The focus here is on randomised controlled field trials, but there is also discussion of laboratory, quasi-, thought, and design experiments. The early sections explain the fundamental ideas and concerns of these designs, while the later sections provide resources for those conducting (or intending to conduct) a trial. Each section ends with a number of selected articles or web-based resources allowing the reader to pursue these ideas further. These resources have been collated or created by myself and colleagues working on two of my ESRC-funded projects – the TLRP Research Capacity-building Network ( and the RDI project Training in Pragmatic Social Interventions ( I have reproduced and attributed here some resources from these sites written by Thomas Cook, Carole Torgerson, and David Torgerson. Both of the websites have further resources and ideas about trials, and associated concepts such as sampling, along with papers describing trials and their results from the annual conference on RCTs in the Social Sciences started by us at the University of York. Other useful references include Chapter 8 in Gorard (2003) for ‘beginners’, Chapters 6 and 7 in Gorard with Taylor (2004) for ‘intermediates’, and Torgerson and Torgerson (2008) for more ‘advanced’ readers and those intending to conduct a randomised controlled trial in the near future. For an excellent introduction to issues of research design in general, see the first few chapters in de Vaus (2001).

1.1 We all need trials

There are many reasons why all researchers in education should know a considerable amount about the conduct and interpretation of work involving experimental designs, and these are considered further in section 3. Perhaps most obviously, the ESRC has required that all research students in the social sciences must have a basic understanding of ‘various forms of recording data from experimental and quasi-experimental research’ and proficiency in analysis of data from the same. For education students the requirement is even more explicit - all students will need to understand experimental methods. Therefore, their tutors and supervisors must have the same or higher levels of knowledge. Designs like class-level randomised controlled trials in schools are perfectly achievable for practice-based masters students, and of course for PhDs. Every researcher is anyway required to read the research of others and to make appropriate critical judgements of its worth. They do this in their literature reviews and evidence syntheses, and so must be able to understand the unique advantages of trial designs, and the well-theorised threats to internal and external validity (see section 3.2). Most researchers are also asked to peer-review the work of others for publication in books and paper, or in applications for grant funding. Again, it is essential that all researchers understand the advantages and limitations of trials so that they can judge this work effectively. The dangerous alternative in each case would be to leave judging the value of trials to a small clique of researchers already familiar with the concept and probably with each other. Perhaps most importantly of all, understanding the idea of an experimental design allows all researchers to imagine their own research as thought experiments, encouraging scepticism and humility about their own studies (section 4.2), and promoting the use of appropriate comparators (sections 2.1, 2.2).

Research design is an important but neglected part of research methods development in the UK. It is often misinterpreted as referring to the design of instruments and data collection or analysis routines. Such practical issues are related loosely to research design, but only loosely. A design like an experiment or randomised controlled trial does not entail any specific form of data collection or analysis. The outcomes of a trial could be the performance measures of participants, their perceptions or attitudes, or their first hand accounts of experiencing the intervention, for example. Trials do not fit into purported paradigms (see section 4.3), and do not entail any ‘isms’. Anyone who is genuinely curious about how to improve education can gain from understanding about trials. But, to repeat, research design especially concerning trials tends to be neglected in current methods resources and training.

Those popular research methods texts that do cover design (in its genuine sense) often do so after making the reader deal with issues of identity and epistemology, and the dubious qualitative and quantitative paradigms (such as Bryman 2001). In Cohen et al. (2007) experiments are covered as a ‘style’ of research, in the 13th chapter, long after paradigms, constructivism, ethics and so on. Design should come before all of these. In Somekh and Lewin (2005) the idea of experiments appears to have been confused with empirical research itself. This is not surprising when one learns that the chapter on design was written by a post-modernist and appears as part of a section entitled sampling. In a book of hundreds of pages, trials get one paragraph written by someone who does not do trials and clearly disapproves of them. Probably the biggest gap in such resources is consideration of the issue of causation.

Other than in purely descriptive work (e.g. '17% more of this type of crime was committed by men than women in 1999'), a research report that did not at least imply a causal model might look rather odd. Causes are central to our notion of understanding why things work as they do. Yet despite this prevalence, social science research methods courses and textbooks tend to overlook the discussion of causal models completely, or else prepare the novice researcher simply with the negative advice that a correlation is not the same as causation. In these standard books, everyone is reminded therefore what is not a cause, and what a cause is not. In some methods books there is a section on the potential and limitations of experiments which points to their unique selling point - the claim to be a direct test of cause and effect (Fisher 1935). But this is a scarce and recently revived phenomenon in social science outside psychology. In general, the concept of cause and effect remains untaught and undiscussed. Truly, it is a 'skeleton in the cupboard of philosophy' (Russell, in Ayer 1972). And despite the importance of experimental designs, and the claims of many researchers to understand them, analysis of RAE returns for the UK and of journal articles suggests that less than 1% of published work is experimental (Gorard et al. 2004).

Good experimental designs testing quite narrowly defined hypotheses (to minimise confounding variables) have considerable power, especially as part of a larger cumulative programme of research via replication, expansion and verification of the findings (see section 4.3). Above all they can help us overcome the equivalent of the potted plant theory which is distressingly common in much research, policy-making and practice. For example, this theory suggests that if efficient schools have a potted plant in the foyer then putting a potted plant in the foyer of other, less successful, schools will lead to an improvement in their quality. Sounds ludicrous? I bet that much of the research evidence you have read recently is just as ludicrous in nature, once you think about it carefully. Unless we intervene or rigorously monitor the effect of natural interventions we can never be clear whether our observations of patterns and apparent relationships are real or whether they are superstitions similar to the potted plant theory.

In many ways the experiment is seen as the 'flagship' of research designs. The basic advantage of this approach over any other is its more convincing claim to be testing for cause and effect, via the manipulation of otherwise identical groups, rather than simply observing an unspecified relationship between two variables. In addition, some experiments will allow the size of any effect to be measured. It has been argued that only experiments are thus able to produce secure and uncontested knowledge about the truth of propositions. Their design is flexible, allowing for any number of different groups and variables, and the outcome measures taken can be of any kind (including in depth observations), although they are normally converted to a coded numeric form. The design is actually so powerful that it requires smaller numbers of participants as a minimum than would be normal in other designs. The analysis of the results is also generally easier than when using other designs, because so much is catered for a priori rather than dredged for post hoc.

Social science research has, for too long, relied on fancy statistical manipulation of poor datasets, rather than well designed studies (FitzGibbon 2000, 2001). When subjected to a definitive trial by experiment, many common interventions and treatments actually show no effect, identifying resources wasted on policies and practices. Perhaps that is also partly why there is considerable resistance to the idea of the use of experimental evidence. Social work was one of the areas where natural experiments were pioneered but, when these seldom showed any positive impact from social work policies, social workers rejected the method itself rather than the ineffective practices (Torgerson and Torgerson 2001). Those with vested interests in other current social science beliefs and theories may, similarly, consider they have little to gain from definitive trials (although this is, of course, not a genuine reason for not using them).

Gorard, S., Rushforth, K. and Taylor, C. (2004) Is there a shortage of quantitative work in education research?, Oxford Review of Education, 30, 3, 371-395

1.2 The basic idea

This section outlines the basic experimental design for two groups. In this, the researcher creates two (or more) groups by using different treatments with two samples drawn randomly from a parent population (or more commonly by dividing one sample into two at random). Each sub-sample becomes a treatment group. The treatment is known as the independent variable(s), and the researcher selects a post-treatment outcome or measure known as the dependent variable(s). Usually one group will receive the treatment and be termed the experimental group, and another will not receive the treatment and be termed the control group (see Table 1.1).

Table 1.1 - The simple experimental design
















In a standard design, the researcher then specifies a null hypothesis (that there will be no difference in the dependent variable between the treatment groups), and an experimental hypothesis (the simplest explanation of any observed difference in the dependent variable between groups). The experimental hypothesis can predict the direction of any observed difference between the groups (a one-tailed hypothesis), or not (a two-tailed hypothesis). Only then does the experimenter obtain the scores or observations of the dependent variable and analyse them. If there is a clear difference between the two groups, it can be said to be caused by the treatment. See sections 3.2 and 5 for some extensions and variations to this simple design.

A one-tailed prediction is intrinsically more convincing. There are always apparent patterns in data. The experimental design tries to maximise the probability that any pattern uncovered is significant, substantial, generalisable and/or replicable. Merely rejecting the null hypothesis as too improbable to explain a set of observations does not make a poorly-crafted experimental hypothesis right. There are, in principle, an infinite number of equally logical explanations for any result. The most useful explanation is therefore that which can be most easily tested by further research. It must be the simplest explanation, usually leading to a further testable prediction.

There are in summary six steps in the basic experiment:

•     formulate a hypothesis (which is confirmatory/disconfirmatory rather than exploratory)
•     randomly assign cases to the intervention or control groups (so that any non-experimental differences are due solely to chance)
•     measure the dependent variable (as a pretest, but note that this step is not always used)
•     introduce the treatment or independent variable
•     measure the dependent variable again (as a posttest)
•     calculate the size or significance of the differences between the groups.

A simple example might involve testing the efficacy of a new lecture plan for teaching a particular aspect of mathematics. A large sample is randomly divided into two groups. Both groups of students sit a test of their understanding of the mathematical concept, giving the researcher a pre-test score. One group is given a lecture (or lectures) on the relevant topic in the usual way. This is the control group. Another group is given a lecture using the new lecture plan. This is the experimental treatment group. Both groups then sit a further test of their understanding of the mathematical concept giving the researcher a post-test score. The difference between the pre- and post-test scores for each student yields a gain score. The null hypothesis will be that both groups will show the same average gain score. The alternate hypothesis could be that the treatment group will show a higher average gain score than the control group. These hypotheses can be tested using a t-test for unrelated samples. If the null hypothesis is rejected, and if the two groups do not otherwise differ in any systematic way, then the researcher can reasonably claim that the new lecture plan caused the improvement gain scores. The next stage is to assess the size and value of the improvement, at least partly in relation to the relative cost of the treatment.

Cook, T. (2002) Randomized experiments in educational policy research: a critical examination of the reasons the educational evaluation community has offered for not doing them, Educational Evaluation and Policy Analysis, 24, 3, 175-199




How to reference this page: Gorard, S. (2007) Experimental Designs. London: TLRP. Online at (accessed )

Creative Commons License TLRP Resources for Research in Education by Teaching and Learning Research Programme is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 2.0 UK: England & Wales License




homepage ESRC