5. Resources for those conducting trials
5.1 Reporting trials
Part of the concern about the quality of UK education research in the late 1990s was to do with the quality of reporting. Many researchers appear to want to keep details of their publicly-funded research private for some reason. In health, deficiencies in the reporting of trials, at least, have been tackled through a widely-agreed set of reporting guidelines – the Consort statement, http://www.consort-statement.org/. The guidelines require authors to report key generic characteristics of RCTs. The situation is much weaker in education (Torgerson et al. 2005). While awaiting the equivalent of a Consort statement for education, trial authors might want to consider the following issues among others:
- Are the findings discussed consistent with the data reported?
- Was data reported, or made available on a website, in sufficient detail to allow a reader to judge its quality and relevance?
- Are the conclusions drawn warranted by the evidence and is the warrant made explicit?
- Is the sample described fully, including why it is of a specific size, the method of allocation to groups, and the number dropping out?
- Was randomisation undertaken by a third party?
- How were drop out cases dealt with – such as via intention to treat (or teach)?
- Was the kind of analysis specified before data collection and analysis, and have multiple tests been run?
- What are the threats to validity in this study?
- How much of the design was blind to the researchers?
- Do the researchers have any vested interest in the results?
5.2 Tips for those in the field
Increasing power at no cost
By David Torgerson, University of York
It is almost automatic when randomising participants or clusters to different treatments within a trial to try and have the same number of cases in each treatment group – a 1:1 allocation ratio of intervention to control group. This tradition has grown up because a 1:1 ratio, for any given sample size, usually ensures maximum statistical power – where ‘power’ is the likelihood of correctly finding a difference between the groups for a given effect size. However, where there are resource shortages limiting the number of cases that can be offered the intervention treatment, then power can be increased by randomly allocating more participants to the control group.
For example, we might only have sufficient resources to offer an intervention to 50 participants. If we used equal allocation then our sample size is constrained to 100 participants. However, if we set the allocation ratio to 2:1 then we can randomise 150 participants, with 100 in the control group and 50 in the intervention group. By doing this we will get more statistical power than if we had simply used equal allocation and had a total sample size of only 100, but at little or no cost when the control group simply receive ‘normal’ treatment anyway. We can set the allocation ratio as high as we wish, although once it exceeds 3:1 then the extra increase in power tends to be slight and it may not be worth the effort of following up a much larger sample size. In summary, increasing the size of the control group in this way can give increased power for very little cost.
Of course, if the total sample size is constrained then using unequal allocation will reduce power - although not by much unless the ratio exceeds 2:1. For example, in a trial of 100 participants if we allocated 32 to one group and 68 to the other the decline in power is only 5% compared with the situation where we put 50 in each group. This loss of power might be worthwhile if we can make considerable resource savings.
The ‘waiting list’ and the ‘stepped wedge’ designs
By Carole Torgerson, University of York
There generally has to be a prime facie case for trialling some new intervention in public policy, perhaps based on prior but less rigorous evidence of its effectiveness. One potential problem then faced by a randomized controlled trial is that the control group is unhappy with being denied this promising intervention. When we are evaluating an intervention where the evidence is uncertain then there is no ethical problem: indeed, it is ethically correct to offer the participants a chance (random possibility) to be offered the most effective intervention, which may very well be the control condition. On the other hand, potential participants may not be convinced that having the control intervention is as likely to be as beneficial as the novel condition. Because of this anticipated benefit, those allocated to the control may suffer resentful demoralisation, and either refuse to continue with the experiment or deliberately or subconsciously do worse merely because they have been refused the novel intervention. We may also wish to evaluate the implementation of an intervention that has been shown to be effective in a laboratory type RCT (explanatory or efficacy trial) and we may wish to evaluate its effectiveness in the ‘real’ world. Finally, a national policy may be implemented when there is dubious or no real evidence of effectiveness
but the political imperative is to be seen to do ‘something’.
One way of addressing these problems is to use either a ‘waiting list’ or a ‘stepped wedge’ design. In a waiting list study participants are told explicitly that they will receive the intervention; however, some will receive it straight away, whilst others will receive it later. We can then evaluate the effectiveness of the intervention by measuring both groups at pre-test, implementing the intervention in one group, giving a post-test measurement (after which the intervention is then given to the controls). As an example consider the RCT by Brooks et al. (2006). This evaluated the use of a computer software package in a secondary school. The package was usually implemented arbitrarily, as there were insufficient lap top computers for all pupils to receive the intervention at the same time. For the evaluation the researchers changed the arbitrary assignment to random allocation and adopted a waiting list design which permitted a rigorous evaluation of the software package. The use of the waiting list in this instance allowed all the children to receive the package and may have reduced any demoralisation either on the part of the children or their teachers. The stepped wedge design differs from the waiting list design in that it operates as a series of waiting lists. For example, the Sure Start evaluations (which did not use a randomized controlled trial design) could have used the stepped wedge approach. In this study there was huge political pressure to implement ‘something’. A RCT design could have been incorporated into the evaluations by randomising geographical areas into several implementation phases. Baseline pre-tests would have occurred at the beginning of the study, and post-tests would have been included each time a new geographical area implemented the intervention. In this manner it would have been possible to control for potential confounding factors that may have undermined the non-randomised evaluation method that was used for Sure Start. More information on the stepped wedge design can be found in a recent systematic review of the method (Brown and Lilford, 2006).
|How to reference this page:
||Gorard, S. (2007) Experimental Designs. London: TLRP. Online at http://www.tlrp.org/capacity/rm/wt/gorard (accessed