 Capacity building resources

British Household Panel Survey (BHPS) data analysis

Paul Lambe

Paul is a Research Fellow at Peninsula Medical School, University of Plymouth (

Aims of this resource

A panel is a group of people who are surveyed periodically over time. Panel data, also sometimes known as longitudinal data or cross-sectional time series data, provide multiple observations on each individual in the panel over time. Two fundamental types of information can be derived from panel data: cross-sectional information that inform us about the differences between subjects or groups of subjects at a particular moment in time, and time series information that inform us about changes within subjects or groups of subjects over time. Longitudinal studies enable the study of the dynamics of change across the life course and the effects of earlier characteristics on later outcomes. For these reasons panel data have become an increasingly used resource in applied social research. A huge range of secondary datasets are available, many containing education related information. However, as pointed up by Anna Vignoles (see The Uses of Large Data Sets in Educational Research on this website) some are not widely used by education researchers Anna Vignoles also points out that education researchers may need better information on which datasets contain education related information, and where they might get straight forward guidance on access and basic manipulation of such data. This contribution is aimed at encouraging and enabling [early stage] education researchers to exploit the rich resource of panel data available to them, and using the appropriate statistical methods and software programmes.

Searching for education related panel data for secondary analysis

A first reference point for those new to panel data and an essential and accessible introduction to the purpose, design, conduct and analysis of panel studies is, David Rose (editor) (2000) Researching Social Change and Economic Change : the uses of household panel studies , Routledge, (see also Martin et al , 2006, Strategic Review of Panel and Cohort Studies: Report to the Research Resources Board of the Economic and Social Research Council, available at

The following sites provide easy access to panel survey data, enabling researchers to find secondary data relevant to their particular research aims and objectives. Institute for Social and Economic Research (ISER) Keeping Track: A Guide to Longitudinal Resources, ( Provides an up-to-date guide to major longitudinal sources of data enabling users to locate information about studies which may provide data useful to their research interests.

The Question Bank at the Centre for Applied Social Surveys, University of Surrey ( ) has been designed to help users locate examples of specific research questions and see them in the context within which they have been used for data collection. It is particularly useful in the search for relevant datasets for secondary analysis enabling the researcher to identify datasets in which topics of interest are covered. The site includes detailed instructions on how to locate questions of interest to the researcher and a very useful resource map for survey researchers with links to sources of data, social research centres, gateways, survey training courses and resources, and government departments' survey sites.

Economic and Social Data Service (ESDS) and United Kingdom Longitudinal Studies Centre (ULSC), Longitudinal datasets available from the UK Data Archive, download service, specialist user support and user support link with Centre for Longitudinal Studies (CLS). (

United Kingdom Longitudinal Studies Centre (UKLSC). Links to UK Longitudinal Surveys and data resources, and to non-UK Longitudinal surveys. (

Economic and Social Data Service (ESDS) NESSTAR Catalogue, delivery system for linked data and documentation, provides access online to a selection of key survey data sets, descriptive information, variable description frequencies, and enables production of tabulations, analyses and graphical charts. (

Current Educational Research in the UK (CERUK) database covers ongoing research and completed research since 2000 in education related disciplines. (

Accessing panel data sets

Having found an appropriate dataset that meets the aims and objectives of the proposed research the data can be accessed and downloaded via the UK Data Archive. Access to the data requires a simple online registration using the Athens authentication system, however, access to study descriptions and online documentation, including questionnaires, variable lists, key words and related studies is freely available. Some survey data sets have special conditions of use most of which can be agreed to online. Data can be downloaded free of charge or ordered on CD and quantitative data formats include SPSS, Stata, and tab delimited formats. User support for secondary analysis of their data sets is available at

Panel data analysis and the British Household Panel Survey

The British Household Panel Survey ( has not been widely used in education research and thus the following focuses on panel data analysis using the BHPS dataset in order to provide some specific guidance and enable and encourage its wider use by education researchers.

The BHPS dataset and permission to use it can be acquired from the UK Data Archive. Access for academic use is open and freely available and the data can be downloaded by registered users in SPSS, Stata and Tab-delimited formats, and other formats are available by special order. The BHPS access page contains links to further information and resources including a list of Frequently Asked Questions, and to BHPS data sets, including subsets of BHPS data created within the NESSTAR system. The online BHPS documentation contains in Volume A, a User's Guide relating to all waves and in Volume B a codebook for each wave with links to the material via Index Term, Record Type, or Wave and a Subject Category Thesaurus. The thesaurus enables users to locate the variables which are most relevant to their research interests and to find other variables with related data throughout the database. An index of publications of analyses using the BHPS datasets including links to abstracts and some full versions of papers is available at ( For a user friendly introduction to the BHPS see Buck, N., Gershuny, J., Rose, D., and Scott, J. editors (1994) Changing Households : The BHPS 1990-1992 .Colchester:ESRC Research Centre on Micro-Social Change.

The BHPS data set at present comprises 15 annual waves, each containing 16 record file types of individual level or household level information and 3 cross-wave record file types for matching between waves. Restructuring the data to suit the analytical needs of a particular research project, in terms of the type of unit of analysis, by the merging of information from different data files at the same wave and, because we are working with panel data, by the joining of files from different waves with one another, requires extensive preparatory work. This data construction and data management, including the manipulation of variables, the forming of datasets at different levels of aggregation, the merging of data, and the creation of panel datasets is often a daunting and complex task requiring programming skills.

However the task is, in my opinion, made much easier if the BHPS data set is in the Stata format. Not only is the handling and manipulation of the panel data easier but also whilst Stata will do virtually all the statistical procedures found in other such programmes it is particularly strong on and designed specifically for the analysis of survey data, panel data analysis and survival analysis. It is recommended that panel data analysis is carried out using the Stata programme (

Beginner and intermediate Stata courses using the BHPS dataset are available each summer at the Essex Summer School in Social Science Data Analysis, University of Essex ( as is a course on panel data analysis which uses the BHPS dataset and the Stata programme.

Stata is a command driven general purpose statistics package and its reference manuals provide very detailed information on each command, and each command has associated with it a help file that may be viewed within a Stata session. There are many books on Stata for particular types of analysis

(see for up-to-date information). The Stata website has useful information for learning Stata plus internet courses are available. The UCLA Academic Technology Services provide examples of how analyses are carried out using Stata ( Further useful information about training courses and a brief critique of Stata by Vernon Gayle of the University of Stirling can be accessed at

See also for more on longitudinal data sources, datasets, software, SPSS and Stata support, publications and UK research training, also see Stephen Jenkins

Statistical methods for analysis of panel data

Longitudinal data enables research questions about change and event occurrence to be addressed. However, different types of research questions demand different analytic approaches. Research questions about change in a particular attribute over time are addressed using methods known variously as:- growth modelling, multi-level modelling, hierarchical linear modelling, random coefficient regression and mixed modelling. Research questions concerning event occurrence, when we want to know whether and when an event occurred and how occurrences vary as a function of predictors, are addressed by methods known variously as:- survival analysis, event history analysis, failure time analysis and hazard modelling. There are many excellent texts available on these methods. However, in my opinion the most accessible is Applied Longitudinal Data Analysis: Modelling Change and Event Occurrence , Oxford University Press 2003, by Judith Singer and John Willett. Furthermore an extremely useful webpage with links to an extensive range of support materials is available at ( The link to the UCLA Academic Technology website provides access to datasets, computer programmes and output that form the basis of the book, and examples are programmed in all the major statistical packages, including Stata. Another useful book is Multi-level and Longitudinal Modelling Using Stata, Stata Press Publications, (2005) by Sophia Rabe-Hesketh and Anders Skrondal. The focus of the book is regression modelling when data are in some way clustered, for example students in schools, with processes operating at different levels, for example, students' characteristics interacting with institutional characteristics (see also A Handbook of Statistical Analyses using Stata, Third Edition, CRC Press, by Sophia Rabe-Hesketh and Brian Everitt (2004)).

It is also sometimes necessary and desirable to analyse panel data cross-sectionally. An accessible and essential guide to regression models for categorical outcomes is Regression Models for Categorical Dependent Variables Using Stata, Revised Edition, Stata Press Publications, (2003) by J.Scott Long and Jeremy Freese. The first part of the book contains information about installing Stata and commands created by Long and Freese. The second part includes an introduction to Stata for those who have not used the programme, and the third part of the book issues of estimation, testing, assessing model fit and interpretation.

Latent class (cluster) analysis (LCA) is increasingly used to characterise the life course in relation to learning using analyse panel data (see Ross MacMillan and Scott Eliason, ‘Characterising the life course as role configurations and pathways' Chapter 24 in Handbook of the Life Course , Handbooks of Sociology and Social Research Series, editors J.T.Mortimer and Michael J. Shanahan, Kluver Academic, 2003) LCA is a statistical method for finding sub-types of related cases (clusters/latent classes) from multivariate categorical data. LCA identifies a number of mutually exclusive clusters that best summarise survey response patterns. It enables the identification of the relative proportion of a survey sample assigned to each cluster, the probability of being assigned to a particular cluster, enables the identification of typical and atypical groups, and the characterisation of the nature of respondents by each cluster, or typologies (see Hagenaars and McCutcheon 2002, McCutcheon 1987). Courses on latent class analysis are available at the Essex Summer School in Social Science Data Analysis, University of Essex ( A statistical software programme LEM, publications and much more is freely available on Jerome K. Vermunt's website at . More sophisticated versions of this software can be purchased from Latent Gold along with users' guide, technical guide, tutorials and support at (

BHPS based analyses from the Learning Lives Project

Some examples of BHPS based panel data analysis are available on the Learning Lives website


How to reference this page: Lambe, P. (2007) British Household Panel Survey (BHPS) data analysis. London: TLRP.

