Administrative data

Administrative data is the term used to describe everyday data about individuals collected by government departments and agencies. Examples include exam results, benefit receipt and National Insurance payments.


Attrition is the discontinued participation of study participants in a longitudinal study. Attrition can reflect a range of factors, from the study participant not being traceable to them choosing not to take part when contacted. Attrition is problematic both because it can lead to bias in the study findings (if the attrition is higher among some groups than others) and because it reduces the size of the sample.

Body mass index

Body mass index is a measure used to assess if an individual is a healthy weight for their height. It is calculated by dividing the individual’s weight by the square of their height, and it is typically represented in units of kg/m2.

Cohort studies

Cohort studies are concerned with charting the lives of groups of individuals who experience the same life events within a given time period. The best known examples are birth cohort studies, which follow a group of people born in a particular period.

Complete case analysis

Complete case analysis is the term used to describe a statistical analysis that only includes participants for which we have no missing data on the variables of interest. Participants with any missing data are excluded.


Conditioning refers to the process whereby participants’ answers to some questions may be influenced by their participation in the study – in other words, their responses are ‘conditioned’ by their being members of a longitudinal study. Examples would include study respondents answering questions differently or even behaving differently as a result of their participation in the study.


Confounding occurs where the relationship between independent and dependent variables is distorted by one or more additional, and sometimes unmeasured, variables. A confounding variable must be associated with both the independent and dependent variables but must not be an intermediate step in the relationship between the two (i.e. not on the causal pathway).

For example, we know that physical exercise (an independent variable) can reduce a person’s risk of cardiovascular disease (a dependent variable). We can say that age is a confounder of that relationship as it is associated with, but not caused by, physical activity and is also associated with coronary health. See also ‘unobserved heterogeneity’, below.


Cross-sectional surveys involve interviewing a fresh sample of people each time they are carried out. Some cross-sectional studies are repeated regularly and can include a large number of repeat questions (questions asked on each survey round).

Data harmonisation

Data harmonisation involves retrospectively adjusting data collected by different surveys to make it possible to compare the data that was collected. This enables researchers to make comparisons both within and across studies. Repeating the same longitudinal analysis across a number of studies allows researchers to test whether results are consistent across studies, or differ in response to changing social conditions.

Data imputation

Data imputation is a technique for replacing missing data with an alternative estimate. There are a number of different approaches, including mean substitution and model-based multivariate approaches.

Data linkage

Data linkage simply means connecting two or more sources of administrative, educational, geographic, health or survey data relating to the same individual for research and statistical purposes. For example, linking housing or income data to exam results data could be used to investigate the impact of socioeconomic factors on educational outcomes.

Dummy variables

Dummy variables, also called indicator variables, are sets of dichotomous (two-category) variables we create to enable subgroup comparisons when we are analysing a categorical variable with three or more categories.

General ability

General ability is a term used to describe cognitive ability, and is sometimes used as a proxy for intelligent quotient (IQ) scores.


Heterogeneity is a term that refers to differences, most commonly differences in characteristics between study participants or samples. It is the opposite of homogeneity, which is the term used when participants share the same characteristics. Where there are differences between study designs, this is sometimes referred to as methodological heterogeneity. Both participant or methodological differences can cause divergences between the findings of individual studies and if these are greater than chance alone, we call this statistical heterogeneity. See also: unobserved heterogeneity.

Household panel surveys

Household panel surveys collect information about the whole household at each wave of data collection, to allow individuals to be viewed in the context of their overall household. To remain representative of the population of households as a whole, studies will typically have rules governing how new entrants to the household are added to the study.


Kurtosis is sometimes described as a measure of ‘tailedness’. It is a characteristic of the distribution of observations on a variable and denotes the heaviness of the distribution’s tails. To put it another way, it is a measure of how thin or fat the lower and upper ends of a distribution are.

Longitudinal studies

Longitudinal studies gather data about the same individuals (‘study participants’) repeatedly over a period of time, in some cases from birth until old age. Many longitudinal studies focus upon individuals, but some look at whole households or organisations.

Non-response bias

Non-response bias is a type of bias introduced when those who participate in a study differ to those who do not in a way that is not random (for example, if attrition rates are particularly high among certain sub-groups). Non-random attrition over time can mean that the sample no longer remains representative of the original population being studied. Two approaches are typically adopted to deal with this type of missing data: weighting survey responses to re-balance the sample, and imputing values for the missing information.

Observational studies

Observational studies focus on observing the characteristics of a particular sample without attempting to influence any aspects of the participants’ lives. They can be contrasted with experimental studies, which apply a specific ‘treatment’ to some participants in order to understand its effect.

Panel studies

Panel studies follow the same individuals over time. They vary considerably in scope and scale. Examples include online opinion panels and short-term studies whereby people are followed up once or twice after an initial interview.


A percentile is a measure that allows us to explore the distribution of data on a variable. It denotes the percentage of individuals or observations that fall below a specified value on a variable. The value that splits the number of observations evenly, i.e. 50% of the observations on a variable fall below this value and 50% above, is called the 50th percentile or more commonly, the median.

Prospective study

In prospective studies, individuals are followed over time and data about them is collected as their characteristics or circumstances change.

Recall error or bias

Recall error or bias describes the errors that can occur when study participants are asked to recall events or experiences from the past. It can take a number of forms – participants might completely forget something happened, or misremember aspects of it, such as when it happened, how long it lasted, or other details. Certain questions are more susceptible to recall bias than others. For example, it is usually easy for a person to accurately recall the date they got married, but it is much harder to accurately recall how much they earned in a particular job, or how their mood at a particular time.

Record linkage

Record linkage studies involve linking together administrative records (for example, benefit receipts or census records) for the same individuals over time.

Reference group

A reference group is a category on a categorical variable to which we compare other values. It is a term that is commonly used in the context of regression analyses in which categorical variables are being modelled.


Residuals are the difference between your observed values (the constant and predictors in the model) and expected values (the error), i.e. the distance of the actual value from the estimated value on the regression line.

Respondent burden

Respondent burden is a catch all phrase that describes the perceived burden faced by participants as a result of their being involved in a study. It could include time spent taking part in the interview and inconvenience this may cause, as well as any difficulties faced as a result of the content of the interview.

Retrospective study

In retrospective studies, individuals are sampled and information is collected about their past. This might be through interviews in which participants are asked to recall important events, or by identifying relevant administrative data to fill in information on past events and circumstances.


Sample is a subset of a population that is used to represent the population as a whole. This reflects the fact that it is often not practical or necessary to survey every member of a particular population. In the case of birth cohort studies, the larger ‘population’ from which the sample is drawn comprises those born in a particular period. In the case of a household panel study like Understanding Society, the larger population from which the sample was drawn comprised all residential addresses in the UK.

Sampling frame

A sampling frame is a list of the target population from which potential study participants can be selected.


Skewness is the measure of how assymetrical the distribution of observations are on a variable. If the distribution has a more pronounced/longer tail at the upper end of the distribution (right-hand side), we say that the distribution is negatively skewed. If it is more pronounced/longer at the lower end (left-hand side), we say that it is positively skewed.

Study participants

Study participants are the individuals who are interviewed as part of a longitudinal study.

Survey weights

Survey weights can be used to adjust a survey sample so it is representative of the survey population as a whole. They may be used to reduce the impact of attrition on the sample, or to correct for certain groups being over-sampled.


The term used to refer to a round of data collection in a particular longitudinal study (for example, the age 7 sweep of the National Child Development Study refers to the data collection that took place in 1965 when the participants were aged 7). Note that the term wave often has the same meaning.

Target population

The population of people that the study team wants to research, and from which a sample will be drawn.

Tracing (or tracking)

Tracing (or tracking) describes the process by which study teams attempt to locate participants who have moved from the address at which they were last interviewed.

Unobserved heterogeneity

Unobserved heterogeneity is a term that describes the existence of unmeasured (unobserved) differences between study participants or samples that are associated with the (observed) variables of interest. The existence of unobserved variables means that statistical findings based on the observed data may be incorrect.


Variables is the term that tends to be used to describe data items within a dataset. So, for example, a questionnaire might collect information about a participant’s job (its title, whether it involves any supervision, the type of organisation they work for and so on). This information would then be coded using a code-frame and the results made available in the dataset in the form of a variable about occupation. In data analysis variables can be described as ‘dependent’ and ‘independent’, with the dependent variable being a particular outcome of interest (for example, high attainment at school) and the independent variables being the variables that might have a bearing on this outcome (for example, parental education, gender and so on).


The term used to refer to a round of data collection in a particular longitudinal study (for example, the age 7 wave of the National Child Development Study refers to the data collection that took place in 1965 when the participants were aged 7). Note that the term sweep often has the same meaning.

Learning Hub

Britain’s mobility problem

Teenage Bullying

If you are born into a working class family, what are your chances of moving up the social ladder? Longitudinal research provides some of the most reliable sources of evidence about levels of social mobility in the UK.

Key finding

Children born into working class families are significantly less likely to move up the ladder than their peers from middle class homes. These inequalities have persisted for generations.

About the research

Researchers from the University of Oxford and the London School of Economics and Political Science analysed information on the childhood and adult social class of more than 32,000 Britons across four generations. They concluded that contrary to public fears, social mobility in the UK was not in decline.

However, among younger generations the experience of downward (as opposed to upward) mobility is more common than in the past. And considerable class inequalities continue to shape the likelihood of people born into different classes ending up in a high social class themselves.

This research was funded by the Economic and Social Research Council.

Research questions

  • Has absolute mobility – the total number moving up and moving down – changed over time? Has this differed for men and women?
  • How has upward mobility changed in relation to downward mobility over time?
  • Have rates of relative mobility changed over time?

Studies used

MRC National Survey of Health and Development (1946 British birth cohort)

Following 5,000 people born in England, Scotland and Wales in a single week of March 1946

National Child Development Study (1958 British birth cohort)

Following 17,000 people born across England, Scotland and Wales in a single week of March 1958

1970 British Cohort Study

Following 17,000 people born across England, Scotland and Wales in a single week of April 1970

Understanding Society

Following 40,000 households from across the UK. In this study, the researchers included Understanding Society participants who were born in 1980-84

Data and definitions

Defining social class

Each participant’s occupation was classed according to the National Statistics Socio-Economic Classification (NS-SEC), a system used by the Office for National Statistics to understand the structure of socioeconomic positions in modern societies. The current NS-SEC classifications are as follows:

1. Higher managerial, administrative and professional occupations
1.1 Large employers and higher managerial and administrative occupations
1.2 Higher professional occupations
2. Lower managerial, administrative and professional occupations
3. Intermediate occupations
4. Small employers and own account workers
5. Lower supervisory and technical occupations
6. Semi-routine occupations
7. Routine occupations
8. Never worked and long-term unemployed

Why father’s and not mother’s social class?

Information on mother’s occupation was not available for the generations born in 1946 and 1980-84. For this reason, the researchers used father’s occupation instead, as it was available for everyone. However, they did test to see whether their results changed when including mother’s occupation for the 1958 and 1970 cohorts. They found no significant differences.

Absolute mobility

Absolute mobility rates show the percentage of individuals whose class destinations are different to their class origins. It can be subdivided into two categories: upward mobility (individuals who have moved from a lower class of origin into a higher class destination) and downward mobility (individuals who have moved from a higher class of origin to a lower class destination). Absolute rates are influenced by changes in the overall class structure (for example, an expansion in professional jobs).

Relative mobility

Relative mobility rates focus on the relative chances of individuals from different class origins arriving at different class destinations, irrespective of changes in the overall class structure. They assess the ‘stickiness’ of the relationship between a person’s class position (their destination) and that of their parents (their origin).

Key findings

Absolute mobility

More than three quarters of men ended up in a different social class as that of their fathers – the proportion was relatively unchanged from 1946 to the early 1980s. For women, absolute mobility increased slightly from 1946 to the 1980s – from around 77 per cent of women in 1946, to about 82 per cent for women born in the early 1980s.

When the researchers looked at the number of people moving up and down the social ladder, the picture was slightly different. Men born in the 1946 were 2 to 3 times more likely to move up in social status rather than down. But over time upward mobility became less likely, and downward mobility more common. The generation born in the early 1980s was just as likely to move up as down.

Women’s mobility followed similar trends, but differences between upward and downward mobility was less marked than for men.

The proportion of people originating in the middle and upper classes tripled between the generation born in 1946, and those born in 1980-84. The number of people born into the working classes halved over the same period.

Relative mobility

The researchers found that relative mobility rates (which focus on the relative chances of individuals from different backgrounds moving up or down the social ladder) have changed little since the post-war generation. When they looked at rates for men and women, they found little change among men but evidence of improved mobility (described as ‘increasing social fluidity’) over time among women.

However, a key finding from the article concerns the degree of inequality that underpins the relative rates of mobility. For example, it describes the odds of someone born into a professional and managerial class family ending up themselves in that class, compared with the far lower chances of someone born into a working class family.

Not mobility, but inequality

Children born into working class families are significantly less likely to move up into professional or managerial jobs than middle class children are to move down. The scale of these inequalities has stayed more or less the same over time for men, and have decreased – but remained large – for women.

A changing society

It is important to consider how changes in UK society may have affected these findings. For instance, the middle classes grew substantially in the middle of the century as more professional and managerial jobs became available. On the other hand, the working classes have steadily shrunk over time. This meant that more people start life higher up on the social ladder (with less room to climb), and there are fewer people at the bottom to make big leaps in social status.

These changes mean that younger generations of men and women face less favourable prospects when it comes to mobility than those faced by their parents or grandparents.

Advantages and challenges of using longitudinal data to study social mobility

To determine whether social mobility is increasing or declining, researchers need information on people’s employment over the course of their whole lives and across multiple generations. Longitudinal studies are ideal sources of data as they follow the same group of people over time, and together cover several generations of Britons.

However, using historical data can also present some challenges. For example, class structures in the UK have changed over time, so to compare the generation born in 1946 to those born in 1980-84, researchers often need to reclassify occupations against a common set of standards. They also need to conduct tests to determine whether this reclassification affects the findings in an unintended way.

Finally, several longitudinal studies collect employment status only at the time of the survey visits, which for many studies means only every few years. It is possible that a participant could be going through a brief spell of unemployment at the time of the visit, but is fully employed normally. This can give a misleading picture of the person’s social class.

The authors of this research manage to overcome this last challenge with the 1958 and 1970 cohorts, as these studies ask participants for their complete work history at the time of each visit. So where someone is out of work at the time of the interview, the researchers can use their last known job to determine their social class.

Implications for policy and practice

Over the past two decades, successive governments have made higher levels of social mobility a focus of public policy.  Longitudinal research is often used as part of the evidence base because of the ability to look within and across cohorts to get a snapshot of inequality and understand how it has changed over time.

The Social Mobility Commission (SMC) is an advisory non-departmental public body with a duty to promote social mobility in England.  The Commission provides an independent scrutiny and advocacy role on social mobility and is required to publish an annual report assessing progress on improving social mobility in the UK – this is the annual “State of Nation” Report.  The Commission also carries out and publishes other research on social mobility throughout the year.  A list of recent publications is provided on its website.  The Commission published Time For Change: An Assessment of Government Policies on Social Mobility 1997-2017 in June 2017.  This says that government policies to improve social mobility have failed to deliver enough progress and warns that without major reform social and economic divisions within Britain are set to widen.

There are also many charities and think tanks in the UK that advocate for greater social mobility, and work with young people from disadvantaged backgrounds to improve their life chances.  These include The Sutton Trust, a foundation which improves social mobility in the UK through evidence-based programmes, research and policy advocacy, and The Bridge Group, a charitable policy association researching and promoting socio-economic diversity and equality.

Access the paper

Bukodi, E., Goldthorpe, J.H., Waller, L. and Kuha, J. (2015) The mobility problem in Britain: new findings from the analysis of birth cohort data. The British Journal of Sociology 66(1), pp. 93-117.

Discussion topics

  • Researchers often criticise each other and government for equating social mobility with upward mobility. Is the terminology really a problem, or is it all just semantics?
  • Should the UK be placing so much emphasis on the supposed ‘social mobility problem’? These findings suggest absolute mobility has not changed, and the increase in downward mobility is explained by the growth in the middle classes and the shrinking of the working classes. So what’s the ‘problem’?
  • Sociologists and economists are embroiled in a heated debate on the merits of using class vs income to understand social mobility. Divide into two groups and each take a side – which measure helps us get closer to a true understanding of social mobility in the UK and why? How could these methodological differences affect political response?