Administrative data

Administrative data is the term used to describe everyday data about individuals collected by government departments and agencies. Examples include exam results, benefit receipt and National Insurance payments.


Attrition is the discontinued participation of study participants in a longitudinal study. Attrition can reflect a range of factors, from the study participant not being traceable to them choosing not to take part when contacted. Attrition is problematic both because it can lead to bias in the study findings (if the attrition is higher among some groups than others) and because it reduces the size of the sample.

Body mass index

Body mass index is a measure used to assess if an individual is a healthy weight for their height. It is calculated by dividing the individual’s weight by the square of their height, and it is typically represented in units of kg/m2.

Cohort studies

Cohort studies are concerned with charting the lives of groups of individuals who experience the same life events within a given time period. The best known examples are birth cohort studies, which follow a group of people born in a particular period.

Complete case analysis

Complete case analysis is the term used to describe a statistical analysis that only includes participants for which we have no missing data on the variables of interest. Participants with any missing data are excluded.


Conditioning refers to the process whereby participants’ answers to some questions may be influenced by their participation in the study – in other words, their responses are ‘conditioned’ by their being members of a longitudinal study. Examples would include study respondents answering questions differently or even behaving differently as a result of their participation in the study.


Confounding occurs where the relationship between independent and dependent variables is distorted by one or more additional, and sometimes unmeasured, variables. A confounding variable must be associated with both the independent and dependent variables but must not be an intermediate step in the relationship between the two (i.e. not on the causal pathway).

For example, we know that physical exercise (an independent variable) can reduce a person’s risk of cardiovascular disease (a dependent variable). We can say that age is a confounder of that relationship as it is associated with, but not caused by, physical activity and is also associated with coronary health. See also ‘unobserved heterogeneity’, below.


Cross-sectional surveys involve interviewing a fresh sample of people each time they are carried out. Some cross-sectional studies are repeated regularly and can include a large number of repeat questions (questions asked on each survey round).

Data harmonisation

Data harmonisation involves retrospectively adjusting data collected by different surveys to make it possible to compare the data that was collected. This enables researchers to make comparisons both within and across studies. Repeating the same longitudinal analysis across a number of studies allows researchers to test whether results are consistent across studies, or differ in response to changing social conditions.

Data imputation

Data imputation is a technique for replacing missing data with an alternative estimate. There are a number of different approaches, including mean substitution and model-based multivariate approaches.

Data linkage

Data linkage simply means connecting two or more sources of administrative, educational, geographic, health or survey data relating to the same individual for research and statistical purposes. For example, linking housing or income data to exam results data could be used to investigate the impact of socioeconomic factors on educational outcomes.

Dummy variables

Dummy variables, also called indicator variables, are sets of dichotomous (two-category) variables we create to enable subgroup comparisons when we are analysing a categorical variable with three or more categories.

General ability

General ability is a term used to describe cognitive ability, and is sometimes used as a proxy for intelligent quotient (IQ) scores.


Heterogeneity is a term that refers to differences, most commonly differences in characteristics between study participants or samples. It is the opposite of homogeneity, which is the term used when participants share the same characteristics. Where there are differences between study designs, this is sometimes referred to as methodological heterogeneity. Both participant or methodological differences can cause divergences between the findings of individual studies and if these are greater than chance alone, we call this statistical heterogeneity. See also: unobserved heterogeneity.

Household panel surveys

Household panel surveys collect information about the whole household at each wave of data collection, to allow individuals to be viewed in the context of their overall household. To remain representative of the population of households as a whole, studies will typically have rules governing how new entrants to the household are added to the study.


Kurtosis is sometimes described as a measure of ‘tailedness’. It is a characteristic of the distribution of observations on a variable and denotes the heaviness of the distribution’s tails. To put it another way, it is a measure of how thin or fat the lower and upper ends of a distribution are.

Longitudinal studies

Longitudinal studies gather data about the same individuals (‘study participants’) repeatedly over a period of time, in some cases from birth until old age. Many longitudinal studies focus upon individuals, but some look at whole households or organisations.

Non-response bias

Non-response bias is a type of bias introduced when those who participate in a study differ to those who do not in a way that is not random (for example, if attrition rates are particularly high among certain sub-groups). Non-random attrition over time can mean that the sample no longer remains representative of the original population being studied. Two approaches are typically adopted to deal with this type of missing data: weighting survey responses to re-balance the sample, and imputing values for the missing information.

Observational studies

Observational studies focus on observing the characteristics of a particular sample without attempting to influence any aspects of the participants’ lives. They can be contrasted with experimental studies, which apply a specific ‘treatment’ to some participants in order to understand its effect.

Panel studies

Panel studies follow the same individuals over time. They vary considerably in scope and scale. Examples include online opinion panels and short-term studies whereby people are followed up once or twice after an initial interview.


A percentile is a measure that allows us to explore the distribution of data on a variable. It denotes the percentage of individuals or observations that fall below a specified value on a variable. The value that splits the number of observations evenly, i.e. 50% of the observations on a variable fall below this value and 50% above, is called the 50th percentile or more commonly, the median.

Prospective study

In prospective studies, individuals are followed over time and data about them is collected as their characteristics or circumstances change.

Recall error or bias

Recall error or bias describes the errors that can occur when study participants are asked to recall events or experiences from the past. It can take a number of forms – participants might completely forget something happened, or misremember aspects of it, such as when it happened, how long it lasted, or other details. Certain questions are more susceptible to recall bias than others. For example, it is usually easy for a person to accurately recall the date they got married, but it is much harder to accurately recall how much they earned in a particular job, or how their mood at a particular time.

Record linkage

Record linkage studies involve linking together administrative records (for example, benefit receipts or census records) for the same individuals over time.

Reference group

A reference group is a category on a categorical variable to which we compare other values. It is a term that is commonly used in the context of regression analyses in which categorical variables are being modelled.


Residuals are the difference between your observed values (the constant and predictors in the model) and expected values (the error), i.e. the distance of the actual value from the estimated value on the regression line.

Respondent burden

Respondent burden is a catch all phrase that describes the perceived burden faced by participants as a result of their being involved in a study. It could include time spent taking part in the interview and inconvenience this may cause, as well as any difficulties faced as a result of the content of the interview.

Retrospective study

In retrospective studies, individuals are sampled and information is collected about their past. This might be through interviews in which participants are asked to recall important events, or by identifying relevant administrative data to fill in information on past events and circumstances.


Sample is a subset of a population that is used to represent the population as a whole. This reflects the fact that it is often not practical or necessary to survey every member of a particular population. In the case of birth cohort studies, the larger ‘population’ from which the sample is drawn comprises those born in a particular period. In the case of a household panel study like Understanding Society, the larger population from which the sample was drawn comprised all residential addresses in the UK.

Sampling frame

A sampling frame is a list of the target population from which potential study participants can be selected.


Skewness is the measure of how assymetrical the distribution of observations are on a variable. If the distribution has a more pronounced/longer tail at the upper end of the distribution (right-hand side), we say that the distribution is negatively skewed. If it is more pronounced/longer at the lower end (left-hand side), we say that it is positively skewed.

Study participants

Study participants are the individuals who are interviewed as part of a longitudinal study.

Survey weights

Survey weights can be used to adjust a survey sample so it is representative of the survey population as a whole. They may be used to reduce the impact of attrition on the sample, or to correct for certain groups being over-sampled.


The term used to refer to a round of data collection in a particular longitudinal study (for example, the age 7 sweep of the National Child Development Study refers to the data collection that took place in 1965 when the participants were aged 7). Note that the term wave often has the same meaning.

Target population

The population of people that the study team wants to research, and from which a sample will be drawn.

Tracing (or tracking)

Tracing (or tracking) describes the process by which study teams attempt to locate participants who have moved from the address at which they were last interviewed.

Unobserved heterogeneity

Unobserved heterogeneity is a term that describes the existence of unmeasured (unobserved) differences between study participants or samples that are associated with the (observed) variables of interest. The existence of unobserved variables means that statistical findings based on the observed data may be incorrect.


Variables is the term that tends to be used to describe data items within a dataset. So, for example, a questionnaire might collect information about a participant’s job (its title, whether it involves any supervision, the type of organisation they work for and so on). This information would then be coded using a code-frame and the results made available in the dataset in the form of a variable about occupation. In data analysis variables can be described as ‘dependent’ and ‘independent’, with the dependent variable being a particular outcome of interest (for example, high attainment at school) and the independent variables being the variables that might have a bearing on this outcome (for example, parental education, gender and so on).


The term used to refer to a round of data collection in a particular longitudinal study (for example, the age 7 wave of the National Child Development Study refers to the data collection that took place in 1965 when the participants were aged 7). Note that the term sweep often has the same meaning.

Learning Hub

Mental health and wellbeing

Why use longitudinal data to study mental health and wellbeing?

There is increasing concern about mental health issues among a range of different groups, as well as growing recognition of the fact that mental health is as important as physical health. These issues matter across society – including the health service, social care, workplaces, and of course families and individuals. In addition, better tools for diagnosis mean increasing numbers of people are being diagnosed with mental health disorders.

Evidence suggests that most adult mental health disorders start in childhood: Kim-Cohen et al (2003) found that 50% of all adult diagnoses were detectable before the age of 15, and 75% before 18. Therefore, by following people throughout their lives, longitudinal studies are uniquely placed to help us to better understand what factors during a person’s life might be contributing to mental health disorders and low levels of wellbeing later in life. They can also shed light on the impact this is having on other areas of their lives, and what policy interventions might help to reduce the impact on society and individuals.

Mental health vs. wellbeing

This page considers how longitudinal studies can help us to understand both mental health and wellbeing.

Mental health disorders can be medically diagnosed using a limited set of indicators. This contrasts with wellbeing, which is a less specific term. This reflects the fact that there can be much more in someone’s life that can contribute to that person’s wellbeing. For example, a range of factors including physical and mental health, education, work (or lack of), housing and social activities and may all contribute to a person’s low – or high – level of wellbeing.
Patalay and Fitzsimons (2016) considered the different factors contributing to mental health and wellbeing in children and concluded that they were largely distinct, although there were some factors that were indicators of both. This is illustrated in the graphic below.

The factors associated with mental illness and wellbeing

Selected longitudinal evidence on mental health and wellbeing

One in four girls is depressed at age 14

A quarter of girls and one in 10 boys are depressed at age 14, according to research at the UCL Institute of Education and the University of Liverpool, which analysed information on more than 10,000 children taking part in the Millennium Cohort Study. Read more.

Mental wellbeing of Generation X directly linked to childhood background

By comparing data on more than 18,000 children from three national birth cohort studies, researchers found that childhood disadvantage is strongly associated with poorer adult mental wellbeing for Generation X. In contrast, Baby Boomers’ childhood background was not linked to their wellbeing in adulthood. A CLOSER-funded team from the MRC Unit for Lifelong Health and Ageing at UCL used data harmonisation to compare the 1970 British Cohort Study, 1958 National Child Development Study and 1946 National Survey of Health and Development. Read more.

A mother's personality can affect their child's mental health

New research using data from 8,000 parents and children taking part in the Avon Longitudinal Study of Parents and Children study found that the children of women with personality traits associated with emotional and relationship difficulties were at greater risk of depression, anxiety and self-harm in their late teens than their peers. Read more.

Unaffordable housing in the UK affects the mental health of homeowners more than renters

Using data from the British Household Panel Survey (the predecessor to Understanding Society) researchers found that the mental health and wellbeing of home purchasers experienced higher levels of stress and anxiety than private renters when housing became unaffordable. Read more.

Childhood bullying is linked to use of mental health services in later life

People bullied frequently or even occasionally as children used more mental health services 39 years later than those who were not bullied, according to findings from the 1958 National Child Development Study. Read more.

Lower levels of cognitive development in childhood are associated with psychotic experiences and affective symptoms later in life

A research team from the MRC Unit for Lifelong Health and Ageing at UCL examined whether cognition in childhood and adolescence was associated with psychiatric disease in later life. They found that people aged 53 who reported psychotic experiences such as hallucinations, or affective symptoms such as insomnia or anxiety, had lower verbal and non-verbal cognition at both the ages of 8 and 15.They used a sample of 2,384 people from the 1946 MRC National Survey of Health and Development. Read more.

Being part of the community is good for wellbeing

People with a strong sense of neighbourhood belonging have better mental wellbeing, according to a research team that compared the experiences of adults in three longitudinal birth cohorts: the 1946 MRC National Survey of Health and Development, 1958 National Child Development Study and Hertfordshire Cohort Study. It analysed data from more than 10,000 men and women aged 50–76. It also found that the link between neighbourhood belonging and wellbeing was stronger for adults in the 1946 and Hertfordshire cohorts, who had average ages of 64 and 73 years respectively, compared to younger adults in the 1958 cohort study, with an average age of 51. Read more.

Men show a greater drop in life satisfaction when they become unemployed

Research by What Works Wellbeing used Understanding Society data to examine gender differences in unemployment and wellbeing, beyond loss of income. It found that, on average, women’s life satisfaction is affected less by becoming unemployed compared to men. But this average gap concealed a range of different experiences: not all women suffer less than men when they lose their job. Read more.

What information do longitudinal studies collect on mental health and wellbeing?

Assessments of mental health and wellbeing typically involve asking study participants a set of questions that have been developed and thoroughly tested.

The Malaise Inventory was used in some of the older longitudinal studies to measure psychological distress. Researchers now use this data as a proxy for measuring mental health in the absence of any other more specific measure taken at the time.

However, other assessments have been developed more recently and are used in newer studies. The Warwick-Edinburgh Mental Wellbeing Scale is often used to measure wellbeing. For example, it was used by the research team considering the links between childhood disadvantage and the poorer wellbeing of Generation X.

Read more about the Malaise Inventory and the Warwick-Edinburgh Mental Wellbeing Scale in the Study Design module.

The Millennium Cohort Study used the short Mood and Feelings Questionnaire to ask participants at age 11 how happy they were with each of six different elements of their lives (school, family, friends, school work, appearance; and life as a whole). The questionnaire consists of a series of descriptive phrases asking how the respondent has been feeling or acting recently and is a screening tool for depression in children and young people aged 6−17 years. The participants were asked to rate their level of happiness with each of these on a 7-point scale ranging from ‘not at all happy’ to ‘very happy’.

Their parents were asked to complete a different assessment, the Strengths and Difficulties Questionnaire (SDQ) . This is a set of questions designed to assess emotional and behavioural difficulties among children and young people. The questions can be asked of parents or teachers about their child, or directly of the child. The responses from the parents and children were used in the research that concluded one in four girls is depressed at the age of 14.

Read more about the SDQ on the Youthinmind website.

Understanding Society uses a self-completion questionnaire to ask participants about their satisfaction in a number of aspects of their life – including their job, school and leisure time. There are two questionnaires: one for adults and one for young people aged 10-15 years.

Young person's questionnaire

How do longitudinal studies collect information on mental health and wellbeing?

Most information about mental health and wellbeing is collected through questionnaires. Study participants might complete the questionnaires themselves, using a computer or pen and paper, or they might be asked the questions by an interviewer.

Studies need to take into account ethical issues when collecting this data. The questions must be phrased in a way that minimises impact on the participant of asking what could be very sensitive questions. For example, a medical professional asking similar questions is usually doing so face to face and will be able to act depending on the answers; a data collection team are not able to intervene in the same way.

Studies have confidentiality contracts with participants to help build trust and ensure that they answer the survey questions truthfully. Often people are more confident sharing their answers with the study team because they are one of many thousands of respondents rather than a named individual at their own medical consultation.

Study teams always consider how sensitive the questions are, and whether participants would be more or less open to discussing their experiences with an interviewer than they would be if they answered the questions on a self-completion questionnaire.

It is important to take into account who the respondent is as this may have implications for how the data is interpreted. While older studies asked participants’ parents whether their children were bullied, many newer studies, such as the Millennium Cohort Study, ask the children directly. And of course, some studies ask both parents and children – and sometimes teachers. You might be interested in looking at how children, parent and teachers’ reports differ.

If they are able to, studies link to medical records to complement the information collected from survey participants. For example, the Avon Longitudinal Study of Parents and Children study asked participants aged 16 questions about self-harm. It also looked at hospital self-harm records for those participants who had agreed to have their medical data linked to their study responses. The researchers could then compare what someone says about self-harm with medically recorded incidents of self-harm.

Find out more about how longitudinal studies are designed, including sampling and the value of different methods and modes of collecting information, in the Study design module.

Advantages of using longitudinal data

There are a number of strengths of longitudinal studies that make them an ideal resource for studying mental health and wellbeing.

Tracking long-term consequences: Cross-sectional studies can tell us how many people are experiencing mental health disorders or low or high levels of wellbeing at a given point in time. They may also be able to differentiate between different groups, depending on how much other information they collect.

However, they are less effective at telling us what factors may have influenced an individual’s mental health − or what impact mental health disorders or lower levels of wellbeing experienced as a child has on the rest of that person’s life. Because longitudinal studies follow people over the course of their lives they can assess the early determinants of mental health disorders and wellbeing, and get closest to identifying these associations. Read more about the differences between longitudinal and cross-sectional studies in the Introduction to longitudinal studies module.

The long-term nature of longitudinal studies also enables them to assess changes and stability in mental health disorders and wellbeing, for example highlighting points of relapse and remission.

Prospective data collection: Individuals may find it hard or not want to accurately recall the state of their past mental health and wellbeing: mental health disorders or lower levels of wellbeing may be something people want to forget or put behind them. By asking people about their mental health at regular intervals during their lives longitudinal studies can capture how they are feeling at that point in time. Read more about prospective study vs retrospective study design in the Study design module.

Breadth of data available: Longitudinal studies have the added advantage of covering a wide range of different areas of life. People with mental health disorders or lower levels of wellbeing can often suffer from other problems, and it can be difficult to unpick the interplay between these factors and their mental health and wellbeing.

Although there may be other areas of people’s lives that have not been captured, longitudinal data cover significantly more areas than other data sources. This makes it possible for researchers to consider how different factors impact on an individual’s mental health and wellbeing.

Large sample sizes: Birth cohorts and household panels have the advantage of large enough sample sizes to identify particular groups that are at higher risk of low mental health or wellbeing. However, other studies are smaller more specific samples (such as prisoners, children in care) that can provide equally valuable long-term evidence. Read more about longitudinal samples in the Study design module.

Find out more about the strengths of longitudinal data in the Introduction to longitudinal studies module.

Challenges of using longitudinal data

Researchers can face challenges using longitudinal data to study people’s mental health and wellbeing.

There are fewer measures of mental health in older studies, which didn’t ask about mental health disorders and wellbeing in as much detail as more recent ones. However, more recent research suggests it is important to understand how mental health and wellbeing develop as a person gets older – leading to studies such as the Millennium Cohort Study asking about mental health in the very early years.

Mental health disorders or low levels of wellbeing can be hard to admit: Like any survey asking people about their mental health disorders and wellbeing, longitudinal studies will struggle because some people don’t want to report their experiences, or may downplay their severity.

Attrition and missing data: Some study participants drop out over time, and this isn’t always random. This is known as attrition. It is also the case that some participants may choose not to answer every question at every sweep – which can lead to something called missing data. People with mental health disorders may also be more likely to drop out or choose not to answer. This leads to attrition bias.

There are analytical methods that researchers can use to deal with attrition bias and missing data. The teams running particular longitudinal studies can provide useful guidance about how best to deal with missing data from their study.

Timeliness: Determining the long-term effects of mental health disorders experienced by children requires us to wait until study participants have grown up. For example, we can see the longer-term effects of low levels of wellbeing during childhood for study participants born in the 1950s, but not (yet) for those born in the 2000s. When using data from older studies, it is important for researchers to consider how to relate their findings to generations growing up today.

Find out more about the challenges of longitudinal data in the Introduction to longitudinal studies module.

CLOSER studies to consider

Avon Longitudinal Study of Parents and Children

The Avon Longitudinal Study of Parents and Children study asked respondents (mothers and children) whether they have had any psychiatric problems, depression, and eating disorders such as anorexia and bulimia, as well as indicators of low levels of wellbeing. It also asked about children’s feeling about school, including whether they were happy, frightened or being bullied.

Southampton Women’s Survey

This survey asked pregnant women, about their mental wellbeing, during their pregnancy, such as feeling sad, tired, less self-confident and sleep patterns.

1958 National Child Development Study

This national birth cohort study asked mothers for their opinion on their child’s personality and temperament. The questions covered elements including whether they thought their child was being bullied, had difficulty concentrating or were worried, miserable, irritable, or preferred to do things alone.

Millennium Cohort Study

This regional birth cohort study has been following the lives of around 19,000 children born in the UK in 2000-01. Unlike the older studies it has collected information on the mental health and wellbeing of its participants from the start. This has provided valuable data for research into the mental health of the participants who are now teenagers.

Understanding Society (The UK Household Longitudinal Study)

Understanding Society is a panel survey of households with yearly interviews. Adult household members (age 16 or older) are interviewed and the same individuals are re-interviewed in successive years to see how things have changed. Children aged 10-15 years are asked to complete a short self-completion youth questionnaire. Adult study participants are asked questions about their general physical and mental health, life satisfaction, physical activity, sleep quality, smoking and alcohol consumption. Participants are also asked about their social networks, family relationships, friendships, and community interaction. Children are asked about exercise, nutrition, risky behaviours, social networks and their caring responsibilities.