Administrative data

Administrative data is the term used to describe everyday data about individuals collected by government departments and agencies. Examples include exam results, benefit receipt and National Insurance payments.


Attrition is the discontinued participation of study participants in a longitudinal study. Attrition can reflect a range of factors, from the study participant not being traceable to them choosing not to take part when contacted. Attrition is problematic both because it can lead to bias in the study findings (if the attrition is higher among some groups than others) and because it reduces the size of the sample.

Body mass index

Body mass index is a measure used to assess if an individual is a healthy weight for their height. It is calculated by dividing the individual’s weight by the square of their height, and it is typically represented in units of kg/m2.

Cohort studies

Cohort studies are concerned with charting the lives of groups of individuals who experience the same life events within a given time period. The best known examples are birth cohort studies, which follow a group of people born in a particular period.

Complete case analysis

Complete case analysis is the term used to describe a statistical analysis that only includes participants for which we have no missing data on the variables of interest. Participants with any missing data are excluded.


Conditioning refers to the process whereby participants’ answers to some questions may be influenced by their participation in the study – in other words, their responses are ‘conditioned’ by their being members of a longitudinal study. Examples would include study respondents answering questions differently or even behaving differently as a result of their participation in the study.


Confounding occurs where the relationship between independent and dependent variables is distorted by one or more additional, and sometimes unmeasured, variables. A confounding variable must be associated with both the independent and dependent variables but must not be an intermediate step in the relationship between the two (i.e. not on the causal pathway).

For example, we know that physical exercise (an independent variable) can reduce a person’s risk of cardiovascular disease (a dependent variable). We can say that age is a confounder of that relationship as it is associated with, but not caused by, physical activity and is also associated with coronary health. See also ‘unobserved heterogeneity’, below.


Cross-sectional surveys involve interviewing a fresh sample of people each time they are carried out. Some cross-sectional studies are repeated regularly and can include a large number of repeat questions (questions asked on each survey round).

Data harmonisation

Data harmonisation involves retrospectively adjusting data collected by different surveys to make it possible to compare the data that was collected. This enables researchers to make comparisons both within and across studies. Repeating the same longitudinal analysis across a number of studies allows researchers to test whether results are consistent across studies, or differ in response to changing social conditions.

Data imputation

Data imputation is a technique for replacing missing data with an alternative estimate. There are a number of different approaches, including mean substitution and model-based multivariate approaches.

Data linkage

Data linkage simply means connecting two or more sources of administrative, educational, geographic, health or survey data relating to the same individual for research and statistical purposes. For example, linking housing or income data to exam results data could be used to investigate the impact of socioeconomic factors on educational outcomes.

Dummy variables

Dummy variables, also called indicator variables, are sets of dichotomous (two-category) variables we create to enable subgroup comparisons when we are analysing a categorical variable with three or more categories.

General ability

General ability is a term used to describe cognitive ability, and is sometimes used as a proxy for intelligent quotient (IQ) scores.


Heterogeneity is a term that refers to differences, most commonly differences in characteristics between study participants or samples. It is the opposite of homogeneity, which is the term used when participants share the same characteristics. Where there are differences between study designs, this is sometimes referred to as methodological heterogeneity. Both participant or methodological differences can cause divergences between the findings of individual studies and if these are greater than chance alone, we call this statistical heterogeneity. See also: unobserved heterogeneity.

Household panel surveys

Household panel surveys collect information about the whole household at each wave of data collection, to allow individuals to be viewed in the context of their overall household. To remain representative of the population of households as a whole, studies will typically have rules governing how new entrants to the household are added to the study.


Kurtosis is sometimes described as a measure of ‘tailedness’. It is a characteristic of the distribution of observations on a variable and denotes the heaviness of the distribution’s tails. To put it another way, it is a measure of how thin or fat the lower and upper ends of a distribution are.

Longitudinal studies

Longitudinal studies gather data about the same individuals (‘study participants’) repeatedly over a period of time, in some cases from birth until old age. Many longitudinal studies focus upon individuals, but some look at whole households or organisations.

Non-response bias

Non-response bias is a type of bias introduced when those who participate in a study differ to those who do not in a way that is not random (for example, if attrition rates are particularly high among certain sub-groups). Non-random attrition over time can mean that the sample no longer remains representative of the original population being studied. Two approaches are typically adopted to deal with this type of missing data: weighting survey responses to re-balance the sample, and imputing values for the missing information.

Observational studies

Observational studies focus on observing the characteristics of a particular sample without attempting to influence any aspects of the participants’ lives. They can be contrasted with experimental studies, which apply a specific ‘treatment’ to some participants in order to understand its effect.

Panel studies

Panel studies follow the same individuals over time. They vary considerably in scope and scale. Examples include online opinion panels and short-term studies whereby people are followed up once or twice after an initial interview.


A percentile is a measure that allows us to explore the distribution of data on a variable. It denotes the percentage of individuals or observations that fall below a specified value on a variable. The value that splits the number of observations evenly, i.e. 50% of the observations on a variable fall below this value and 50% above, is called the 50th percentile or more commonly, the median.

Prospective study

In prospective studies, individuals are followed over time and data about them is collected as their characteristics or circumstances change.

Recall error or bias

Recall error or bias describes the errors that can occur when study participants are asked to recall events or experiences from the past. It can take a number of forms – participants might completely forget something happened, or misremember aspects of it, such as when it happened, how long it lasted, or other details. Certain questions are more susceptible to recall bias than others. For example, it is usually easy for a person to accurately recall the date they got married, but it is much harder to accurately recall how much they earned in a particular job, or how their mood at a particular time.

Record linkage

Record linkage studies involve linking together administrative records (for example, benefit receipts or census records) for the same individuals over time.

Reference group

A reference group is a category on a categorical variable to which we compare other values. It is a term that is commonly used in the context of regression analyses in which categorical variables are being modelled.


Residuals are the difference between your observed values (the constant and predictors in the model) and expected values (the error), i.e. the distance of the actual value from the estimated value on the regression line.

Respondent burden

Respondent burden is a catch all phrase that describes the perceived burden faced by participants as a result of their being involved in a study. It could include time spent taking part in the interview and inconvenience this may cause, as well as any difficulties faced as a result of the content of the interview.

Retrospective study

In retrospective studies, individuals are sampled and information is collected about their past. This might be through interviews in which participants are asked to recall important events, or by identifying relevant administrative data to fill in information on past events and circumstances.


Sample is a subset of a population that is used to represent the population as a whole. This reflects the fact that it is often not practical or necessary to survey every member of a particular population. In the case of birth cohort studies, the larger ‘population’ from which the sample is drawn comprises those born in a particular period. In the case of a household panel study like Understanding Society, the larger population from which the sample was drawn comprised all residential addresses in the UK.

Sampling frame

A sampling frame is a list of the target population from which potential study participants can be selected.


Skewness is the measure of how assymetrical the distribution of observations are on a variable. If the distribution has a more pronounced/longer tail at the upper end of the distribution (right-hand side), we say that the distribution is negatively skewed. If it is more pronounced/longer at the lower end (left-hand side), we say that it is positively skewed.

Study participants

Study participants are the individuals who are interviewed as part of a longitudinal study.

Survey weights

Survey weights can be used to adjust a survey sample so it is representative of the survey population as a whole. They may be used to reduce the impact of attrition on the sample, or to correct for certain groups being over-sampled.


The term used to refer to a round of data collection in a particular longitudinal study (for example, the age 7 sweep of the National Child Development Study refers to the data collection that took place in 1965 when the participants were aged 7). Note that the term wave often has the same meaning.

Target population

The population of people that the study team wants to research, and from which a sample will be drawn.

Tracing (or tracking)

Tracing (or tracking) describes the process by which study teams attempt to locate participants who have moved from the address at which they were last interviewed.

Unobserved heterogeneity

Unobserved heterogeneity is a term that describes the existence of unmeasured (unobserved) differences between study participants or samples that are associated with the (observed) variables of interest. The existence of unobserved variables means that statistical findings based on the observed data may be incorrect.


Variables is the term that tends to be used to describe data items within a dataset. So, for example, a questionnaire might collect information about a participant’s job (its title, whether it involves any supervision, the type of organisation they work for and so on). This information would then be coded using a code-frame and the results made available in the dataset in the form of a variable about occupation. In data analysis variables can be described as ‘dependent’ and ‘independent’, with the dependent variable being a particular outcome of interest (for example, high attainment at school) and the independent variables being the variables that might have a bearing on this outcome (for example, parental education, gender and so on).


The term used to refer to a round of data collection in a particular longitudinal study (for example, the age 7 wave of the National Child Development Study refers to the data collection that took place in 1965 when the participants were aged 7). Note that the term sweep often has the same meaning.

Learning Hub

What are the effects of social media use on adolescent well-being?

Teenage Bullying

Is social media use harming young people's mental health? Researchers have used longitudinal data to track how increases in social media use can affect adolescent well-being.

Key finding

High levels of social media use in early adolescence were shown to have implications for well-being in later adolescence, particularly for girls.

About the research

Researchers from the University of Essex’s Institute for Social and Economic Research, in collaboration with UCL’s Department of Epidemiology and Public Health, analysed five waves of data from the UK Household Longitudinal Study (Understanding Society) to assess how frequency of social media use among 10 to 15 year olds affected their mental well-being.

The research wanted to see whether there was a link between changes in social media interaction over time and adolescents’ happiness and well-being. In particular, the aim was to see whether there were any differences between boys and girls (controlling for ethnicity, parental education and parents’ marital status).

Prior research has shown that screen-based media interaction increases as young people get older, whilst well-being levels decrease throughout adolescence and these changes differ by gender. However, whilst previous studies have controlled for age and gender, they have not focused on how associations with well-being change over time or differ between girls and boys.

Research questions

  • Is there a relationship between social media interaction and trajectories of well-being over time among adolescents in the UK?
  • Does the association between social media interaction and well-being trajectories differ by gender?

Studies used

Understanding Society: The UK Household Longitudinal Study

Following 40,000 UK households from across the UK. In this study, researchers included data from five waves of the youth questionnaire, comprising those aged 10 to 15 years.

Data and definitions

Social media interaction

Each participant’s engagement with social media was established through two questions asking: (i) “Do you belong to a social website such as Bebo, Facebook or MySpace?” and (ii) “How many hours do you spend chatting or interacting with friends through a social website like that on a normal school day?”


Two measures of well-being were derived from items in the survey questionnaire. Happiness scores were derived from respondents’ total score on six questions relating to different domains of their life (friends, family, appearance, school, schoolwork and life as a whole). Average happiness score was 35.03 (out of 42) for girls and 35.27 for boys.

Negative well-being was measured using the 20 items of the Strengths and Difficulties Questionnaire (SDQ) that cover hyperactivity/inattention, emotional symptoms, conduct problems and peer relationship problems. Responses for these items were summed to obtain a SDQ total difficulties score. For girls, the average SDQ total difficulties score was 10.61 (out of 40); for boys it was 10.65.

Key findings

The findings of the analysis confirm that social media usage increases with age for both boys and girls, with girls’ usage exceeding that of boys throughout adolescence.

Overall, 23% of girls and 28% of boys did not have a social media profile, whilst 10% of girls and 2% of boys spent “4 or more hours per day” interacting via social media.

The analysis indicates that increased social media interaction was correlated with lower levels of happiness and higher levels of socio-emotional difficulties at age 10 for girls, whilst for boys, there was a correlation between increased social media interaction and higher levels of socio-emotional difficulties (although not with happiness) at age 10.

The models also indicated that adolescents with high levels of social media use at age 10 have slower rates of change in usage as they move through adolescence, relative to those with lower levels of social media use. Those reporting greater well-being at age 10 will, similarly, experience smaller changes in happiness and socio-economic difficulties as they get older, compared to those reporting lower levels of well-being.

For girls, increased interaction with social media was, furthermore, associated with greater increases in socio-emotional difficulties with age, whilst no such association was seen for boys.

Results from this study show that, while socio-emotional difficulties decreased with age for boys, they increased for girls. Worse well-being was associated with greater social media interaction at age 10 and the changes over time were also associated for girls. Of most importance, greater interaction on social media at age 10 was associated with worsening socio-emotional difficulties with age among girls.

This is one of the first studies to show such apparent differences between social media interaction and well-being between boys and girls.

Advantages and challenges of using longitudinal data to study adolescent well-being and social media use

The paper uses a nationally-representative sample of young people and the longitudinal nature of the study allows for the statistical modelling of changes in patterns of behaviour and well-being over time. In particular, it can reveal specific changes during the important developmental life stage of adolescence and during a time of increasing social media usage within the population as a whole and, particularly, amongst young people.

Nonetheless, there are considerable methodological and logistical challenges associated with this type of research. Primary amongst these is the use of panel data, which does not allow for the modelling of individual change over time but relies instead on change by age averaged across individuals. The variables used to measure social media interactions focus only on active ‘chatting with friends’ and, therefore, does not include other forms of active or passive social media use. Moreover, they refer to patterns of use ‘on a normal school day’ and so do not take into account young people’s social media use at weekends or during holidays, which is likely to be higher.

Implications for policy and practice

Adolescents are increasingly engaged in social media and the long-term effects on well-being are not fully known. The findings of this study indicate that there is a significant association between social media use and mental well-being for adolescent girls, whilst for boys other factors are likely to be more influential in contributing to the reduction in well-being during adolescence. In particular, boys may interact more through gaming platforms, which, although acknowledged in the paper, is not captured by the analysis presented here.

It must be borne in mind, however, that the data refers to social media usage from 2009 to 2015, which is likely to have changed in the years since, given the rapid advances in technology and young people’s shifting interaction with changing platforms.

Nonetheless, this study contributes to the evidence on the impact of social media and screen-use on young people’s health. Whilst this is a key priority for policy making in public health, debates are on-going as to the appropriate direction for regulation or intervention in this area. Longitudinal evidence provides an important opportunity to better understand potential causal links between social media use and mental well-being in adolescence.


Access the paper

Booker, C.L., Kelly, Y.J. & Sacker, A. Gender differences in the associations between age trends of social media interaction and well-being among 10-15 year olds in the UKBMC Public Health 18, 321 (2018).

Discussion topics

  • Why might it be that social media use has a stronger association with well-being for adolescent girls compared with boys?
  • What are the benefits of increase social media use amongst adolescents? Do these outweigh the potential impact on mental well-being?
  • What other forms of social interaction might influence adolescents’ mental well-being?