Administrative data

Administrative data is the term used to describe everyday data about individuals collected by government departments and agencies. Examples include exam results, benefit receipt and National Insurance payments.


Attrition is the discontinued participation of study participants in a longitudinal study. Attrition can reflect a range of factors, from the study participant not being traceable to them choosing not to take part when contacted. Attrition is problematic both because it can lead to bias in the study findings (if the attrition is higher among some groups than others) and because it reduces the size of the sample.

Body mass index

Body mass index is a measure used to assess if an individual is a healthy weight for their height. It is calculated by dividing the individual’s weight by the square of their height, and it is typically represented in units of kg/m2.

Cohort studies

Cohort studies are concerned with charting the lives of groups of individuals who experience the same life events within a given time period. The best known examples are birth cohort studies, which follow a group of people born in a particular period.

Complete case analysis

Complete case analysis is the term used to describe a statistical analysis that only includes participants for which we have no missing data on the variables of interest. Participants with any missing data are excluded.


Conditioning refers to the process whereby participants’ answers to some questions may be influenced by their participation in the study – in other words, their responses are ‘conditioned’ by their being members of a longitudinal study. Examples would include study respondents answering questions differently or even behaving differently as a result of their participation in the study.


Confounding occurs where the relationship between independent and dependent variables is distorted by one or more additional, and sometimes unmeasured, variables. A confounding variable must be associated with both the independent and dependent variables but must not be an intermediate step in the relationship between the two (i.e. not on the causal pathway).

For example, we know that physical exercise (an independent variable) can reduce a person’s risk of cardiovascular disease (a dependent variable). We can say that age is a confounder of that relationship as it is associated with, but not caused by, physical activity and is also associated with coronary health. See also ‘unobserved heterogeneity’, below.


Cross-sectional surveys involve interviewing a fresh sample of people each time they are carried out. Some cross-sectional studies are repeated regularly and can include a large number of repeat questions (questions asked on each survey round).

Data harmonisation

Data harmonisation involves retrospectively adjusting data collected by different surveys to make it possible to compare the data that was collected. This enables researchers to make comparisons both within and across studies. Repeating the same longitudinal analysis across a number of studies allows researchers to test whether results are consistent across studies, or differ in response to changing social conditions.

Data imputation

Data imputation is a technique for replacing missing data with an alternative estimate. There are a number of different approaches, including mean substitution and model-based multivariate approaches.

Data linkage

Data linkage simply means connecting two or more sources of administrative, educational, geographic, health or survey data relating to the same individual for research and statistical purposes. For example, linking housing or income data to exam results data could be used to investigate the impact of socioeconomic factors on educational outcomes.

Dummy variables

Dummy variables, also called indicator variables, are sets of dichotomous (two-category) variables we create to enable subgroup comparisons when we are analysing a categorical variable with three or more categories.

General ability

General ability is a term used to describe cognitive ability, and is sometimes used as a proxy for intelligent quotient (IQ) scores.


Heterogeneity is a term that refers to differences, most commonly differences in characteristics between study participants or samples. It is the opposite of homogeneity, which is the term used when participants share the same characteristics. Where there are differences between study designs, this is sometimes referred to as methodological heterogeneity. Both participant or methodological differences can cause divergences between the findings of individual studies and if these are greater than chance alone, we call this statistical heterogeneity. See also: unobserved heterogeneity.

Household panel surveys

Household panel surveys collect information about the whole household at each wave of data collection, to allow individuals to be viewed in the context of their overall household. To remain representative of the population of households as a whole, studies will typically have rules governing how new entrants to the household are added to the study.


Kurtosis is sometimes described as a measure of ‘tailedness’. It is a characteristic of the distribution of observations on a variable and denotes the heaviness of the distribution’s tails. To put it another way, it is a measure of how thin or fat the lower and upper ends of a distribution are.

Longitudinal studies

Longitudinal studies gather data about the same individuals (‘study participants’) repeatedly over a period of time, in some cases from birth until old age. Many longitudinal studies focus upon individuals, but some look at whole households or organisations.

Non-response bias

Non-response bias is a type of bias introduced when those who participate in a study differ to those who do not in a way that is not random (for example, if attrition rates are particularly high among certain sub-groups). Non-random attrition over time can mean that the sample no longer remains representative of the original population being studied. Two approaches are typically adopted to deal with this type of missing data: weighting survey responses to re-balance the sample, and imputing values for the missing information.

Observational studies

Observational studies focus on observing the characteristics of a particular sample without attempting to influence any aspects of the participants’ lives. They can be contrasted with experimental studies, which apply a specific ‘treatment’ to some participants in order to understand its effect.

Panel studies

Panel studies follow the same individuals over time. They vary considerably in scope and scale. Examples include online opinion panels and short-term studies whereby people are followed up once or twice after an initial interview.


A percentile is a measure that allows us to explore the distribution of data on a variable. It denotes the percentage of individuals or observations that fall below a specified value on a variable. The value that splits the number of observations evenly, i.e. 50% of the observations on a variable fall below this value and 50% above, is called the 50th percentile or more commonly, the median.

Prospective study

In prospective studies, individuals are followed over time and data about them is collected as their characteristics or circumstances change.

Recall error or bias

Recall error or bias describes the errors that can occur when study participants are asked to recall events or experiences from the past. It can take a number of forms – participants might completely forget something happened, or misremember aspects of it, such as when it happened, how long it lasted, or other details. Certain questions are more susceptible to recall bias than others. For example, it is usually easy for a person to accurately recall the date they got married, but it is much harder to accurately recall how much they earned in a particular job, or how their mood at a particular time.

Record linkage

Record linkage studies involve linking together administrative records (for example, benefit receipts or census records) for the same individuals over time.

Reference group

A reference group is a category on a categorical variable to which we compare other values. It is a term that is commonly used in the context of regression analyses in which categorical variables are being modelled.


Residuals are the difference between your observed values (the constant and predictors in the model) and expected values (the error), i.e. the distance of the actual value from the estimated value on the regression line.

Respondent burden

Respondent burden is a catch all phrase that describes the perceived burden faced by participants as a result of their being involved in a study. It could include time spent taking part in the interview and inconvenience this may cause, as well as any difficulties faced as a result of the content of the interview.

Retrospective study

In retrospective studies, individuals are sampled and information is collected about their past. This might be through interviews in which participants are asked to recall important events, or by identifying relevant administrative data to fill in information on past events and circumstances.


Sample is a subset of a population that is used to represent the population as a whole. This reflects the fact that it is often not practical or necessary to survey every member of a particular population. In the case of birth cohort studies, the larger ‘population’ from which the sample is drawn comprises those born in a particular period. In the case of a household panel study like Understanding Society, the larger population from which the sample was drawn comprised all residential addresses in the UK.

Sampling frame

A sampling frame is a list of the target population from which potential study participants can be selected.


Skewness is the measure of how assymetrical the distribution of observations are on a variable. If the distribution has a more pronounced/longer tail at the upper end of the distribution (right-hand side), we say that the distribution is negatively skewed. If it is more pronounced/longer at the lower end (left-hand side), we say that it is positively skewed.

Study participants

Study participants are the individuals who are interviewed as part of a longitudinal study.

Survey weights

Survey weights can be used to adjust a survey sample so it is representative of the survey population as a whole. They may be used to reduce the impact of attrition on the sample, or to correct for certain groups being over-sampled.


The term used to refer to a round of data collection in a particular longitudinal study (for example, the age 7 sweep of the National Child Development Study refers to the data collection that took place in 1965 when the participants were aged 7). Note that the term wave often has the same meaning.

Target population

The population of people that the study team wants to research, and from which a sample will be drawn.

Tracing (or tracking)

Tracing (or tracking) describes the process by which study teams attempt to locate participants who have moved from the address at which they were last interviewed.

Unobserved heterogeneity

Unobserved heterogeneity is a term that describes the existence of unmeasured (unobserved) differences between study participants or samples that are associated with the (observed) variables of interest. The existence of unobserved variables means that statistical findings based on the observed data may be incorrect.


Variables is the term that tends to be used to describe data items within a dataset. So, for example, a questionnaire might collect information about a participant’s job (its title, whether it involves any supervision, the type of organisation they work for and so on). This information would then be coded using a code-frame and the results made available in the dataset in the form of a variable about occupation. In data analysis variables can be described as ‘dependent’ and ‘independent’, with the dependent variable being a particular outcome of interest (for example, high attainment at school) and the independent variables being the variables that might have a bearing on this outcome (for example, parental education, gender and so on).


The term used to refer to a round of data collection in a particular longitudinal study (for example, the age 7 wave of the National Child Development Study refers to the data collection that took place in 1965 when the participants were aged 7). Note that the term sweep often has the same meaning.

Learning Hub


Why use longitudinal data to study bullying?

Young people, schools, parents and government are more aware of bullying than ever before. It is a very important topic of modern day policy, practice and academic inquiry.

But to make the strongest case for tackling bullying, campaigners, practitioners and policymakers must prove that bullying isn’t just a part of growing up – that it can have a long-term negative impact on young people's lives.

Longitudinal studies make a unique contribution to our understanding of bullying by tracking its effects right through the course of our lives. The data have been used to understand the long-term consequences of bullying, and find out if different groups are more resilient or susceptible to the damaging effects.

Selected longitudinal evidence on bullying

The scarring effects of childhood bullying are still visible 40 years later

According to findings from the 1958 National Child Development Study, being bullied as a child is associated with a range of negative social, physical and mental health outcomes later in life. Read more.

Children with special educational needs are twice as likely to be bullied

Findings from the Millennium Cohort Study have shown that at age 7, 12 per cent of children with special educational needs and 11 per cent of those with a statement of need said they were bullied ‘all of the time’ by other pupils, compared to just 6 per cent of their non-disabled peers. Read more.

Being bullied by siblings is linked to being bullied at school

Findings from Understanding Society have shown that the chances of being bullied at school are considerably higher if children are victimised by their siblings. Read more.

Bullied children face twice the risk of depression and anxiety

Findings from the Avon Longitudinal Study of Parents and Children have shown that children who are bullied frequently in their early teenage years are two to three times as likely to develop depression and anxiety disorders by age 18. Read more.

What information do longitudinal studies collect on bullying?

Many longitudinal studies following younger generations, including the Millennium Cohort Study, the Avon Longitudinal Study of Parents and Children, and Understanding Society, ask a wide range of questions about participants’ experiences of bullying. These questions tend to cover:

  • whether they are the victim or perpetrator of bullying (or both)
  • whether they experience different types of bullying, including physical (hitting, punching, shoving), verbal (name calling, verbal threats), and relational (being left out of activities or friendship groups, being pressured into doing things, or being lied or gossiped about)
  • how often they experience these different types of bullying
  • whether they experience bullying at school, at home or both.

Researchers can use these data alongside the wide range of other information collected from the studies to determine which groups of young people are most at risk of being bullied, and how bullying is related to other areas of their lives, such as educational attainment and health.

However, bullying hasn’t always been as prominent an issue as it is today. This has meant that many older studies did not ask as many questions about bullying when their participants were growing up. For example, when the 1958 National Child Development Study study participants were aged 7 and 11, their mothers were asked if their children were bullied and how often. Mothers weren’t asked about different kinds of bullying or whether their child was a bully, and children weren’t asked about their experiences directly.

Does that mean older longitudinal data are not useful for studying bullying? In fact, older studies offer a significant advantage to studies of bullying: their participants have grown up.

Researchers can use data from older longitudinal studies to investigate how childhood experiences of being bullied affect adult life. In this evidence case study, researchers at King’s College, London, used data from the 1958 cohort to determine that being bullied as a child was associated with a huge range of problems in adulthood, including depression, unemployment and lower life satisfaction.

Anti-bullying charities and practitioners working with young people seized on these findings, as they are some of the best evidence we have that bullying truly does leave a scar for life and cannot be ignored.

Find out more about what information longitudinal studies collect in the Introduction to longitudinal studies module.

How do longitudinal studies collect information on bullying?

Most information about bullying is collected through questionnaires. Study participants might complete the questionnaires themselves, using a computer or pen and paper, or they might be asked the questions by an interviewer. Study teams always consider how sensitive the questions are, and whether participants would be more or less open to discussing their experiences with an interviewer than they would if they answered the questions on a self-completion questionnaire.

It is important to understand who the respondent is when using longitudinal data on bullying. While older studies asked participants’ parents whether their children were bullied, many newer studies ask the children directly. And of course, some studies ask both parents and children – and sometimes even teachers. You might be interested in looking at how children, parent and teachers’ reports differ.

Find out more about how longitudinal studies are designed, including sampling and the value of different methods and modes of collecting information, in the Study design module.

Advantages of using longitudinal data

There are a number of strengths of longitudinal studies that make them an ideal resource for studying bullying.

Breadth of data available: Longitudinal studies have the added advantage of covering a wide range of different areas of life. Victims of bullying often suffer from other problems, and it can be very difficult to unpick the impact of bullying alone. Of course, it is always possible that there are other factors that have not been captured, but longitudinal data cover significantly more than other data sources. Read more about the breadth of data available in the Introduction to longitudinal studies module.

Tracking long-term consequences: Cross-sectional studies can tell us about how many young people are experiencing bullying at a given point in time, and may also be able to differentiate between different groups depending on how much other information they collect. However, what really matters is how this experience shapes the rest of their lives – and longitudinal data can get closest to proving that the effects of bullying last. Read more about the differences between longitudinal and cross-sectional studies in the Introduction to longitudinal studies module.

Prospective data collection: Past experience of bullying is difficult to remember accurately – it can be an emotionally difficult experience that could cloud people's memories. Read more about prospective study vs retrospective study design in the Study design module.

Large sample sizes: Many longitudinal studies also have large enough sample sizes to identify particular groups that are at higher risk of being bullied, or those who are more resilient. Read more about longitudinal samples in the Study design module.

Find out more about the strengths of longitudinal data in the Introduction to longitudinal studies module.

Challenges of using longitudinal data

Researchers using longitudinal data to study bullying should be aware of some general challenges.

Being bullied is tough to admit: Bullying can be a traumatic and embarrassing experience. Any survey about bullying (whether longitudinal or not) will struggle with the fact that some people don’t want to report their experiences, or may downplay their severity.

Attrition and missing data: Some study participants drop out over time, and this isn’t always random. This is known as attrition. It is also the case that some participants may choose not to answer every question at every sweep – which can lead to something called missing data.

There are analytical methods that researchers can use to deal with attrition and missing data. The teams running particular longitudinal studies can provide useful guidance about how best to deal with missing data from their study.

Timeliness: Determining the long-term effects of childhood bullying requires us to wait until study participants have grown up. For example, we can see the longer-term effects of childhood bullying for study participants born in the 1950s, but not (yet)  for those born in the 2000s. When using data from older studies, it is important for researchers to consider how to relate their findings to generations growing up today.

Find out more about the challenges of longitudinal data in the Introduction to longitudinal studies module.

CLOSER studies to consider

Understanding Society

This household panel study has been used to look at bullying among siblings, and how those experiences relate to experiences of school bullying. Because it has boosted samples of ethnic minority and immigrant families, it enables researchers to understand if experiences of bullying are different for these groups.

Millennium Cohort Study

This national birth cohort study has rich data on child development, including mental health and wellbeing, in addition to detailed questions on bullying asked of children, parents and teachers. It over sampled minority ethnic and disadvantaged children, whose risks for being bullied may differ from other children.

Avon Longitudinal Study of Parents and Children

This regional birth cohort study has a strong biomedical focus allowing for scientifically robust studies of bullying and mental and physical health.

1958 National Child Development Study

This national birth cohort study is the oldest study with data on bullying. Mothers were asked if their children were bullied at ages 7 and 11, and their responses can be related to the study participants’ education, employment, health and wellbeing outcomes throughout adulthood.