All surveys rely on samples, which are selected from a group of interest (often referred to as the ‘target population’).
The first thing study teams need to decide is who the study will focus on.
Think back to the three examples in the last section – each has a different sample population:
Most studies select their sample from within certain geographic limits. This might be for practical or scientific reasons. The geographic limits could be very small, for instance a city or county, or very large, such as the whole of the UK.
The first two examples are known as cohort studies and target specific groups or sections of the population. Cohort study samples share a common experience at a particular point in time. For example, a birth cohort follows children born within a specific period. Other cohorts follow groups of students in the same year at school, patients diagnosed with a certain disease at a particular point in time, or new recruits entering an organisation or industry in a given year.
Some studies, like Understanding Society, target the UK population as a whole. One challenge this presents is the fact that the population is always changing.
Studies that seek to represent the whole population must be ‘dynamic’ – that is, there needs to be a way in which new members can join the sample. Otherwise there is a risk that, over time, the sample will become increasingly different to the population it is meant to represent.
Understanding Society creates a dynamic sample by including people who move into participating households. For example, if the child of a participating household leaves home to move in with a partner, the partner will join the sample. Similarly, if a couple breaks up and forms two new households, both new households become part of the sample.
To select a sample, researchers need a ‘sampling frame’. This is a list of everyone in the target population of interest, from which a sample can be drawn. The choice of sampling frame depends on who the study wants to sample and when they would like to first interview them.
For example, the SWS wanted to interview women before they became pregnant, which ruled out certain sampling options (such as recruiting the sample through maternity records).
When assessing the sampling frame used for a study, it is important to consider how accurately the frame reflects the target population of interest. For example, does it include people who are not in the target population at all (and who need to be identified and weeded out)? Or is it missing people who are in the target population?
Child Benefit Records were used as the sampling frame for the Millennium Cohort Study. At the time, Child Benefit was universal, which meant that the list of recipients in 2000-01 (when the study started) was an accurate reflection of all UK families with a child born in the study’s target year.
However, Child Benefit Records are no longer as suitable a sampling frame for birth cohort studies because the benefit is no longer universal. Changes made in 2013 mean that the records under-represent higher earners, who are no longer entitled to Child Benefit. If a study were to use the current Child Benefit Records as a sampling frame today, the sample would under-represent higher income households.
Planning for all surveys involves considering the likely achieved sample size – that is, how many participants are likely to take part.
Cross-sectional study teams will identify the ideal achieved sample size, as well as the likely response rate – that is, the number of people who complete the survey divided by the number of people who were invited to take part (minus any who turn out to be ineligible). Study teams usually issue a sample that is larger than their ideal achieved sample size to take into account that response rates are never 100%.
With longitudinal studies, these calculations are more complex. The study teams need to think about the sample over a longer time period, collecting data a number of times.
An important consideration for longitudinal study teams is attrition – that is, participants dropping out of the study, either permanently or temporarily.
Some attrition is unavoidable (for example, participants might die or leave the country). Other attrition is avoidable but challenging to overcome (for example, keeping in touch with participants who move or persuading reluctant participants to take part).
The sample design for a longitudinal study will involve making judgements about the starting sample size needed to ensure that the study can withstand likely attrition levels over time.
In the case of some longitudinal studies, the target population is much larger than the desired number of participants so a smaller subsample needs to be selected. Study teams use various methods to make sure that this subsample is as representative of the target population as possible. These sampling methods have become more sophisticated over time as sampling methods have evolved.
For example, the first three British birth cohorts selected their sample of births by choosing a specific week within the relevant year (1946, 1958 and 1970). All births within those weeks were eligible to be included in the first round of each study. The 1958 and 1970 birth cohorts included these participants in subsequent waves of the study; in the case of the 1946 birth cohort, a subsample of cases from the first study were followed up.
There were several limitations to this approach. In particular, the sample is potentially not representative of everyone born in that year – only of those born in that season. This makes it is impossible to use the data to explore issues like whether season of birth affects later outcomes, such as educational attainment.
This is one of the reasons that the most recent birth cohort, the Millennium Cohort Study, selected its sample of births from across a whole school year. This allows researchers to be confident that the data collected can be used to make inferences about the wider population born at the turn of the century.
However, it is important to be aware that there is a debate within epidemiology about whether the importance of having representative samples drawn from well-defined populations has been overrated. Instead, it is argued, some research questions are better addressed by sample designs that focus upon particular groups of interest rather by seeking to obtain a representative sample of the relevant population as a whole. For an introduction to this discussion see this article in the Longitudinal and Life Course Studies journal.
As covered in the previous sections, most longitudinal study teams aim to select representative samples that reflect the composition of the target population. However, unless the starting sample is very large indeed, this means that there will be relatively small numbers of participants from minority groups.
While the proportions of participants from minority groups might accurately reflect the make-up of the wider population, the small numbers can constrain the research that can be done using these groups.
For example, imagine a particular group represents 2 per cent of the UK population as a whole. If a longitudinal study achieves 8,000 interviews in its first sweep of data collection, it will include around 160 participants from the minority group – too small for any detailed statistical analysis, especially if some of these participants drop out at subsequent sweeps.
As a result, some studies now ‘boost’ the number of participants from particular. Examples of longitudinal studies that have taken this approach include:
If a study contains a boosted number of participants from a particular group, survey weights should be applied to adjust the overall results so that they are representative of the population as a whole. Sample weighting involves some individuals counting as less than one case, while others may count for more.