First, we are going to examine the sex (n622), a dichotomous variable, to look at how this is coded. The ‘codebook’ command is particularly useful for looking at categorical variables.
The n622 variable is coded 1=Male and 2=Female. There are 2,141 males in our data and 2,356 females.
For our regression analysis, we will recode the data to create a new binary variable (which we will label ‘sex’ and in which we will recode the values as 0=Male and 1=Female). Such binary variables are often known as dummy variables. Although the coefficients would work out the same if the variable was coded as 1/2 or 0/1, the intercept (labelled as “_cons” in the output) would be less intuitive. In our regression analysis, we will use males as the reference group.
The second variable we are going to look at is father’s social class (n1171).
The n1171 variable has 7 categories ranging from 1=‘Social class I’ to 7=‘Social class V’. Some of the categories have low numbers of observations. For example, ‘SC IV non-manual’ has only 75 observations, so we will combine some of the categories to increase the number of observations they capture by creating a new variable with fewer categories using the ‘gen’ and ‘replace’ commands.
We have now created a new variable n1171_2 which collapses social class I and II from n1171 into a combined I and II professional and managerial category which we will use as our reference group. These two categories are often combined into a single high social class grouping. The second change we have made is combining the ‘SC IV non-manual’ category with only 75 observations with the ‘SC IV manual’ category to create a single IV category with 711 observations. With only 75 observations it may increase the chance that we may find no association with BMI at age 42 in the non-manual unskilled category (compared to the higher social classes) as a consequence of the low sample size, even if there actually is a relationship. We can examine the difference between the original and recoded variable using the ‘tab’ command.
As you can see from the output table above, social class n1171_2 now has 5 categories. We can now proceed to the next steps in our analysis, where we will undertake statistical modelling to explore research questions with the data.
The Learning Hub is a resource for students and educators
tel | +44 (0)20 7331 5102 |
---|---|
closer@ucl.ac.uk |
Sign up for our email newsletters to get the latest from CLOSER
Sign up