To access the CTD, we must download it from the UK Data Service (UKDS). We will need to register/login to access the data and then choose the Stata formatted data from the download options. To do this, click on the icon below and it will take you to the relevant page on UKDS:
The download is in the format of a zipped (compressed) folder. After unzipping the folder, we can then open the ‘CLOSER_training_dataset_complete_cases.dta’ file in Stata.
We have prepared a Stata syntax file (a .do file) to accompany this module. It includes all of the commands discussed in the following sections and we recommend you open it up in Stata alongside the CTD data. You can download this syntax file by clicking the icon below:
Now we have the data, our first step will be to simplify the dataset by dropping the variables not currently relevant to us. This variable selection is done using Stata’s ‘keep’ command as shown below (note that in the code snippets below and throughout this module, Stata commands are in bold font and the variable names are in italics).
For these analyses, we are adopting a complete case analysis approach. That means that in preparing the dataset, we are excluding any cases where there are missing data on any of the variables of interest. (Missing data can be handled in alternative ways, such as through the use of data imputation techniques). To remove the incomplete cases, we first want to ensure that all of the variables use the same missing value code (“.”) as illustrated in the Stata code snippet below.
We then need to run the following set of commands in Stata to create a temporary variable denoting cases with incomplete data (miss1). We can then remove cases with any incomplete data using the ‘drop if’ command.
The data are now ready for some initial exploration of the variables of interest.
The Learning Hub is a resource for students and educators
tel | +44 (0)20 7331 5102 |
---|---|
closer@ucl.ac.uk |
Sign up for our email newsletters to get the latest from CLOSER
Sign up