Next Steps Longitudinal Dataset [work in progress]

Overview

This repository provides R scripts that harmonise the multiple sweeps of Next Steps into a single tidy dataset, so analysts can get straight to research rather than recoding.
The variables are given consistent names which relate to the content and age of participants (e.g., educ25 for education attainment at age 25).

Included variable domains

Domain	Examples
Demographics	sex, ethnicity, language, sexuality, partnership, region,
Education	qualifications, parents' qualifications.
Socioeconomic	own economic activity, parental economic activity, own NSSEC, parental NSSEC, home ownership, household income, IMD
Physical and mental health	psychological distress, life satisfaction, self-harm,anxiety, depression, loneliness, long-term illness, self‑rated health
Anthropometrics	weight, height, BMI
Behaviours	smoking, alcohol, drug, exercise, school absence, police contact, bullying victimisation

See ns_mseu_userguide.docx for full details.

Data availability

The source datasets are available to download from the UK Data Service. The derived variable dataset will be deposited once completed.

How to run or inspect the code

Depending on your needs, you can either rebuild the full derived dataset, or just inspect how specific variables are derived.

Setting up the raw data

Download Study 5545 (Next Steps, End User Licence) from the UK Data Service in Stata (.dta) format, then unzip it.

Unzipping the file should give you the UKDA-5545-stata folder with the datasets. You can either place this folder in data/ (which will replace the empty UKDA-5545-stata counterpart from this repo). Alternatively, find the .dta files and place them directly in the relevant subfolder (that is, data/UKDA-5545-stata/stata/stata13/safeguarded_eul/).

Reproducing the full pipeline

To rebuild the full dataset, run build-core-dataset.R from the project root. Running this script recreates the derived dataset in data/derived/. The main script calls other scripts in the R/ subfolder, which derive domain-specific variables. It then merges all derived variables into a single dataset.

Inspecting specific scripts

To inspect how specific variables are derived, open the relevant script in R/ (for example, R/02-education.R for education variables). Each domain script can also be run standalone, but you must first run R/00-load-raw-data.R (e.g. source("R/00-load-raw-data.R")).

Version control

The code was run with R v4.5.2. R package versions are tracked using renv. If you wish to rerun the code using the same package versions, make sure the renv package is installed (install.packages("renv")), and run renv::restore() the first time you open R in the project root. This will install those package versions locally and link them with the project. Please note that depending on your machine and setup, this may take time, and additional work might be required if some package installations fail.

Repository structure

.
├── build-core-dataset.R        # Run this to produce the derived dataset from start to finish.
├── R/                          # Domain-specific scripts, sourced by the main script
│   ├── 00-load-raw-data.R      # Loads the raw .dta files
│   ├── 01-demographic.R
│   ├── 02-education.R
│   ├── 03-socioeconomic.R
│   ├── 04-health.R
│   ├── 05-anthropometric.R
│   ├── 06-behaviour.R
│   └── helpers.R               # Shared recoding helper functions.
├── data/
│   ├── UKDA-5545-stata/.../safeguarded_eul/   # Place the raw .dta files from UKDS here
│   └── derived/                               # The derived dataset is exported here.
├── renv.lock                   # Tracks R package versions (see "Version control" above)
└── .Rprofile                   # Auto-activates renv when an R session starts here

User feedback and future plans

We welcome user feedback and plan to expand this dataset in future releases. Please email clsdata@ucl.ac.uk.

Licence

Code: MIT Licence (see LICENSE). Derived datasets remain subject to the Next Steps End‑User Licence terms.

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
R		R
data		data
renv		renv
.Rprofile		.Rprofile
.gitignore		.gitignore
README.md		README.md
build-core-dataset.R		build-core-dataset.R
ns-mseu-data.Rproj		ns-mseu-data.Rproj
renv.lock		renv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Next Steps Longitudinal Dataset [work in progress]

Overview

Included variable domains

Data availability

How to run or inspect the code

Setting up the raw data

Reproducing the full pipeline

Inspecting specific scripts

Version control

Repository structure

User feedback and future plans

Licence

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Next Steps Longitudinal Dataset [work in progress]

Overview

Included variable domains

Data availability

How to run or inspect the code

Setting up the raw data

Reproducing the full pipeline

Inspecting specific scripts

Version control

Repository structure

User feedback and future plans

Licence

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages