SPOTlight was a prospective cohort study of deteriorating ward patients assessed for critical care admission in National Health Service hospitals in the UK. Early admission was defined as within 4 h of assessment. The primary endpoint was 90-day survival. We have provided you with a copy of some key variables from the original study.
# Steve Harris # 2017-09-19 # Explore outreach data, and examine factors affecting admission to critical care # Load up the libraries I will use library(readr) library(Hmisc) library(ggplot2) library(dplyr) # Load directly from the internet share df <- read_csv('https://ndownloader.figshare.com/files/5094199?private_link=aff8f0912c76840c7526') # Basic overview of the data str(df)
First of perform some sanity checks. Make sure you understand your data. How many rows (observations), and what variables? How are these variables encoded? Use the functions
There’s a nice function called
describe() in the library Hmisc (so named as ‘Harrel’-‘miscellaneous’: all you need to know is that Frank Harrel is a famous(!) statistician from van der Bilt). You use it as follows
describe(df$my_variable) to inspect a variable. It will summarise, report missingness, report unique values and more.
Can you work out what the unit of analysis is (i.e. what does a row represent)? Hospitals? Patients? Referrals?
Let’s try to understand if age affects your chances of admission? There are two relevant variables (in addition to
||boolean (1/0) indicator of fact of admission to ICU within 1 week of referral|
||boolean (1/0) indicator of decision to admit to ICU within 1 week of referral|
||the integer NEWS score recorded on the ward at the time of assessment|
||the (categorical) NEWS risk class at the time of assessment|
You should always start by trying to understand in detail these variables contributing to the question you’re asking. It’s probably also worth looking at the NEWS score variables as markers of severity.
This means you’ll need to inspect a mixture of continuous (
news_score), nominal (
icu_admit), ordinal (
Try doing this both visually (see the exploratory data visualisation lecture), and numerically by using
Here’s a starter looking at age:
describe(df$age) ggplot(df, aes(x=age)) + geom_histogram(binwidth=1)