R base functions
table
table is a function used to build a contingency table, which is a table that shows counts for categorical data, from one or more categories. prop.table is a function that accepts table output, returning proportions of the counts.
Examples
In the DeathRecords data file, create a table of the values in the Race column and how many times that each Race value occurs.
Click to see solution
deathDF <- read.csv("/anvil/projects/tdm/data/death_records/DeathRecords.csv")
table(deathDF$Race)
1 2 3 4 5 6 7 18 28 38
2241510 309504 18031 13297 8159 700 11074 6778 4711 623
48 58 68 78
4913 316 8737 2818
In the DeathRecords data file, create a table showing how many people are in each of the 6 categories above at the time of their death.
The labels for part a should be the default labels, i.e., like this: (-Inf,18] (18,25] (25,35] (35,55] (55,150] (150, Inf]
"youth": less than or equal to 18 years old "young adult": older than 18 but less than or equal to 25 years old "adult": older than 25 but less than or equal to 35 years old "middle age adult": older than 35 but less than or equal to 55 years old "senior adult": greater than 55 years old but less than or equal to 150 years old (or any other upper threshold that you like) "unknown": age of 999 (you could use, say, ages 150 to Inf for this category) |
Click to see solution
death_records <- read.csv("/anvil/projects/tdm/data/death_records/DeathRecords.csv")
ages <- death_records$Age
age_groups <- cut(ages, breaks = c(0, 18, 25, 35, 55, 150, Inf))
first_table <- table(age_groups)
print(first_table)
age_groups
(0,18] (18,25] (25,35] (35,55] (55,150] (150,Inf]
36033 27691 49540 271181 2246155 571
In the DeathRecords data file, create a table showing how many people are in each of the 6 categories above at the time of their death but also adding labels corresponding to the 6 categories below.
The labels for part a should be the default labels, i.e., like this: (-Inf,18] (18,25] (25,35] (35,55] (55,150] (150, Inf]
"youth": less than or equal to 18 years old "young adult": older than 18 but less than or equal to 25 years old "adult": older than 25 but less than or equal to 35 years old "middle age adult": older than 35 but less than or equal to 55 years old "senior adult": greater than 55 years old but less than or equal to 150 years old (or any other upper threshold that you like) "unknown": age of 999 (you could use, say, ages 150 to Inf for this category) |
Click to see solution
death_records <- read.csv("/anvil/projects/tdm/data/death_records/DeathRecords.csv")
ages <- death_records$Age
second_age_groups <- cut(ages,
breaks = c(0, 18, 25, 35, 55, 150, Inf),
labels = c("youth", "young adult", "adult", "middle age adult", "senior adult", "unknown"))
second_table <- table(second_age_groups)
print(second_table)
second_age_groups
youth young adult adult middle age adult
36033 27691 49540 271181
senior adult unknown
2246155 571
cut
cut breaks a vector into factors specified by the argument breaks. cut is particularly useful to break Date data into quarters (Q1, Q2), years (1999, 2000, 2001), and so on.
Examples
Use the cut command to classify people at their time of death into 6 categories:
"youth": less than or equal to 18 years old "young adult": older than 18 but less than or equal to 25 years old "adult": older than 25 but less than or equal to 35 years old "middle age adult": older than 35 but less than or equal to 55 years old "senior adult": greater than 55 years old but less than or equal to 150 years old (or any other upper threshold that you like) "unknown": age of 999 (you could use, say, ages 150 to Inf for this category) |
Click to see solution
death_records <- read.csv("/anvil/projects/tdm/data/death_records/DeathRecords.csv")
ages <- death_records$Age
# sort into categories but no labels
age_groups <- cut(ages, breaks = c(0, 18, 25, 35, 55, 150, Inf))
# add labels corresponding to the 6 categories above
second_age_groups <- cut(ages,
breaks = c(0, 18, 25, 35, 55, 150, Inf),
labels = c("youth", "young adult", "adult", "middle age adult", "senior adult", "unknown"))
subset
subset is a function that helps you take subsets of data. By default, subset removes NA rows.
subset does not perform any operation that can’t be accomplished by indexing.
|
Examples
In the DeathRecords data file, show the head of the subset of data for which Sex=='F'.
Click to see solution
deathDF <- read.csv("/anvil/projects/tdm/data/death_records/DeathRecords.csv")
femaleSubset <- subset(deathDF, Sex == 'F')
head(femaleSubset)
Id ResidentStatus Education1989Revision Education2003Revision EducationReportingFlag MonthOfDeath Sex AgeType Age AgeSubstitutionFlag ... CauseRecode39 NumberOfEntityAxisConditions NumberOfRecordAxisConditions Race BridgedRaceFlag RaceImputationFlag RaceRecode3 RaceRecode5 HispanicOrigin HispanicOriginRaceRecode <int> <int> <int> <int> <int> <int> <chr> <int> <int> <int> ... <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> 3 3 1 0 7 1 1 F 1 75 0 ... 28 2 2 1 0 0 1 1 100 6 6 6 1 0 5 1 1 F 1 93 0 ... 37 5 5 1 0 0 1 1 100 6 9 9 1 0 3 1 1 F 1 86 0 ... 37 1 1 1 0 0 1 1 100 6 11 11 1 0 3 1 1 F 1 79 0 ... 22 2 2 1 0 0 1 1 100 6 13 13 1 0 4 1 1 F 1 85 0 ... 22 5 5 1 0 0 1 1 100 6 14 14 1 0 3 1 1 F 1 84 0 ... 8 2 2 1 0 0 1 1 100 6
In the DeathRecords data file, show the head of the subset of data for which Sex=='F' & Age!=999.
Click to see solution
deathDF <- read.csv("/anvil/projects/tdm/data/death_records/DeathRecords.csv")
validFemaleSubset <- subset(deathDF, Sex == 'F' & Age != 999)
head(validFemaleSubset)
Id ResidentStatus Education1989Revision Education2003Revision EducationReportingFlag MonthOfDeath Sex AgeType Age AgeSubstitutionFlag ... CauseRecode39 NumberOfEntityAxisConditions NumberOfRecordAxisConditions Race BridgedRaceFlag RaceImputationFlag RaceRecode3 RaceRecode5 HispanicOrigin HispanicOriginRaceRecode <int> <int> <int> <int> <int> <int> <chr> <int> <int> <int> ... <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> 3 3 1 0 7 1 1 F 1 75 0 ... 28 2 2 1 0 0 1 1 100 6 6 6 1 0 5 1 1 F 1 93 0 ... 37 5 5 1 0 0 1 1 100 6 9 9 1 0 3 1 1 F 1 86 0 ... 37 1 1 1 0 0 1 1 100 6 11 11 1 0 3 1 1 F 1 79 0 ... 22 2 2 1 0 0 1 1 100 6 13 13 1 0 4 1 1 F 1 85 0 ... 22 5 5 1 0 0 1 1 100 6 14 14 1 0 3 1 1 F 1 84 0 ... 8 2 2 1 0 0 1 1 100 6