We added GLU_index to the QC-ed totals in the previous script. In this script, we will analyze some of the body measurements and KCAL intake by GLU_index groups using the QC-ed totals data. These examples are not intended to be a complete guide to analysis, but rather to give you some ideas for how to explore this data.
Name the path to DietR directory where input files are pulled.
main_wd <- "~/GitHub/DietR"
Load the necessary functions.
source("lib/specify_data_dir.R")
source("lib/ggplot2themes.R")
You can come back to the main directory by:
setwd(main_wd)
Specify the directory where the data is.
SpecifyDataDirectory(directory.name= "eg_data/NHANES/Laboratory_data/")
Load the data of those to be used in the diabetic status analysis.
glu_2 <- read.table( file="QCtotal_d_ga_body_meta_glu_comp_2.txt", sep= "\t", header= T )
Make GLU_index as a factor for plotting.
glu_2$GLU_index <- factor(glu_2$GLU_index, levels = c("Normal", "Prediabetic", "Diabetic"))
The columnname for BMI is BMXBMI
.
Check the summary data - this will also show the number of missing data if any.
summary(glu_2$BMXBMI)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 16.00 24.10 28.20 29.21 33.00 63.60 13
13 are missing BMI and has NA’s. You can also see that by counting the number of NAs in specified rows.
colSums(is.na(glu_2[, c("SEQN", "BMXBMI")]))
## SEQN BMXBMI
## 0 13
Create a density plot of BMI by GLU_index type.
BMIfreq <- ggplot(data=glu_2, aes(x=BMXBMI, group=GLU_index, fill=GLU_index)) +
geom_density(adjust=1.5, alpha=.4) + space_axes + no_grid +
scale_fill_manual(values= c("steelblue3", "yellow", "hotpink") ) +
labs(x="BMI", y="Density")
Print out the chart. > If there are missing data, it will give a Warning message: > “Removed 13 rows containing non-finite values (stat_density).”
BMIfreq
## Warning: Removed 13 rows containing non-finite values (`stat_density()`).
Save the chart as .pdf. It is helpful to make note of the number of datapoints: n = 1610 - 13 missing = 1597.
ggsave("QCtotal_d_ga_body_meta_glu_comp_2_n1597_BMI_by_GLU_index.pdf",
BMIfreq, device="pdf", width=5.3, height=4.5)
## Warning: Removed 13 rows containing non-finite values (`stat_density()`).
The diabetic population seems to have higher BMI than the prediabetic, and the lowest BMI was the normal population.
The columnname for bodyweight is BMXWT
.
Check the summary data - this will show the number of missing data if any.
summary(glu_2$BMXWT)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 36.00 65.80 78.40 81.35 92.65 198.90 11
11 are missing body weight and has NA’s.
Show histogram of body weight.
hist(glu_2$BMXWT)
Create a density plot of body weight by GLU_index type.
weightfreq <- ggplot(data=glu_2, aes(x=BMXWT, group=GLU_index, fill=GLU_index)) +
geom_density(adjust=1.5, alpha=.4) + space_axes + no_grid +
scale_fill_manual(values= c("steelblue3", "yellow", "hotpink")) +
labs(x="Body weight (kg)", y="Density")
weightfreq
## Warning: Removed 11 rows containing non-finite values (`stat_density()`).
Save the chart as .pdf. n = 1610 - 11 missing = 1599.
ggsave("QCtotal_d_ga_body_meta_glu_comp_2_n1599_weight_by_GLU_index.pdf",
weightfreq, device="pdf", width=5.3, height=4.5)
## Warning: Removed 11 rows containing non-finite values (`stat_density()`).
Check the summary data - this will show the number of missing data if any.
summary(glu_2$KCAL)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 604.5 1518.0 1933.5 2022.3 2444.9 5136.0
There is no missing data for KCAL.
Create a line chart of the KCAL frequency of each group.
KCALfreq <- ggplot(data=glu_2, aes(x=KCAL, group=GLU_index, color=GLU_index)) +
geom_density(adjust=1.5, alpha=.4, linewidth=1.2, linetype="longdash") + space_axes + no_grid +
scale_color_manual(values= c("steelblue3", "gold3", "hotpink") ) +
labs(x="KCAL", y="Density") +
scale_y_continuous(labels = function(x) format(x, scientific = FALSE))
KCALfreq
Save the chart as .pdf.
ggsave("QCtotal_d_ga_body_meta_glu_comp_2_n1610_KCAL_by_GLU_index_line.pdf",
KCALfreq, device="pdf", width=5.3, height=4.5)
Come back to the main directory before you start running another script.
setwd(main_wd)