Introduction

In the previous script, we identified males in their 60s and older seem to be an appropriate subsample to analyze their diabetic statuses and diet. We will remove people who are 80 years old or potentially above from our analyses because they may be using meal assistance and thus may not be eating freely.


Load functions and packages

Name the path to DietR directory where input files are pulled.

main_wd <- "~/GitHub/DietR"

Load necessary packages.

library(ggplot2)

Load the necessary functions.

source("lib/specify_data_dir.R")
source("lib/ggplot2themes.R")

You can come back to the main directory by:

setwd(main_wd)


Load NHANES15-16totals data

Specify the directory where the data is.

SpecifyDataDirectory(directory.name = "eg_data/NHANES/Laboratory_data")

Load the data of those to be used in the diabetes status analysis.

glu_2 <- read.delim(file="QCtotal_d_ga_body_meta_glu_comp_2.txt", sep="\t", header=T)

Make GLU_index as a factor for plotting.

glu_2$GLU_index <- factor(glu_2$GLU_index, levels = c("Normal", "Prediabetic", "Diabetic"))

Check the sample size of each category.

table(glu_2$GLU_index, useNA = "always")
## 
##      Normal Prediabetic    Diabetic        <NA> 
##         679         723         208           0

Age - no missing data, and spread pretty evenly.

summary(glu_2$RIDAGEYR)     
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   18.00   34.00   50.00   49.27   64.00   80.00
hist(glu_2$RIDAGEYR)

Gender - no missing data. 1: male, 2: female.

table(glu_2$RIAGENDR, useNA="always")     
## 
##    1    2 <NA> 
##  765  845    0


Select only males 60-79 years old, so that our samples are more uniform

Select those who are males and whose ages fall between 60-79.

glu_2_males60to79 <- subset(glu_2, RIAGENDR == 1 & RIDAGEYR >= 60 & RIDAGEYR <= 79) 

Check the dimension of the selected data - 236 rows.

dim(glu_2_males60to79)
## [1] 236 266

Ensure the ages of the selected subpopulation are between 60-79.

table(glu_2_males60to79$RIDAGEYR)
## 
## 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 
## 17 22 14 12 13 17 14 11  8 13 15 11 11  9 11 11  9  7  6  5

Look at the distribution of GLU_index among the selected subpopulation.

table(glu_2_males60to79$GLU_index, useNA="always")
## 
##      Normal Prediabetic    Diabetic        <NA> 
##          47         127          62           0

Save the glu_2_males60to79 as a txt file.

write.table(glu_2_males60to79, "QCtotal_d_ga_body_meta_glu_comp_2_males60to79.txt", 
            sep="\t", row.names = F, quote = F)


Body measurements by diabetic statuses

BMI

Look at the BMI frequency of each group

It is OK to see an error saying it removed rows containing non-finite values or missing values,as long as the charts are produced.

males60to79_BMIfreq <- 
  ggplot(data=glu_2_males60to79, aes(x=BMXBMI, group=GLU_index, fill=GLU_index)) +
  geom_density(adjust=1.5, alpha=0.4) + space_axes + no_grid +
  scale_fill_manual(values= c("aquamarine2", "lightgoldenrod1", "lightpink1") ) +
  labs(x="BMI", y="Density")
males60to79_BMIfreq
## Warning: Removed 3 rows containing non-finite values (`stat_density()`).

Save the chart as .pdf.

ggsave("males60to79_BMI_by_GLU_index.pdf", 
       males60to79_BMIfreq, device="pdf", width=5.3, height=4.5)
## Warning: Removed 3 rows containing non-finite values (`stat_density()`).



Create a boxplot of BMI of each GLU_index group

males60to79_BMIbox <- 
  ggplot(glu_2_males60to79, aes(x=GLU_index, y=BMXBMI, fill=GLU_index)) +
  geom_boxplot(outlier.shape = NA) + no_grid + space_axes +
  scale_fill_manual(values= c("aquamarine2", "lightgoldenrod1", "lightpink1") ) +
  geom_jitter(width=0.3)
males60to79_BMIbox
## Warning: Removed 3 rows containing non-finite values (`stat_boxplot()`).
## Warning: Removed 3 rows containing missing values (`geom_point()`).

ggsave("males60to79_BMI_by_GLU_index_box.pdf", 
       males60to79_BMIbox, device="pdf", width=5.3, height=4.5)
## Warning: Removed 3 rows containing non-finite values (`stat_boxplot()`).
## Warning: Removed 3 rows containing missing values (`geom_point()`).



ANOVA between groups

The three GLU_index groups appear to have different BMI means.

Test the difference between groups by ANOVA.

Remove samples (rows) that have missing data in the target variable.

df <- glu_2_males60to79[complete.cases(glu_2_males60to79$BMXBMI), ]

Run ANOVA.

myanova <- aov(BMXBMI ~ GLU_index, data=df)
summary(myanova)
##              Df Sum Sq Mean Sq F value Pr(>F)    
## GLU_index     2    476  237.94   8.406  3e-04 ***
## Residuals   230   6511   28.31                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The ANOVA results indicate that the groups are different (p<0.05).

But first, test the assumptions for ANOVA.

Create a new dataset of residuals of the model.

res1 <- residuals(myanova)

Check the normality assumption.

hist(res1)

qqnorm(res1, plot.it=TRUE)
qqline(res1)

The histogram and QQplot indicate the residuals have an approximately normal distribution.

check equal variance of residuals with a side-by-side boxplots of the residuals.

boxplot(res1 ~ df$GLU_index)

This boxplot also indicates the variance of the residuals are approximately equal across the factor levels.

Create a new variable of squared residuals.

res1sq <- res1*res1

Levene’s test – Run ANOVA for the squared residuals as the response.
If Pr>0.05, the residuals of the groups have equal variance.

anova(lm(res1sq ~ df$GLU_index))
## Analysis of Variance Table
## 
## Response: res1sq
##               Df Sum Sq Mean Sq F value Pr(>F)
## df$GLU_index   2     67   33.27  0.0128 0.9873
## Residuals    230 597290 2596.91

The p-value (Pr) of the Levene’s test also supported equal variance.

When the assumptions are satisfied, then you can run ANOVA.

summary(aov(BMXBMI ~ GLU_index, data=df))
##              Df Sum Sq Mean Sq F value Pr(>F)    
## GLU_index     2    476  237.94   8.406  3e-04 ***
## Residuals   230   6511   28.31                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

If ANOVA is significant, you can do a pairwise t-test.

“holm” (default) is less conservative than Bonferroni p-value adjustment method for multiple comparisons. Other methods that can be used: “holm”, “hochberg”, “hommel”, “bonferroni”, “BH”, “BY”, “fdr”, “none”.

pairwise.t.test(df$BMXBMI, df$GLU_index, p.adjust.method = "holm") 
## 
##  Pairwise comparisons using t tests with pooled SD 
## 
## data:  df$BMXBMI and df$GLU_index 
## 
##             Normal  Prediabetic
## Prediabetic 0.12765 -          
## Diabetic    0.00037 0.00324    
## 
## P value adjustment method: holm

Based on the p-values, Normal and Prediabetic have the same mean BMI (p=0.12765 > 0.05). Normal and Diabetic have different mean BMI (p=0.00037 < 0.05), as do Prediabetic and Diabetic (p=0.00324 < 0.05).



KCAL

Look at the KCAL frequency of each group

males60to79_KCALfreq <- 
  ggplot(data=glu_2_males60to79, aes(x=KCAL, group=GLU_index, color=GLU_index)) +
  geom_density(adjust=1.5, alpha=0.4, linewidth=1.2, linetype="longdash") + space_axes + no_grid +
  scale_color_manual(values= c("aquamarine3", "lightgoldenrod3", "lightpink1")) +
  labs(x="KCAL", y="Density") +
  scale_y_continuous(labels= function(x) format(x, scientific = FALSE))
males60to79_KCALfreq

Save the chart as .pdf.

ggsave("males60to79_KCAL_by_GLU_index.pdf", 
       males60to79_KCALfreq, device="pdf", width=5.3, height=4.5)



Create a boxplot of KCAL of each GLU_index group

males60to79_KCALbox <- 
  ggplot(glu_2_males60to79, aes(x=GLU_index, y=KCAL, fill=GLU_index)) +
  geom_boxplot(outlier.shape = NA) + no_grid + space_axes +
  scale_fill_manual(values= c("aquamarine2", "lightgoldenrod1", "lightpink1") ) +
  geom_jitter(width=0.3)
males60to79_KCALbox

ggsave("males60to79_KCAL_by_GLU_index_box.pdf", 
       males60to79_KCALbox, device="pdf", width=5.3, height=4.5)



ANOVA between groups

Test the difference of KCAL between groups by ANOVA.

Remove samples (rows) that have missing data in the target variable.

df <- glu_2_males60to79[complete.cases(glu_2_males60to79$KCAL), ]

Run ANOVA.

summary(aov(KCAL ~ GLU_index, data=df))
##              Df    Sum Sq Mean Sq F value Pr(>F)
## GLU_index     2    997260  498630   1.065  0.347
## Residuals   233 109123888  468343

Based on the p-values for ANOVA, the energy intake in KCAL is not different between the GLU_index groups are not different.



Come back to the main directory.

setwd(main_wd)