In this script, we will analyze correlation between ordination axes values and foods.
Name the path to DietR directory where input files are pulled.
main_wd <- "~/GitHub/DietR"
Load necessary packages.
library(ggplot2)
Load the necessary functions.
source("lib/specify_data_dir.R")
source("lib/corr.axes.foods.R")
source("lib/ggplot2themes.R")
Load the distinct 100 colors for use.
distinct100colors <- readRDS("lib/distinct100colors.rda")
You can come back to the main directory by:
setwd(main_wd)
Specify the directory where the data is.
SpecifyDataDirectory(directory.name = "eg_data/NHANES/Laboratory_data/Ordination/")
From sorted IFC table, generate a table of total amount of food consumed by all the individuals, and a table with correlation coefficients, p-values, and q-values with a desired threshold between food items and Axes that were saved in the ordination section. This function will print out the food items’ names while calculating the correlation measures.
Be careful not to confuse WEIGHTED and UNweighted unifrac distances as you name the files.
CorrAxesFood(food_ifc_soted = "../Foodtree/Food_D12_FC_QC_demo_QCed_males60to79_3Lv.food.ifc_sorted.txt",
AmountSums_out_fn = "Food_D12_FC_QC_demo_QCed_males60to79_3Lv_AmountSums.txt",
qval_threshold = 0.05,
meta_users = "Food_D12_FC_QC_demo_QCed_males60to79_3Lv_ord_WEIGHTED_meta_users.txt",
corr_axes_foods_outfn = "Food_D12_FC_QC_demo_QCed_males60to79_3Lv_ord_WEIGHTED_corr_axes_foods_thr0.05.txt")
Argument | Description |
---|---|
food_ifc_soted | xxx.food.ifc.sorted.txt file, saved in the ordination section |
AmountSums_out_fn | Output filename to be saved which has the total consumption amount of each food |
qval_threshold | q-value threshold to call a correlation significant |
meta_users | xxx.meta_users.txt file, waved in the ordination section |
corr_axes_foods_outfn | Output filename to be saved which has the correlation between foods and Axes |
dat <- read.delim("Food_D12_FC_QC_demo_QCed_males60to79_3Lv_ord_WEIGHTED_corr_axes_foods_thr0.05.txt")
Check the number of food items with significant q-values.
nrow(subset(dat, Significance=="*"))
## [1] 31
There are 31 food items correlated with particular food items.
Show only food items that are significantly correlated with one of the axes.
subset(dat, Significance=="*")
## Food Axis correlation
## 1 Water tap Axis.1 -0.8487995
## 2 Water bottled unsweetened Axis.1 0.8097436
## 3 Coffee brewed Axis.3 -0.7107605
## 4 Soft drink cola Axis.5 -0.5557616
## 5 Coffee brewed Axis.2 0.5292725
## 6 Beer light Axis.6 0.4848544
## 7 Milk whole Axis.2 0.4815899
## 8 Milk reduced fat 2 Axis.2 0.4609677
## 9 Beer light Axis.5 0.4542099
## 10 Coffee brewed Axis.4 0.4453274
## 11 Soft drink cola Axis.4 -0.3908675
## 12 Soft drink cola Axis.6 0.3809187
## 13 Water bottled unsweetened Axis.4 0.3594993
## 14 Beer Axis.7 0.3520281
## 15 Ice cream regular flavors other than chocolate Axis.10 -0.3476261
## 16 Milk reduced fat 2 Axis.9 -0.3410168
## 17 Milk fat free skim Axis.8 0.3205856
## 18 Tea iced brewed black pre sweetened with sugar Axis.6 -0.3165401
## 19 Banana raw Axis.7 -0.3154898
## 20 Milk whole Axis.10 -0.3105564
## 21 Beer Axis.9 -0.3065722
## 22 Milk fat free skim Axis.5 -0.2857583
## 23 Soft drink cola Axis.8 -0.2854799
## 24 Water tap Axis.3 0.2784664
## 25 Water tap Axis.4 0.2763216
## 26 Soft drink cola diet Axis.9 -0.2737740
## 27 Milk reduced fat 2 Axis.3 0.2719744
## 28 Orange juice 100 canned bottled or in a carton Axis.10 -0.2654601
## 29 Tea iced brewed black unsweetened Axis.6 -0.2622820
## 30 Milk low fat 1 Axis.9 0.2607144
## 31 Milk fat free skim Axis.7 -0.2610995
## pval qval Significance
## 1 1.049973e-66 1.761854e-62 *
## 2 4.280100e-56 3.591004e-52 *
## 3 1.299079e-37 7.266184e-34 *
## 4 1.571715e-20 6.593345e-17 *
## 5 1.938004e-18 6.503943e-15 *
## 6 2.560240e-15 7.160137e-12 *
## 7 4.171242e-15 9.999063e-12 *
## 8 8.100359e-14 1.699050e-10 *
## 9 2.051963e-13 3.825770e-10 *
## 10 6.751299e-13 1.132868e-09 *
## 11 4.907704e-10 7.486480e-07 *
## 12 1.445241e-09 2.020928e-06 *
## 13 1.311234e-08 1.692501e-05 *
## 14 2.726032e-08 3.267344e-05 *
## 15 4.158947e-08 4.652476e-05 *
## 16 7.747876e-08 8.125585e-05 *
## 17 4.848080e-07 4.785340e-04 *
## 18 6.862093e-07 6.396995e-04 *
## 19 7.503549e-07 6.626818e-04 *
## 20 1.136569e-06 9.535811e-04 *
## 21 1.580760e-06 1.263103e-03 *
## 22 8.198166e-06 6.108955e-03 *
## 23 8.373419e-06 6.108955e-03 *
## 24 1.416126e-05 9.901082e-03 *
## 25 1.658287e-05 1.113042e-02 *
## 26 1.996896e-05 1.288766e-02 *
## 27 2.274451e-05 1.413529e-02 *
## 28 3.615393e-05 2.166653e-02 *
## 29 4.513158e-05 2.611407e-02 *
## 30 5.029718e-05 2.722538e-02 *
## 31 4.897894e-05 2.722538e-02 *
It is also possible to view each axis separately.
# Select Axis 1 rows
dat_1 <- subset(dat, Axis=="Axis.1")
head(dat_1[order(dat_1$qval), ], 10)
## Food Axis correlation pval qval
## 1 Water tap Axis.1 -0.84879953 1.049973e-66 1.761854e-62
## 2 Water bottled unsweetened Axis.1 0.80974358 4.280100e-56 3.591004e-52
## 51 Coffee brewed Axis.1 -0.21150806 1.079057e-03 3.550310e-01
## 62 Egg whole boiled or poached Axis.1 -0.19837925 2.199611e-03 5.953140e-01
## 68 Milk NFS Axis.1 0.09053595 1.656542e-01 8.020252e-01
## 78 Milk fat free skim Axis.1 -0.13590236 3.694459e-02 8.020252e-01
## 87 Buttermilk reduced fat 2 Axis.1 0.10484862 1.081434e-01 8.020252e-01
## 90 Buttermilk whole Axis.1 0.10484862 1.081434e-01 8.020252e-01
## 93 Kefir NS as to fat content Axis.1 -0.10772118 9.876618e-02 8.020252e-01
## 109 Rice milk Axis.1 0.10389110 1.114203e-01 8.020252e-01
## Significance
## 1 *
## 2 *
## 51
## 62
## 68
## 78
## 87
## 90
## 93
## 109
# Select Axis 2 rows and sort by qval.
dat_2 <- subset(dat, Axis=="Axis.2")
head(dat_2[order(dat_2$qval), ], 10)
## Food Axis correlation pval qval
## 5 Coffee brewed Axis.2 0.5292725 1.938004e-18 6.503943e-15
## 7 Milk whole Axis.2 0.4815899 4.171242e-15 9.999063e-12
## 8 Milk reduced fat 2 Axis.2 0.4609677 8.100359e-14 1.699050e-10
## 40 Milk fat free skim Axis.2 -0.2317225 3.308411e-04 1.387878e-01
## 45 Tea iced bottled black Axis.2 0.2211288 6.228534e-04 2.322551e-01
## 53 Sugar white granulated or lump Axis.2 0.2063469 1.435065e-03 4.543468e-01
## 58 Coffee creamer powder Axis.2 0.2008213 1.932983e-03 5.592321e-01
## 66 Water tap Axis.2 -0.1920650 3.051289e-03 7.757670e-01
## 85 Buttermilk low fat 1 Axis.2 0.1086787 9.578786e-02 8.020252e-01
## 98 Soy milk Axis.2 -0.0818681 2.101663e-01 8.020252e-01
## Significance
## 5 *
## 7 *
## 8 *
## 40
## 45
## 53
## 58
## 66
## 85
## 98
# Select Axis 3 rows and sort by qval.
dat_3 <- subset(dat, Axis=="Axis.3")
head(dat_3[order(dat_3$qval), ], 10)
## Food Axis correlation pval qval
## 3 Coffee brewed Axis.3 -0.71076052 1.299079e-37 7.266184e-34
## 24 Water tap Axis.3 0.27846636 1.416126e-05 9.901082e-03
## 27 Milk reduced fat 2 Axis.3 0.27197439 2.274451e-05 1.413529e-02
## 47 Apple raw Axis.3 0.22009987 6.613055e-04 2.361001e-01
## 50 Water bottled unsweetened Axis.3 0.21549415 8.618017e-04 2.896430e-01
## 70 Milk whole Axis.3 0.08554006 1.903598e-01 8.020252e-01
## 75 Milk low fat 1 Axis.3 -0.10695949 1.011874e-01 8.020252e-01
## 117 Yogurt low fat milk plain Axis.3 0.10484862 1.081434e-01 8.020252e-01
## 124 Yogurt Greek nonfat milk plain Axis.3 0.08761323 1.797989e-01 8.020252e-01
## 144 Yogurt Greek nonfat milk fruit Axis.3 0.08761323 1.797989e-01 8.020252e-01
## Significance
## 3 *
## 24 *
## 27 *
## 47
## 50
## 70
## 75
## 117
## 124
## 144
# Select Axis 4 rows and sort by qval.
dat_4 <- subset(dat, Axis=="Axis.4")
head(dat_4[order(dat_4$qval), ], 10)
## Food Axis correlation pval
## 10 Coffee brewed Axis.4 0.44532739 6.751299e-13
## 11 Soft drink cola Axis.4 -0.39086753 4.907704e-10
## 13 Water bottled unsweetened Axis.4 0.35949933 1.311234e-08
## 25 Water tap Axis.4 0.27632163 1.658287e-05
## 36 Beer light Axis.4 -0.24342502 1.589345e-04
## 65 Muffin English Axis.4 0.19229201 3.016107e-03
## 71 Milk whole Axis.4 -0.09027923 1.668623e-01
## 94 Kefir NS as to fat content Axis.4 0.08378314 1.996592e-01
## 119 Yogurt Greek low fat milk plain Axis.4 -0.07708049 2.381587e-01
## 129 Yogurt low fat milk fruit Axis.4 0.08509615 1.926789e-01
## qval Significance
## 10 1.132868e-09 *
## 11 7.486480e-07 *
## 13 1.692501e-05 *
## 25 1.113042e-02 *
## 36 7.408112e-02
## 65 7.757670e-01
## 71 8.020252e-01
## 94 8.020252e-01
## 119 8.020252e-01
## 129 8.020252e-01
xxx_AmountSums.txt will be generated again, but its content will be the same regardless of which distance method (weighted or unweighted unifrac or else) was used, as long as the food.ifc_sorted is the same.
CorrAxesFood(food_ifc_soted = "../Foodtree/Food_D12_FC_QC_demo_QCed_males60to79_3Lv.food.ifc_sorted.txt",
AmountSums_out_fn = "Food_D12_FC_QC_demo_QCed_males60to79_3Lv_AmountSums.txt",
qval_threshold = 0.05,
meta_users = "Food_D12_FC_QC_demo_QCed_males60to79_3Lv_ord_UNweighted_meta_users.txt",
corr_axes_foods_outfn = "Food_D12_FC_QC_demo_QCed_males60to79_3Lv_ord_UNweighted_corr_axes_foods_thr0.05.txt")
UNweighted can be viewed in the same way.
dat <- read.delim("Food_D12_FC_QC_demo_QCed_males60to79_3Lv_ord_UNweighted_corr_axes_foods_thr0.05.txt")
Check the number of food items with significant q-values.
nrow(subset(dat, Significance=="*"))
## [1] 78
There are as many as 78 food items at alpha=0.05, so it will be probably better to look at most significant food items or a particular Principal Coordinate Axis of your interest.
Show the first 10 food items that are significantly correlated with one of the axes.
head(subset(dat, Significance=="*"), 10)
## Food Axis correlation
## 1 Water bottled unsweetened Axis.3 0.7070852
## 2 Tomatoes raw Axis.4 -0.6803295
## 3 Ice cream regular flavors other than chocolate Axis.9 0.5728066
## 4 Lettuce raw Axis.4 -0.5633409
## 5 Sugar white granulated or lump Axis.2 0.5475776
## 6 Milk reduced fat 2 Axis.1 -0.5377079
## 7 Cheese American Axis.2 0.5362031
## 8 Coffee brewed Axis.3 -0.5349615
## 9 Milk whole Axis.1 -0.5293084
## 10 Ham prepackaged or deli luncheon meat Axis.10 -0.5131655
## pval qval Significance
## 1 4.448087e-37 7.463889e-33 *
## 2 2.010864e-33 1.687115e-29 *
## 3 5.613733e-22 3.139948e-18 *
## 4 3.657487e-21 1.534316e-17 *
## 5 7.281465e-20 2.443660e-16 *
## 6 4.379006e-19 1.224662e-15 *
## 7 5.727158e-19 1.372882e-15 *
## 8 7.139692e-19 1.497550e-15 *
## 9 1.925940e-18 3.590808e-15 *
## 10 2.964411e-17 4.974282e-14 *
Come back to the main directory.
setwd(main_wd)