In this script, we will analyze correlation between ordination axes values and foods.
Name the path to DietR directory where input files are pulled.
main_wd <- "~/GitHub/DietR"
Load the necessary packages.
library(ggplot2)
Load the necessary functions.
source("lib/specify_data_dir.R")
source("lib/corr.axes.foods.R")
source("lib/ggplot2themes.R")
Call color palette.
distinct100colors <- readRDS("lib/distinct100colors.rda")
You can come back to the main directory by:
setwd(main_wd)
Specify the directory where the data is.
SpecifyDataDirectory(directory.name= "eg_data/VVKAJ/Ordination/")
From sorted IFC table, generate a table of total amount of food consumed by all the individuals, and a table with correlation coefficients, p-values, and q-values with desired threshold between food items and axes that were saved in the ordination section.
Be careful not to confuse WEIGHTED and UNweighted unifrac distances.
CorrAxesFood(food_ifc_soted = "../Foodtree/VVKAJ_Items_f_id_s_m_QCed_4Lv.food.ifc_sorted.txt",
AmountSums_out_fn = "VVKAJ_Items_f_id_s_m_QCed_4Lv_AmountSums.txt",
qval_threshold = 0.05,
meta_users = "VVKAJ_Items_f_id_s_m_QCed_4Lv_ord_WEIGHTED_meta_users.txt",
corr_axes_foods_outfn = "VVKAJ_Items_f_id_s_m_QCed_4Lv_ord_WEIGHTED_corr_axes_foods_thr0.05.txt")
Argument | Description |
---|---|
food_ifc_soted | xxx.food.ifc.sorted.txt file, saved in the ordination section |
AmountSums_out_fn | output filename to be saved which has the total consumption amount of each food |
qval_threshold | q-value threshold to call a correlation significant |
meta_users | xxx.meta_users.txt file, waved in the ordination section |
corr_axes_foods_outfn | output filename to be saved which has the correlation between foods and Axes |
dat <- read.delim("VVKAJ_Items_f_id_s_m_QCed_4Lv_ord_WEIGHTED_corr_axes_foods_thr0.05.txt")
Check the number of food items with significant q-values.
nrow(subset(dat, Significance=="*"))
## [1] 8
There are 8 food items significantlly correlated with one of the axes.
Show only food items that are significantly correlated with one of the axes.
subset(dat, Significance=="*")
## Food Axis correlation pval
## 1 Soybean soup miso broth Axis.1 0.7127549 3.960138e-08
## 2 Tea hot leaf green Axis.1 0.7216284 2.228514e-08
## 3 Rice white cooked fat not added in cooking Axis.1 0.6943809 1.220027e-07
## 4 Soy sauce Axis.1 0.6323384 3.168152e-06
## 5 Avocado raw Axis.2 -0.6235105 4.755867e-06
## 6 Beer Axis.4 0.5795861 3.019255e-05
## 7 Water tap Axis.4 -0.5817230 2.776637e-05
## 8 Nutritional powder mix whey based NFS Axis.2 -0.5678651 4.731394e-05
## qval Significance
## 1 6.950042e-05 *
## 2 6.950042e-05 *
## 3 1.427431e-04 *
## 4 2.780054e-03 *
## 5 3.338619e-03 *
## 6 1.513941e-02 *
## 7 1.513941e-02 *
## 8 2.075899e-02 *
Axis.1 is positively correlated with miso soup, green tea, white rice, and soy sauce. Those are commonly used in Japanese cuisine. So, it makes sense that Japanese-diet consumers are clustered on the right side of the biplot (Axis.1 x Axis.2 plot). Similarly, Axis.2 is negatively correlated with Avocado and Nutritional powder mix, whey based, and they are commonly found in keto (high protein, high fat, low carb) diet. And keto-diet consuming individuals tend to have lower Axis.2 values in the biplot. Correlation between axes and foods can be useful in identifying characteristics of the diet based on clustered individuals on the ordination biplot.
It is also possible to view each axis separately.
# Select Axis 1 rows
dat_1 <- subset(dat, Axis=="Axis.1")
head(dat_1[order(dat_1$qval), ], 10)
## Food Axis correlation pval
## 1 Soybean soup miso broth Axis.1 0.7127549 3.960138e-08
## 2 Tea hot leaf green Axis.1 0.7216284 2.228514e-08
## 3 Rice white cooked fat not added in cooking Axis.1 0.6943809 1.220027e-07
## 4 Soy sauce Axis.1 0.6323384 3.168152e-06
## 9 Water tap Axis.1 -0.5184238 2.642395e-04
## 12 Soybeans dry cooked fat not added in cooking Axis.1 0.4644421 1.309222e-03
## 13 Soybean curd Axis.1 0.4635566 1.341198e-03
## 17 Seaweed cooked NS as to fat added in cooking Axis.1 0.4387608 2.569919e-03
## 18 Milk whole Axis.1 0.4336419 2.921872e-03
## 41 Almond milk unsweetened Axis.1 -0.2984626 4.643090e-02
## qval Significance
## 1 6.950042e-05 *
## 2 6.950042e-05 *
## 3 1.427431e-04 *
## 4 2.780054e-03 *
## 9 1.030534e-01
## 12 3.621234e-01
## 13 3.621234e-01
## 17 5.306128e-01
## 18 5.697650e-01
## 41 6.984895e-01
# Select Axis 2 rows and sort by qval.
dat_2 <- subset(dat, Axis=="Axis.2")
head(dat_2[order(dat_2$qval), ], 10)
## Food Axis correlation pval
## 5 Avocado raw Axis.2 -0.6235105 4.755867e-06
## 8 Nutritional powder mix whey based NFS Axis.2 -0.5678651 4.731394e-05
## 19 Milk NFS Axis.2 -0.2089448 1.683699e-01
## 23 Milk whole Axis.2 -0.2909457 5.250417e-02
## 30 Milk reduced fat 2 Axis.2 -0.3767086 1.075195e-02
## 35 Milk low fat 1 Axis.2 -0.2734407 6.914211e-02
## 60 Cheese Brie Axis.2 -0.2553770 9.043337e-02
## 71 Cheese Mozzarella whole milk Axis.2 -0.2089448 1.683699e-01
## 75 Cheese Mozzarella part skim Axis.2 -0.2321609 1.248796e-01
## 83 Cheese Swiss Axis.2 0.2339003 1.219991e-01
## qval Significance
## 5 0.003338619 *
## 8 0.020758993 *
## 19 0.698489461
## 23 0.698489461
## 30 0.698489461
## 35 0.698489461
## 60 0.698489461
## 71 0.698489461
## 75 0.698489461
## 83 0.698489461
# Select Axis 3 rows and sort by qval.
dat_3 <- subset(dat, Axis=="Axis.3")
head(dat_3[order(dat_3$qval), ], 10)
## Food Axis correlation pval
## 11 Milk low fat 1 Axis.3 -0.4731188 0.001030000
## 20 Milk NFS Axis.3 0.2321609 0.124879574
## 24 Milk whole Axis.3 0.3953723 0.007185917
## 31 Milk reduced fat 2 Axis.3 0.4047371 0.005820053
## 40 Almond milk sweetened Axis.3 -0.2089448 0.168369896
## 61 Cheese Brie Axis.3 -0.2205528 0.145426875
## 67 Cheese Feta Axis.3 -0.2321609 0.124879574
## 72 Cheese Mozzarella whole milk Axis.3 0.2321609 0.124879574
## 76 Cheese Mozzarella part skim Axis.3 0.2089448 0.168369896
## 99 Beef steak braised lean and fat eaten Axis.3 -0.2321609 0.124879574
## qval Significance
## 11 0.3286635
## 20 0.6984895
## 24 0.6984895
## 31 0.6984895
## 40 0.6984895
## 61 0.6984895
## 67 0.6984895
## 72 0.6984895
## 76 0.6984895
## 99 0.6984895
# Select Axis 4 rows and sort by qval.
dat_4 <- subset(dat, Axis=="Axis.4")
head(dat_4[order(dat_4$qval), ], 10)
## Food Axis correlation pval qval
## 6 Beer Axis.4 0.5795861 3.019255e-05 0.01513941
## 7 Water tap Axis.4 -0.5817230 2.776637e-05 0.01513941
## 15 Spinach raw Axis.4 -0.4528371 1.787181e-03 0.39546611
## 25 Milk whole Axis.4 0.2039034 1.791104e-01 0.69848946
## 36 Milk low fat 1 Axis.4 -0.2421731 1.089806e-01 0.69848946
## 42 Almond milk unsweetened Axis.4 -0.3571222 1.602957e-02 0.69848946
## 47 Coconut milk Axis.4 -0.3482679 1.905603e-02 0.69848946
## 55 Yogurt Greek nonfat milk fruit Axis.4 -0.1973367 1.938259e-01 0.69848946
## 58 Cream heavy Axis.4 0.2205528 1.454269e-01 0.69848946
## 62 Cheese Brie Axis.4 0.2089448 1.683699e-01 0.69848946
## Significance
## 6 *
## 7 *
## 15
## 25
## 36
## 42
## 47
## 55
## 58
## 62
xxx_AmountSums.txt will be generated again, but its content will be the same regardless of which distance method (weighted or unweighted unifrac or else) was used, as long as the food.ifc_sorted is the same.
CorrAxesFood(food_ifc_soted = "../Foodtree/VVKAJ_Items_f_id_s_m_QCed_4Lv.food.ifc_sorted.txt",
AmountSums_out_fn = "VVKAJ_Items_f_id_s_m_QCed_4Lv_AmountSums.txt",
qval_threshold = 0.05,
meta_users = "VVKAJ_Items_f_id_s_m_QCed_4Lv_ord_UNweighted_meta_users.txt",
corr_axes_foods_outfn = "VVKAJ_Items_f_id_s_m_QCed_4Lv_ord_UNweighted_corr_axes_foods_thr0.05.txt")
UNweighted can be viewed in the same way.
dat <- read.delim("VVKAJ_Items_f_id_s_m_QCed_4Lv_ord_UNweighted_corr_axes_foods_thr0.05.txt")
Check the number of food items with significant q-values.
nrow(subset(dat, Significance=="*"))
## [1] 6
There are 6 food items significantlly correlated with one of the axes.
Show only food items that are significantly correlated with one of the axes.
subset(dat, Significance=="*")
## Food Axis correlation pval
## 1 Soybean soup miso broth Axis.1 0.7110313 4.417103e-08
## 2 Tea hot leaf green Axis.1 0.7087858 5.086451e-08
## 3 Rice white cooked fat not added in cooking Axis.1 0.6853274 2.061340e-07
## 4 Avocado raw Axis.2 -0.6763446 3.406440e-07
## 5 Soy sauce Axis.1 0.6570469 9.471300e-07
## 6 Beer Axis.3 -0.5547960 7.655434e-05
## qval Significance
## 1 8.926721e-05 *
## 2 8.926721e-05 *
## 3 2.411768e-04 *
## 4 2.989151e-04 *
## 5 6.648852e-04 *
## 6 4.478429e-02 *
Come back to the main directory before you start running another script.
setwd(main_wd)