Introduction

In this script, we will analyze correlation between ordination axes values and foods.


Load functions and packages

Name the path to DietR directory where input files are pulled.

main_wd <- "~/GitHub/DietR"

Load the necessary packages.

library(ggplot2)

Load the necessary functions.

source("lib/specify_data_dir.R")
source("lib/corr.axes.foods.R")
source("lib/ggplot2themes.R")

Call color palette.

distinct100colors <- readRDS("lib/distinct100colors.rda")

You can come back to the main directory by:

setwd(main_wd)

Specify the directory where the data is.

SpecifyDataDirectory(directory.name= "eg_data/VVKAJ/Ordination/")


Weighted unifrac distance ordination results

From sorted IFC table, generate a table of total amount of food consumed by all the individuals, and a table with correlation coefficients, p-values, and q-values with desired threshold between food items and axes that were saved in the ordination section.
Be careful not to confuse WEIGHTED and UNweighted unifrac distances.

CorrAxesFood(food_ifc_soted = "../Foodtree/VVKAJ_Items_f_id_s_m_QCed_4Lv.food.ifc_sorted.txt", 
             AmountSums_out_fn = "VVKAJ_Items_f_id_s_m_QCed_4Lv_AmountSums.txt",
             qval_threshold = 0.05,
             meta_users =            "VVKAJ_Items_f_id_s_m_QCed_4Lv_ord_WEIGHTED_meta_users.txt",
             corr_axes_foods_outfn = "VVKAJ_Items_f_id_s_m_QCed_4Lv_ord_WEIGHTED_corr_axes_foods_thr0.05.txt")
Argument Description
food_ifc_soted xxx.food.ifc.sorted.txt file, saved in the ordination section
AmountSums_out_fn output filename to be saved which has the total consumption amount of each food
qval_threshold q-value threshold to call a correlation significant
meta_users xxx.meta_users.txt file, waved in the ordination section
corr_axes_foods_outfn output filename to be saved which has the correlation between foods and Axes



Load and analyze the output

dat <- read.delim("VVKAJ_Items_f_id_s_m_QCed_4Lv_ord_WEIGHTED_corr_axes_foods_thr0.05.txt")

Check the number of food items with significant q-values.

nrow(subset(dat, Significance=="*"))
## [1] 8

There are 8 food items significantlly correlated with one of the axes.

Show only food items that are significantly correlated with one of the axes.

subset(dat, Significance=="*")
##                                         Food   Axis correlation         pval
## 1                    Soybean soup miso broth Axis.1   0.7127549 3.960138e-08
## 2                         Tea hot leaf green Axis.1   0.7216284 2.228514e-08
## 3 Rice white cooked fat not added in cooking Axis.1   0.6943809 1.220027e-07
## 4                                  Soy sauce Axis.1   0.6323384 3.168152e-06
## 5                                Avocado raw Axis.2  -0.6235105 4.755867e-06
## 6                                       Beer Axis.4   0.5795861 3.019255e-05
## 7                                  Water tap Axis.4  -0.5817230 2.776637e-05
## 8      Nutritional powder mix whey based NFS Axis.2  -0.5678651 4.731394e-05
##           qval Significance
## 1 6.950042e-05            *
## 2 6.950042e-05            *
## 3 1.427431e-04            *
## 4 2.780054e-03            *
## 5 3.338619e-03            *
## 6 1.513941e-02            *
## 7 1.513941e-02            *
## 8 2.075899e-02            *

Axis.1 is positively correlated with miso soup, green tea, white rice, and soy sauce. Those are commonly used in Japanese cuisine. So, it makes sense that Japanese-diet consumers are clustered on the right side of the biplot (Axis.1 x Axis.2 plot). Similarly, Axis.2 is negatively correlated with Avocado and Nutritional powder mix, whey based, and they are commonly found in keto (high protein, high fat, low carb) diet. And keto-diet consuming individuals tend to have lower Axis.2 values in the biplot. Correlation between axes and foods can be useful in identifying characteristics of the diet based on clustered individuals on the ordination biplot.

It is also possible to view each axis separately.

# Select Axis 1 rows
dat_1 <- subset(dat, Axis=="Axis.1")
head(dat_1[order(dat_1$qval), ], 10)
##                                            Food   Axis correlation         pval
## 1                       Soybean soup miso broth Axis.1   0.7127549 3.960138e-08
## 2                            Tea hot leaf green Axis.1   0.7216284 2.228514e-08
## 3    Rice white cooked fat not added in cooking Axis.1   0.6943809 1.220027e-07
## 4                                     Soy sauce Axis.1   0.6323384 3.168152e-06
## 9                                     Water tap Axis.1  -0.5184238 2.642395e-04
## 12 Soybeans dry cooked fat not added in cooking Axis.1   0.4644421 1.309222e-03
## 13                                 Soybean curd Axis.1   0.4635566 1.341198e-03
## 17 Seaweed cooked NS as to fat added in cooking Axis.1   0.4387608 2.569919e-03
## 18                                   Milk whole Axis.1   0.4336419 2.921872e-03
## 41                      Almond milk unsweetened Axis.1  -0.2984626 4.643090e-02
##            qval Significance
## 1  6.950042e-05            *
## 2  6.950042e-05            *
## 3  1.427431e-04            *
## 4  2.780054e-03            *
## 9  1.030534e-01             
## 12 3.621234e-01             
## 13 3.621234e-01             
## 17 5.306128e-01             
## 18 5.697650e-01             
## 41 6.984895e-01
# Select Axis 2 rows and sort by qval.  
dat_2 <- subset(dat, Axis=="Axis.2")
head(dat_2[order(dat_2$qval), ], 10)
##                                     Food   Axis correlation         pval
## 5                            Avocado raw Axis.2  -0.6235105 4.755867e-06
## 8  Nutritional powder mix whey based NFS Axis.2  -0.5678651 4.731394e-05
## 19                              Milk NFS Axis.2  -0.2089448 1.683699e-01
## 23                            Milk whole Axis.2  -0.2909457 5.250417e-02
## 30                   Milk reduced fat 2  Axis.2  -0.3767086 1.075195e-02
## 35                       Milk low fat 1  Axis.2  -0.2734407 6.914211e-02
## 60                           Cheese Brie Axis.2  -0.2553770 9.043337e-02
## 71          Cheese Mozzarella whole milk Axis.2  -0.2089448 1.683699e-01
## 75           Cheese Mozzarella part skim Axis.2  -0.2321609 1.248796e-01
## 83                          Cheese Swiss Axis.2   0.2339003 1.219991e-01
##           qval Significance
## 5  0.003338619            *
## 8  0.020758993            *
## 19 0.698489461             
## 23 0.698489461             
## 30 0.698489461             
## 35 0.698489461             
## 60 0.698489461             
## 71 0.698489461             
## 75 0.698489461             
## 83 0.698489461
# Select Axis 3 rows and sort by qval.  
dat_3 <- subset(dat, Axis=="Axis.3")
head(dat_3[order(dat_3$qval), ], 10)
##                                     Food   Axis correlation        pval
## 11                       Milk low fat 1  Axis.3  -0.4731188 0.001030000
## 20                              Milk NFS Axis.3   0.2321609 0.124879574
## 24                            Milk whole Axis.3   0.3953723 0.007185917
## 31                   Milk reduced fat 2  Axis.3   0.4047371 0.005820053
## 40                 Almond milk sweetened Axis.3  -0.2089448 0.168369896
## 61                           Cheese Brie Axis.3  -0.2205528 0.145426875
## 67                           Cheese Feta Axis.3  -0.2321609 0.124879574
## 72          Cheese Mozzarella whole milk Axis.3   0.2321609 0.124879574
## 76           Cheese Mozzarella part skim Axis.3   0.2089448 0.168369896
## 99 Beef steak braised lean and fat eaten Axis.3  -0.2321609 0.124879574
##         qval Significance
## 11 0.3286635             
## 20 0.6984895             
## 24 0.6984895             
## 31 0.6984895             
## 40 0.6984895             
## 61 0.6984895             
## 67 0.6984895             
## 72 0.6984895             
## 76 0.6984895             
## 99 0.6984895
# Select Axis 4 rows and sort by qval.  
dat_4 <- subset(dat, Axis=="Axis.4")
head(dat_4[order(dat_4$qval), ], 10)
##                              Food   Axis correlation         pval       qval
## 6                            Beer Axis.4   0.5795861 3.019255e-05 0.01513941
## 7                       Water tap Axis.4  -0.5817230 2.776637e-05 0.01513941
## 15                    Spinach raw Axis.4  -0.4528371 1.787181e-03 0.39546611
## 25                     Milk whole Axis.4   0.2039034 1.791104e-01 0.69848946
## 36                Milk low fat 1  Axis.4  -0.2421731 1.089806e-01 0.69848946
## 42        Almond milk unsweetened Axis.4  -0.3571222 1.602957e-02 0.69848946
## 47                   Coconut milk Axis.4  -0.3482679 1.905603e-02 0.69848946
## 55 Yogurt Greek nonfat milk fruit Axis.4  -0.1973367 1.938259e-01 0.69848946
## 58                    Cream heavy Axis.4   0.2205528 1.454269e-01 0.69848946
## 62                    Cheese Brie Axis.4   0.2089448 1.683699e-01 0.69848946
##    Significance
## 6             *
## 7             *
## 15             
## 25             
## 36             
## 42             
## 47             
## 55             
## 58             
## 62


Unweighted unifrac distance ordination results

xxx_AmountSums.txt will be generated again, but its content will be the same regardless of which distance method (weighted or unweighted unifrac or else) was used, as long as the food.ifc_sorted is the same.

CorrAxesFood(food_ifc_soted = "../Foodtree/VVKAJ_Items_f_id_s_m_QCed_4Lv.food.ifc_sorted.txt",
             AmountSums_out_fn = "VVKAJ_Items_f_id_s_m_QCed_4Lv_AmountSums.txt",
             qval_threshold = 0.05,
             meta_users =            "VVKAJ_Items_f_id_s_m_QCed_4Lv_ord_UNweighted_meta_users.txt",
             corr_axes_foods_outfn = "VVKAJ_Items_f_id_s_m_QCed_4Lv_ord_UNweighted_corr_axes_foods_thr0.05.txt")



Load and analyze the output

UNweighted can be viewed in the same way.

dat <- read.delim("VVKAJ_Items_f_id_s_m_QCed_4Lv_ord_UNweighted_corr_axes_foods_thr0.05.txt")

Check the number of food items with significant q-values.

nrow(subset(dat, Significance=="*"))
## [1] 6

There are 6 food items significantlly correlated with one of the axes.

Show only food items that are significantly correlated with one of the axes.

subset(dat, Significance=="*")
##                                         Food   Axis correlation         pval
## 1                    Soybean soup miso broth Axis.1   0.7110313 4.417103e-08
## 2                         Tea hot leaf green Axis.1   0.7087858 5.086451e-08
## 3 Rice white cooked fat not added in cooking Axis.1   0.6853274 2.061340e-07
## 4                                Avocado raw Axis.2  -0.6763446 3.406440e-07
## 5                                  Soy sauce Axis.1   0.6570469 9.471300e-07
## 6                                       Beer Axis.3  -0.5547960 7.655434e-05
##           qval Significance
## 1 8.926721e-05            *
## 2 8.926721e-05            *
## 3 2.411768e-04            *
## 4 2.989151e-04            *
## 5 6.648852e-04            *
## 6 4.478429e-02            *



Come back to the main directory before you start running another script.

setwd(main_wd)