In this tutorial, we will use mock data from the VVKAJ dataset that was created with ASA24. VVKAJ stands for Vegetarian, Vegan, Keto, American, Japanese and was designed because these different eating patterns reflect differences that are often seen in real data. This mock dataset contains dietary for 15 mock participants who report dietary data while following the 5 different dietary patterns (VVKAJ) for three days. There are a total of XX dietary records in this dataset.
ASA24 data includes the following files: _Items.csv, _INS.csv, _Responses.csv, _TNS.csv, Totals.csv, and TS.csv. Refer to the ASA24 Reserchers’ website for specific explanations for each file, but for the purpose of this tutorial, we focus on using the Items.csv, which has all the food items reported by the participants.
In this script, you will:
Name the path to DietR directory where input files are pulled.
main_wd <- "~/GitHub/DietR"
Load the necessary functions.
source("lib/specify_data_dir.R")
source("lib/load_clean_ASA24.R")
source("lib/Food_tree_scripts/format.foods_2.r")
source("lib/QCOutliers.R")
source("lib/average.by.R")
You can come back to the main directory by:
setwd(main_wd)
Specify the directory where the data is.
SpecifyDataDirectory(directory.name= "eg_data/VVKAJ/")
Load your unprocessed (raw) food items-level data (as downloaded from the ASA24 study website). The csv file will be loaded as a dataframe in R and be named as items_raw.
items_raw <- read.csv("Raw_data/VVKAJ_Items.csv", sep = ",", header=T)
items_raw
has a column called “Food_Description”, but this needs to be changed to “Main.food.description”. Change the column name.
names(items_raw)[names(items_raw) == "Food_Description"] <- "Main.food.description"
Check if any column names match with “Main.food.description”. If there is a match, it will be printed.
names(items_raw)[names(items_raw) == "Main.food.description"]
## [1] "Main.food.description"
[NOTE] The numbers in the square brackets of the output indicate the sequential number of each element to help count the number of elements.
Save the items file as a .txt file. This command saves the object “items_raw” as a .txt file with the specified filename using the write.table function.
write.table(items_raw, "VVKAJ_Items.txt", sep="\t", row.names=F)
Special characters common in food names in dietary data such as "
, '
, ,
, %
may interfere correct data loading in R; thus, we replace them with an underscore _
.
Format foods so that special characters will be replaced with “_“.”_f” stands for “formatted”.
FormatFoods(input_fn = "VVKAJ_Items.txt",
output_fn = "VVKAJ_Items_f.txt")
[NOTE] It is best practice to avoid overwriting your raw data. Always save formatted/manipulated versions as a new file as described above.
Load the Items_f.txt file to take a look at it.
You need the quote=""
and colClasses="character"
arguments to ignore quotation marks (do not regard them as a cell separator) and to load all the columns as characters so that FoodID will keep the trailing “.0”.
items_f <- read.delim("VVKAJ_Items_f.txt", quote="", colClasses="character")
All special characters in the items data should have been replaced with an underscore in the Main.food.description column, the 3rd from the last column of the items_f. We can confirm that by using the head function, which shows the first six rows of the specified dataset by default. In this website version of tutorial, output is folded for visual clarity. You can click and expand the results.
head(items_f)
## RecallRecId UserName
## 1 41fee4cf-783f-469b-aadf-62c7e2cd33a9 VVKAJ101
## 2 41fee4cf-783f-469b-aadf-62c7e2cd33a9 VVKAJ101
## 3 41fee4cf-783f-469b-aadf-62c7e2cd33a9 VVKAJ101
## 4 41fee4cf-783f-469b-aadf-62c7e2cd33a9 VVKAJ101
## 5 41fee4cf-783f-469b-aadf-62c7e2cd33a9 VVKAJ101
## 6 41fee4cf-783f-469b-aadf-62c7e2cd33a9 VVKAJ101
## UserID RecallNo RecallAttempt RecallStatus
## 1 7bd05142-312e-4648-b1ac-fc258540af52 1 0 2
## 2 7bd05142-312e-4648-b1ac-fc258540af52 1 0 2
## 3 7bd05142-312e-4648-b1ac-fc258540af52 1 0 2
## 4 7bd05142-312e-4648-b1ac-fc258540af52 1 0 2
## 5 7bd05142-312e-4648-b1ac-fc258540af52 1 0 2
## 6 7bd05142-312e-4648-b1ac-fc258540af52 1 0 2
## IntakeStartDateTime IntakeEndDateTime ReportingDate Lang Occ_No
## 1 11/5/2021 0:00 11/5/2021 23:59 11/6/2021 1 1
## 2 11/5/2021 0:00 11/5/2021 23:59 11/6/2021 1 1
## 3 11/5/2021 0:00 11/5/2021 23:59 11/6/2021 1 1
## 4 11/5/2021 0:00 11/5/2021 23:59 11/6/2021 1 1
## 5 11/5/2021 0:00 11/5/2021 23:59 11/6/2021 1 2
## 6 11/5/2021 0:00 11/5/2021 23:59 11/6/2021 1 3
## Occ_Time Occ_Name EatWith WatchTVuseComputer Location FoodNum FoodType
## 1 11/5/2021 7:00 1 <NA> <NA> 1 1 1
## 2 11/5/2021 7:00 1 <NA> <NA> 1 2 1
## 3 11/5/2021 7:00 1 <NA> <NA> 1 3 2
## 4 11/5/2021 7:00 1 <NA> <NA> 1 4 1
## 5 11/5/2021 10:00 7 <NA> <NA> 1 5 1
## 6 11/5/2021 12:00 3 <NA> <NA> 1 6 1
## FoodSrce CodeNum FoodCode ModCode HowMany SubCode PortionCode FoodAmt KCAL
## 1 <NA> 1 92400000 0 9.6 0 30001 297.6 124.992
## 2 <NA> 2 57231250 0 1 0 10205 104 419.12
## 3 <NA> 3 91302010 0 0.5 0 21000 10.5 31.92
## 4 <NA> 4 63200200 0 0.5 0 10205 98.5 50.235
## 5 <NA> 5 94000100 0 8.4 0 30000 252 0
## 6 <NA> 6 32103000 0 0.5 0 10205 111 285.27
## PROT TFAT CARB MOIS ALC CAFF THEO SUGR FIBE CALC
## 1 0 0.744 30.83136 265.93536 0 26.784 0 29.58144 0 2.976
## 2 9.36 10.92 75.92 5.408 0 0 0 16.016 10.088 40.56
## 3 0.0315 0 8.652 1.7955 0 0 0 8.6226 0.021 0.63
## 4 0.4137 0.6304 11.98745 85.29115 0 0 0 8.32325 2.6595 7.88
## 5 0 0 0 251.748 0 0 0 0 0 7.56
## 6 11.3886 25.6854 1.1211 70.9956 0 0 0 1.1211 0 46.62
## IRON MAGN PHOS POTA SODI ZINC COPP SELE VC VB1
## 1 0.05952 0 26.784 14.88 8.928 0.26784 0.020832 0.2976 0 0
## 2 17.992 101.92 307.84 350.48 298.48 2.392 0.416 43.472 0.208 0.728
## 3 0.0441 0.21 0.42 5.46 0.42 0.0231 0.00378 0.084 0.0525 0
## 4 0.1773 4.925 10.835 53.19 0.985 0.06895 0.032505 0.0985 2.4625 0.03152
## 5 0 2.52 0 0 10.08 0.0252 0.0252 0 0 0
## 6 1.11 8.88 157.62 116.55 420.69 0.9657 0.01554 27.861 0 0.06105
## VB2 NIAC VB6 FOLA FA FF FDFE VB12 VARA RET
## 1 0 0 0 0 0 0 0 0 0 0
## 2 0.832 9.984 1.04 199.68 179.92 19.76 326.56 3.016 447.2 447.2
## 3 0.00399 0.012705 0.00252 0.21 0 0.21 0.21 0 0 0
## 4 0.036445 0.5122 0.058115 6.895 0 6.895 6.895 0 1.97 0
## 5 0 0 0 0 0 0 0 0 0 0
## 6 0.45954 0.05661 0.10878 39.96 0 39.96 39.96 1.0101 135.42 134.31
## BCAR ACAR CRYP LYCO LZ ATOC VK CHOLE SFAT S040 S060 S080
## 1 0 0 0 0 0 0 0 0 0 0 0 0
## 2 5.2 0 1.04 0 142.48 0.7696 2.288 0 1.04 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0
## 4 27.58 0 0 0 66.98 0.4728 16.154 0 0.052205 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0
## 6 11.1 0 9.99 0 316.35 1.6317 35.631 340.77 5.44344 0 0 0.00222
## S100 S120 S140 S160 S180 MFAT M161 M181 M201
## 1 0 0 0 0 0 0 0 0 0
## 2 0 0.00208 0.00208 0.80392 0.20592 6.032 0.00312 5.98936 0.03744
## 3 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0.030535 0.010835 0.089635 0.00197 0.085695 0
## 5 0 0 0 0 0 0 0 0 0
## 6 0.00222 0.00222 0.04329 3.80397 1.40859 7.27938 0.29415 6.882 0.07881
## M221 PFAT P182 P183 P184 P204 P205 P225 P226 VITD
## 1 0 0 0 0 0 0 0 0 0 0
## 2 0.00104 3.224 3.05136 0.17264 0 0 0 0 0 1.976
## 3 0 0 0 0 0 0 0 0 0 0
## 4 0 0.274815 0.164495 0.11032 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0
## 6 0.00777 10.96014 9.55599 1.21545 0 0.14319 0.00444 0 0.03441 1.998
## CHOLN VITE_ADD B12_ADD F_TOTAL F_CITMLB F_OTHER F_JUICE V_TOTAL V_DRKGR
## 1 0.8928 0 0 0 0 0 0 0 0
## 2 29.952 0 3.016 0 0 0 0 0 0
## 3 0.231 0 0 0 0 0 0 0 0
## 4 5.0235 0 0 0.65995 0.65995 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0
## 6 268.398 0 0 0 0 0 0 0 0
## V_REDOR_TOTAL V_REDOR_TOMATO V_REDOR_OTHER V_STARCHY_TOTAL V_STARCHY_POTATO
## 1 0 0 0 0 0
## 2 0 0 0 0 0
## 3 0 0 0 0 0
## 4 0 0 0 0 0
## 5 0 0 0 0 0
## 6 0 0 0 0 0
## V_STARCHY_OTHER V_OTHER V_LEGUMES G_TOTAL G_WHOLE G_REFINED PF_TOTAL
## 1 0 0 0 0 0 0 0
## 2 0 0 0 2.7456 2.652 0.0936 0.7384
## 3 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 1.8093
## PF_MPS_TOTAL PF_MEAT PF_CUREDMEAT PF_ORGAN PF_POULT PF_SEAFD_HI PF_SEAFD_LOW
## 1 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0
## PF_EGGS PF_SOY PF_NUTSDS PF_LEGUMES D_TOTAL D_MILK D_YOGURT D_CHEESE OILS
## 1 0 0 0 0 0 0 0 0 0
## 2 0 0 0.7384 0 0 0 0 0 6.0736
## 3 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0
## 6 1.8093 0 0 0 0 0 0 0 15.8508
## SOLID_FATS ADD_SUGARS A_DRINKS FoodComp
## 1 0 7.05312 0 1
## 2 0 3.432 0 1
## 3 0 2.05275 0 1
## 4 0 0 0 1
## 5 0 0 0 1
## 6 4.8507 0 0 1
## Main.food.description
## 1 Soft_drink_NFS
## 2 Cereal_Post_Great_Grains_Double_Pecan_Whole_Grain_Cereal_
## 3 Honey
## 4 Berries_frozen_NFS
## 5 Water_tap
## 6 Egg_salad_made_with_mayonnaise
## Old.Main.food.description FoodID
## 1 Soft drink, NFS 92400000.0
## 2 Cereal (Post Great Grains Double Pecan Whole Grain Cereal) 57231250.0
## 3 Honey 91302010.0
## 4 Berries, frozen, NFS 63200200.0
## 5 Water, tap 94000100.0
## 6 Egg salad, made with mayonnaise 32103000.0
Add a human-readable sample identifier (SampleID) with a desired prefix, and save it as a .txt file. SampleIDs are IDs unique to each combination of users and day and represent days of dietary intake in this dataset.
AddSampleIDtoItems(input.fn="VVKAJ_Items_f.txt", user.name="UserName", recall.no="RecallNo",
prefix="vvkaj.", out.fn="VVKAJ_Items_f_id.txt")
Load the formatted Items file with SampleID added.
items_f_id <- read.delim("VVKAJ_Items_f_id.txt", quote="", colClasses="character")
A combination of the specified prefix and sequential number (vvkaj.00001) should be added in the SampleID column, the first column of the items_f_id dataframe. You will probably need to scroll up the output a little bit in the console to view the first column.
head(items_f_id)
## SampleID RecallRecId UserName
## 1 vvkaj.00001 41fee4cf-783f-469b-aadf-62c7e2cd33a9 VVKAJ101
## 2 vvkaj.00001 41fee4cf-783f-469b-aadf-62c7e2cd33a9 VVKAJ101
## 3 vvkaj.00001 41fee4cf-783f-469b-aadf-62c7e2cd33a9 VVKAJ101
## 4 vvkaj.00001 41fee4cf-783f-469b-aadf-62c7e2cd33a9 VVKAJ101
## 5 vvkaj.00001 41fee4cf-783f-469b-aadf-62c7e2cd33a9 VVKAJ101
## 6 vvkaj.00001 41fee4cf-783f-469b-aadf-62c7e2cd33a9 VVKAJ101
## UserID RecallNo RecallAttempt RecallStatus
## 1 7bd05142-312e-4648-b1ac-fc258540af52 1 0 2
## 2 7bd05142-312e-4648-b1ac-fc258540af52 1 0 2
## 3 7bd05142-312e-4648-b1ac-fc258540af52 1 0 2
## 4 7bd05142-312e-4648-b1ac-fc258540af52 1 0 2
## 5 7bd05142-312e-4648-b1ac-fc258540af52 1 0 2
## 6 7bd05142-312e-4648-b1ac-fc258540af52 1 0 2
## IntakeStartDateTime IntakeEndDateTime ReportingDate Lang Occ_No
## 1 11/5/2021 0:00 11/5/2021 23:59 11/6/2021 1 1
## 2 11/5/2021 0:00 11/5/2021 23:59 11/6/2021 1 1
## 3 11/5/2021 0:00 11/5/2021 23:59 11/6/2021 1 1
## 4 11/5/2021 0:00 11/5/2021 23:59 11/6/2021 1 1
## 5 11/5/2021 0:00 11/5/2021 23:59 11/6/2021 1 2
## 6 11/5/2021 0:00 11/5/2021 23:59 11/6/2021 1 3
## Occ_Time Occ_Name EatWith WatchTVuseComputer Location FoodNum FoodType
## 1 11/5/2021 7:00 1 <NA> <NA> 1 1 1
## 2 11/5/2021 7:00 1 <NA> <NA> 1 2 1
## 3 11/5/2021 7:00 1 <NA> <NA> 1 3 2
## 4 11/5/2021 7:00 1 <NA> <NA> 1 4 1
## 5 11/5/2021 10:00 7 <NA> <NA> 1 5 1
## 6 11/5/2021 12:00 3 <NA> <NA> 1 6 1
## FoodSrce CodeNum FoodCode ModCode HowMany SubCode PortionCode FoodAmt KCAL
## 1 <NA> 1 92400000 0 9.6 0 30001 297.6 124.992
## 2 <NA> 2 57231250 0 1 0 10205 104 419.12
## 3 <NA> 3 91302010 0 0.5 0 21000 10.5 31.92
## 4 <NA> 4 63200200 0 0.5 0 10205 98.5 50.235
## 5 <NA> 5 94000100 0 8.4 0 30000 252 0
## 6 <NA> 6 32103000 0 0.5 0 10205 111 285.27
## PROT TFAT CARB MOIS ALC CAFF THEO SUGR FIBE CALC
## 1 0 0.744 30.83136 265.93536 0 26.784 0 29.58144 0 2.976
## 2 9.36 10.92 75.92 5.408 0 0 0 16.016 10.088 40.56
## 3 0.0315 0 8.652 1.7955 0 0 0 8.6226 0.021 0.63
## 4 0.4137 0.6304 11.98745 85.29115 0 0 0 8.32325 2.6595 7.88
## 5 0 0 0 251.748 0 0 0 0 0 7.56
## 6 11.3886 25.6854 1.1211 70.9956 0 0 0 1.1211 0 46.62
## IRON MAGN PHOS POTA SODI ZINC COPP SELE VC VB1
## 1 0.05952 0 26.784 14.88 8.928 0.26784 0.020832 0.2976 0 0
## 2 17.992 101.92 307.84 350.48 298.48 2.392 0.416 43.472 0.208 0.728
## 3 0.0441 0.21 0.42 5.46 0.42 0.0231 0.00378 0.084 0.0525 0
## 4 0.1773 4.925 10.835 53.19 0.985 0.06895 0.032505 0.0985 2.4625 0.03152
## 5 0 2.52 0 0 10.08 0.0252 0.0252 0 0 0
## 6 1.11 8.88 157.62 116.55 420.69 0.9657 0.01554 27.861 0 0.06105
## VB2 NIAC VB6 FOLA FA FF FDFE VB12 VARA RET
## 1 0 0 0 0 0 0 0 0 0 0
## 2 0.832 9.984 1.04 199.68 179.92 19.76 326.56 3.016 447.2 447.2
## 3 0.00399 0.012705 0.00252 0.21 0 0.21 0.21 0 0 0
## 4 0.036445 0.5122 0.058115 6.895 0 6.895 6.895 0 1.97 0
## 5 0 0 0 0 0 0 0 0 0 0
## 6 0.45954 0.05661 0.10878 39.96 0 39.96 39.96 1.0101 135.42 134.31
## BCAR ACAR CRYP LYCO LZ ATOC VK CHOLE SFAT S040 S060 S080
## 1 0 0 0 0 0 0 0 0 0 0 0 0
## 2 5.2 0 1.04 0 142.48 0.7696 2.288 0 1.04 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0
## 4 27.58 0 0 0 66.98 0.4728 16.154 0 0.052205 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0
## 6 11.1 0 9.99 0 316.35 1.6317 35.631 340.77 5.44344 0 0 0.00222
## S100 S120 S140 S160 S180 MFAT M161 M181 M201
## 1 0 0 0 0 0 0 0 0 0
## 2 0 0.00208 0.00208 0.80392 0.20592 6.032 0.00312 5.98936 0.03744
## 3 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0.030535 0.010835 0.089635 0.00197 0.085695 0
## 5 0 0 0 0 0 0 0 0 0
## 6 0.00222 0.00222 0.04329 3.80397 1.40859 7.27938 0.29415 6.882 0.07881
## M221 PFAT P182 P183 P184 P204 P205 P225 P226 VITD
## 1 0 0 0 0 0 0 0 0 0 0
## 2 0.00104 3.224 3.05136 0.17264 0 0 0 0 0 1.976
## 3 0 0 0 0 0 0 0 0 0 0
## 4 0 0.274815 0.164495 0.11032 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0
## 6 0.00777 10.96014 9.55599 1.21545 0 0.14319 0.00444 0 0.03441 1.998
## CHOLN VITE_ADD B12_ADD F_TOTAL F_CITMLB F_OTHER F_JUICE V_TOTAL V_DRKGR
## 1 0.8928 0 0 0 0 0 0 0 0
## 2 29.952 0 3.016 0 0 0 0 0 0
## 3 0.231 0 0 0 0 0 0 0 0
## 4 5.0235 0 0 0.65995 0.65995 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0
## 6 268.398 0 0 0 0 0 0 0 0
## V_REDOR_TOTAL V_REDOR_TOMATO V_REDOR_OTHER V_STARCHY_TOTAL V_STARCHY_POTATO
## 1 0 0 0 0 0
## 2 0 0 0 0 0
## 3 0 0 0 0 0
## 4 0 0 0 0 0
## 5 0 0 0 0 0
## 6 0 0 0 0 0
## V_STARCHY_OTHER V_OTHER V_LEGUMES G_TOTAL G_WHOLE G_REFINED PF_TOTAL
## 1 0 0 0 0 0 0 0
## 2 0 0 0 2.7456 2.652 0.0936 0.7384
## 3 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 1.8093
## PF_MPS_TOTAL PF_MEAT PF_CUREDMEAT PF_ORGAN PF_POULT PF_SEAFD_HI PF_SEAFD_LOW
## 1 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0
## PF_EGGS PF_SOY PF_NUTSDS PF_LEGUMES D_TOTAL D_MILK D_YOGURT D_CHEESE OILS
## 1 0 0 0 0 0 0 0 0 0
## 2 0 0 0.7384 0 0 0 0 0 6.0736
## 3 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0
## 6 1.8093 0 0 0 0 0 0 0 15.8508
## SOLID_FATS ADD_SUGARS A_DRINKS FoodComp
## 1 0 7.05312 0 1
## 2 0 3.432 0 1
## 3 0 2.05275 0 1
## 4 0 0 0 1
## 5 0 0 0 1
## 6 4.8507 0 0 1
## Main.food.description
## 1 Soft_drink_NFS
## 2 Cereal_Post_Great_Grains_Double_Pecan_Whole_Grain_Cereal_
## 3 Honey
## 4 Berries_frozen_NFS
## 5 Water_tap
## 6 Egg_salad_made_with_mayonnaise
## Old.Main.food.description FoodID
## 1 Soft drink, NFS 92400000.0
## 2 Cereal (Post Great Grains Double Pecan Whole Grain Cereal) 57231250.0
## 3 Honey 91302010.0
## 4 Berries, frozen, NFS 63200200.0
## 5 Water, tap 94000100.0
## 6 Egg salad, made with mayonnaise 32103000.0
Ensure your items file has the expected dimensions (number of rows x number of columns, shown as number of obs. and number of variables) in the environment window of R Studio. Or by using dim(items_f_id) and dim(items_raw). Note that items_f_id has 3 more columns than items_raw because new columns of FoodID, Old.Main.food.description, and SampleID have been added.
dim(items_f_id)
## [1] 779 133
dim(items_raw)
## [1] 779 130
individuals_to_remove.txt
to filter out users marked as Remove = yesLoad your metadata that has information about which UserName(s) to remove.
ind_to_rm <- read.delim("individuals_to_remove.txt")
Metadata for this purpose (ind_to_rm) has UserName and which one to be removed.
ind_to_rm
## UserName Remove
## 1 VVKAJ101
## 2 VVKAJ102
## 3 VVKAJ103
## 4 VVKAJ104
## 5 VVKAJ105
## 6 VVKAJ106
## 7 VVKAJ107
## 8 VVKAJ108
## 9 VVKAJ109
## 10 VVKAJ110
## 11 VVKAJ111
## 12 VVKAJ112
## 13 VVKAJ113
## 14 VVKAJ114
## 15 VVKAJ115
## 16 VVKAJ116 yes
## 17 VVKAJ117
Show which has “yes” in the “Remove” column.
subset(ind_to_rm, Remove == "yes")
## UserName Remove
## 16 VVKAJ116 yes
As shown in the console, the user named “VVKAJ116” is marked to be removed. VVKAJ116 has only 1 day of data, which may not be complete, thus it is marked as an individual to remove. However, be careful when deleting a datapoint from your study and never remove individuals from the raw dataset, to ensure you can always go back and include them if desired.
Remove the specified individuals.
The output will be saved as a text file with the specified name. This assumes the usernames are in UserName column, and will print which user(s) will be removed.
RemoveRows(data=items_f_id, metadata.file= ind_to_rm,
output.name= "VVKAJ_Items_f_id_s.txt")
## 1 row(s) below are to be removed:
## UserName Remove
## 16 VVKAJ116 yes
Load the output for further processing.
items_f_id_s <- read.delim("VVKAJ_Items_f_id_s.txt", quote="", colClasses="character")
Show unique usernames in items_f_id_s and confirm “VVKAJ116” has been removed.
unique(items_f_id_s$UserName)
## [1] "VVKAJ101" "VVKAJ102" "VVKAJ103" "VVKAJ104" "VVKAJ105" "VVKAJ106"
## [7] "VVKAJ107" "VVKAJ108" "VVKAJ109" "VVKAJ110" "VVKAJ111" "VVKAJ112"
## [13] "VVKAJ113" "VVKAJ114" "VVKAJ115" "VVKAJ117"
ind_metadata
has the participants’ gender, age, height, weight, BMI, and Waist.Circumference, etc. If desired, this individual-specific information can be added to items data.
Load ind_metadata.txt.
ind_metadata <- read.table("ind_metadata.txt", sep="\t", header=T)
Look at what the metadata has.
head(ind_metadata)
## UserName Diet Gender Age Weight Height BMI Waist.Circumference
## 1 VVKAJ101 Vegetarian M 31 79 186 22.83501 80
## 2 VVKAJ102 Vegan F 60 73 163 27.47563 90
## 3 VVKAJ103 Keto M 43 81 175 26.44898 72
## 4 VVKAJ104 American F 25 85 169 29.76086 89
## 5 VVKAJ105 Japanese M 71 60 169 21.00767 75
## 6 VVKAJ106 Vegetarian F 53 68 159 26.89767 85
This includes information on the removed individual, VVKAJ116, but it will not be used if VVKAJ116 is not in the items data.
Add this metadata of each participant to totals or items. ‘NA’ will be inserted to UserNames which are not in ind_metadata.
items_f_id_s_m <- merge(x=items_f_id_s, y=ind_metadata, by="UserName", all.x=T)
Check that the items data and metadata are merged.
head(items_f_id_s_m)
## UserName SampleID RecallRecId
## 1 VVKAJ101 vvkaj.00001 41fee4cf-783f-469b-aadf-62c7e2cd33a9
## 2 VVKAJ101 vvkaj.00001 41fee4cf-783f-469b-aadf-62c7e2cd33a9
## 3 VVKAJ101 vvkaj.00001 41fee4cf-783f-469b-aadf-62c7e2cd33a9
## 4 VVKAJ101 vvkaj.00001 41fee4cf-783f-469b-aadf-62c7e2cd33a9
## 5 VVKAJ101 vvkaj.00001 41fee4cf-783f-469b-aadf-62c7e2cd33a9
## 6 VVKAJ101 vvkaj.00001 41fee4cf-783f-469b-aadf-62c7e2cd33a9
## UserID RecallNo RecallAttempt RecallStatus
## 1 7bd05142-312e-4648-b1ac-fc258540af52 1 0 2
## 2 7bd05142-312e-4648-b1ac-fc258540af52 1 0 2
## 3 7bd05142-312e-4648-b1ac-fc258540af52 1 0 2
## 4 7bd05142-312e-4648-b1ac-fc258540af52 1 0 2
## 5 7bd05142-312e-4648-b1ac-fc258540af52 1 0 2
## 6 7bd05142-312e-4648-b1ac-fc258540af52 1 0 2
## IntakeStartDateTime IntakeEndDateTime ReportingDate Lang Occ_No
## 1 11/5/2021 0:00 11/5/2021 23:59 11/6/2021 1 1
## 2 11/5/2021 0:00 11/5/2021 23:59 11/6/2021 1 1
## 3 11/5/2021 0:00 11/5/2021 23:59 11/6/2021 1 1
## 4 11/5/2021 0:00 11/5/2021 23:59 11/6/2021 1 1
## 5 11/5/2021 0:00 11/5/2021 23:59 11/6/2021 1 2
## 6 11/5/2021 0:00 11/5/2021 23:59 11/6/2021 1 3
## Occ_Time Occ_Name EatWith WatchTVuseComputer Location FoodNum FoodType
## 1 11/5/2021 7:00 1 <NA> <NA> 1 1 1
## 2 11/5/2021 7:00 1 <NA> <NA> 1 2 1
## 3 11/5/2021 7:00 1 <NA> <NA> 1 3 2
## 4 11/5/2021 7:00 1 <NA> <NA> 1 4 1
## 5 11/5/2021 10:00 7 <NA> <NA> 1 5 1
## 6 11/5/2021 12:00 3 <NA> <NA> 1 6 1
## FoodSrce CodeNum FoodCode ModCode HowMany SubCode PortionCode FoodAmt KCAL
## 1 <NA> 1 92400000 0 9.6 0 30001 297.6 124.992
## 2 <NA> 2 57231250 0 1 0 10205 104 419.12
## 3 <NA> 3 91302010 0 0.5 0 21000 10.5 31.92
## 4 <NA> 4 63200200 0 0.5 0 10205 98.5 50.235
## 5 <NA> 5 94000100 0 8.4 0 30000 252 0
## 6 <NA> 6 32103000 0 0.5 0 10205 111 285.27
## PROT TFAT CARB MOIS ALC CAFF THEO SUGR FIBE CALC
## 1 0 0.744 30.83136 265.93536 0 26.784 0 29.58144 0 2.976
## 2 9.36 10.92 75.92 5.408 0 0 0 16.016 10.088 40.56
## 3 0.0315 0 8.652 1.7955 0 0 0 8.6226 0.021 0.63
## 4 0.4137 0.6304 11.98745 85.29115 0 0 0 8.32325 2.6595 7.88
## 5 0 0 0 251.748 0 0 0 0 0 7.56
## 6 11.3886 25.6854 1.1211 70.9956 0 0 0 1.1211 0 46.62
## IRON MAGN PHOS POTA SODI ZINC COPP SELE VC VB1
## 1 0.05952 0 26.784 14.88 8.928 0.26784 0.020832 0.2976 0 0
## 2 17.992 101.92 307.84 350.48 298.48 2.392 0.416 43.472 0.208 0.728
## 3 0.0441 0.21 0.42 5.46 0.42 0.0231 0.00378 0.084 0.0525 0
## 4 0.1773 4.925 10.835 53.19 0.985 0.06895 0.032505 0.0985 2.4625 0.03152
## 5 0 2.52 0 0 10.08 0.0252 0.0252 0 0 0
## 6 1.11 8.88 157.62 116.55 420.69 0.9657 0.01554 27.861 0 0.06105
## VB2 NIAC VB6 FOLA FA FF FDFE VB12 VARA RET
## 1 0 0 0 0 0 0 0 0 0 0
## 2 0.832 9.984 1.04 199.68 179.92 19.76 326.56 3.016 447.2 447.2
## 3 0.00399 0.012705 0.00252 0.21 0 0.21 0.21 0 0 0
## 4 0.036445 0.5122 0.058115 6.895 0 6.895 6.895 0 1.97 0
## 5 0 0 0 0 0 0 0 0 0 0
## 6 0.45954 0.05661 0.10878 39.96 0 39.96 39.96 1.0101 135.42 134.31
## BCAR ACAR CRYP LYCO LZ ATOC VK CHOLE SFAT S040 S060 S080
## 1 0 0 0 0 0 0 0 0 0 0 0 0
## 2 5.2 0 1.04 0 142.48 0.7696 2.288 0 1.04 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0
## 4 27.58 0 0 0 66.98 0.4728 16.154 0 0.052205 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0
## 6 11.1 0 9.99 0 316.35 1.6317 35.631 340.77 5.44344 0 0 0.00222
## S100 S120 S140 S160 S180 MFAT M161 M181 M201
## 1 0 0 0 0 0 0 0 0 0
## 2 0 0.00208 0.00208 0.80392 0.20592 6.032 0.00312 5.98936 0.03744
## 3 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0.030535 0.010835 0.089635 0.00197 0.085695 0
## 5 0 0 0 0 0 0 0 0 0
## 6 0.00222 0.00222 0.04329 3.80397 1.40859 7.27938 0.29415 6.882 0.07881
## M221 PFAT P182 P183 P184 P204 P205 P225 P226 VITD
## 1 0 0 0 0 0 0 0 0 0 0
## 2 0.00104 3.224 3.05136 0.17264 0 0 0 0 0 1.976
## 3 0 0 0 0 0 0 0 0 0 0
## 4 0 0.274815 0.164495 0.11032 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0
## 6 0.00777 10.96014 9.55599 1.21545 0 0.14319 0.00444 0 0.03441 1.998
## CHOLN VITE_ADD B12_ADD F_TOTAL F_CITMLB F_OTHER F_JUICE V_TOTAL V_DRKGR
## 1 0.8928 0 0 0 0 0 0 0 0
## 2 29.952 0 3.016 0 0 0 0 0 0
## 3 0.231 0 0 0 0 0 0 0 0
## 4 5.0235 0 0 0.65995 0.65995 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0
## 6 268.398 0 0 0 0 0 0 0 0
## V_REDOR_TOTAL V_REDOR_TOMATO V_REDOR_OTHER V_STARCHY_TOTAL V_STARCHY_POTATO
## 1 0 0 0 0 0
## 2 0 0 0 0 0
## 3 0 0 0 0 0
## 4 0 0 0 0 0
## 5 0 0 0 0 0
## 6 0 0 0 0 0
## V_STARCHY_OTHER V_OTHER V_LEGUMES G_TOTAL G_WHOLE G_REFINED PF_TOTAL
## 1 0 0 0 0 0 0 0
## 2 0 0 0 2.7456 2.652 0.0936 0.7384
## 3 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 1.8093
## PF_MPS_TOTAL PF_MEAT PF_CUREDMEAT PF_ORGAN PF_POULT PF_SEAFD_HI PF_SEAFD_LOW
## 1 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0
## PF_EGGS PF_SOY PF_NUTSDS PF_LEGUMES D_TOTAL D_MILK D_YOGURT D_CHEESE OILS
## 1 0 0 0 0 0 0 0 0 0
## 2 0 0 0.7384 0 0 0 0 0 6.0736
## 3 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0
## 6 1.8093 0 0 0 0 0 0 0 15.8508
## SOLID_FATS ADD_SUGARS A_DRINKS FoodComp
## 1 0 7.05312 0 1
## 2 0 3.432 0 1
## 3 0 2.05275 0 1
## 4 0 0 0 1
## 5 0 0 0 1
## 6 4.8507 0 0 1
## Main.food.description
## 1 Soft_drink_NFS
## 2 Cereal_Post_Great_Grains_Double_Pecan_Whole_Grain_Cereal_
## 3 Honey
## 4 Berries_frozen_NFS
## 5 Water_tap
## 6 Egg_salad_made_with_mayonnaise
## Old.Main.food.description FoodID
## 1 Soft drink, NFS 92400000.0
## 2 Cereal (Post Great Grains Double Pecan Whole Grain Cereal) 57231250.0
## 3 Honey 91302010.0
## 4 Berries, frozen, NFS 63200200.0
## 5 Water, tap 94000100.0
## 6 Egg salad, made with mayonnaise 32103000.0
## Diet Gender Age Weight Height BMI Waist.Circumference
## 1 Vegetarian M 31 79 186 22.83501 80
## 2 Vegetarian M 31 79 186 22.83501 80
## 3 Vegetarian M 31 79 186 22.83501 80
## 4 Vegetarian M 31 79 186 22.83501 80
## 5 Vegetarian M 31 79 186 22.83501 80
## 6 Vegetarian M 31 79 186 22.83501 80
Furthermore, as a quick way to look at the metadata of only the selected individuals, you can subset the metadata to just the usernames present in the analysis dataset (items_f_id_s) using the %in%
operator.
ind_metadata_s <- ind_metadata[ind_metadata$UserName %in% items_f_id_s$UserName, ]
Use the tail function to show the last six rows of ind_metadata_s. You can see that the last individual in this metadata is now VVKAJ117, and that VVKAJ116, which was not in items_f_id_s, has been omitted.
tail(ind_metadata_s)
## UserName Diet Gender Age Weight Height BMI Waist.Circumference
## 11 VVKAJ111 Vegetarian M 60 74.9000 164.1 27.81408 78.0
## 12 VVKAJ112 Vegan F 48 69.8992 158.0 28.00000 95.5
## 13 VVKAJ113 Keto M 43 80.0000 183.0 23.88844 98.0
## 14 VVKAJ114 American F 62 90.7235 161.0 35.00000 110.0
## 15 VVKAJ115 Japanese M 22 69.6348 174.0 23.00000 70.0
## 17 VVKAJ117 error M 23 76.0000 175.0 24.81633 80.0
Save the merged dataframe as a .txt file.
write.table(items_f_id_s_m, "VVKAJ_Items_f_id_s_m.txt", sep="\t", row.names=F, quote=F)
Use one of the input files saved above as an input for calculating totals for. Specify which columns have usernames and Recall.No., which has the recorded days.
GenerateTotals(inputfn = "VVKAJ_Items_f_id_s_m.txt",
User.Name = "UserName",
Recall.No = "RecallNo",
outfn = "VVKAJ_Tot.txt")
Load the total file generated above.
new_totals <- read.table("VVKAJ_Tot.txt", header=T, sep="\t")
The number of rows should be {No. of users x No. days}. For the example data, 16 users x 3 days = 48 rows (observations).
nrow(new_totals)
## [1] 48
View the new_totals.
head(new_totals)
## User_Day UserName RecallNo FoodAmt KCAL PROT TFAT CARB
## 1 VVKAJ101_1 VVKAJ101 1 2402.500 2314.9040 80.15720 82.97721 337.7973
## 2 VVKAJ101_2 VVKAJ101 2 1671.650 913.5955 35.13461 37.42791 118.5702
## 3 VVKAJ101_3 VVKAJ101 3 1920.588 1604.8050 48.29881 70.08057 215.7318
## 4 VVKAJ102_1 VVKAJ102 1 1957.475 1440.1995 38.54926 80.38863 152.8296
## 5 VVKAJ102_2 VVKAJ102 2 1858.067 1508.9957 46.25199 56.89753 212.5225
## 6 VVKAJ102_3 VVKAJ102 3 2049.400 1834.2306 58.89500 87.53381 211.8413
## MOIS ALC CAFF THEO SUGR FIBE CALC IRON MAGN
## 1 1879.619 0 26.784 0.00 144.20847 52.62550 889.8455 31.282000 589.3865
## 2 1467.035 0 76.800 0.00 47.67748 21.04413 619.2521 7.366001 273.2332
## 3 1572.133 0 30.240 3.36 111.81870 23.27856 1427.6826 14.628896 320.9569
## 4 1669.286 0 50.400 5.04 25.75794 33.90378 912.4538 11.797247 284.4450
## 5 1521.262 0 22.680 2.52 55.29288 31.25067 1144.7673 14.180575 366.8420
## 6 1666.272 0 0.000 0.00 31.41001 29.35169 700.0389 14.009471 373.1694
## PHOS POTA SODI ZINC COPP SELE VC VB1
## 1 1787.4540 4615.400 2298.662 11.776515 1.935984 113.91525 209.0321 1.882480
## 2 794.2740 2093.175 1058.073 5.470231 1.325496 58.42284 147.7427 0.983826
## 3 1249.7137 2571.690 2069.723 8.953749 1.565804 83.81562 137.9377 1.316055
## 4 808.2387 2974.354 2550.088 5.641533 1.278044 42.76070 179.4785 1.137953
## 5 1043.4290 4686.336 2840.423 6.408068 1.427889 38.26618 157.1404 2.909776
## 6 1164.2706 4653.953 4285.981 6.745011 2.066325 72.05487 117.7118 4.570952
## VB2 NIAC VB6 FOLA FA FF FDFE VB12
## 1 2.9713065 18.72793 3.829823 719.3660 179.920 539.4460 846.2460 5.17202
## 2 1.2894892 12.24912 1.386882 372.7493 85.070 287.6793 432.4393 2.23350
## 3 2.0125987 14.49594 4.120610 918.5919 601.800 316.7919 1340.1919 13.23145
## 4 0.9494982 10.92889 1.637870 459.3215 52.025 410.3265 495.5415 1.80990
## 5 1.3127483 13.76033 2.560617 534.1780 20.200 515.9980 548.3180 3.97461
## 6 1.2672856 18.25922 2.540221 506.4956 26.700 481.8156 525.2356 3.70270
## VARA RET BCAR ACAR CRYP LYCO LZ ATOC
## 1 1261.6665 673.1850 7004.547 106.7165 56.11800 13.350 14273.582 9.851310
## 2 1412.8761 82.3500 13464.676 4921.4081 81.65188 1174.626 2444.209 4.628156
## 3 1111.7093 504.9250 6476.761 1614.4750 19.37550 4.704 5437.084 16.765260
## 4 310.6182 165.1115 1551.208 344.1073 68.67350 7582.125 1340.894 18.987955
## 5 1265.3238 222.6810 10784.821 3226.1183 198.74500 3380.685 4032.658 16.267538
## 6 344.2824 144.1200 2307.391 143.9169 12.66863 3374.064 3388.637 9.870323
## VK CHOLE SFAT S040 S060 S080 S100 S120
## 1 674.6513 375.967 18.696057 0.310902 0.11065100 0.1176030 0.2303450 0.18606100
## 2 176.1320 18.300 8.035444 0.137250 0.13725000 0.1401794 0.1440906 0.16560063
## 3 356.2514 64.265 21.762310 0.461520 0.35581000 0.6559880 0.7907570 3.08172600
## 4 193.4643 63.277 12.393455 0.063991 0.04301325 0.0834395 0.1062550 0.06083225
## 5 271.3043 53.980 10.073571 0.086720 0.06900500 0.2074300 0.2042550 0.43357300
## 6 270.4181 65.830 17.110502 0.318160 0.16680000 0.2767274 0.3239814 0.26321963
## S140 S160 S180 MFAT M161 M181 M201 M221
## 1 1.2467585 11.448329 4.130060 28.14091 0.7393160 26.99568 0.2349650 0.00881000
## 2 0.5821610 4.955292 1.532696 15.32206 0.3639200 14.79923 0.1110759 0.00346125
## 3 2.6981170 9.237105 3.415553 24.67066 0.4838810 23.65191 0.1867220 0.00864600
## 4 0.1955330 8.961791 2.439567 34.62131 0.9667318 33.33719 0.2755755 0.00450000
## 5 0.5196547 6.309740 1.897680 22.20331 0.4462590 21.47338 0.2112680 0.02609700
## 6 0.8637470 10.355075 3.764050 33.49530 0.3996655 32.60066 0.3411139 0.00658725
## PFAT P182 P183 P184 P204 P205 P225 P226 VITD
## 1 28.95861 25.201469 3.525870 0 0.157937 0.00444 0.00000 0.03441 4.881200
## 2 10.24234 9.135821 1.092069 0 0.001315 0.00000 0.00000 0.00000 2.379000
## 3 18.75467 16.930229 1.803171 0 0.006573 0.00000 0.00000 0.00000 10.193000
## 4 28.41416 24.796558 3.534361 0 0.038890 0.00000 0.00000 0.00909 3.375425
## 5 20.44748 17.914195 2.412754 0 0.026967 0.00000 0.00000 0.01881 4.985000
## 6 31.17042 27.616109 3.344330 0 0.038623 0.00032 0.00208 0.00670 2.102000
## CHOLN VITE_ADD B12_ADD F_TOTAL F_CITMLB F_OTHER F_JUICE V_TOTAL
## 1 572.5637 0.00000 3.01600 4.037445 0.65995 3.035495 0.3420 3.656365
## 2 143.2537 0.00000 1.41000 0.640500 0.30625 0.323750 0.0105 2.727494
## 3 195.7137 8.34700 10.79370 1.530100 1.01160 0.505000 0.0135 1.906206
## 4 182.4953 6.85640 1.53720 0.000000 0.00000 0.000000 0.0000 4.280535
## 5 212.2748 7.19922 3.21321 0.000000 0.00000 0.000000 0.0000 7.783067
## 6 219.7563 0.00000 2.83410 0.000000 0.00000 0.000000 0.0000 7.084731
## V_DRKGR V_REDOR_TOTAL V_REDOR_TOMATO V_REDOR_OTHER V_STARCHY_TOTAL
## 1 1.6658400 0.000000 0.00000 0.000000 0.0000000
## 2 0.4840000 1.177100 0.04860 1.128500 0.0000000
## 3 1.0143750 0.300500 0.00000 0.300500 0.0000000
## 4 0.0099900 1.485350 1.48535 0.000000 0.1288625
## 5 0.6186667 1.857575 0.84840 1.009175 3.3200000
## 6 0.7412500 0.848400 0.84840 0.000000 4.6146000
## V_STARCHY_POTATO V_STARCHY_OTHER V_OTHER V_LEGUMES G_TOTAL G_WHOLE
## 1 0.0000 0.0000000 1.9905250 1.256750 5.4384 2.6520
## 2 0.0000 0.0000000 1.0663938 0.105000 2.6500 1.7669
## 3 0.0000 0.0000000 0.5913312 0.135000 4.4072 4.2542
## 4 0.0000 0.1288625 2.6563325 1.085300 5.0421 1.5522
## 5 3.3200 0.0000000 1.9868250 0.285000 2.2405 1.0285
## 6 4.6146 0.0000000 0.8804812 0.192375 1.6050 0.0000
## G_REFINED PF_TOTAL PF_MPS_TOTAL PF_MEAT PF_CUREDMEAT PF_ORGAN PF_POULT
## 1 2.7864 3.111700 0.0000 0 0.0000 0 0
## 2 0.8831 3.316731 0.0000 0 0.0000 0 0
## 3 0.1530 2.501000 0.0000 0 0.0000 0 0
## 4 3.4899 0.644600 0.0000 0 0.0000 0 0
## 5 1.2120 2.226680 0.0000 0 0.0000 0 0
## 6 1.6050 4.697931 0.2016 0 0.2016 0 0
## PF_SEAFD_HI PF_SEAFD_LOW PF_EGGS PF_SOY PF_NUTSDS PF_LEGUMES D_TOTAL D_MILK
## 1 0 0 1.8093 0.0000 1.3024000 4.935150 1.34064 0.21168
## 2 0 0 0.0000 1.9200 1.3967312 0.420000 0.75030 0.75030
## 3 0 0 0.0000 0.0000 2.5010000 0.540000 2.39525 0.88850
## 4 0 0 0.3030 0.0000 0.3416000 4.351050 0.15150 0.15150
## 5 0 0 0.2020 1.6660 0.3586800 1.155000 0.57980 0.57980
## 6 0 0 0.2020 3.3558 0.9385313 0.772875 0.70760 0.58900
## D_YOGURT D_CHEESE OILS SOLID_FATS ADD_SUGARS A_DRINKS
## 1 0.00000 0.5544 46.76440 14.582140 12.53787 0
## 2 0.00000 0.0000 19.10275 4.932700 3.95306 0
## 3 1.50675 0.0000 36.48778 18.798000 10.32484 0
## 4 0.00000 0.0000 69.07104 2.439577 1.72710 0
## 5 0.00000 0.0000 44.07012 3.130800 5.21406 0
## 6 0.00000 0.0140 65.58897 9.271000 0.80064 0
Load ind_metadata.txt if you have not done so.
ind_metadata <- read.table("ind_metadata.txt", sep="\t", header=T)
Add this metadata of each participant to totals.’NA’ will be inserted to UserNames which are not in ind_metadata.
new_totals_m <- merge(x=new_totals, y=ind_metadata, by="UserName", all.x=T)
Check that the items data and metadata are merged.
head(new_totals_m)
## UserName User_Day RecallNo FoodAmt KCAL PROT TFAT CARB
## 1 VVKAJ101 VVKAJ101_1 1 2402.500 2314.9040 80.15720 82.97721 337.7973
## 2 VVKAJ101 VVKAJ101_2 2 1671.650 913.5955 35.13461 37.42791 118.5702
## 3 VVKAJ101 VVKAJ101_3 3 1920.588 1604.8050 48.29881 70.08057 215.7318
## 4 VVKAJ102 VVKAJ102_1 1 1957.475 1440.1995 38.54926 80.38863 152.8296
## 5 VVKAJ102 VVKAJ102_2 2 1858.067 1508.9957 46.25199 56.89753 212.5225
## 6 VVKAJ102 VVKAJ102_3 3 2049.400 1834.2306 58.89500 87.53381 211.8413
## MOIS ALC CAFF THEO SUGR FIBE CALC IRON MAGN
## 1 1879.619 0 26.784 0.00 144.20847 52.62550 889.8455 31.282000 589.3865
## 2 1467.035 0 76.800 0.00 47.67748 21.04413 619.2521 7.366001 273.2332
## 3 1572.133 0 30.240 3.36 111.81870 23.27856 1427.6826 14.628896 320.9569
## 4 1669.286 0 50.400 5.04 25.75794 33.90378 912.4538 11.797247 284.4450
## 5 1521.262 0 22.680 2.52 55.29288 31.25067 1144.7673 14.180575 366.8420
## 6 1666.272 0 0.000 0.00 31.41001 29.35169 700.0389 14.009471 373.1694
## PHOS POTA SODI ZINC COPP SELE VC VB1
## 1 1787.4540 4615.400 2298.662 11.776515 1.935984 113.91525 209.0321 1.882480
## 2 794.2740 2093.175 1058.073 5.470231 1.325496 58.42284 147.7427 0.983826
## 3 1249.7137 2571.690 2069.723 8.953749 1.565804 83.81562 137.9377 1.316055
## 4 808.2387 2974.354 2550.088 5.641533 1.278044 42.76070 179.4785 1.137953
## 5 1043.4290 4686.336 2840.423 6.408068 1.427889 38.26618 157.1404 2.909776
## 6 1164.2706 4653.953 4285.981 6.745011 2.066325 72.05487 117.7118 4.570952
## VB2 NIAC VB6 FOLA FA FF FDFE VB12
## 1 2.9713065 18.72793 3.829823 719.3660 179.920 539.4460 846.2460 5.17202
## 2 1.2894892 12.24912 1.386882 372.7493 85.070 287.6793 432.4393 2.23350
## 3 2.0125987 14.49594 4.120610 918.5919 601.800 316.7919 1340.1919 13.23145
## 4 0.9494982 10.92889 1.637870 459.3215 52.025 410.3265 495.5415 1.80990
## 5 1.3127483 13.76033 2.560617 534.1780 20.200 515.9980 548.3180 3.97461
## 6 1.2672856 18.25922 2.540221 506.4956 26.700 481.8156 525.2356 3.70270
## VARA RET BCAR ACAR CRYP LYCO LZ ATOC
## 1 1261.6665 673.1850 7004.547 106.7165 56.11800 13.350 14273.582 9.851310
## 2 1412.8761 82.3500 13464.676 4921.4081 81.65188 1174.626 2444.209 4.628156
## 3 1111.7093 504.9250 6476.761 1614.4750 19.37550 4.704 5437.084 16.765260
## 4 310.6182 165.1115 1551.208 344.1073 68.67350 7582.125 1340.894 18.987955
## 5 1265.3238 222.6810 10784.821 3226.1183 198.74500 3380.685 4032.658 16.267538
## 6 344.2824 144.1200 2307.391 143.9169 12.66863 3374.064 3388.637 9.870323
## VK CHOLE SFAT S040 S060 S080 S100 S120
## 1 674.6513 375.967 18.696057 0.310902 0.11065100 0.1176030 0.2303450 0.18606100
## 2 176.1320 18.300 8.035444 0.137250 0.13725000 0.1401794 0.1440906 0.16560063
## 3 356.2514 64.265 21.762310 0.461520 0.35581000 0.6559880 0.7907570 3.08172600
## 4 193.4643 63.277 12.393455 0.063991 0.04301325 0.0834395 0.1062550 0.06083225
## 5 271.3043 53.980 10.073571 0.086720 0.06900500 0.2074300 0.2042550 0.43357300
## 6 270.4181 65.830 17.110502 0.318160 0.16680000 0.2767274 0.3239814 0.26321963
## S140 S160 S180 MFAT M161 M181 M201 M221
## 1 1.2467585 11.448329 4.130060 28.14091 0.7393160 26.99568 0.2349650 0.00881000
## 2 0.5821610 4.955292 1.532696 15.32206 0.3639200 14.79923 0.1110759 0.00346125
## 3 2.6981170 9.237105 3.415553 24.67066 0.4838810 23.65191 0.1867220 0.00864600
## 4 0.1955330 8.961791 2.439567 34.62131 0.9667318 33.33719 0.2755755 0.00450000
## 5 0.5196547 6.309740 1.897680 22.20331 0.4462590 21.47338 0.2112680 0.02609700
## 6 0.8637470 10.355075 3.764050 33.49530 0.3996655 32.60066 0.3411139 0.00658725
## PFAT P182 P183 P184 P204 P205 P225 P226 VITD
## 1 28.95861 25.201469 3.525870 0 0.157937 0.00444 0.00000 0.03441 4.881200
## 2 10.24234 9.135821 1.092069 0 0.001315 0.00000 0.00000 0.00000 2.379000
## 3 18.75467 16.930229 1.803171 0 0.006573 0.00000 0.00000 0.00000 10.193000
## 4 28.41416 24.796558 3.534361 0 0.038890 0.00000 0.00000 0.00909 3.375425
## 5 20.44748 17.914195 2.412754 0 0.026967 0.00000 0.00000 0.01881 4.985000
## 6 31.17042 27.616109 3.344330 0 0.038623 0.00032 0.00208 0.00670 2.102000
## CHOLN VITE_ADD B12_ADD F_TOTAL F_CITMLB F_OTHER F_JUICE V_TOTAL
## 1 572.5637 0.00000 3.01600 4.037445 0.65995 3.035495 0.3420 3.656365
## 2 143.2537 0.00000 1.41000 0.640500 0.30625 0.323750 0.0105 2.727494
## 3 195.7137 8.34700 10.79370 1.530100 1.01160 0.505000 0.0135 1.906206
## 4 182.4953 6.85640 1.53720 0.000000 0.00000 0.000000 0.0000 4.280535
## 5 212.2748 7.19922 3.21321 0.000000 0.00000 0.000000 0.0000 7.783067
## 6 219.7563 0.00000 2.83410 0.000000 0.00000 0.000000 0.0000 7.084731
## V_DRKGR V_REDOR_TOTAL V_REDOR_TOMATO V_REDOR_OTHER V_STARCHY_TOTAL
## 1 1.6658400 0.000000 0.00000 0.000000 0.0000000
## 2 0.4840000 1.177100 0.04860 1.128500 0.0000000
## 3 1.0143750 0.300500 0.00000 0.300500 0.0000000
## 4 0.0099900 1.485350 1.48535 0.000000 0.1288625
## 5 0.6186667 1.857575 0.84840 1.009175 3.3200000
## 6 0.7412500 0.848400 0.84840 0.000000 4.6146000
## V_STARCHY_POTATO V_STARCHY_OTHER V_OTHER V_LEGUMES G_TOTAL G_WHOLE
## 1 0.0000 0.0000000 1.9905250 1.256750 5.4384 2.6520
## 2 0.0000 0.0000000 1.0663938 0.105000 2.6500 1.7669
## 3 0.0000 0.0000000 0.5913312 0.135000 4.4072 4.2542
## 4 0.0000 0.1288625 2.6563325 1.085300 5.0421 1.5522
## 5 3.3200 0.0000000 1.9868250 0.285000 2.2405 1.0285
## 6 4.6146 0.0000000 0.8804812 0.192375 1.6050 0.0000
## G_REFINED PF_TOTAL PF_MPS_TOTAL PF_MEAT PF_CUREDMEAT PF_ORGAN PF_POULT
## 1 2.7864 3.111700 0.0000 0 0.0000 0 0
## 2 0.8831 3.316731 0.0000 0 0.0000 0 0
## 3 0.1530 2.501000 0.0000 0 0.0000 0 0
## 4 3.4899 0.644600 0.0000 0 0.0000 0 0
## 5 1.2120 2.226680 0.0000 0 0.0000 0 0
## 6 1.6050 4.697931 0.2016 0 0.2016 0 0
## PF_SEAFD_HI PF_SEAFD_LOW PF_EGGS PF_SOY PF_NUTSDS PF_LEGUMES D_TOTAL D_MILK
## 1 0 0 1.8093 0.0000 1.3024000 4.935150 1.34064 0.21168
## 2 0 0 0.0000 1.9200 1.3967312 0.420000 0.75030 0.75030
## 3 0 0 0.0000 0.0000 2.5010000 0.540000 2.39525 0.88850
## 4 0 0 0.3030 0.0000 0.3416000 4.351050 0.15150 0.15150
## 5 0 0 0.2020 1.6660 0.3586800 1.155000 0.57980 0.57980
## 6 0 0 0.2020 3.3558 0.9385313 0.772875 0.70760 0.58900
## D_YOGURT D_CHEESE OILS SOLID_FATS ADD_SUGARS A_DRINKS Diet Gender
## 1 0.00000 0.5544 46.76440 14.582140 12.53787 0 Vegetarian M
## 2 0.00000 0.0000 19.10275 4.932700 3.95306 0 Vegetarian M
## 3 1.50675 0.0000 36.48778 18.798000 10.32484 0 Vegetarian M
## 4 0.00000 0.0000 69.07104 2.439577 1.72710 0 Vegan F
## 5 0.00000 0.0000 44.07012 3.130800 5.21406 0 Vegan F
## 6 0.00000 0.0140 65.58897 9.271000 0.80064 0 Vegan F
## Age Weight Height BMI Waist.Circumference
## 1 31 79 186 22.83501 80
## 2 31 79 186 22.83501 80
## 3 31 79 186 22.83501 80
## 4 60 73 163 27.47563 90
## 5 60 73 163 27.47563 90
## 6 60 73 163 27.47563 90
Save the merged dataframe as a .txt file.
write.table(new_totals_m, "VVKAJ_Tot_m.txt", sep="\t", row.names=F, quote=F)
Calculate the mean of the totals data across all the days for each participant.
AverageBy(data= new_totals, by= "UserName", start.col= "FoodAmt", end.col= "A_DRINKS",
outfn="VVKAJ_Tot_mean.txt")
Load the output for further processing.
new_totals_mean <- read.table("VVKAJ_Tot_mean.txt", header=T, sep="\t")
The number of rows should be equal to the number of users. This example data has 16 users, so there should be 16 rows of mean totals.
nrow(new_totals_mean)
## [1] 16
Load ind_metadata.txt if you have not done so.
ind_metadata <- read.table("ind_metadata.txt", sep="\t", header=T)
Add this metadata of each participant in the mean totals. ‘NA’ will be inserted to UserNames which are not in ind_metadata.
new_totals_mean_m <- merge(x=new_totals_mean, y=ind_metadata, by="UserName", all.x=T)
head(new_totals_mean_m, 1)
## UserName FoodAmt KCAL PROT TFAT CARB MOIS ALC CAFF THEO
## 1 VVKAJ101 1998.246 1611.101 54.5302 63.49523 224.0331 1639.596 0 44.608 1.12
## SUGR FIBE CALC IRON MAGN PHOS POTA SODI
## 1 101.2349 32.31606 978.9267 17.75897 394.5255 1277.147 3093.422 1808.819
## ZINC COPP SELE VC VB1 VB2 NIAC VB6
## 1 8.733498 1.609095 85.38457 164.9042 1.39412 2.091131 15.15766 3.112438
## FOLA FA FF FDFE VB12 VARA RET BCAR ACAR
## 1 670.2357 288.93 381.3057 872.959 6.87899 1262.084 420.1533 8981.995 2214.2
## CRYP LYCO LZ ATOC VK CHOLE SFAT S040
## 1 52.38179 397.5601 7384.958 10.41491 402.3449 152.844 16.1646 0.303224
## S060 S080 S100 S120 S140 S160 S180 MFAT
## 1 0.201237 0.3045901 0.3883975 1.144463 1.509012 8.546908 3.026103 22.71121
## M161 M181 M201 M221 PFAT P182 P183 P184
## 1 0.529039 21.81561 0.1775876 0.006972417 19.31854 17.08917 2.14037 0
## P204 P205 P225 P226 VITD CHOLN VITE_ADD B12_ADD F_TOTAL
## 1 0.055275 0.00148 0 0.01147 5.817733 303.8437 2.782333 5.073233 2.069348
## F_CITMLB F_OTHER F_JUICE V_TOTAL V_DRKGR V_REDOR_TOTAL V_REDOR_TOMATO
## 1 0.6592667 1.288082 0.122 2.763355 1.054738 0.4925333 0.0162
## V_REDOR_OTHER V_STARCHY_TOTAL V_STARCHY_POTATO V_STARCHY_OTHER V_OTHER
## 1 0.4763333 0 0 0 1.216083
## V_LEGUMES G_TOTAL G_WHOLE G_REFINED PF_TOTAL PF_MPS_TOTAL PF_MEAT
## 1 0.4989167 4.1652 2.891033 1.274167 2.976477 0 0
## PF_CUREDMEAT PF_ORGAN PF_POULT PF_SEAFD_HI PF_SEAFD_LOW PF_EGGS PF_SOY
## 1 0 0 0 0 0 0.6031 0.64
## PF_NUTSDS PF_LEGUMES D_TOTAL D_MILK D_YOGURT D_CHEESE OILS SOLID_FATS
## 1 1.733377 1.96505 1.495397 0.6168267 0.50225 0.1848 34.11831 12.77095
## ADD_SUGARS A_DRINKS Diet Gender Age Weight Height BMI
## 1 8.93859 0 Vegetarian M 31 79 186 22.83501
## Waist.Circumference
## 1 80
Save the merged dataframe as a .txt file.
write.table(new_totals_mean_m, "VVKAJ_Tot_mean_m.txt", sep="\t", row.names=F, quote=F)
Totals data may contain outliers due to errors in dietary reporting. These errors may be due to omission or inaccurate over- or under-estimation of portion size, leading to improbable nutrient totals. ASA24 provides General Guidelines for Reviewing & Cleaning Data for identifying and removing suspicious records.
Here, we will identify records that contain values that fall outside typically observed ranges of kilocalories (KCAL), protein (PROT), total fat (TFAT), and vitamin C (VC). The ASA24 guide provides ranges of beta carotene (BCAR), too, however, outlier checking for BCAR is omitted in this tutorial but can be considered if you identify it as a nutrient that has a high variance in your study dataset.
[NOTE] Your input dataframe (QCtotals) will be overwritten after each outlier removal.
Run all these QC steps in this order. When asked, choose to remove the outliers that fall outside the specified range for each nutrient.
Load your totals if necessary - to be used as input for QC.
new_totals_mean_m <- read.table("VVKAJ_Tot_mean_m.txt", sep="\t", header=T)
Split your dataset to males and females because different thresholds apply for males and females.
new_totals_mean_m_M <- subset(new_totals_mean_m, Gender=="M")
new_totals_mean_m_F <- subset(new_totals_mean_m, Gender=="F")
Define your males totals dataset to be used as input.
QCtotals <- new_totals_mean_m_M
Flag if KCAL is <650 or >5700 → ask remove or not → if yes, remove those rows.
QCOutliers(input.data = QCtotals, target.colname = "KCAL", min = 650, max = 5700)
This function will print out rows that fall outside the specified min-max range, and a dialogue box will appear outside the R Studio (shown below), asking whether to remove them. You should make sure to review these records carefully to double-check if the removal is warranted. It is possible to have a valid record that could meet the threshold for removal. Only you will know if you can trust the record when working with real data.
If you find potential outlier(s) here, click “No”, and view those total(s) with their other nutrient intake information by running the following;
KCAL_outliers <- subset(QCtotals, KCAL < 650 | KCAL > 5700)
# Sort the rows by KCAL and show only the specified variables.
KCAL_outliers[order(KCAL_outliers$KCAL, decreasing = T),
c('UserName', 'KCAL', 'FoodAmt', 'PROT', 'TFAT', 'CARB')]
## UserName KCAL FoodAmt PROT TFAT CARB
## 16 VVKAJ117 5789.476 5639.525 194.1051 263.8755 674.0867
If you think it is a true outlier, then run the QCOutliers command for KCAL again, and click “Yes” to remove the outlier. Here for this tutorial, we will remove this individual.
QCOutliers(input.data = QCtotals, target.colname = "KCAL", min = 650, max = 5700)
## There are 1 observations with < 650 or > 5700 .
## Remove? (Yes/no/cancel)
## Outlier rows were removed; the cleaned data is saved as an object called "QCtotals".
## 8 rows remained.
Continue the QC process with other variables.
Flag if PROT is <25 or >240 → ask remove or not → if yes, remove those rows
QCOutliers(input.data = QCtotals, target.colname = "PROT", min = 25, max = 240)
## There are 0 observations with < 25 or > 240 .
## There are no outlier rows, but the input data was renamed as QCtotals.
## 8 rows remained.
Flag if TFAT is <25 or >230 → ask remove or not → if yes, remove those rows
QCOutliers(input.data = QCtotals, target.colname = "TFAT", min = 25, max = 230)
## There are 0 observations with < 25 or > 230 .
## There are no outlier rows, but the input data was renamed as QCtotals.
## 8 rows remained.
Flag if VC (Vitamin C) is <5 or >400 → ask remove or not → if yes, remove those rows.
QCOutliers(input.data = QCtotals, target.colname = "VC", min = 5, max = 400)
## There are 0 observations with < 5 or > 400 .
## There are no outlier rows, but the input data was renamed as QCtotals.
## 8 rows remained.
Name the males totals after QC.
QCed_M <- QCtotals
Define your female totals dataset to be used as input.
QCtotals <- new_totals_mean_m_F
Flag if KCAL is <600 or >4400 → ask remove or not → if yes, remove those rows.
QCOutliers(input.data = QCtotals, target.colname = "KCAL", min = 600, max = 4400)
Flag if PROT is <10 or >180 → ask remove or not → if yes, remove those rows
QCOutliers(input.data = QCtotals, target.colname = "PROT", min = 10, max = 180)
## There are 0 observations with < 10 or > 180 .
## There are no outlier rows, but the input data was renamed as QCtotals.
## 7 rows remained.
Flag if TFAT is <15 or >185 → ask remove or not → if yes, remove those rows
QCOutliers(input.data = QCtotals, target.colname = "TFAT", min = 15, max = 185)
## There are 0 observations with < 15 or > 185 .
## There are no outlier rows, but the input data was renamed as QCtotals.
## 7 rows remained.
Flag if VC (Vitamin C) is <5 or >350 → ask remove or not → if yes, remove those rows.
QCOutliers(input.data = QCtotals, target.colname = "VC", min = 5, max = 350)
## There are 0 observations with < 5 or > 350 .
## There are no outlier rows, but the input data was renamed as QCtotals.
## 7 rows remained.
Name the females totals after QC.
QCed_F <- QCtotals
Combine the rows of M and F.
QCtotals_MF <- rbind(QCed_M, QCed_F)
Save as a .txt file.
write.table(QCtotals_MF, "VVKAJ_Tot_mean_m_QCed.txt", sep="\t", quote=F, row.names=F)
In the previous section, we have removed individual(s) that did not pass the QC from mean total data. We will remove those individual(s) from the totals (before taking means of days), so that we will have the same individuals in the mean_total and total.
Among the individuals in new_totals_m, retain only those in QCtotals_MF.
new_totals_m_QCed <- new_totals_m[ new_totals_m$UserName %in% QCtotals_MF$UserName, ]
Save as a .txt file. This will be the total for each of the “QC-ed” individuals for each day, to be used for clustering analyses.
write.table(new_totals_m_QCed, "VVKAJ_Tot_m_QCed.txt", sep="\t", quote=F, row.names=F)
Among the individuals in new_totals_m, pick up only those in QCtotals_MF.
items_f_id_s_m_QCed <- items_f_id_s_m[ items_f_id_s_m$UserName %in% QCtotals_MF$UserName, ]
Save as a .txt file. This will be the items for each of the “QC-ed” individuals for each day, to be used for ordination etc.
write.table(items_f_id_s_m_QCed, "VVKAJ_Items_f_id_s_m_QCed.txt", sep="\t", quote=F, row.names=F)
Come back to the main directory before you start running another script.
setwd(main_wd)