Pre-calculate joint distributions of genotypes
We can use SPAGRM.NullModel function in GRAB package to pre-calculate the joint distribution of genotypes. This function requires products of step 0 and 1.
Prepare input files
SPAGRM.NullModel needs the SparseGRMFile and PairwiseIBDFile from step 0, and ResidMatFile from step 1. SparseGRMFile and PairwiseIBDFile only need to be calculated once for a certain cohort. ResidMatFile has the same format regardless of various phenotypes and statistical models.
Run SPAGRM.NullModel to pre-calculate genotype distributions
SPAGRM.NullModel(ResidMatFile,
SparseGRMFile,
PairwiseIBDFile,
control = list(MaxQuantile = 0.75,
MinQuantile = 0.25,
OutlierRatio = 1.5,
ControlOutlier = TRUE,
MaxNuminFam = 5,
MAF_interval = c(0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5)))
ResidMatFile: required. A path to the ResidMat file.SparseGRMFile: required. A path to the sparse GRM file.PairwiseIBDFile: required. A path to the pairwise IBD file.control: optional.controlis to specify a list of parameters for controlling the association testing process.MaxQuantile/MinQuantile/OutlierRatio: we use these three parameters to divide model residuals into outliers and non-outliers. The interquartile range (IQR) is defined as $MaxQuantile - MinQuantile$; The interval of outlier detection is defined as $[MinQuantile - OutlierRatio \times IQR, MaxQuantile + OutlierRatio \times IQR]$ which divides model residuals into non-outliers and outliers.MaxQuantile: optional. Upper quantile, ranging from 0 to 1. (default=0.75)MinQuantile: optional. Lower quantile, ranging from 0 to 1. (default=0.25)OutlierRatio: optional. Multiples of IQR, more than 0. (default=1.5)
ControlOutlier: optional. If TRUE, we will control the proportion of outliers $\leq$ 10% to remain computational efficiency. (default=TRUE)MaxNuminFam: optional. Control the family size $\leq$MaxQuantileto remain computational efficiency. (default=5)MAF_interval: optional. we divide the minor allele frequency (MAF) region into several intervals and pre-calculate joint distributions of genotypes at each MAF cutoff.
Example:
# Load in the filepath
library(GRAB)
SparseGRMFile = system.file("SparseGRM", "SparseGRM.txt", package = "GRAB")
PairwiseIBDFile = system.file("PairwiseIBD", "PairwiseIBD.txt", package = "GRAB")
ResidMatFile = system.file("extdata", "ResidMat.txt", package = "GRAB")
obj.SPAGRM = SPAGRM.NullModel(ResidMatFile = ResidMatFile,
SparseGRMFile = SparseGRMFile,
PairwiseIBDFile = PairwiseIBDFile,
control = list(ControlOutlier = FALSE))
# cutoffVec: -1.930831 1.920325
# Outliers information is as below
# SubjID Resid Outlier
# 1: Subj-217 -2.368746 TRUE
# 2: f11_6 -2.315659 TRUE
# 3: Subj-133 -2.179490 TRUE
# 4: f28_9 -2.135892 TRUE
# 5: f24_1 1.948334 TRUE
# 6: Subj-45 1.953527 TRUE
# 7: f15_7 2.015207 TRUE
# 8: Subj-280 2.044026 TRUE
# 9: Subj-487 2.131917 TRUE
# 10: Subj-8 2.300450 TRUE
# 11: Subj-263 2.521091 TRUE
# 12: f5_3 2.579543 TRUE
# Start process the related residual information.
# Start process the Chow-Liu tree.
# Completed processing the CLT for families with outliers: 0 , 5 / 50
# Save the objSPAGRM product
objSPAGRMFile = system.file("results", "objSPAGRMFile.RData", package = "GRAB")
save(obj.SPAGRM, file = objSPAGRMFile)
Note
- Users should check model residuals in
ResidMatFilebefore runningSPAGRM.NullModel, including:
- If model residuals are all zeros or a constant number, please check the null model fitting;
- If sum of model residuals is not equal to zero, please rescale them to make sure the sum of model residuals is equal to zero.
- If
ControlOutlieris set TRUE, we will automatically alterOutlierRatioto make sure that the proportion of outliers $\leq$ 10%. But we suggest settingControlOutlierto FALSE when analyzing ultra-rare variants and/or phenotypic distribution is extremely unbalanced.MaxNuminFamis set to 5 by default, in order to improve computing efficiency while maintaining accuracy. Our real data analysis indicates that increasingMaxNuminFamhas no impact on the association results, but the run time will increase exponentially.