Pre-calculate joint distributions of genotypes

We can use SPAGRM.NullModel function in GRAB package to pre-calculate the joint distribution of genotypes. This function requires products of step 0 and 1.

Prepare input files

SPAGRM.NullModel needs the SparseGRMFile and PairwiseIBDFile from step 0, and ResidMatFile from step 1. SparseGRMFile and PairwiseIBDFile only need to be calculated once for a certain cohort. ResidMatFile has the same format regardless of various phenotypes and statistical models.

Run SPAGRM.NullModel to pre-calculate genotype distributions

SPAGRM.NullModel(ResidMatFile,
                 SparseGRMFile,
                 PairwiseIBDFile,
                 control = list(MaxQuantile = 0.75,
                                MinQuantile = 0.25,
                                OutlierRatio = 1.5,
                                ControlOutlier = TRUE,
                                MaxNuminFam = 5,
                                MAF_interval = c(0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5)))
  • ResidMatFile: required. A path to the ResidMat file.
  • SparseGRMFile: required. A path to the sparse GRM file.
  • PairwiseIBDFile: required. A path to the pairwise IBD file.
  • control: optional. control is to specify a list of parameters for controlling the association testing process.
    • MaxQuantile/MinQuantile/OutlierRatio: we use these three parameters to divide model residuals into outliers and non-outliers. The interquartile range (IQR) is defined as $MaxQuantile - MinQuantile$; The interval of outlier detection is defined as $[MinQuantile - OutlierRatio \times IQR, MaxQuantile + OutlierRatio \times IQR]$ which divides model residuals into non-outliers and outliers.
      • MaxQuantile: optional. Upper quantile, ranging from 0 to 1. (default=0.75)
      • MinQuantile: optional. Lower quantile, ranging from 0 to 1. (default=0.25)
      • OutlierRatio: optional. Multiples of IQR, more than 0. (default=1.5)
    • ControlOutlier: optional. If TRUE, we will control the proportion of outliers $\leq$ 10% to remain computational efficiency. (default=TRUE)
    • MaxNuminFam: optional. Control the family size $\leq$ MaxQuantile to remain computational efficiency. (default=5)
    • MAF_interval: optional. we divide the minor allele frequency (MAF) region into several intervals and pre-calculate joint distributions of genotypes at each MAF cutoff.

Example:

# Load in the filepath
library(GRAB)
SparseGRMFile = system.file("SparseGRM", "SparseGRM.txt", package = "GRAB")
PairwiseIBDFile = system.file("PairwiseIBD", "PairwiseIBD.txt", package = "GRAB")
ResidMatFile = system.file("extdata", "ResidMat.txt", package = "GRAB")
obj.SPAGRM = SPAGRM.NullModel(ResidMatFile = ResidMatFile, 
                              SparseGRMFile = SparseGRMFile, 
                              PairwiseIBDFile = PairwiseIBDFile,
                              control = list(ControlOutlier = FALSE))
# cutoffVec:	 -1.930831 1.920325 
# Outliers information is as below
#       SubjID     Resid Outlier
#  1: Subj-217 -2.368746    TRUE
#  2:    f11_6 -2.315659    TRUE
#  3: Subj-133 -2.179490    TRUE
#  4:    f28_9 -2.135892    TRUE
#  5:    f24_1  1.948334    TRUE
#  6:  Subj-45  1.953527    TRUE
#  7:    f15_7  2.015207    TRUE
#  8: Subj-280  2.044026    TRUE
#  9: Subj-487  2.131917    TRUE
# 10:   Subj-8  2.300450    TRUE
# 11: Subj-263  2.521091    TRUE
# 12:     f5_3  2.579543    TRUE
# Start process the related residual information.
# Start process the Chow-Liu tree.
# Completed processing the CLT for families with outliers:	 0 ,  5 / 50 
# Save the objSPAGRM product
objSPAGRMFile = system.file("results", "objSPAGRMFile.RData", package = "GRAB")
save(obj.SPAGRM, file = objSPAGRMFile)

Note

  • Users should check model residuals in ResidMatFile before running SPAGRM.NullModel, including:
    1. If model residuals are all zeros or a constant number, please check the null model fitting;
    2. If sum of model residuals is not equal to zero, please rescale them to make sure the sum of model residuals is equal to zero.
  • If ControlOutlier is set TRUE, we will automatically alter OutlierRatio to make sure that the proportion of outliers $\leq$ 10%. But we suggest setting ControlOutlier to FALSE when analyzing ultra-rare variants and/or phenotypic distribution is extremely unbalanced.
  • MaxNuminFam is set to 5 by default, in order to improve computing efficiency while maintaining accuracy. Our real data analysis indicates that increasing MaxNuminFam has no impact on the association results, but the run time will increase exponentially.