Within-subject variability

Recently, TrajGWAS method introduced a mixed-effects multiple location scale model into large-scale longitudinal trait GWAS. Building on regular linear mixed effect models, they further formulated $\sigma_{\varepsilon_{ij}}$ via

\[\sigma_{\varepsilon_{ij}} = exp(W_{ij}^T \tau + G_i \tau_g + w_i)\]

The standard error $\sigma_{\varepsilon_{ij}}$ is determined by covariate vector $W_{ij}$, genotype $G_i$, and a random intercept $w_i$. This additional formula allows TrajGWAS to test genetic effects on within-subject (WS) variability (i.e. $\tau_g$), which is also an important risk factor for complex diseases. The Julia package WiSER can fit the above model.

Quick start for TrajGWAS

Please run the following code in Julia. Julia package TrajGWAS is builded upon the WiSER method and can also be used for model fitting. More details can be seen in WiSER documentation or TrajGWAS documentation. In this section, our main focus is to demonstrate how to obtain model residuals for testing $\beta$_g = 0 (i.e., the mean level) from the fitted null model.

Installation and library

using DataFrames, CSV, DelimitedFiles
using Ipopt, NLopt, KNITRO
using WiSER, TrajGWAS

# for TrajGWAS new version, solver settings:
solver = Ipopt.Optimizer(); solver_config = Dict("print_level"=>0, "mehrotra_algorithm"=>"yes", "warm_start_init_point"=>"yes", "max_iter"=>100)
# solver = Ipopt.Optimizer(); solver_config = Dict("print_level"=>0, "watchdog_shortened_iter_trigger"=>3, "max_iter"=>100)
# solver = KNITRO.Optimizer(); solver_config = Dict("outlev"=>3) # (Knitro is commercial software)
# solver = NLopt.Optimizer(); solver_config = Dict("algorithm"=>:LD_MMA, "maxeval"=>4000)
# solver = NLopt.Optimizer(); solver_config = Dict("algorithm"=>:LD_LBFGS, "maxeval"=>4000)

Fit the null model

PhenoFile = "../GRAB/extdata/simuLongPHENO.txt" # please copy the filepath of simuLongPHENO.txt.
LongPheno = CSV.read(PhenoFile, DataFrame)

nullmodel = trajgwas(@formula(LongPheno ~ 1 + AGE + GENDER),
                    @formula(LongPheno ~ 1 + AGE),
                    @formula(LongPheno ~ 1 + AGE + GENDER),
                    :IID,
                    LongPheno,
                    nothing;
                    solver=solver,
                    solver_config = solver_config)
# ******************************************************************************
# This program contains Ipopt, a library for large-scale nonlinear optimization.
# Ipopt is released as open source code under the Eclipse Public License (EPL).
# For more information visit https://github.com/coin-or/Ipopt
# ******************************************************************************
# run = 1, ‖Δβ‖ = 0.379288, ‖Δτ‖ = 0.076489, ‖ΔL‖ = 0.653537, status = LOCALLY_SOLVED, time(s) = 0.800000
# run = 2, ‖Δβ‖ = 0.015746, ‖Δτ‖ = 0.028828, ‖ΔL‖ = 0.169548, status = LOCALLY_SOLVED, time(s) = 0.086000
# Within-subject variance estimation by robust regression (WiSER)
# Mean Formula:
# LongPheno ~ 1 + AGE + GENDER
# Random Effects Formula:
# LongPheno ~ 1 + AGE
# Within-Subject Variance Formula:
# LongPheno ~ 1 + AGE + GENDER
# Number of individuals/clusters: 1000
# Total observations: 10515
# Fixed-effects parameters:
# ─────────────────────────────────────────────────────────
#                    Estimate  Std. Error       Z  Pr(>|Z|)
# ─────────────────────────────────────────────────────────
# β1: (Intercept)  37.2366      1.13462     32.82    <1e-99
# β2: AGE          -0.327652    0.022705   -14.43    <1e-46
# β3: GENDER        0.712463    0.094477     7.54    <1e-13
# τ1: (Intercept)  -1.33006     2.07856     -0.64    0.5222
# τ2: AGE           0.0566825   0.042218     1.34    0.1794
# τ3: GENDER        0.132336    0.0642011    2.06    0.0393
# ─────────────────────────────────────────────────────────
# Random effects covariance matrix Σγ:
#  "γ1: (Intercept)"  67.3923   -1.32643
#  "γ2: AGE"          -1.32643   0.0266415

Obtain model residuals of testing $\tau$_g = 0 (i.e., the within-subject variability)

rownames = nullmodel.ids

ResidMatFile_tau = split(PhenoFile,"simuLongPHENO.txt")[1] * "ResidMat.txt"

f2 = open(ResidMatFile_tau, "w")
writedlm(f2, ["SubjID" "Resid"])
for j in 1:length(nullmodel.data)
        Resid_tau = - sum(nm.data[j].diagDVRV)
        writedlm(f2, [rownames[j] Resid_tau])
end
close(f2)

ResidMat_tau = CSV.read(ResidMatFile_tau, DataFrame)
# 1000×2 DataFrame
#   Row │ SubjID    Resid      
#       │ String15  Float64    
# ──────┼──────────────────────
#     1 │ Subj-1      6.63029
#     2 │ Subj-10    10.1477
#     3 │ Subj-100    6.96377
#     4 │ Subj-101    2.00466
#     5 │ Subj-102    0.762797
#     6 │ Subj-103   -9.68518
#     7 │ Subj-104    7.93268
#   ⋮   │    ⋮          ⋮
#   995 │ f9_4       -9.50279
#   996 │ f9_5        1.54764
#   997 │ f9_6       -7.54868
#   998 │ f9_7        9.64257
#   999 │ f9_8        1.72898
#  1000 │ f9_9        0.151542
#              987 rows omitted

Note

The column names of ResidMatFile must be exactly SubjID in the first column and Resid in the second column.

Each subject should match its corresponding residual.

Above codes are mostly the same as in Linear mixed-effect models. In practice, model residuals for testing the mean and WS variability can be obtained simultaneously from the fitted WiSER model.

Within-subject variability

Quick start for TrajGWAS

Installation and library

Fit the null model

Obtain model residuals of testing $\tau$g = 0 (i.e., the within-subject variability)

Obtain model residuals of testing $\tau$_g = 0 (i.e., the within-subject variability)