Modeling Strategy

Modeling Strategy#

The FGMB resource was generated by training molecular trait prediction models, or RWAS weights, across multiple aging-brain molecular QTL datasets. These models estimate the genetically regulated component of molecular traits, including gene expression, protein abundance, and splicing, across brain regions, cell types, cohorts, and molecular modalities. To accommodate differences in genetic architecture across molecular traits, FGMB used a competitive modeling strategy rather than relying on a single prediction method.

For each gene–molecular-trait pair, prediction models were trained using genotype and molecular trait data from the corresponding molecular reference panel. Model training was performed within TAD-boundary-enhanced cis-regulatory windows, combining gene-centered cis windows with generalized TAD-boundary domains defined by the companion FunGen-xQTL workflow. FGMB analyses were restricted to protein-coding genes.

Prediction Model Training#

FGMB evaluated eight prediction methods for molecular trait prediction. These included six univariate methods that model one molecular trait at a time and two multivariate methods that jointly model multiple molecular traits. The univariate methods included SuSiE, mr.ash, Lasso, PrediXcan-style Elastic-net, BayesL, and BayesR. The multivariate methods included mr.mash and mvSuSiE, which borrow information across related molecular traits to learn shared and context-specific regulatory effects.

This multi-method strategy was designed to capture a broad range of regulatory architectures. Sparse methods such as Lasso and SuSiE are useful when genetic regulation is driven by a small number of variants, while Bayesian shrinkage methods such as mr.ash, BayesL, and BayesR can accommodate moderately sparse or more polygenic effects. Multivariate methods such as mr.mash and mvSuSiE further leverage sharing across molecular contexts, which is particularly useful for multi-brain-region, multi-cell-type, and multi-modal molecular datasets.

Cross-Validation and Imputability Criteria#

Model performance was evaluated using five-fold cross-validation across all candidate prediction methods. For each gene–molecular-trait pair and method, prediction accuracy was assessed using cross-validation r² and the p-value from a linear regression of observed molecular trait values on predicted molecular trait values.

A gene–molecular-trait pair was considered imputable if at least one prediction method achieved cross-validation r² and \(p < 0.05\). Gene–molecular-trait pairs that did not meet this threshold were excluded from downstream RWAS and fine-mapping analyses to improve reliability of imputed molecular trait estimates.

Best-Performing Weight Selection#

For each imputable gene–molecular-trait pair, FGMB selected the best-performing prediction model based on cross-validation performance. RWAS association results were generated for all imputable models across the eight prediction methods. For downstream causal fine-mapping, we used the best-performing prediction weights together with their corresponding RWAS association results.

Downstream Use in RWAS and Causal TWAS (cTWAS) Fine-Mapping#

The selected FGMB weights can be applied to GWAS summary statistics to test gene-level gene–trait associations through RWAS. In this step, prediction weights, GWAS z-scores, and LD matrices are harmonized to the same variant set and allele orientation before computing RWAS association statistics.

For causal fine-mapping, FGMB integrates selected prediction weights with RWAS association results and LD information to prioritize candidate causal genes, molecular traits, and SNP-level signals. This step supports multi-group cTWAS (M-cTWAS) and single group cTWAS (cTWAS) fine-mapping, allowing association evidence to be evaluated jointly across molecular traits, contexts, and regulatory modalities.

The workflow examples in the next section provides an implementation-oriented outline of the FGMB analysis steps.