RWAS Association#

This page illustrates command templates for applying FGMB prediction weights to GWAS summary statistics for transcriptome-wide association testing. The examples are based on the StatFunGen/xqtl-protocol pipeline in xqtl-protocol, with focus on the twas SoS section used for RWAS association testing.

  • twas: test gene-level molecular-trait associations by combining xQTL-derived prediction weights, GWAS z-scores, and LD reference data. Pipeline tutorial link

The paths below are templates. Replace GWAS study names, metadata files, LD reference files, xQTL weight metadata, and region identifiers with files generated for each FGMB analysis.

Required Inputs#

The RWAS workflow expects harmonized GWAS, LD reference, genomic-region, and xQTL weight metadata. The input conventions follow the twas_ctwas.ipynb documentation in xqtl-protocol.

Input

Purpose

--gwas_meta_data

GWAS summary-statistics metadata table. This can point to one or more tabix-indexed GWAS files and optional column-mapping YAML files.

--ld_meta_data

LD reference metadata table with chromosome, region boundaries, LD matrix path, BIM path, and genome-build information.

--ld_reference_sample_size

Effective sample size of the LD reference panel used by the RWAS workflow.

--regions

LD block or analysis-region file. The workflow uses these regions to extract matching GWAS, LD, and xQTL weight data.

--xqtl_meta_data

xQTL prediction-weight metadata table. Essential columns include gene or molecular-trait coordinates, region_id, and paths to the trained weight databases.

--xqtl_type_table

Optional table mapping molecular contexts to molecular modality labels such as eQTL, pQTL, or sQTL.

--rsq_cutoff

Cross-validation adjusted R-squared cutoff used to define imputable gene–molecular-trait pairs.

--rsq_pval_cutoff

Cross-validation p-value cutoff used to define imputable gene–molecular-trait pairs.

--region-name

Optional single LD block or region identifier for smoke testing. Omit this or provide a larger region list for production runs.

The RWAS step assumes that molecular-trait prediction weights have already been trained and exported by the expression-predictor workflow.

RWAS Association: twas Example Command#

Use twas to apply trained FGMB/xQTL molecular prediction weights to GWAS summary statistics for RWAS association testing. The workflow extracts GWAS z-scores and matching LD information for each region, harmonizes variants and alleles, applies the available prediction-weight models, and exports gene-level molecular-trait-specific association statistics.

For input previews and detailed tutorials, please refer to the pipeline tutorial vignette.

The example below illustrates a simple RWAS command.

sos run ./xqtl-protocol/code/pecotmr_integration/twas_ctwas.ipynb twas \
   --cwd ../output/ --name FGMB_AD_RWAS \
   --gwas_meta_data data/rwas/gwas_meta.tsv \
   --ld_meta_data  resource/ADSP_R4_EUR_LD/ld_meta_file.tsv \
   --regions resource/EUR_LD_blocks.bed \
   --xqtl_meta_data data/rwas/ROSMAP_twas_wgw_xqtl_meta_data.tsv \
   --xqtl_type_table resource/data_type_table.txt \
   --rsq_pval_cutoff 0.05 --rsq_cutoff 0.01 \
   --region-name chr11_84267999_86714492 \
   -s build 

Expected Outputs#

The primary RWAS output is a table of gene–molecular-trait association statistics. The upstream workflow documents output columns such as:

gwas_study, chrom, start, end, block, gene, TSS, context, is_imputable, method, is_selected_method, rsq_adj_cv, pval_cv, twas_z, twas_pval

Key columns include:

Column

Meaning

gwas_study

GWAS study label from the GWAS metadata file.

gene / molecular_id

Molecular trait or gene identifier tested by RWAS.

context

Brain region, cell type, cohort, or molecular modality context for the prediction model.

method

Prediction-weight method used for the test.

is_imputable

Whether the gene-molecular-trait pair passed cross-validation performance filters.

is_selected_method

Whether the method was selected as the best-performing prediction model for that gene–molecular-trait pair.

rsq_adj_cv / pval_cv

Cross-validation performance statistics used for model filtering.

twas_z / twas_pval

RWAS association statistic and p-value.

block

LD block or analysis region used for harmonization and testing.

These outputs are the starting point for manuscript association summaries, cross-context RWAS heatmaps, and downstream causal RWAS fine-mapping.