Workflow Examples

Workflow Examples#

This section contains notebook pages for the main FGMB Atlas analysis workflow: predictor construction, RWAS association testing, and multi-group causal TWAS (M-cTWAS) fine-mapping.

The pages in this section are command illustrations for the analysis workflows used by the FGMB Atlas project. They are meant to document how the major pipeline stages are launched and what inputs they expect; they are not intended to be executed inside the Jupyter Book build environment.

In this documentation, a molecular trait refers to a source-specific molecular measurement panel used for predictor training. Each molecular trait is defined by both its biological context and molecular modality, such as a brain region, cell type, or cell subtype measured for gene expression, protein abundance, or splicing.

Workflow Overview#

The workflow examples are organized around three main analysis steps.

Step

Page

Purpose

1

Build Expression Predictors

Train molecular-trait prediction weights from xQTL data using univariate and multi-context modeling workflows.

2

RWAS Association

Apply trained FGMB prediction weights to GWAS summary statistics to generate gene-molecular-trait RWAS association results.

3

Multi-Group Causal TWAS Fine-Mapping

Integrate RWAS association statistics, molecular-trait specific prediction weights, GWAS summary-statistics, and LD reference to prioritize candidate causal genes, molecular traits, and SNP-level signals.

Together, these steps move from molecular predictor construction to disease-trait association testing and then to downstream causal fine-mapping.

Computing Dependencies#

The commands in these workflow examples are intended to be run in an StatFunGen/xqtl-protocol analysis environment. For complete upstream workflow details, please refer to the the StatFunGen/xqtl-protocol repository at the Computing environment setup.

The protocol notebooks used by the examples:

Before launching production analyses, make sure the following software and resources are available.

Requirement

Purpose

xqtl-protocol repository

Provides the SoS workflow notebooks, helper scripts, and protocol documentation used by the command examples.

SoS workflow engine

Runs notebook workflow sections such as sos run ... susie_twas, sos run ... mnm, and sos run ... twas.

Python environment for SoS

Provides the command-line interface and workflow execution layer.

R environment

Runs statistical modeling, RWAS testing, data handling, and plotting code inside the workflow sections.

Protocol R packages

Required for fine-mapping, RWAS-weight training, LD/GWAS harmonization, RWAS association testing, and result export. Typical packages include pecotmr, susieR, data.table, dplyr, readr, tidyr, and the multivariate modeling dependencies used by the active protocol release.

Genotype and molecular phenotype files

Required for building molecular-trait prediction models. Inputs should be harmonized to the samples, variants, regions, and covariates used for each FGMB context.

Tabix-indexed GWAS summary statistics

Required for regional extraction of GWAS z-scores during RWAS association testing.

LD reference files

Required for regional LD matrices, variant harmonization, RWAS testing, and causal fine-mapping.

FGMB/xQTL prediction weight metadata

Points RWAS and cTWAS workflows to trained molecular-trait prediction weights and context annotations.

Analysis-region files

Define cis windows, genes, LD blocks, or other regions used to coordinate genotype, GWAS, LD, and xQTL inputs.

For a quick environment check, confirm that the main command-line tools are visible before launching a run.

which sos
which Rscript
which tabix
which python

The workflow pages below document command structure and expected file conventions. Exact package versions, containers, cluster settings, and input paths should follow the active FGMB analysis environment and the corresponding xqtl-protocol release.