# Workflow Examples

This section contains notebook pages for the main FGMB Atlas analysis workflow: predictor construction, RWAS association testing, and multi-group causal TWAS (M-cTWAS) fine-mapping.

The pages in this section are command illustrations for the analysis workflows used by the FGMB Atlas project. They are meant to document how the major pipeline stages are launched and what inputs they expect; they are not intended to be executed inside the Jupyter Book build environment.

In this documentation, a `molecular trait` refers to a source-specific molecular measurement panel used for predictor training. Each molecular trait is defined by both its biological context and molecular modality, such as a brain region, cell type, or cell subtype measured for gene expression, protein abundance, or splicing.

## Workflow Overview

The workflow examples are organized around three main analysis steps.

| Step | Page | Purpose |
| --- | --- | --- |
| 1 | [Build Expression Predictors](1_build_expression_predictor.ipynb) | Train molecular-trait prediction weights from xQTL data using univariate and multi-context modeling workflows. |
| 2 | [RWAS Association](2_rwas_association.ipynb) | Apply trained FGMB prediction weights to GWAS summary statistics to generate gene-molecular-trait RWAS association results. |
| 3 | [Multi-Group Causal TWAS Fine-Mapping](3_multi_group_causal_twas_finemapping.ipynb) | Integrate RWAS association statistics, molecular-trait specific prediction weights, GWAS summary-statistics, and LD reference to prioritize candidate causal genes, molecular traits, and SNP-level signals. |

Together, these steps move from molecular predictor construction to disease-trait association testing and then to downstream causal fine-mapping.

## Computing Dependencies

The commands in these workflow examples are intended to be run in an [`StatFunGen/xqtl-protocol`](https://github.com/StatFunGen/xqtl-protocol) analysis environment. For complete upstream workflow details, please refer to the the `StatFunGen/xqtl-protocol` repository at the [Computing environment setup](https://github.com/StatFunGen/xqtl-protocol#computing-environment-setup).

The protocol notebooks used by the examples:

- [`mnm_regression.ipynb`](https://github.com/StatFunGen/xqtl-protocol/blob/main/code/mnm_analysis/mnm_methods/mnm_regression.ipynb) for univariate and multi-context prediction-weight training.
- [`twas_ctwas.ipynb`](https://github.com/StatFunGen/xqtl-protocol/blob/main/code/pecotmr_integration/twas_ctwas.ipynb) for RWAS association testing and downstream cTWAS workflows.

Before launching production analyses, make sure the following software and resources are available.

| Requirement | Purpose |
| --- | --- |
| `xqtl-protocol` repository | Provides the SoS workflow notebooks, helper scripts, and protocol documentation used by the command examples. |
| SoS workflow engine | Runs notebook workflow sections such as `sos run ... susie_twas`, `sos run ... mnm`, and `sos run ... twas`. |
| Python environment for SoS | Provides the command-line interface and workflow execution layer. |
| R environment | Runs statistical modeling, RWAS testing, data handling, and plotting code inside the workflow sections. |
| Protocol R packages | Required for fine-mapping, RWAS-weight training, LD/GWAS harmonization, RWAS association testing, and result export. Typical packages include `pecotmr`, `susieR`, `data.table`, `dplyr`, `readr`, `tidyr`, and the multivariate modeling dependencies used by the active protocol release. |
| Genotype and molecular phenotype files | Required for building molecular-trait prediction models. Inputs should be harmonized to the samples, variants, regions, and covariates used for each FGMB context. |
| Tabix-indexed GWAS summary statistics | Required for regional extraction of GWAS z-scores during RWAS association testing. |
| LD reference files | Required for regional LD matrices, variant harmonization, RWAS testing, and causal fine-mapping. |
| FGMB/xQTL prediction weight metadata | Points RWAS and cTWAS workflows to trained molecular-trait prediction weights and context annotations. |
| Analysis-region files | Define cis windows, genes, LD blocks, or other regions used to coordinate genotype, GWAS, LD, and xQTL inputs. |

For a quick environment check, confirm that the main command-line tools are visible before launching a run.

```bash
which sos
which Rscript
which tabix
which python
```

The workflow pages below document command structure and expected file conventions. Exact package versions, containers, cluster settings, and input paths should follow the active FGMB analysis environment and the corresponding `xqtl-protocol` release.

```{tableofcontents}
```
