# Atlas Overview

This section introduces the FGMB Atlas resource and summarizes the modeling strategy used to connect molecular QTL resources, prediction models, regulome-wide association studies (RWAS) association testing, and downstream causal fine-mapping.

In this documentation, a `molecular trait` refers to a source-specific molecular measurement panel used for predictor training. Each molecular trait is defined by both its biological context and molecular modality, such as a brain region, cell type, or cell subtype measured for gene expression, protein abundance, or splicing.

Core resource components include:

- Molecular QTL-derived prediction models for genetically regulated molecular traits.
- RWAS-ready summary resources for disease and aging-related brain traits.
- Cross-context annotations that connect signals across modalities, cohorts, and biological settings.
- Figure and workflow notebooks that document the manuscript analyses.


## Molecular Modalities

The current FGMB atlas includes the following molecular modalities:
- Gene expression
- Protein abundance
- Splicing regulation

Within the atlas, each molecular trait is defined as a unique context–modality combination, and these data are used to train genetically regulated expression prediction models for downstream RWAS and causal TWAS analyses. 



## ROSMAP Resources
ROSMAP provides the largest and most diverse set of molecular reference panels in FGMB. These data include both bulk tissue and single-nucleus resources across multiple brain contexts and modalities.
### Bulk tissue RNA-seq
Bulk RNA-seq gene expression data were incorporated from three ROSMAP brain regions:
- Dorsolateral prefrontal cortex (DLPFC), N = 777
- Posterior cingulate cortex (PCC), N = 441
- Anterior cingulate cortex (AC), N = 593
### Monocyte gene expression
FGMB also includes a peripheral blood-derived monocyte gene expression panel from ROSMAP:
- Monocyte (Mono), N = 226
### ROSMAP protein abundance
Protein abundance data from ROSMAP were incorporated for:
- DLPFC protein abundance (pQTL), N = 416
  
### ROSMAP splicing regulation
Splicing enrichment data derived from bulk RNA-seq were included for three ROSMAP brain regions:
- DLPFC splicing regulation (sQTL), N = 806
- PCC splicing regulation (sQTL), N = 449
- AC splicing regulation (sQTL), N = 603

### ROSMAP single-nucleus RNA-seq resources
FGMB includes several ROSMAP-derived single-nucleus RNA-seq resources from dorsolateral prefrontal cortex.

##### 1. CUIMC1 single-nucleus resource
The CUIMC1 dataset includes six major pseudo-bulk cell types of DLPFC region.
- Astrocytes (Ast), N = 419
- Inhibitory neurons (Inh), N = 419
- Excitatory neurons (Exc), N = 419
- Oligodendrocytes (Oli), N = 419
- Oligodendrocyte progenitor cells (OPC), N = 418
- Microglia (Mic), N = 419
  
##### 2. MIT single-nucleus resource
The MIT dataset includes major cell types and selected subtypes, with sample sizes ranging from 80 to 387:
- Astrocytes (Ast), N = 385
- Inhibitory neurons (Inh), N = 379
- Excitatory neurons (Exc), N = 386
- Oligodendrocytes (Oli), N = 387
- Oligodendrocyte progenitor cells (OPC), N = 383
- Microglia (Mic), N = 377
- Astrocyte subtype 10 (Ast_10), N = 113
- Microglia subtype 12 (Mic_12), N = 106
- Microglia subtype 13 (Mic_13), N = 80

##### 3. Mega single-nucleus resource
The mega-analysis resource integrates multiple single-nucleus datasets and includes six major cell types with sample sizes ranging from 735 to 737:
- Astrocytes (Ast), N = 737
- Inhibitory neurons (Inh), N = 736
- Excitatory neurons (Exc), N = 737
- Oligodendrocytes (Oli), N = 737
- Oligodendrocyte progenitor cells (OPC), N = 735
- Microglia (Mic), N = 733

## Mount Sinai Brain Bank (MSBB)
The Mount Sinai Brain Bank contributes bulk tissue molecular reference panels to FGMB. These data broaden regional representation beyond ROSMAP and add complementary bulk brain contexts for expression and protein prediction modeling.
1. MSBB gene expression panels
Bulk RNA-seq gene expression data were included from four cortical brain regions:
- Frontal pole (FP, Brodmann area 10), N = 274
- Superior temporal gyrus (STG, Brodmann area 22), N = 254
- Parahippocampal gyrus (PHG, Brodmann area 36), N = 230
- Inferior frontal gyrus (IFG, Brodmann area 44), N = 256

2. MSBB protein abundance panel
FGMB also includes protein abundance data from:
- Parahippocampal gyrus (PHG pQTL), N = 184

## Knight-ADRC
The Knight Alzheimer’s Disease Research Center contributes parietal cortex molecular reference panels to FGMB. These data add an independent aging-brain cohort and broaden atlas coverage across both cohort and modality.
1. Knight-ADRC gene expression panel
Gene expression reference data were included for:
- Parietal cortex (PC eQTL), N = 354
2. Knight-ADRC protein abundance panel
Protein abundance reference data were included for:
- Parietal cortex (PC pQTL), N = 412


## FGMB Molecular Trait Summary

```{list-table} Summary of FGMB molecular trait prediction models.
:header-rows: 1
:name: fgmb-molecular-trait-prediction-models

* - Source Dataset
  - Molecular Dataset
  - Context
  - Molecular Modality
  - Sample Size
  - Genes Trained
  - Imputable Genes
* - ROSMAP ([De Jager et al. 2018](https://www.nature.com/articles/sdata2018142))
  - DLPFC eQTL
  - Dorsolateral Prefrontal Cortex
  - Gene Expression
  - 777
  - 16,307
  - 9,948
* -
  - PCC eQTL
  - Posterior Cingulate Cortex
  - Gene Expression
  - 441
  - 16,110
  - 10,720
* -
  - AC eQTL
  - Anterior Cingulate Cortex
  - Gene Expression
  - 593
  - 16,104
  - 10,918
* -
  - Mono eQTL
  - Monocyte
  - Gene Expression
  - 226
  - 12,801
  - 5,397
* - ROSMAP ([Bennett et al. 2018](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6380522/))
  - DLPFC pQTL
  - Dorsolateral Prefrontal Cortex
  - Protein Expression
  - 416
  - 7,396
  - 3,925
* - ROSMAP ([Najar et al. 2025](https://www.biorxiv.org/content/10.1101/2025.04.06.646893v1))
  - DLPFC sQTL
  - Dorsolateral Prefrontal Cortex
  - Splicing Enrichment
  - 806
  - 12,474
  - 7,261
* -
  - PCC sQTL
  - Posterior Cingulate Cortex
  - Splicing Enrichment
  - 449
  - 12,663
  - 9,832
* -
  - AC sQTL
  - Anterior Cingulate Cortex
  - Splicing Enrichment
  - 603
  - 12,585
  - 8,472
* - ROSMAP ([Fujita et al. 2024](https://www.nature.com/articles/s41588-024-01685-y))
  - Ast eQTL CUIMC1
  - Astrocyte
  - Gene Expression
  - 419
  - 11,392
  - 6,023
* -
  - Inh eQTL CUIMC1
  - Inhibitory Neuron
  - Gene Expression
  - 419
  - 11,266
  - 6,609
* -
  - Exc eQTL CUIMC1
  - Excitatory Neuron
  - Gene Expression
  - 419
  - 11,111
  - 8,085
* -
  - Oli eQTL CUIMC1
  - Oligodendrocyte
  - Gene Expression
  - 419
  - 10,912
  - 5,595
* -
  - OPC eQTL CUIMC1
  - Oligodendrocyte Progenitor
  - Gene Expression
  - 418
  - 7,742
  - 3,582
* -
  - Mic eQTL CUIMC1
  - Microglia
  - Gene Expression
  - 419
  - 7,130
  - 3,071
* - ROSMAP ([Comandante-Lou et al. 2025](https://www.biorxiv.org/content/10.1101/2025.02.24.639868v1))
  - Ast eQTL MIT
  - Astrocyte
  - Gene Expression
  - 385
  - 9,125
  - 5,530
* -
  - Ast.10 eQTL
  - Astrocyte Subtype 10
  - Gene Expression
  - 113
  - 1,927
  - 1,927
* -
  - Inh eQTL MIT
  - Inhibitory Neuron
  - Gene Expression
  - 379
  - 10,760
  - 7,141
* -
  - Exc eQTL MIT
  - Excitatory Neuron
  - Gene Expression
  - 386
  - 10,645
  - 8,000
* -
  - Oli eQTL MIT
  - Oligodendrocyte
  - Gene Expression
  - 387
  - 10,021
  - 6,748
* -
  - OPC eQTL MIT
  - Oligodendrocyte Progenitor
  - Gene Expression
  - 383
  - 8,707
  - 5,163
* -
  - Mic eQTL MIT
  - Microglia
  - Gene Expression
  - 377
  - 5,404
  - 3,306
* -
  - Mic.12 eQTL MIT
  - Microglia Subtype 12
  - Gene Expression
  - 106
  - 702
  - 701
* -
  - Mic.13 eQTL MIT
  - Microglia Subtype 13
  - Gene Expression
  - 80
  - 692
  - 692
* - ROSMAP ([Comandante-Lou et al. 2025](https://www.biorxiv.org/content/10.1101/2025.02.24.639868v1))
  - Ast eQTL mega
  - Astrocyte
  - Gene Expression
  - 737
  - 7,742
  - 4,428
* -
  - Inh eQTL mega
  - Inhibitory Neuron
  - Gene Expression
  - 736
  - 9,577
  - 5,970
* -
  - Exc eQTL mega
  - Excitatory Neuron
  - Gene Expression
  - 737
  - 10,138
  - 7,856
* -
  - Oli eQTL mega
  - Oligodendrocyte
  - Gene Expression
  - 737
  - 8,196
  - 4,826
* -
  - OPC eQTL mega
  - Oligodendrocyte Progenitor
  - Gene Expression
  - 735
  - 5,897
  - 2,787
* -
  - Mic eQTL mega
  - Microglia
  - Gene Expression
  - 733
  - 3,514
  - 1,562
* - MSBB ([Wang et al. 2018](https://www.nature.com/articles/sdata2018185))
  - FP eQTL
  - Frontal Pole
  - Gene Expression
  - 274
  - 9,275
  - 8,088
* -
  - STG eQTL
  - Superior Temporal Gyrus
  - Gene Expression
  - 254
  - 9,275
  - 7,645
* -
  - PHG eQTL
  - Parahippocampal Gyrus
  - Gene Expression
  - 230
  - 9,275
  - 7,494
* -
  - IFG eQTL
  - Inferior Frontal Gyrus
  - Gene Expression
  - 256
  - 9,275
  - 7,997
* -
  - PHG pQTL
  - Parahippocampal Gyrus
  - Protein Expression
  - 184
  - 11,224
  - 8,325
* - Knight-ADRC ([Fernandez et al. 2024](https://www.nature.com/articles/s41597-024-03485-9))
  - PC eQTL
  - Parietal Cortex
  - Gene Expression
  - 354
  - 15,941
  - 9,070
* -
  - PC pQTL
  - Parietal Cortex
  - Protein Expression
  - 412
  - 1,018
  - 233
```


*Molecular Dataset* refers to a source-specific molecular measurement panel used for predictor training, such as a brain region, cell type, or cell subtype measured for gene expression, protein abundance, or splicing. *Context* denotes the tissue, cell type, or cell subtype, whereas *Molecular Modality* denotes the type of molecular phenotype being modeled.




