Atlas Overview#

This section introduces the FGMB Atlas resource and summarizes the modeling strategy used to connect molecular QTL resources, prediction models, regulome-wide association studies (RWAS) association testing, and downstream causal fine-mapping.

In this documentation, a molecular trait refers to a source-specific molecular measurement panel used for predictor training. Each molecular trait is defined by both its biological context and molecular modality, such as a brain region, cell type, or cell subtype measured for gene expression, protein abundance, or splicing.

Core resource components include:

  • Molecular QTL-derived prediction models for genetically regulated molecular traits.

  • RWAS-ready summary resources for disease and aging-related brain traits.

  • Cross-context annotations that connect signals across modalities, cohorts, and biological settings.

  • Figure and workflow notebooks that document the manuscript analyses.

Molecular Modalities#

The current FGMB atlas includes the following molecular modalities:

  • Gene expression

  • Protein abundance

  • Splicing regulation

Within the atlas, each molecular trait is defined as a unique context–modality combination, and these data are used to train genetically regulated expression prediction models for downstream RWAS and causal TWAS analyses.

ROSMAP Resources#

ROSMAP provides the largest and most diverse set of molecular reference panels in FGMB. These data include both bulk tissue and single-nucleus resources across multiple brain contexts and modalities.

Bulk tissue RNA-seq#

Bulk RNA-seq gene expression data were incorporated from three ROSMAP brain regions:

  • Dorsolateral prefrontal cortex (DLPFC), N = 777

  • Posterior cingulate cortex (PCC), N = 441

  • Anterior cingulate cortex (AC), N = 593

Monocyte gene expression#

FGMB also includes a peripheral blood-derived monocyte gene expression panel from ROSMAP:

  • Monocyte (Mono), N = 226

ROSMAP protein abundance#

Protein abundance data from ROSMAP were incorporated for:

  • DLPFC protein abundance (pQTL), N = 416

ROSMAP splicing regulation#

Splicing enrichment data derived from bulk RNA-seq were included for three ROSMAP brain regions:

  • DLPFC splicing regulation (sQTL), N = 806

  • PCC splicing regulation (sQTL), N = 449

  • AC splicing regulation (sQTL), N = 603

ROSMAP single-nucleus RNA-seq resources#

FGMB includes several ROSMAP-derived single-nucleus RNA-seq resources from dorsolateral prefrontal cortex.

1. CUIMC1 single-nucleus resource#

The CUIMC1 dataset includes six major pseudo-bulk cell types of DLPFC region.

  • Astrocytes (Ast), N = 419

  • Inhibitory neurons (Inh), N = 419

  • Excitatory neurons (Exc), N = 419

  • Oligodendrocytes (Oli), N = 419

  • Oligodendrocyte progenitor cells (OPC), N = 418

  • Microglia (Mic), N = 419

2. MIT single-nucleus resource#

The MIT dataset includes major cell types and selected subtypes, with sample sizes ranging from 80 to 387:

  • Astrocytes (Ast), N = 385

  • Inhibitory neurons (Inh), N = 379

  • Excitatory neurons (Exc), N = 386

  • Oligodendrocytes (Oli), N = 387

  • Oligodendrocyte progenitor cells (OPC), N = 383

  • Microglia (Mic), N = 377

  • Astrocyte subtype 10 (Ast_10), N = 113

  • Microglia subtype 12 (Mic_12), N = 106

  • Microglia subtype 13 (Mic_13), N = 80

3. Mega single-nucleus resource#

The mega-analysis resource integrates multiple single-nucleus datasets and includes six major cell types with sample sizes ranging from 735 to 737:

  • Astrocytes (Ast), N = 737

  • Inhibitory neurons (Inh), N = 736

  • Excitatory neurons (Exc), N = 737

  • Oligodendrocytes (Oli), N = 737

  • Oligodendrocyte progenitor cells (OPC), N = 735

  • Microglia (Mic), N = 733

Mount Sinai Brain Bank (MSBB)#

The Mount Sinai Brain Bank contributes bulk tissue molecular reference panels to FGMB. These data broaden regional representation beyond ROSMAP and add complementary bulk brain contexts for expression and protein prediction modeling.

  1. MSBB gene expression panels Bulk RNA-seq gene expression data were included from four cortical brain regions:

  • Frontal pole (FP, Brodmann area 10), N = 274

  • Superior temporal gyrus (STG, Brodmann area 22), N = 254

  • Parahippocampal gyrus (PHG, Brodmann area 36), N = 230

  • Inferior frontal gyrus (IFG, Brodmann area 44), N = 256

  1. MSBB protein abundance panel FGMB also includes protein abundance data from:

  • Parahippocampal gyrus (PHG pQTL), N = 184

Knight-ADRC#

The Knight Alzheimer’s Disease Research Center contributes parietal cortex molecular reference panels to FGMB. These data add an independent aging-brain cohort and broaden atlas coverage across both cohort and modality.

  1. Knight-ADRC gene expression panel Gene expression reference data were included for:

  • Parietal cortex (PC eQTL), N = 354

  1. Knight-ADRC protein abundance panel Protein abundance reference data were included for:

  • Parietal cortex (PC pQTL), N = 412

FGMB Molecular Trait Summary#

Table 1 Summary of FGMB molecular trait prediction models.#

Source Dataset

Molecular Dataset

Context

Molecular Modality

Sample Size

Genes Trained

Imputable Genes

ROSMAP (De Jager et al. 2018)

DLPFC eQTL

Dorsolateral Prefrontal Cortex

Gene Expression

777

16,307

9,948

PCC eQTL

Posterior Cingulate Cortex

Gene Expression

441

16,110

10,720

AC eQTL

Anterior Cingulate Cortex

Gene Expression

593

16,104

10,918

Mono eQTL

Monocyte

Gene Expression

226

12,801

5,397

ROSMAP (Bennett et al. 2018)

DLPFC pQTL

Dorsolateral Prefrontal Cortex

Protein Expression

416

7,396

3,925

ROSMAP (Najar et al. 2025)

DLPFC sQTL

Dorsolateral Prefrontal Cortex

Splicing Enrichment

806

12,474

7,261

PCC sQTL

Posterior Cingulate Cortex

Splicing Enrichment

449

12,663

9,832

AC sQTL

Anterior Cingulate Cortex

Splicing Enrichment

603

12,585

8,472

ROSMAP (Fujita et al. 2024)

Ast eQTL CUIMC1

Astrocyte

Gene Expression

419

11,392

6,023

Inh eQTL CUIMC1

Inhibitory Neuron

Gene Expression

419

11,266

6,609

Exc eQTL CUIMC1

Excitatory Neuron

Gene Expression

419

11,111

8,085

Oli eQTL CUIMC1

Oligodendrocyte

Gene Expression

419

10,912

5,595

OPC eQTL CUIMC1

Oligodendrocyte Progenitor

Gene Expression

418

7,742

3,582

Mic eQTL CUIMC1

Microglia

Gene Expression

419

7,130

3,071

ROSMAP (Comandante-Lou et al. 2025)

Ast eQTL MIT

Astrocyte

Gene Expression

385

9,125

5,530

Ast.10 eQTL

Astrocyte Subtype 10

Gene Expression

113

1,927

1,927

Inh eQTL MIT

Inhibitory Neuron

Gene Expression

379

10,760

7,141

Exc eQTL MIT

Excitatory Neuron

Gene Expression

386

10,645

8,000

Oli eQTL MIT

Oligodendrocyte

Gene Expression

387

10,021

6,748

OPC eQTL MIT

Oligodendrocyte Progenitor

Gene Expression

383

8,707

5,163

Mic eQTL MIT

Microglia

Gene Expression

377

5,404

3,306

Mic.12 eQTL MIT

Microglia Subtype 12

Gene Expression

106

702

701

Mic.13 eQTL MIT

Microglia Subtype 13

Gene Expression

80

692

692

ROSMAP (Comandante-Lou et al. 2025)

Ast eQTL mega

Astrocyte

Gene Expression

737

7,742

4,428

Inh eQTL mega

Inhibitory Neuron

Gene Expression

736

9,577

5,970

Exc eQTL mega

Excitatory Neuron

Gene Expression

737

10,138

7,856

Oli eQTL mega

Oligodendrocyte

Gene Expression

737

8,196

4,826

OPC eQTL mega

Oligodendrocyte Progenitor

Gene Expression

735

5,897

2,787

Mic eQTL mega

Microglia

Gene Expression

733

3,514

1,562

MSBB (Wang et al. 2018)

FP eQTL

Frontal Pole

Gene Expression

274

9,275

8,088

STG eQTL

Superior Temporal Gyrus

Gene Expression

254

9,275

7,645

PHG eQTL

Parahippocampal Gyrus

Gene Expression

230

9,275

7,494

IFG eQTL

Inferior Frontal Gyrus

Gene Expression

256

9,275

7,997

PHG pQTL

Parahippocampal Gyrus

Protein Expression

184

11,224

8,325

Knight-ADRC (Fernandez et al. 2024)

PC eQTL

Parietal Cortex

Gene Expression

354

15,941

9,070

PC pQTL

Parietal Cortex

Protein Expression

412

1,018

233

Molecular Dataset refers to a source-specific molecular measurement panel used for predictor training, such as a brain region, cell type, or cell subtype measured for gene expression, protein abundance, or splicing. Context denotes the tissue, cell type, or cell subtype, whereas Molecular Modality denotes the type of molecular phenotype being modeled.