Using blood transcriptome analysis for Alzheimer's disease diagnosis and patient stratification
Huan Zhong and Xiaopu Zhou contributed equally to this work.
Abstract
INTRODUCTION
Blood protein biomarkers demonstrate potential for Alzheimer's disease (AD) diagnosis. Limited studies examine the molecular changes in AD blood cells.
METHODS
Bulk RNA-sequencing of blood cells was performed on AD patients of Chinese descent (n = 214 and 26 in the discovery and validation cohorts, respectively) with normal controls (n = 208 and 38 in the discovery and validation cohorts, respectively). Weighted gene co-expression network analysis (WGCNA) and deconvolution analysis identified AD-associated gene modules and blood cell types. Regression and unsupervised clustering analysis identified AD-associated genes, gene modules, cell types, and established AD classification models.
RESULTS
WGCNA on differentially expressed genes revealed 15 gene modules, with 6 accurately classifying AD (areas under the receiver operating characteristics curve [auROCs] > 0.90). These modules stratified AD patients into subgroups with distinct disease states. Cell-type deconvolution analysis identified specific blood cell types potentially associated with AD pathogenesis.
DISCUSSION
This study highlights the potential of blood transcriptome for AD diagnosis, patient stratification, and mechanistic studies.
Highlights
- We comprehensively analyze the blood transcriptomes of a well-characterized Alzheimer's disease cohort to identify genes, gene modules, pathways, and specific blood cells associated with the disease.
- Blood transcriptome analysis accurately classifies and stratifies patients with Alzheimer's disease, with some gene modules achieving classification accuracy comparable to that of the plasma ATN biomarkers.
- Immune-associated pathways and immune cells, such as neutrophils, have potential roles in the pathogenesis and progression of Alzheimer's disease.
1 BACKGROUND
Alzheimer's disease (AD), one of the most common aging-associated diseases, has become a leading cause of death worldwide.1 Despite its huge socioeconomic burden, intervention strategies for the disease are limited. Current diagnosis mostly relies on physicians’ subjective judgements based on family history and cognitive tests. However, those diagnostic methods are often insufficiently sensitive to detect the early stages of the disease or insufficiently specific to distinguish AD from other types of dementia. Meanwhile, the implementation of positron emission tomography (PET) and lumbar puncture for detecting AD hallmarks, including amyloid plaques, tau tangles, and neurodegeneration—collectively termed the “ATN” biomarker panel—can quantitatively assess disease severity, greatly improving the sensitivity and accuracy of AD diagnosis. In particular, the sequential changes in the ATN biomarkers enable us to examine the detailed molecular phenotypic changes during disease progression, which facilitates disease staging, intervention, and the development of tailored intervention strategies. Furthermore, the recent identification of blood-based biomarkers that reflect the respective changes of the ATN biomarkers in the brain, including amyloid-beta (Aβ), phosphorylated tau at threonine 181 (p-tau181), and neurofilament light polypeptide (NfL), has revolutionized the diagnosis and monitoring of AD.2-5 However, it is unclear if these biomarkers fully reflect the various disease states during AD progression. Moreover, the ATN biomarkers might not provide a detailed picture of the molecular mechanisms that underlie AD progression.
The clinical use of blood-based diagnostics for AD is being intensely investigated because of the ease of access to blood samples. Besides the blood ATN biomarkers, the expressions of hundreds of plasma proteins—many of which are expressed by blood cells—are altered in patients with AD.6-8 Therefore, in parallel to the plasma ATN biomarkers, blood cells may also undergo pathophysiological changes along AD progression.
Transcriptome analysis is a widely used method for studying how genes behave in different biological contexts, including the brains of individuals with AD. Specifically, analyses of the brain transcriptome in AD provide a comprehensive view of the molecular changes in genes, pathways, gene modules, and cell types.9-15 For example, co-expression network analysis of transcriptomic data has revealed that the immune system is involved in AD,14 while cell-type deconvolution analysis has shown a decrease in the number of neurons and an increase in the numbers of microglia and astrocytes within the AD brain.15 Specific gene sets from transcriptomic data can also distinguish AD-affected brains from undemented brain tissues.16 Considering the changes in blood cells that occur upon AD onset and progression, transcriptome analysis may reveal molecular changes in the blood cells of patients with AD, providing critical insights into the disease's pathophysiological mechanisms. Furthermore, recent studies demonstrate that blood transcriptome analysis can aid the diagnosis of uncommon muscle and mitochondrial disorders, underscoring its potential utility for the diagnosis and characterization of diseases, including AD.17 Hence, applying blood transcriptome analysis to AD studies may also reveal new diagnostic biomarkers. Nevertheless, few studies involving transcriptome analysis have investigated AD blood cells or the application of blood transcriptomic data to disease diagnosis or stratification. Therefore, the present study aims to assess the potential usage of blood transcriptome analysis for AD, including delineating the molecular phenotypic changes in blood cells as well as clinical diagnosis, stratification, and monitoring.
2 METHODS
2.1 Study cohorts
Two cohorts of Hong Kong Chinese individuals aged ≥60 years were included in this study. The discovery cohort (n = 422) comprised 214 patients with AD and 208 normal controls (NCs) who visited the Specialist Outpatient Department of the Prince of Wales Hospital of the Chinese University of Hong Kong from April 2013 to February 2018. The validation cohort (n = 64) included 38 NCs and 26 patients with AD with plasma p-tau181 levels ≤2.55 and > 2.55 pg/mL, respectively (the p-tau181 cutoff was based on our previous publication8). The validation cohort comprised 18 participants (14 patients with AD and 4 NCs) from the Specialist Outpatient Department of the Prince of Wales Hospital of the Chinese University of Hong Kong recruited from April 2013 to February 2018; 15 participants (12 patients with AD and 3 NCs) who visited Queen Elizabeth Hospital from February 2018 to March 2020; and 31 patients with AD who visited the Community CareAge Foundation or Haven of Hope Christian Service from October 2019 to January 2020 (Table S1).
All participants recruited from hospitals underwent medical history assessment, clinical assessment, and cognitive and functional assessment (i.e., the Montreal Cognitive Assessment [MoCA]).18 Participants with any psychiatric disorder or significant neurological disease other than AD were excluded. For participants recruited from the Specialist Outpatient Department of the Prince of Wales Hospital of the Chinese University of Hong Kong, the clinical diagnosis of AD was based on the American Psychiatric Association's Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-5).19 Participants recruited from Queen Elizabeth Hospital also underwent neuroimaging assessment by MRI.20 For these participants, the clinical diagnosis of AD was based on the US National Institute on Aging and Alzheimer's Association (NIA-AA) workgroup 2011 revised criteria.21, 22 In addition, the participants recruited from the Community CareAge Foundation or Haven of Hope Christian Service (representing population-level NCs), underwent medical history assessment as well as cognitive and functional assessment with the MoCA.18
RESEARCH IN CONTEXT
-
Systematic review: The authors reviewed the literature using traditional sources (e.g., PubMed). While the utility of blood transcriptome analysis for investigating disease mechanisms, diagnosis, and stratification has not been extensively researched, a few publications describe its applications in Alzheimer's disease. These relevant citations are cited appropriately.
-
Interpretation: Our findings suggest that there are molecular changes in the blood of patients with Alzheimer's disease, including alterations to genes, gene modules, and biological pathways—particularly immune-associated pathways and certain blood cell types. In addition, the results demonstrate the potential use of blood transcriptomic data to classify and stratify patients with Alzheimer's disease.
-
Future directions: This study presents evidence of systemic alterations to the molecular phenotypes of blood cells in Alzheimer's disease. It also suggests the potential utility of blood transcriptomic data for investigating the disease's pathophysiological mechanisms and developing tools for diagnosis and stratification. Future studies might focus on the specific involvement of different blood cell types in the pathogenesis and progression of Alzheimer's disease as well as the use of blood transcriptome analysis for diagnosis and stratification in diverse ethnic groups and investigating other diseases.
This study was approved by the Clinical Research & Ethics Committees of the Joint Chinese University of Hong Kong-New Territories East Cluster for the Prince of Wales Hospital (CREC Ref. No. 2015.461), the Kowloon Central Cluster/Kowloon East Cluster for Queen Elizabeth Hospital (KC/KE-15-0024/FR-3), Ethics and Research Committee of Haven of Hope Christian Service and the Human Participants Research Panel of The Hong Kong University of Science and Technology (CRP#180 & CRP#225). All participants provided written informed consent for both study participation and sample collection.
2.2 Blood transcriptome sequencing and analysis
RNA samples were extracted with PAXgene Blood RNA Tubes using PAXgene Blood miRNA Extraction kits (Qiagen, PreAnalytiX GmbH, Hilden, Germany). For each participant, 3 to 10 μg total RNA, quantified by a BioDrop DUO machine (BioDrop Ltd., UK), was treated with a GLOBINclear-Human Kit (Invitrogen, Waltham, MA, USA) to remove hemoglobin transcripts. Then, 2 to 5-μg cleaned RNA samples were subjected to RNA sequencing by Novogene (Beijing, China) using an Illumina NovaSeq 6000 platform. For each participant, 40 million 150-bp paired-end reads were produced.
The raw sequencing reads were first subjected to FastQC (www.bioinformatics.babraham.ac.uk/projects/fastqc/) to evaluate data quality and then to Trimmomatic23 to trim and filter low-quality reads. The cleaned reads were then mapped to the GRCh37 human reference genome (GRCh37.87 GTF file, sourced from ENSEMBL) using the STAR aligner (“spliced transcripts alignment to a reference”; version 2.5.3a)24 with the default settings. Gene and transcript abundance were then quantified by analyzing the mapped BAM files with Stringtie2 version 2.1.4.25 A total of 63,677 genes from ENSEMBL encompassing different categories were analyzed, including 22,810 protein-coding genes, pseudogenes, and long noncoding RNAs. There are 28,010 genes remaining after excluding genes with low expression (with at least one read count in each sample).
2.2.1 Differential expression analysis
For the 28,010 genes that passed the quality control and low-abundance filtering, the raw read counts were subjected to DESeq226 to determine the genes that were differentially expressed in patients with AD compared to NCs, with age, sex, and population structure (represented by the top five principal components from the genomic data) included as covariates. A cutoff (i.e., adjusted p < 0.1 calculated using the Benjamini and Hochberg method) was applied to identify the 12,837 differentially expressed genes.
2.2.2 Gene ontology enrichment analysis
The online tool PANTHER version 14.027 was utilized to perform Gene Ontology and pathway enrichment analyses on the differentially expressed genes (i.e., upregulated, downregulated, or from specific gene modules). Gene Ontology terms with a false discovery rate (FDR) < 0.05 were considered statistically significant.
2.2.3 Co-expression network analysis
WGCNA28 was applied to construct co-expression networks and identify genes that are co-regulated in human blood. Before analysis, the lm() function was applied to regress out the confounding effects of age and sex on gene expression matrix. The resultant clean gene expression matrix (measured as transcripts per million [TPM]) was then subjected to WGCNA, with a signed similarity matrix (i.e., Pearson's correlation coefficients) generated to represent the strength of the correlations among the gene expression profiles. Gene modules were identified using the default settings, and those with at least 30 genes were retained for downstream analysis. To determine the correlations between each gene module and phenotype or other AD-associated endophenotypes (e.g., plasma biomarkers), the first principal component of the module's gene expression (termed the “module eigengene”) was used.
2.2.4 Disease classification analysis
The clean gene expression matrix (quantified in TPM) mentioned above was subjected to a generalized linear model using the glm function in the R Stats Package version 3.6.0. The trained models were then applied to both the discovery and validation cohorts, and the model output—termed the “module score”—was used to evaluate the accuracy of AD classification. As an indicator of model performance, the area under the receiver operating characteristics curve (auROC) was calculated using the roc function in the R pROC package.29
2.2.5 Participant stratification analysis
The predicted score for each selected gene module—M01, M02, M05, M09, M13, and M15—was subjected to k-means clustering using the kmeans() function in the R stats package to determine the subgroups of participants in the discovery cohort. The optimal number of clusters was determined using the elbow method, which was implemented using the fviz_nbclust() function in the R factoextra package.30 Then, for uniform manifold approximation and projection (UMAP) analysis, the module score was subjected to the umap() function in the umap package in R to project individual participants into a two-dimensional plane for visualization.
2.2.6 Cell-type deconvolution analysis
The lm() function was applied to regress out the confounding effects of age and sex on gene expression before analysis. The clean gene expression matrix (measured in TPM) was subjected to CIBERSORTx software31 for cell-type deconvolution analysis using the LM22 signature matrix as a reference. One-tailed robust regression analysis was performed to examine the associations between gene modules (represented by predicted scores) and cell-type enrichment scores (obtained from CIBERSORTx) using data from the NCs in the discovery cohort. Association analysis was conducted to identify cell types that were altered in AD using multiple linear regression, taking all the analyzed cell types jointly and with age and sex included as covariates.
2.2.7 Functions and tools for statistical analysis
Linear regression was performed using the lm() function in the R stats package, and robust regression analysis was performed using the lmrob() function in the R robustbase package.32
2.3 Calculation of population structure using whole-genome sequencing data
Whole-genome sequencing data for the studied participants were retrieved from previous studies.33, 34 In brief, whole-genome sequencing (5× coverage with 150-bp paired-end reads) was performed by Novogene on an Illumina HiSeq × Ten and NovaSeq platform (San Diego, CA, USA). The GotCloud pipeline35 was adopted to detect variants from our whole-genome sequencing data. In brief, the sequencing data were subjected to FastQC (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) for quality control and Trimmomatic36 to trim and filter low-quality reads. The clean data were mapped to the GRCh37 reference genome containing decoy fragments using BWA-mem. The data were then subjected to the GotCloud pipeline for data processing and variant detection using the default settings.37 The clean genotype files were subsequently to Beagle38 and Thunder39 for genotyping refinement. After passing the GATK Variant Quality Score Recalibration and minor allele frequency filter (> 5%), the common variants underwent linkage disequilibrium pruning using PLINK software (version 1.9). The linkage disequilibrium-pruned variants were subsequently utilized for principal component analysis with PLINK to identify population structure.
2.4 Measurement of plasma protein biomarker levels
Data for plasma biomarkers, including the Aβ42/40 ratio and total tau, p-tau181, and NfL levels, were obtained from a previous publication.40 Among the 422 participants in the discovery cohort, 184 including 100 patients with AD and 84 NCs had available plasma biomarker data. Plasma biomarker data were available for all 64 participants in the validation cohort.
For brain imaging analysis, T1-weighted magnetization-prepared rapid gradient-echo (MPRAGE) and fluid-attenuated inversion recovery (FLAIR) sequences were retrieved for 54 patients with AD and 85 NCs from the Prince of Wales Hospital in the discovery cohort. The raw imaging files were deidentified and sent to BrainNow Medical Technology (HKSAR, China) to analyze the volumes of different brain regions and white matter hyperintensity levels.
2.5 Single-cell analysis
2.5.1 Study participants
Five additional elderly Hong Kong Chinese individuals (all females aged 69 to 83 years) were recruited for the blood single-cell RNA sequencing analysis. Three patients with AD or mild cognitive impairment were recruited from the Specialist Outpatient Department of the Prince of Wales Hospital at the Chinese University of Hong Kong, and two individuals without a history of major disease were recruited from the Prince of Wales Hospital. Patients were diagnosed with AD or mild cognitive impairment according to the DSM-5 criteria.19 This study was approved by the Clinical Research & Ethics Committees of Joint Chinese University of Hong Kong-New Territories East Cluster for the Prince of Wales Hospital (CREC Ref. No. 2015.461) and the Human Participants Research Panel of the Hong Kong University of Science and Technology (CRP#180 and CRP#225). All participants provided written informed consent for both study participation and sample collection.
2.5.2 Blood collection and leukocyte isolation
Whole-blood samples (2 mL) were collected from individuals using K3EDTA tubes (VACUETTE; Greiner Bio-One). To lyse red blood cells, the blood samples were treated with 40 mL ACK Lysing Buffer (A1049201, Thermo Fisher) for 5 min at room temperature. The remaining cells were then washed with 10 mL phosphate buffered saline (PBS) and pelleted by centrifugation at 300 × g for 5 min at room temperature. For a second round of red blood cell lysis, the cell pellets were then resuspended in 15 mL ACK Lysing Buffer (A1049201) and incubated for 5 min at room temperature. After a second washing step with PBS and centrifugation, the cell pellets were resuspended in 2 mL PBS with 0.04% BSA and filtered with a 40-μm cell strainer (352340, Corning). The cell suspensions were then assessed for cell viability and counted under a microscope. The concentration was adjusted to 1000 cells per μL with PBS and 0.04% bovine serum albumin (BSA).
2.5.3 Single-cell RNA sequencing library construction and sequencing
We prepared scRNA-seq libraries using a Chromium Next GEM Single Cell 3′ Library & Gel Bead Kit v3.1 (1000121; 10x Genomics) according to the manufacturer's protocol. In brief, 10 μL cell suspension (10,000 cells) was mixed with reverse-transcription reagents and loaded into a chip to partition single cells into droplets. The droplets were incubated on a thermocycler to generate barcoded cDNA from polyadenylated mRNA by reverse-transcription. Libraries were constructed from cDNA according to the manufacturer's instructions. The library concentration was measured using Qubit (Thermo Fisher Scientific), and fragment lengths were measured using a Fragment Analyzer (Advanced Analytical Technologies). The libraries were then sent to Novogene for sequencing on a NovaSeq 6000 system, which generated 150-bp paired-end reads for downstream analysis.
2.5.4 Single-cell RNA sequencing preprocessing and quality control
Demultiplexed FASTQ files were aligned to the hg19 reference genome using CellRanger version 6.1.2 to obtain gene counts. Single cells with ≥200 unique molecular identifiers and genes detected in ≥5 cells were selected for analysis. For quality control, cell-free mRNA contamination was estimated and removed using the SoupX package.41 Potential dead cells (i.e., ≥ 10% mitochondrial genes or ≤200 features) and multiplets (i.e., aggregation of two or more cells into single droplets; ≥5000 features) were excluded using Seurat version 4.0.6.42 After filtering, 19,958 cells remained in the dataset.
2.5.5 Single-cell RNA sequencing integration, cell clustering, and cell-type annotation
The gene count matrices of each sample were normalized and integrated with the reciprocal principal component analysis and SCTransform-based workflows in Seurat. In brief, variable features were identified and normalized using the SCTransform function. The first 30 principal components from reciprocal principal component analysis were selected to identify integration anchors between samples using the FindIntegrationAnchors function. The samples were then integrated into a single Seurat object using the anchors via the IntegrateData function. We performed principal component analysis on the integrated data and selected the first 30 principal components for cluster identification using k-nearest neighbor algorithm. We identified the differentially expressed genes of each cluster using the Wilcoxon rank-sum test with the FindAllMarkers function. Only genes with a log2(fold-change) > 0.25 that were expressed in ≥ 10% of cells in the cluster were analyzed. Known cell-type marker genes among the differentially expressed genes were used to assign clusters to 13 cell types based on the literature.43, 44
2.6 Data visualization
A heatmap of the differentially expressed genes was generated using the heatmap.2() function in the ggplot package in R. The heatmap of the co-regulation of gene expression was generated using the TOMplot function in the WGCNA package in R. Bar charts, heatmaps of the association results between gene modules and disease-associated endophenotypes, and receiver operating characteristic curves were generated using GraphPad Prism version 8.0.2. Radar plots were generated using the ggradar() function using the R ggradar package, and Ridgeline plots were generated using the geom_ridgeline() function in the R ggridges package.
3 RESULTS
3.1 Blood transcriptome analysis reveals dysregulated genes and pathways in patients with AD
To understand the molecular changes in the blood cells of patients with AD, we performed bulk RNA-seq on blood cells from an AD cohort recruited in Hong Kong,45, 46 comprising 422 participants (patients with AD: n = 214, normal controls [NCs]: n = 208), hereafter referred to as the “discovery cohort” (Table S1; see Figure 1 for study design). Based on these RNA-seq data, we first identified genes that were dysregulated in the blood in AD by using a generalized linear model with age, sex, and population structure (i.e., the top five principal components estimated from genomic data) as covariates. Out of 28,010 genes detected in the blood, 12,837 were differentially expressed between the patients with AD and NCs, including 7736 upregulated and 5101 downregulated genes (adjusted p < 0.1; Figure 2A, Table S2). We subsequently conducted Gene Ontology enrichment analysis to examine the biological processes associated with those differentially expressed genes. The upregulated genes are involved in the metabolic process (false discovery rate [FDR] = 1.22 × 10−16), translation (FDR = 3.99 × 10−10) and oxidative phosphorylation (FDR = 5.31 × 10−4; Figure 2B, Table S3), whereas the downregulated genes are involved in the cell cycle (FDR = 1.77 × 10−17), histone modification (FDR = 3.18 × 10−8), and immune-associated pathways such as autophagy (FDR = 4.59 × 10−5) and lymphocyte activation (FDR = 6.20 × 10−4; Figure 2B, Table S4). Hence, by identifying which genes and pathways are altered in blood cells in AD, our results suggest the existence of molecular changes in the blood cells of patients with AD.
To validate the observed gene expression changes in AD blood, we repeated the RNA-seq and differential expression analyses on blood cells from an independent cohort (n = 64; comprising 26 p-tau–positive patients with AD and 38 p-tau–negative NCs)—hereafter referred to as the “validation cohort”—with a stricter diagnosis based on the examination of plasma p-tau181 levels. The changes in transcript levels of those differentially expressed genes (i.e., the fold-change between the AD and NC groups) between the discovery and validation cohorts were strongly correlated (R2 = 0.89, p < 2.2 × 10−16). This finding validates the observed molecular changes in the blood cells of patients with AD (Figure 2C, Table S5). Furthermore, we examined the expression changes of key AD risk genes implicated in a genome-wide association study (GWAS) of both the discovery and validation cohorts.47 Many of these key AD GWAS risk genes, such as apolipoprotein E (APOE), ABCA7, and CR1, which are associated with inflammation and immune function, were differentially expressed at the gene level in the blood of patients with AD in the discovery cohort, with a consistent trend of dysregulation in the validation cohort (Figure 2D). Indeed, single-cell RNA-seq studies have provided compelling evidence suggesting that APOE is expressed in specific blood cell types, particularly in human macrophages and mouse leukocytes.48, 49 We extended the analysis to all genes that resided in AD GWAS risk loci based on the annotations from Kunkle et al.50 Notably, for specific AD risk loci, multiple genes are significantly dysregulated in both the discovery and validation cohorts (e.g., ABCA7, CR1, HLA-DRB1, MS4A2, MYAP1, and SPI1; Figure S1).50 These findings support the findings of previous Gene Ontology analyses that highlight the dysregulation of immune-associated pathways in AD, further indicating that the molecular changes in blood cells are associated with AD pathogenesis.
3.2 Gene modules in blood cells are involved in diverse biological processes in AD
Given the involvement of the above-mentioned differentially expressed genes in diverse biological pathways, it is important to identify gene modules to focus on understanding the key biological processes in AD. As genes related to a given biological process likely have overlapping regulatory mechanisms and similar expression patterns, we performed WGCNA to identify modules of co-expressed genes that are altered in the blood from patients with AD and whose expression levels are highly correlated. In brief, we constructed a co-expression network from the 12,837 differentially expressed genes in the blood samples from the discovery cohort and identified 15 AD-associated gene modules, which we designated M01–M15 (Figure 3A, B; Tables S6-19). Interestingly, Gene Ontology analysis revealed that many of these gene modules are involved in immune-associated pathways. For example, M07 is involved in B cell activation (FDR = 1.04 × 10−5), M09 in adaptive immune response (FDR = 3.96 × 10−11), M11 in innate immune response (FDR = 5.82 × 10−24), and M15 in myeloid leukocyte activation (FDR = 1.53 × 10−5). These results further corroborate the dysregulation of immune pathways in blood cells in AD and suggest the parallel involvement of diverse immune-associated pathways during AD pathogenesis and progression. In addition, M06 is associated with synaptic transmission (FDR = 3.42 × 10−2; Table S8), suggesting a link between the molecular changes in blood cells and the brain during AD pathogenesis and progression.
3.3 Associations between gene modules and AD
We subsequently evaluated whether these gene modules contribute equally to AD or not. Accordingly, we used lasso logistic regression to condense the gene expression data from each module into a single numeric score—termed the “module score”—that reflects the relative overall expression changes of genes within the module. We trained the model using data from the discovery cohort and determined if it can distinguish patients with AD from NCs in both cohorts. We then evaluated the ability of the identified gene modules to identify AD by calculating the auROCs in both cohorts (as a measure of the strength of the association between the identified gene modules and AD). Interestingly, all 15 AD-associated gene modules were associated with AD (auROC > 0.7), and 6 modules—M01, M02, M05, M09, M13, and M15—exhibited consistently strong associations with AD (auROC > 0.9) in both cohorts (Figure 4A; Figures S2, S3; Table S20). Among these six modules, M09 and M15 are related to immune functions: adaptive immune response and myeloid leukocyte activation, respectively. These results indicate that several distinct biological processes, especially those related to immune cells or pathways, are dysregulated in blood cells in AD, highlighting the roles of these processes in AD.
3.4 Associations between gene modules and plasma ATN biomarkers
During AD progression, the plasma ATN biomarkers are correlated with key pathological hallmarks in the brain.51, 52 Given that the above-mentioned gene modules can identify AD based on the blood cell transcriptome, we subsequently investigated if these modules are associated with pathological changes of AD. Accordingly, we evaluated the plasma ATN biomarkers using the ultrasensitive SIMOA (“single-molecule array”) platform and determined their correlations with the module scores of each gene module. Interestingly, in the discovery cohort, only 9 of the 15 gene modules were significantly correlated with the plasma ATN biomarkers (p < 0.05; Figure S4). Among the six modules that were closely associated with AD (i.e., M01, M02, M05, M09, M13, and M15), only M02, M13, and M15 were correlated with the plasma ATN biomarkers (“ATN modules” hereafter; Figure 4B). We further compared the auROCs for AD classification between the three ATN modules and the three that were not (“non-ATN modules” hereafter). Interestingly, there was no significant difference between the auROCs of the ATN and non-ATN modules (Figure 4C). Hence, our results suggest that there are specific AD-associated molecular changes that occur in blood cells but are not correlated with the plasma ATN biomarkers. Accordingly, these findings indicate that blood transcriptome analysis could supplement the ATN biomarkers for the diagnosis of AD. Of note, these non-ATN modules might still be associated with brain functions.
Next, to examine whether the six modules that are closely associated with AD are associated with the brain structural and cognitive changes in AD, we determined the correlations between the module scores and cognitive performance measured by the MoCA as well as brain volume (i.e., the amygdala and hippocampus) measured by MRI among participants from the discovery cohort. Interestingly, both the ATN and non-ATN modules were correlated with cognitive performance as well as volumetric changes in the amygdala and hippocampus in all participants (Figures S5, S6). This suggests that the molecular changes in blood cells captured by transcriptome analysis are correlated with changes in cognitive function and brain structure, which might provide additional information for determining disease status beyond the existing ATN biomarkers.
3.5 Accuracy of the identified gene modules for AD classification
We subsequently investigated whether blood transcriptomic data can assist AD classification. Accordingly, we trained logistic regression models that include the module scores from the six modules closely associated with AD—M01, M02, M05, M09, M13, and M15—and output a single numeric score representing an individual's disease status. We again calculated the auROCs to evaluate the performance of the AD classification models. Interestingly, the models based on the ATN modules (i.e., M02, M03, and M15) were highly accurate in both the discovery (auROC = 0.990) and validation cohorts (auROC = 0.897) as were the non-ATN modules (i.e., M01, M05, and M09) (auROC = 0.983 and 0.927 in the discovery and validation cohorts, respectively). Furthermore, when combining all six modules, the classification model exhibited similarly high accuracy (auROC = 0.998 and 0.920 in the discovery and validation cohorts, respectively; Figure 4D, E). Thus, our results indicate that blood transcriptomic data can be used to classify AD with relatively high accuracy. Furthermore, the plasma ATN biomarkers and blood gene modules exhibited similar accuracy of AD classification. In the discovery cohort, the models based on the blood gene modules (auROC = 0.98–1.00) classified AD more accurately than those based on the plasma ATN biomarkers (auROC = 0.70–0.82; Figure 4D, Table S21). We obtained similar results in the validation cohort: the models developed using the blood gene modules performed better (auROC = 0.90–0.92) than that based on plasma Aβ42/40 ratio (auROC = 0.76) and comparably to that based on plasma NfL level (auROC = 0.95; Figure 4E, Table S21; data for p-tau181 are not presented, because plasma p-tau181 was used as a selection criterion for the validation cohort). Taken together, these results suggest that blood transcriptome analysis can accurately distinguish patients with AD from NCs.
3.6 Stratification of participants based on blood transcriptomic data
Given the progressive nature and complex multifactorial etiology of AD, accurate staging is crucial for effective disease management. Despite recent studies suggesting the existence of multiple AD subtypes, there is limited information about the molecular mechanisms associated with these subtypes or stages. Of note, the AD-associated blood gene modules identified herein involve distinct biological processes that may be altered along with disease progression. Hence, the dysregulation of these gene modules may represent individuals’ specific status through AD progression.
Accordingly, to determine if the identified gene modules from blood cells can be used to stratify participants into subgroups, we performed unsupervised clustering analysis using the module scores from M01, M02, M03, M05, M09, and M15. Accordingly, the participants in the discovery cohort clustered into five subgroups, designated C1–C5 (Figure 5A), with increasing proportions of patients with AD indicating that the current stratification is associated with disease progression (Figure S7). Therefore, we investigated whether individuals from different subgroups had different statuses by comparing the AD endophenotypes among subgroups. Notably, there were significant differences in plasma ATN biomarker levels (Figure 5B), cognitive performance (Figure S8a), and the volumes of the amygdala (Figure S8b) and hippocampus (Figure S8c) among subgroups (p < 0.05). Thus, besides classifying AD risk, blood transcriptomic data can also stratify individuals into subgroups corresponding to different disease stages or states.
Of note, the module score used for stratification analysis was calculated based on the expression levels of genes within each module, which are associated with specific biological processes (Figure 3). Therefore, alterations to these biological processes can be represented by changes in the associated module score. Accordingly, to investigate whether specific biological processes associated with the six gene modules are altered among the identified subgroups, we compared the module scores from these modules among subgroups C1–C5. Interestingly, our analysis revealed significant alterations in the gene module scores among these subgroups. For instance, the module scores of M01, whose genes are involved in cellular component organization, were significantly higher in C2–C5 than in C1 (p < 1 × 10−3; Figure 5C, D). Meanwhile, the module scores of M15, whose genes are involved in myeloid leukocyte activation, were significantly lower in C2–C5 than in C1 (p < 1 × 10−3; Figure 5C, D). These results indicate that blood transcriptome analysis can further elucidate the molecular changes that occur during AD progression, providing insights into possible pathophysiological mechanisms associated with the disease pathogenesis and progression of AD.
3.7 Changes in blood cell subtypes are implicated in AD
The transcriptomic changes described herein are likely due to molecular changes occurring within individual blood cell subtypes. Therefore, we identified which gene modules are associated with the roles of specific blood cell types (Figure S9a). Accordingly, to examine how the molecular changes in individual blood cell subtypes are associated with AD or the identified gene modules, we conducted cell-type deconvolution analysis using the blood transcriptomic data from the discovery cohort. The enrichment scores estimated from the deconvolution analysis represent the contribution of each blood cell subtype to the transcriptomic data. We subsequently performed multiple regression analysis to determine the associations of the enrichment scores with the module scores and AD phenotypes. Notably, the key ATN and non-ATN gene modules were significantly associated with the expression profiles of various blood cell types (Figure 6A, Figure S9a, Table S22). In particular, some associations were closely aligned with the annotated biological functions of the individual gene modules. For instance, M09 is involved in adaptive immune response and associated with plasma cells (p = 6.82 × 10−130; Figure 6A, Table S22), whereas M15 is involved in myeloid leukocyte activation and associated with neutrophils (p = 8.13 × 10−7; Figure 6A, Table S22). Of note, among the analyzed cell types, neutrophils were most closely associated with AD (p = 1.09 × 10−22; Figure 6B, Figure S9b, Table S23). Therefore, the dysregulation of genes in M15 is likely linked to molecular changes in neutrophils during AD pathogenesis and progression.
To further examine the relationship between M15 and neutrophils, we performed single-cell RNA-seq analysis of the blood cells from three patients with AD and 2 NCs. We estimated the module scores of M15 for individual cell types and then visualized the distribution of M15 module scores across different cell types. The module score of M15 was significantly higher for neutrophils than for other cell types (Figure 6C, D), which is corroborated by the correlation between M15 module scores and neutrophil enrichment scores in our cell deconvolution analysis (Figure 6A, Table S22). Thus, these results collectively suggest that the dysregulation of M15 in AD is predominantly driven by neutrophils.
To better understand changes in blood cell types during AD progression or in different disease stages, we compared the cell-type enrichment scores across different subgroups classified according to gene modules from the blood transcriptomic data. Accordingly, we observed significant alterations in key blood cell subtypes among these subgroups. For instance, the enrichment scores of M0 macrophages and resting mast cells were significantly higher in C3–C5 than in C1 (p < 1 × 10−3; Figure 6E, F). Meanwhile, the enrichment scores of neutrophils were significantly lower in C4 and C5 than in C1 (p < 1 × 10−2; Figure 6E, F). These results show that, besides diagnosis and patient stratification, blood transcriptome profiling can reveal molecular changes in specific blood cell types that play critical roles in AD pathogenesis and progression.
4 DISCUSSION
Recent advancements in “omics” research have revolutionized the prediction of AD risk and diagnosis.53-56 Notably, transcriptome analysis is now widely used to accurately quantify the levels of all genes expressed in tissues of interest and is extensively applied in disease studies. Therefore, blood transcriptome analysis is expected to comprehensively profile disease-associated molecular changes in blood cells during AD pathogenesis and progression. Accordingly, as one of the first comprehensive analyses of the AD blood transcriptome, we assayed the blood transcriptome of individuals from well-defined AD cohorts with available data on plasma ATN biomarkers, brain volumetric data, and cognitive function to identify molecular signatures in blood cells that are associated with AD (i.e., dysregulated genes, modules, and biological pathways). In particular, we identified specific immune pathways and immune-associated blood cell types that are altered in the blood of patients with AD. We verified some of these alterations using single-cell transcriptomic data in independent participants. Of note, the gene modules identified from blood transcriptomes also classified patients with AD with accuracy comparable to that with the plasma ATN biomarkers. Furthermore, these gene modules could also stratify participants into distinct subgroups corresponding to different disease stages or states. Thus, our findings collectively suggest that there are changes in the molecular phenotypes of blood cells during the onset and development of AD, highlighting the potential utility of blood transcriptomic data for both mechanistic and drug development studies in AD as well as patient classification and stratification.
Importantly, our findings support the potential of using blood transcriptome analysis to identify presymptomatic AD patients who already show molecular changes. Specifically, by using blood transcriptome analysis, we can stratify AD patients and NC into five subgroups; specifically, C1 consists entirely of undemented participants, and C2 primarily consists of undemented individuals (87 out of 94; see Figure S7). Of note, compared to individuals in C1, individuals in C2 have a lower plasma Aβ42/40 ratio but no obvious differences in cognitive performance, or volume of the hippocampus or amygdala. This suggests that, despite being classified as NCs according to other analyses, individuals in C2 may already exhibit amyloid pathology (Figure 5 and Figure S8). Therefore, analyzing the blood transcriptome could also aid in the early detection of individuals at risk of developing AD before symptoms manifest.
Numerous studies demonstrate alterations in the blood transcriptome of AD patients, although the overlap in the dysregulated genes among different published studies is small.57-60 This discrepancy may be attributable to the small sample sizes of those studies (Figure S10 and Table S24). Nonetheless, it is worth highlighting that approximately half of the dysregulated genes reported in those studies were differentially regulated in AD patients in the present study. Thus, comprehensive AD-associated blood transcriptome profiling can be used to detect biomarkers of AD and provide insights into the pathological mechanisms of the disease.
Recent genetics and functional studies suggest the involvement of immune-associated signaling pathways, including the peripheral immune system,61, 62 in the pathogenesis of AD.37, 61-63 Concordantly, our blood transcriptome and gene ontology analyses demonstrate the potential involvement of immune-associated pathways (i.e., innate and adaptive immune-associated pathways) and specific immune cell types (i.e., B cells, neutrophils, and leukocytes) in AD. Given that several studies suggest that neutrophils are involved in AD,64-69 we further examined the genes associated with myeloid leukocyte activation from module M15, which is associated with neutrophils. Accordingly, we identified that LYN, a member of the Src family of nonreceptor protein tyrosine kinases, is involved in essential neutrophil functions such as phagocytosis, chemotaxis, and respiratory burst response (Figure S11a). LYN is mainly expressed in blood cells (Figure S11b), and our previous high-throughput proteomic analysis highlights LYN as one of the most differentially regulated proteins in AD,46 both its blood transcript and plasma protein levels are consistently lower in patients with AD than in NCs. Our analysis of single-cell blood data confirms the high expression of LYN in neutrophils (Figure S11c). Moreover, the fraction of LYN-positive neutrophils was significantly lower in individuals with high p-tau181 levels than in those with low levels (Figure S11d). Meanwhile, although LYN plasma protein and blood transcript levels were lower in patients with AD, they were only correlated in NCs and not in patients with AD (Figures S11e–g). These observations again highlight the involvement of the peripheral immune system, including neutrophil functioning, in AD pathogenesis and progression. Notably, there is a recent report on the success of a phase three clinical trial of masitinib, a tyrosine kinase inhibitor that blocks LYN activity.70, 71 This corroborates the feasibility of targeting LYN or its associated pathways as an intervention strategy for AD.72-74
In summary, our comprehensive analysis of the blood transcriptome in patients with AD reveals key molecular phenotypes, including genes, modules, pathways, and subtypes of blood cells, that may be closely associated with AD pathogenesis and progression. Our findings also suggest the potential utility of blood transcriptome analysis for disease classification and stratification. Given the ease of blood sampling, blood transcriptome analysis could provide insights into human diseases, aiding the development of technologies for disease diagnosis, monitoring, and patient stratification. This could ultimately facilitate both early intervention and precision medicine for AD and other human diseases.75
AUTHOR CONTRIBUTIONS
Huan Zhong, Xiaopu Zhou, Amy K. Y. Fu, and Nancy Y. Ip conceived of the project; Andrew Lung Tat Chan, Timothy C. Y. Kwok, Vincent C. T. Mok, and Fanny C. F. Ip organized patient recruitment and sample collection; Ronnie Ming Nok Lo, Bonnie Wing Yan Wong, and Elaine Yee Ling Cheng organized the clinical data and prepared the samples for sequencing analysis; Huan Zhong and Xiaopu Zhou set up the data-processing pipelines; Tiffany T. W. Mak coordinated the data organization; Huan Zhong, Xiaopu Zhou, Hyebin Uhm, Yuanbing Jiang, Han Cao, Yu Chen, Kin Y. Mok, John Hardy, Amy K. Y. Fu, and Nancy Y. Ip analyzed the data; and Huan Zhong, Xiaopu Zhou, Amy K. Y. Fu, and Nancy Y. Ip wrote the manuscript with the input from all the co-authors.
ACKNOWLEDGMENTS
The authors thank Pauline Kwan, Hazel Mok, Dr Phillip Y.C. Chan, Choi Ying Ling, and Estella P S Tong for coordinating the collection of clinical data. The authors also thank Ka Chun Lok and Cara Wing Si Kwong for their excellent technical assistance as well as other members of the Ip Laboratory for many helpful discussions. John Hardy has served as a consultant for Eli Lilly and Eisai. This work was supported in part by the National Key R&D Program of China (2021YFE0203000); the Research Grants Council of Hong Kong (the Collaborative Research Fund [C6027-19GF], the Theme-Based Research Scheme [T13-605/18 W], and the General Research Fund [HKUST16103122]); the Areas of Excellence Scheme of the University Grants Committee (AoE/M-604/16); the Innovation and Technology Commission (InnoHK, ITCPD/17-9, ITS/207/18FP, MRP/042/18X and MRP/097/20X); the Guangdong Provincial Key S&T Program Grant (2018B030336001); the Guangdong Provincial Fund for Basic and Applied Basic Research (2019B1515130004), the NSFC-RGC Joint Research Scheme (32061160472), the Shenzhen Knowledge Innovation Program Grants (JCYJ20180507183642005, JCYJ20200109115631248, and ZDSYS20200828154800001); the Guangdong–Hong Kong–Macao Greater Bay Area Center for Brain Science and Brain-Inspired Intelligence Fund (2019001 and 2019003); and the Fundamental Research Program of Shenzhen Virtual University Park (2021Szvup137).
CONFLICT OF INTEREST STATEMENT
Y.J., F.C.I., A.K.F., and N.Y.I. are inventors of the protein biomarker-related technology licensed to Cognitact. Y.J. and F.C.I. are co-founders of Cognitact. The remaining authors declare no competing interests. Author disclosures are available in the supporting information.
CONSENT STATEMENT
This study was approved by the Clinical Research & Ethics Committees of the Joint Chinese University of Hong Kong-New Territories East Cluster for the Prince of Wales Hospital (CREC Ref. No. 2015.461), the Kowloon Central Cluster/Kowloon East Cluster for Queen Elizabeth Hospital (KC/KE-15-0024/FR-3), Ethics and Research Committee of Haven of Hope Christian Service and the Human Participants Research Panel of The Hong Kong University of Science and Technology (CRP#180 & CRP#225). All participants provided written informed consent for both study participation and sample collection.
CODE AVAILABILITY
The code used in this manuscript is available upon request.
Figure legends
Open Research
DATA AVAILABILITY STATEMENT
The data and results of the genetic and Alzheimer's disease-associated endophenotypic analysis are provided in the Supplementary Information. For raw data, the consent form signed by each participant states that the research content will remain private under the supervision of the hospital and research team. Therefore, these data will be made available and shared only in the context of a formal collaboration; applications for data sharing and project collaboration will be processed and reviewed by a Review Panel hosted at the Hong Kong University of Science and Technology. Researchers may contact [email protected] for further details on project collaboration and the sharing of the data from this study.