Skip to main content

Transcriptomic predictors of rapid progression from mild cognitive impairment to Alzheimer's disease

Abstract

Background

Effective treatment for Alzheimer’s disease (AD) remains an unmet need. Thus, identifying patients with mild cognitive impairment (MCI) who are at high-risk of progressing to AD is crucial for early intervention.

Methods

Blood-based transcriptomics analyses were performed using a longitudinal study cohort to compare progressive MCI (P-MCI, n = 28), stable MCI (S-MCI, n = 39), and AD patients (n = 49). Statistical DESeq2 analysis and machine learning methods were employed to identify differentially expressed genes (DEGs) and develop prediction models.

Results

We discovered a remarkable gender-specific difference in DEGs that distinguish P-MCI from S-MCI. Machine learning models achieved high accuracy in distinguishing P-MCI from S-MCI (AUC 0.93), AD from S-MCI (AUC 0.94), and AD from P-MCI (AUC 0.92). An 8-gene signature was identified for distinguishing P-MCI from S-MCI.

Conclusions

Blood-based transcriptomic biomarker signatures show great utility in identifying high-risk MCI patients, with mitochondrial processes emerging as a crucial contributor to AD progression.

Graphical Abstract

Background

Alzheimer’s disease (AD) is a prevalent age-related neurodegenerative disease. AD is responsible for 60% to 70% of age-related neurodegenerative disease [1]. In addition, it has been suggested that over 100 million people will have been diagnosed with AD worldwide by 2050 [2]. Given that an effective treatment for AD is still lacking, intervention at the disease's early stage, namely mild cognitive impairment (MCI), has become a world priority. However, early diagnosis of AD by brain imaging and by β-amyloid (Aβ) measurement in cerebrospinal fluid [3] are not suitable for population screening due to their invasive nature and relatively high cost. It is important to note that many longitudinal studies have revealed that there is a substantial heterogeneity among MCI patients. A portion of patients with MCI rapidly advanced to AD, while others remained stable, sometimes even for more than 10 years; interestingly, a portion of patients also appear to reverse back to cognitively un-impairment [4, 5]. Therefore, predicting the subtypes of patients with MCI in terms of the different trajectories that disease progression takes is crucial to aiding physicians with clinical diagnosis, as well as with intervention trial design.

Transcriptomics analysis of human brains has been used previously to elucidate potential mechanisms behind neurodegenerative diseases. Using a transcriptomic approach, one recent study identified three major molecular subtypes of AD that have specific signatures when postmortem brain tissues are investigated [6], and this has suggested that there are multiple different pathogenesis processes that can lead to the development of AD. However, most of such studies were based on transcriptomic datasets obtained from the postmortem brain tissues of AD patients, which limits their applicability to clinical practice. Accordingly, blood-based biomarkers that are able to identifying MCI and AD have the potential to be a more cost-effective approach and easily accessible tool; this approach should help to facilitate timely diagnosis [7]. For example, a single-cell transcriptomic study of peripheral blood mononuclear cells (PBMCs) has revealed that differential patterns of gene expression in specific cell types appeared to correlate with AD pathology [8, 9]. However, robust and reliable biomarkers obtained from the peripheral blood that are able to predict the longitudinal outcome of patients with MCI remained an unmet clinical need.

In this study, we carried out a longitudinal study of a cohort of patients with either MCI or AD (between 2012 and 2022). Patients with MCI were classified into two groups: (a) Progressive MCI (P-MCI), patients who were initially diagnosed with MCI who progress rapidly to AD within a three-year follow-up period. (b) Stable MCI (S-MCI), patients with MCI who either remained clinically stable for a minimum period of four years or reverted to a cognitively unimpaired state. We generated transcriptomics datasets from peripheral white blood cells (WBCs) of three groups of patients, namely P-MCI patients, S-MCI patients and AD patients. We analyzed the differentially expressed genes (DEGs) by DESeq2 and adopted a machine learning approach to construct prediction models for discriminating the three disease classes. Furthermore, we carried out pathway analyses of the DEGs and the features (mRNAs) selected by machine learning in order to determine the potential roles of these genes in the molecular pathogenesis of AD and also used them to delineate the complex biological processes associated with MCI heterogeneity.

Methods

Study subjects

A total of 116 patients were enrolled from Taipei Veterans General Hospital (Taipei-VGH) to undergo RNA-sequencing; these consisted of 49 patients with AD, 28 patients classified as P-MCI, and 39 patients classified as S-MCI. Representative timelines for S-MCI and P-MCI cases are shown in Supplementary Figure S1. MCI and AD were diagnosed at Taipei-VGH based on the criteria recommended by the National Institute on Aging/Alzheimer’s Association workgroups in 2011 [10]. Patients with frontotemporal dementia and vascular dementia were excluded. Furthermore, the patients with AD were all considered to be sporadic cases who were suffering from late-onset AD. MCI was diagnosed as a change in cognition involving impairment in one or more cognitive domains, while preserving independence in the areas of social and occupational functioning [11]. Cognitive function was accessed using a set of cognitive performance tests including the mini‐mental state examination (MMSE), the Wechsler memory scale-logical memory (WMS-LM) test, the Chinese version verbal learning test (CVVLT), the trail making test part B (TMT-B), the Taylor complex figure test (TY-CFT), the Boston naming test (BNT) and the verbal fluency-animal (VF-animal) score test.

Sample collection and RNA extraction

Fasting peripheral blood was drawn using EDTA-coated vacuum tubes. After centrifugation at 4 °C and 3,000 g for 10 min, the samples were processed in order to obtain white blood cells (WBCs) using RBC lysis buffer. Isolation of RNA from WBCs was performed using TRI Reagent (T9424, Sigma-Aldrich) and the phenol/chloroform method. The isolated RNA was stored at − 80 °C until analysis. The quality of the total RNA was measured by RNA Integrity Number (RIN) and samples with RIN values higher than 8 were used for RNA sequencing.

RNA sequencing and analysis of DEGs

The library preparation and sequencing were conducted by the Cancer Progression Research Center at National Yang Ming Chiao Tung University. Paired-end sequencing was performed using a NovaSeq 6000 sequencing system (Illumina Inc.), which generated a sequencing depth of at least 20 million reads for each sample. Low-quality reads (Q < 20) were filtered out, and trimmed clean reads were mapped onto the human reference genome GRCh38. Transcripts per million (TPM) and expected counts for each gene were calculated using the RNA-Seq by Expectation Maximization (RSEM) software package version 1.3.3 [12]. Pseudogenes or non-annotated genes were excluded from further analysis. To remove noise, all genes with TPM values below 4 were considered to be not expressed (background) and were set to zero. Other remaining genes in TPM values had 4 subtracted from them before machine learning analysis. For differential expression analysis using DESeq2 version 1.36.0 [13], expected counts of genes were used. A total of 11,406 genes were retained after filtering for expressed genes (minimal count of the expected counts being ≥ 40 for the gene and the gene was detected in at least 50% of samples). Gender-stratified differential expression analyses were conducted using DESeq2. For each gender-specific cohort, we performed pairwise comparisons across three groups (P-MCI versus S-MCI; AD versus P-MCI; and AD versus S-MCI). Differentially expressed genes (DEGs) were identified using a nominal p-value threshold of 0.05. Additional analyses, incorporating age as a covariate, were performed to evaluate the potential impact of age on the differential expression results.

Machine learning analysis of the RNA-seq data

The machine learning approach was conducted as illustrated in Supplementary Figure S2. Because of the limited number of samples, the transcriptome datasets of males and females are combined together. For each leave-one-out cross-validation (LOOCV) iteration, independent t-tests were used for initial feature screening. Age, gender, and mRNA features exhibiting a p-value < 0.05 in the following comparisons were utilized for each classification model: (a) P-MCI vs. S-MCI, (b) AD vs. S-MCI, and (c) AD vs. P-MCI. For each binary classification task, the dataset was partitioned into a 90% training set and a 10% validation set during the feature selection process. The Lasso [14], support vector machine (SVM) [15], and the random forest (RF) [16] algorithms were employed to rank the informative features. This process was repeated 100 times, and the top 20 features were selected by computing the selection frequency across the three feature selection methods. The final set of selected features was determined to be those consistently ranked within the top 20 across all LOOCV iterations. SVM, Logistic regression (LR) and RF algorithms were used to construct the prediction models, and to assess the accuracy as well as the area under the receiver operating characteristic curve (AUC). The above assessments of the binary classification tasks were calculated via stratified five-fold cross-validation.

Our performance evaluation revealed that the SVM-based prediction models have slightly lower performance metrics compared to LR and RF classifiers across all three comparison groups. Accordingly, we selected the models with the best performance for our final analysis: (a) LR for P-MCI vs. S-MCI; (b) RF for AD vs. S-MCI; (c) LR for AD vs. P-MCI. Data processing and feature selection were conducted in R (v3.6.3) using various packages including data.table (v1.14.2), dplyr (v1.1.4), doParallel (v1.0.15), randomForest (v4.6.14), e1071 (v1.7.3), glmnet (v4.1.4), pROC (v1.16.1), and cvAUC (v1.1.0). Model construction and cross-validation were performed in Python (v3.7) using pandas (v1.3.5), numpy (v1.21.2), and scikit-learn (v1.0.2).

Functional enrichment analysis

The DEGs from DESeq2 or important features (mRNAs) that had selected by machine learning were analyzed by Ingenuity Pathway Analysis (IPA) (http://www.ingenuity.com). Significant canonical pathways with a p-value < 0.05 and a |z-score|≥ 1 were grouped based on their functional categories; a bubble plot was used to present the relationship between the three numeric variables. Gene set enrichment analysis (GSEA) was conducted using normalized count data obtained from DESeq2 using GSEA software version 4.3.2. The KEGG pathway gene sets were predefined and obtained from the Molecular Signatures Database (MSigDB v5.0).

Statistical analysis and data visualization

Partial least-squares discriminant analysis (PLS-DA) among the selected groups was carried out by EZinfo (Umetrics, version 3.0.3). Bar charts, bubble plots, Chi-square tests, and ANOVA were conducted using GraphPad Prism v9.0. Heatmaps were created by uploading log-transformed data into Multi Experiment Viewer (MEV) 4.9 software [17]. ANCOVA with age as a covariate was performed using SPSS software version 24.

Results

Clinical characteristics of the study subjects

After a 10-year follow-up, a total of 116 patients enrolled from Taipei-VGH that had available good quality total RNA, which had been isolated from their WBC, were selected for RNA-sequencing. Of the subjects, 39 patients (33.6%) were classified as S-MCI, 28 patients (24.1%) were classified as P-MCI, and 49 patients (42.3%) were classified as having AD (Fig. 1A). The demographic characteristics of the patients with either MCI or AD are shown in Table 1; these include age, gender, education, APOE ε4 and a variety of cognition evaluation tests. There is no significant difference in terms of gender, years of education, and percentage of APOE ε4 carriers between the different groups of patients, namely S-MCI, P-MCI, and AD patients. However, the mean age in the S-MCI group (69.9 ± 6.0 years old) is significantly younger than that of the P-MCI group (75.8 ± 8.0 years old) and AD group (75.8 ± 7.4 years old) groups. Notably, the various cognitive assessments revealed the following results: (1) both the P-MCI and AD groups were significantly different from the S-MCI group across a number of cognitive measurements, namely MMSE, WMS-LM, CVVLT recall, TY-CFT recall, and VF-animal scores. (2) When the P-MCI group was compared to the S-MCI group, no significant differences were found for BNT, TMT-B time and TMT-B lines scores. (3) When comparing the AD group with the P-MCI group, the AD group was found to have a significant lower performance regarding MMSE scores, TMT-B times, and BNT scores than the P-MCI group. Supplementary Figure S3 illustrates the demographic characteristics stratified by gender using bar and dot plots.

Fig. 1
figure 1

Integrated study by statistical analysis using DESeq2 and machine learning methods to discover biomarker signatures that allow identification of MCI patients at high risk of AD. A During the 10-year longitudinal study, patients with MCI were classified as stable MCI (S-MCI, ≥ 4y) or progressive MCI (P-MCI, < 3y progressing to AD). Independently, 36 patients with AD were also selected. All subjects underwent whole transcriptome profiling of white blood cells via RNA-seq. B Schematic flow chart for identification of key genes derived from the RNA-seq data. Differential gene expression analysis was conducted by DESeq2 using the expected counts as inputs. For machine learning, count data are transformed into TPM values for feature selection and model construction. C, D Volcano plots showing significant upregulated and downregulated genes of P-MCI versus S-MCI based on DESeq2 analyses of males (C) or females (D). E Heatmap for 501 genes (204 upregulated and 297 downregulated) that differentially modulated in male P-MCI group. F Heatmap for 879 genes (440 upregulated and 439 downregulated) that differentially modulated in female P-MCI group. G A Venn diagram comparing significantly regulated genes found in male and female patients defined as P-MCI compared with S-MCI patients. H A Venn diagram comparing significantly dysregulated pathways found in male and female P-MCI patients compared with S-MCI patients. MCI, mild cognitive impairment; DEG, differentially expressed gene; IPA, ingenuity pathway analysis

Table 1 Characteristics of 116 samples defined as P-MCI, S-MCI or AD

Gender-specific differences in DEGs and pathways enriched in P-MCI

We integrated two methodologies, namely DESeq2 and a machine learning approach, to analyze the transcriptomic datasets and to identify biomarkers and molecular signatures that are able to differentiate the S-MCI, P-MCI, and AD groups (Fig. 1B). Since patients classified in the P-MCI group have a high-risk of rapid progression towards AD, we thus prioritized our study to target the comparison between P-MCI and S-MCI. To mitigate potential gender-related confounding effects, we stratified the patients by gender and conducted a comparative analysis of their gene expression profiles by DESeq2 in order to obtain statistical inferences with respect to differences in DEGs. We used a Wald test (p-value < 0.05) to identify DEGs. In the P-MCI vs. S-MCI comparison, the results yielded 501 DEGs (204 up-regulated and 297 down-regulated) in the male group and 879 DEGs (440 up-regulated and 439 down-regulated) in the female group (Fig. 1C-F). Of the identified DEGs, seven genes remained significant after FDR correction using the Benjamini–Hochberg procedure. These were: UVRAG in males and RGPD5, IL18RAP, WHRN, FCRL5, ZNF683, and TXNIP in females. Intriguingly, most of the DEGs identified in the males and females are different, with only a small portion of the DEGs overlapping (Fig. 1G). In addition, IPA analysis of the DEGs identified 20 pathways enriched in males and 98 pathways enriched in females; however, only two overlapping pathways were identified in both genders (Fig. 1H). These two pathways are the neutrophil degranulation pathway, and the pathogen induced cytokine storm signaling pathway (Fig. 2A).

Fig. 2
figure 2

Gender-specific differences in the biological pathways enriched in P-MCI patients compared to S-MCI patients. A Bubble plot depicting the significant canonical pathways based on male P-MCI vs. S-MCI DEGs. B GSEA shows enrichment of Alzheimer’s disease and B cell receptor signaling pathways in the male P-MCI vs. S-MCI. C Bubble plot depicting the significant canonical pathways based on female P-MCI vs. S-MCI DEGs. D GSEA shows enrichment of chemokine signaling pathway, natural killer cell-mediated cytotoxicity, and neurotrophin signaling pathway in the female P-MCI vs. S-MCI. E A schematic diagram of B cell survival signaling and GnRH Signaling pathways based on male P-MCI vs. S-MCI DEGs. F A schematic diagram of LPS-TLR4, Insulin-PI3K-AKT-FOXO, and IL-6-HIF1α pathways based on female P-MCI vs. S-MCI DEGs. All up-regulated DEGs are colored in red, while down-regulated are colored in blue. Black squares represent non-significant changes and grey squares demonstrate undetected gene expression. *, p < 0.05 in the DESeq2 analysis result with age as a covariate. GSEA, gene set enrichment analysis; NES, normalized enrichment score; ERAD, Endoplasmic Reticulum-Associated Degradation. MCI, mild cognitive impairment; P-MCI, Progressive MCI; S-MCI, Stable MCI

In males, the significantly enriched pathways in P-MCI compared with S-MCI are mainly related to immune response and signaling (Fig. 2A). Interestingly, GSEA revealed that P-MCI is positively associated with Alzheimer’s disease (NES = 1.88; FDR = 0.026), but negatively associated with the B cell receptor signaling pathway (NES = -2.33; FDR < 0.001) (Fig. 2B). In females, the significantly enriched pathways in P-MCI compared with S-MCI can be grouped into four major categories: (a) immune response, (b) signaling, (c) cell death, and (d) stress response (Fig. 2C). Notably, GSEA revealed three enriched pathways that are positively associated with P-MCI, namely the chemokine signaling pathway (NES = 1.91; FDR = 0.041), the natural killer cell-mediated cytotoxicity (NES = 1.85; FDR = 0.017), and the neurotrophin signaling pathway (NES = 1.71; FDR = 0.043) (Fig. 2D).

To summarize the remarkable gender-specific difference in the DEGs and to highlight the gender dimorphism of the enriched biological pathways, we created a schematic diagram to illustrate the pathway networks (Fig. 2E and F). In male P-MCI patients, B cell survival signaling and gonadotropin-releasing hormone (GnRH) signaling are significantly down-regulated compared with S-MCI patients (Fig. 2E). In female P-MCI patients, three pathways are identified, namely the LPS-TLR4, Insulin-PI3K-AKT-FOXO, and IL-6-HIF1α pathways. Many of the DEGs involved in these pathways are up-regulated, while other DEGs are down-regulated, suggesting that all three pathways are dysregulated in P-MCI patients compared with S-MCI patients (Fig. 2F).

Moreover, a previous study has shown that the immune microenvironment may have an impact on the etiology and pathology of AD [18]. To elucidate a possible relationship between the transcriptomic changes and the abundance of each immune cell type, we employed CIBERSORT deconvolution analysis [19] to estimate the composition of 22 immune cell types based on the whole-transcriptome datasets. Our analysis revealed that the numbers of naive B cells and M1 macrophages (pro-inflammatory) are significantly lower in both the P-MCI and AD groups compared with the S-MCI group (p < 0.05); however, there is no significant difference between P-MCI and AD groups (Supplementary Figure S4). These results indicated that two cell types (naïve B cells and M1 macrophages) may have a slight influence on the DEGs. Alternatively, changes in the numbers of the two cell types could be a consequence of the differences in DEGs between P-MCI and S-MCI patients.

Construction of machine learning models to identify high-risk patients with P-MCI

We utilized age, gender, and mRNA transcripts of annotated protein-coding genes as inputs for feature selection (Fig. 1B; Supplementary Figure S2). For each model, the optimal number of features was determined by choosing the minimum feature number that is able to achieve high AUC and accuracy values (Fig. 3A). Notably, the result of PCA revealed that patients classified as P-MCI, S-MCI, and AD could be separated from one another using a combination of selected features (these were 62 features for discriminating P-MCI vs. S-MCI, 74 features for discriminating AD vs. P-MCI, and 58 features for discriminating AD vs. S-MCI). This indicates that these features have a strong ability to be discriminative (Fig. 3B). Remarkably, our machine learning models also reveal a good performance when five-fold cross validation was carried out. The results were as follows: (1) a panel of eight features was able to achieve an accuracy rate of 0.87 and an AUC of 0.93 when discriminating between P-MCI and S-MCI; (2) a panel of nine features was able to achieve an accuracy rate of 0.85 and an AUC of 0.94 when discriminating between AD and S-MCI; and (3) a panel of ten features was able to achieve an accuracy rate of 0.88 and an AUC of 0.92 when discriminating between AD and P-MCI (Fig. 3C and D). The performance of the three machine learning models, including optimal feature number, sensitivity, specificity, accuracy, and AUC for each binary classifier, are presented in Fig. 3D. Our results revealed that the ML models maintain a robust predictive power across all patients when both sexes were analyzed, as well when the male and female subgroups were analyzed separately.

Fig. 3
figure 3

Machine learning models to differentiate MCI and AD, and identify high-risk P-MCI patients. A The top differentiating features ranked by selection frequency are shown for three different classification models: (i) P-MCI vs. S-MCI; (ii) AD vs. S-MCI; (iii) AD vs. P-MCI. B PCA score plot of all 174 features from joint selected genes of all three classifications. C ROC curves of three groups of classifiers: (i) P-MCI vs. S-MCI; (ii) AD vs. P-MCI; (iii) AD vs. S-MCI using optimal number of selected features by five-fold cross validation. The 95% confidence interval (CI) values are also indicated. D Performance assessment of all three classifiers: (i) P-MCI vs. S-MCI; (ii) AD vs. S-MCI; (iii) AD vs. P-MCI using the top selected features. For each classifier, results are shown for all subjects combined (♀ + ♂), for females only (♀), and for males only (♂). The results are presented as means. MCI, mild cognitive impairment; P-MCI, Progressive MCI; S-MCI, Stable MCI

Interestingly, among the eight features discriminating between P-MCI and S-MCI, five genes (ARRDC4, TMEM187, RABEPK, ZC3H3 and MC1R) are significantly up-regulated in P-MCI, while two genes (PRRC2A and RORC) are significantly down-regulated in P-MCI (Supplementary Figure S5). In addition, among the three panels of features, except for TMEM187, which is the only gene selected by more than one model, namely P-MCI vs. S-MCI, and AD vs. P-MCI, the remaining genes used for the three machine learning models do not overlap. The proteins and biological functions of the three panels of genes was shown in Supplementary Table S1.

Pathways associated with the transcriptomic features selected by machine learning

To explore the biological function associated with the three panels of features identified by machine learning, we performed pathway analysis based on the mRNA features. Firstly, a total of 62 genes were identified when discriminating P-MCI vs. S-MCI. Many of the selected genes are involved in mitochondrial processes, including complex I assembly (NDUFAF1), electron transport chain (UQC), mitochondrial calcium ion transport (MICU2) and coenzyme Q biosynthesis (COQ7) (Fig. 4A and B). Notably, the drug transporter ABCB1 (rank 10) has been reported previously to be involved in the AD pathogenesis via its involvement in clearing Aβ from the brain into the blood circulation [20]. Furthermore, a secreted protein, TSP1 (rank 14), has been shown previously to be associated with mitochondrial dysfunction and may be involved in enhancing cell senescence and modulating the immune response [21, 22].

Fig. 4
figure 4

Enriched pathways based on the top features selected by machine learning that are then used to discriminate between P-MCI, S-MCI, and AD. A, C, E Bubble plots representing functional enrichment of important/key mRNAs selected by P-MCI vs. S-MCI, AD vs. P-MCI, and AD vs. S-MCI models. B A representation of ABCB1, TSP1, and mitochondrial function-related mRNAs selected by machine learning during the P-MCI vs. S-MCI classifier. D A representation of mRNAs associated with chromatin remodeling and DNA repair, mitochondrial process, proteostasis, and transporters, identified by machine learning during the AD vs. P-MCI classifier. Figure created with BioRender.com

Secondly, a total of 74 features were identified when discriminating AD vs. P-MCI. The selected genes can be functionally categorized into four major groups (Fig. 4C and D). The first group consists of genes involved in chromatin remodeling and DNA repair. For example, INO80C (rank14) plays a crucial role in regulating chromatin structure through its ability to exchange histone variants and modulate nucleosome spacing [23]. Another example is BABAM2 (rank 63), which is a component of the BRCA1-A complex, and possesses deubiquitinase activity that targets histones H2A and H2AX at DNA lesions sites. The second group consists of genes involved in mitochondrial process. For example, GATB (rank 40), NME4 (rank 64) and FDXR (rank 69), which are associated with mitochondrial biosynthesis and function, while NDUFAF1 (rank 21) is involved in complex I assembly. The third group consists of genes involved in proteostasis. For example, PSMF1 (proteasome inhibitor subunit 1), UBC (ubiquitin c) and FBXL17 (substrate-recognition component of the SCF E3 ubiquitin ligase complex), all of which are involved in ubiquitin proteasome system. The final group of genes is made up of genes encoding transporters. For example, the transporters SLC2A8 and SLC3A2, which are functionally involved in the transportation of glucose and amino acids during AD pathogenesis. These findings suggest there is a wide effect on a range of cellular processes and that these changes are associated with progression from P-MCI to AD.

Thirdly, a total of 58 features were identified when discriminating AD vs. S-MCI. However, in this case, we found no significant enrichment of mitochondrial process being associated with these genes. The enriched pathways are mainly associated with metabolism and signaling (such as activation of NMDA receptors and postsynaptic events), proteostasis, cell death and cell cycle, as well as autophagy (Fig. 4E).

New insights into the multifaceted biological processes involved in AD progression

Because of the limited numbers of study subjects, our machine learning approach was conducted using a combined dataset of males and females. In order to compare the DEGs identified by DESeq2 and the features selected by machine learning, we performed additional DESeq2 analysis using the pooled datasets of both genders. When P-MCI and S-MCI were compared, a total of 437 DEGs were identified by DESeq2. Among these DEGs, 30 genes overlapped with the features selected by machine learning (Supplementary Figure S6A). Furthermore, IPA analysis revealed that there are 13 significant pathways associated with the 437 DEGs (Supplementary Figure S6B). However, no common pathway was found that was associated with the DEGs and features. DEGs obtained from the statistical model (DESeq2) are mainly related to signaling pathways and the immune response; while features obtained from machine learning are mainly related to mitochondrial function, ABCB1 function, and TSP1 function (Fig. 5). These results suggest that multiple biological pathways and processes are involved when there is rapid progression of high-risk patients from P-MCI to AD, and that an integrated study using both a statistical method (DESeq2) and machine learning models should help to accelerate the discovery of biomarker signatures and reveal the multifaceted nature of molecular pathogenesis during disease progression of AD.

Fig. 5
figure 5

Multifaceted biological processes and pathways that are identified to be associated with AD progression. Our study revealed that multiple biological pathways and processes appear to be involved in the rapid progression of high-risk patients from P-MCI to AD, and that an integrated investigation of DESeq2 and machine learning methods should accelerate the discovery of biomarker signatures that will reveal the multifaceted nature of the molecular pathogenesis during AD disease progression

Discussion

This study produces several pivotal findings using a longitudinal study cohort and a 10-year follow-up. Firstly, using statistical analysis by DESeq2, we discovered a remarkable gender-specific difference in DEGs that distinguish P-MCI from S-MCI. Pathway analyses revealed that in male P-MCI, B cell survival signaling and gonadotropin-releasing hormone signaling are significantly down-regulated. However, in female P-MCI, three pathways, namely the LPS-TLR4, Insulin-PI3K-AKT-FOXO, and IL-6-HIF1α pathways, are highly enriched compared with S-MCI. Secondly, using machine learning methods, we established three prediction models with good performance that can discriminate between the three disease states; these are (a) P-MCI versus S-MCI (an accuracy rate of 0.87 and an AUC of 0.93), (b) AD versus S-MCI (an accuracy rate of 0.85 and an AUC of 0.94), and (c) AD versus P-MCI (an accuracy rate of 0.88 and an AUC of 0.92). Lastly, a panel of eight mRNA features (ARRDC4, PRRC2A, RORC, TMEM187, RABEPK, ZC3H3, MC1R, and TNNI2) emerged as a biomarker signature for distinguishing high-risk P-MCI from S-MCI. Pathway analysis of the features highlights the fact that mitochondrial processes seem to be a crucial contributor to AD progression. These results demonstrate that transcriptomic signatures obtained from peripheral WBCs are of great utility for identifying MCI patients who are at high-risk of accelerated progression to AD. We highlight the potential involvement of these genes in biological processes and pathways, the study of which should help us to gain insights into the complex mechanisms that underlie the heterogeneity of MCI.

A greater number of DEGs and pathways were identified in females than in males when comparing P-MCI to S-MCI (Fig. 1). Most of the DEGs and pathways were unique to each gender. Notably, the GSEA results indicate that immune response-related pathways are positively associated with P-MCI in females, whereas in males, these genes are regulated in a negative direction (Fig. 2). Previous studies have reviewed sex differences in AD pathogenesis and identified possible roles for sex hormones and sex chromosomes [24, 25]. Our findings support the hypothesis that there are gender-biased transcriptome signatures in patients with MCI, and suggests a female-specific relationship between neuroinflammation and rapid AD progression. The opposite regulation trends of these immune-related genes are similar to a previous report where two female-specific genes were discovered in cerebral cortex, namely NCL and KIF2A; these showed up-regulation in females with AD, while being down-regulated in males with AD compared to their respective controls [26].

Dysregulation of the gonadotropin-releasing hormone (GnRH) signaling pathway was found to be specifically enriched in males who were defined as P-MCI. GnRH not only regulates reproductive function, but is also recognized as being involved in neurodegenerative disease like AD [27]. The potential differential effects of GnRH on AD pathology in males and females require further investigation. In females, the dysregulated insulin-PI3K-AKT-FOXO signaling pathway was specifically enriched. Anti-inflammatory interventions and anti-diabetic drugs targeting insulin pathways have been considered as possible AD disease-modifying treatments [28, 29]. Our findings suggest that the use of these treatments to reduce the risk of progression from MCI to AD may only work for the female gender. Interestingly, the machine learning models for P-MCI and S-MCI were built and were successful when both genders were included. As gender is not represented in the feature panel, we hypothesize that our machine learning models utilized genes that allow for gender-independent classification.

Previously, many studies have suggested that biomarkers obtained by comparing AD to controls can be used to predict conversion from MCI to AD. However, recent studies have indicated that this change has a non-linear trajectory whereby cognitively unimpaired (CU) individuals progress to MCI, and then further progressing to AD [30, 31]. Compared to AD, earlier stages, such as MCI or subjective cognitive decline (SCD), are generally believed to be more heterogeneous, possibly due to the slow progressive nature of neurodegenerative diseases. A recent imaging study found that molecular biomarkers predicting CU-to-MCI conversion are not as helpful for as they are for MCI-to-AD conversion [32]. To address this issue, we focused on patients with MCI and AD, excluding CU participants. Our study revealed the presence of dynamic regulation based on the following observations: (1) the three feature panels incorporated into our prediction models were distinct, with few features consistently selected across all three models; and (2) the majority of the top-ranked genes displayed different regulation trends, with many genes significantly up-regulated in P-MCI, but remaining unchanged or even being down-regulated in AD (Supplementary Figure S5). Collectively, these findings suggest that searching for additional biomarkers that go beyond established AD biomarkers would be very useful for predicting conversion risk of an individual in the future. Our platform, which utilizes a panel of 8 mRNA biomarkers, may serve as a diagnostic tool for identifying patients who are at high-risk of developing AD.

In the present study we investigated the correlation between blood-based transcriptomics obtained from WBCs and disease progression of patients from MCI to AD. Most deconvoluted immune cell types remained unchanged between these two groups, except for a reduction in the naïve B and M1 macrophage cell populations of P-MCI and AD individuals compared to S-MCI individuals. Previous studies have reported conflicting results regarding B cell populations in AD subjects [8, 9], which highlights the need for further research. The potential of variation in WBC-derived immune cell populations to predict progression from MCI to AD remains unestablished and requires thorough investigation. To our knowledge, differences in immune cell-type proportions have not been reported for the various MCI subtypes based on blood transcriptome. Given the compromised blood–brain barrier in Alzheimer's disease and the complex interaction between peripheral and central immune systems [18], increased neuroinflammation is thought to raise the risk of developing AD [33]. Notably, a portion of DEGs have been found to be common to both blood and brain in patients with AD and MCI, especially in the prefrontal cortex region [31]. Another study has shown that there is a striking similarity between peripheral and brain pathological mechanisms [34]. These findings suggests that the blood transcriptome could be used as a surrogate indicator for monitoring the progression and pathogenesis of AD, at least to some extent.

Regarding the biomarker signature of the eight mRNA features (ARRDC4, PRRC2A, RORC, TMEM187, RABEPK, ZC3H3, MC1R, TNNI2) used by the machine learning model to distinguish P-MCI vs. S-MCI, two genes, namely PRRC2A and RORC, seem to be directly associated with AD pathogenesis. PRRC2A is involved in the proliferation and cell fate determination of oligodendrocytes, and its deficiency in the brain leads to hypomyelination [35]. RORC, which encodes RORγt, is a transcription factor known to control differentiation of T help 17 cells [36]. At the pathway level, mitochondrial process changes are significantly enriched in P-MCI with; all of the features (mRNAs) associated with mitochondria-related genes being up-regulated in P-MCI compared to S-MCI (Fig. 4A and B). Given the important role of mitochondrial dysfunction in the pathogenesis of MCI and AD [37], one possible explanation is that this increase might reflect a compensatory response against impaired mitochondrial functions. Interestingly, in the comparison of AD vs. P-MCI, mitochondrial process was also identified to be significantly enriched and were among the features used by the machine learning model (Fig. 4C and D). Nevertheless, whether the up-regulation of genes associated with mitochondrial function in WBCs indeed reflect a similar situation to that occurring in the brain and that this may precede the progression to AD, is of great interest and requires further investigation.

Limitations and future perspectives

There are several limitations to the present study despite the meticulous application of an integrated methodology of machine learning and statistical methods used to leverage the transcriptomics datasets in order to gain insights into disease progression. Firstly, the sample size of this longitudinal study is relatively limited compared with many other cross-sectional studies on MCI and AD. This is due to the requirement for long-term follow-up visits in order to define patients as S-MCI or P-MCI. Thus, the current small sample size may have constrained our statistical power when detecting differences that might persist after multiple testing correction. Secondly, even though the machine learning models show a promising performance in terms of accuracy rate of prediction and AUC, our findings need to be validated using a larger sample size and using more ethnically diverse external cohorts. This is essential so that the prediction models can be widely used on patients with MCI and AD for personalized medicine purposes. Thirdly, transcriptomic profiling of peripheral blood can be influenced by a number of confounding factors, such as comorbidities and medication, which were not considered here. Fourthly, lacking brain transcriptomics datasets of the patients that form our cohort precluded an analysis to identify dysregulated genes and pathways common to white blood cells and brain tissue. Finally, the cause-and-effect relationship between the genes and pathways identified in this study need to be investigated using in vitro and in vivo approaches.

Conclusion

Our findings support the notion that a variety of specific pathways and biological processes can be identified as associated with cognitive decline. In particular, various mitochondrial process and the dysfunction of these appear to be associated with the conversion of high-risk patients to AD patients; this discovery was obtained via a machine learning approach. On the other hand, statistical models by DESeq2 revealed the dysregulation of the neuroinflammatory GnRH and PI3K-AKT-FOXO signaling pathways, and these changes show prominent sexual dimorphism. Considering that machine learning and statistical models are very different in approach, combining these two methodologies should provide a more comprehensive approach to exploring the multifaceted nature of AD pathogenesis in patients with MCI.

Data availability

The authors declare that all supporting data, methods, and materials are available within the main body of the manuscript and in the online supplementary data. The RNA-seq data, including TPM and expected count matrices, have been deposited in the NCBI Gene Expression Omnibus (GEO) database under accession number GSE282742.

References

  1. Barker WW, Luis CA, Kashuba A, Luis M, Harwood DG, Loewenstein D, et al. Relative frequencies of Alzheimer disease, Lewy body, vascular and frontotemporal dementia, and hippocampal sclerosis in the State of Florida Brain Bank. Alzheimer Dis Assoc Disord. 2002;16:203–12.

    Article  PubMed  Google Scholar 

  2. Wortmann M. Dementia: a global health priority - highlights from an ADI and World Health Organization report. Alzheimer’s Res Ther. 2012;4:40.

    Article  Google Scholar 

  3. Ahmed RM, Paterson RW, Warren JD, Zetterberg H, O’Brien JT, Fox NC, et al. Biomarkers in dementia: clinical utility and new directions. J Neurol Neurosurg Psychiatry. 2014;85:1426–34.

    Article  CAS  PubMed  Google Scholar 

  4. Koepsell TD, Monsell SE. Reversion from mild cognitive impairment to normal or near-normal cognition: risk factors and prognosis. Neurology. 2012;79:1591–8.

    Article  PubMed  PubMed Central  Google Scholar 

  5. Pandya SY, Clem MA, Silva LM, Woon FL. Does mild cognitive impairment always lead to dementia? A review. J Neurol Sci. 2016;369:57–62.

    Article  PubMed  Google Scholar 

  6. Neff RA, Wang M, Vatansever S, Guo L, Ming C, Wang Q, et al. Molecular subtyping of Alzheimer's disease using RNA sequencing data reveals novel mechanisms and targets. Sci Adv. 2021;7:eabb5398.

  7. Hampel H, O’Bryant SE, Molinuevo JL, Zetterberg H, Masters CL, Lista S, et al. Blood-based biomarkers for Alzheimer disease: mapping the road to the clinic. Nat Rev Neurol. 2018;14:639–52.

    Article  PubMed  PubMed Central  Google Scholar 

  8. Xiong LL, Xue LL, Du RL, Niu RZ, Chen L, Chen J, et al. Single-cell RNA sequencing reveals B cell-related molecular biomarkers for Alzheimer’s disease. Exp Mol Med. 2021;53:1888–901.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Xu H, Jia J. Single-Cell RNA sequencing of peripheral blood reveals immune cell signatures in Alzheimer’s disease. Front Immunol. 2021;12:645666.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Croisile B, Auriacombe S, Etcharry-Bouyx F, Vercelletto M, National Institute on A, Alzheimer A. The new 2011 recommendations of the National Institute on Aging and the Alzheimer’s Association on diagnostic guidelines for Alzheimer’s disease: Preclinal stages, mild cognitive impairment, and dementia. Revue Neurologique. 2012;168:471–82.

    Article  CAS  PubMed  Google Scholar 

  11. Albert MS, DeKosky ST, Dickson D, Dubois B, Feldman HH, Fox NC, et al. The diagnosis of mild cognitive impairment due to Alzheimer’s disease: recommendations from the National Institute on Aging-Alzheimer’s Association workgroups on diagnostic guidelines for Alzheimer’s disease. Alzheimers Dement. 2011;7:270–9.

    Article  PubMed  Google Scholar 

  12. Li B, Dewey CN. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics. 2011;12:323.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15:550.

    Article  PubMed  PubMed Central  Google Scholar 

  14. Tibshirani R. Regression Shrinkage and Selection via the Lasso. J Roy Stat Soc: Ser B (Methodol). 1996;58:267–88.

    Article  Google Scholar 

  15. Cortes C, Vapnik V. Support-vector networks. Mach Learn. 1995;20:273–97.

    Article  Google Scholar 

  16. Breiman L. Random forests. Mach Learn. 2001;45:5–32.

    Article  Google Scholar 

  17. Saeed AI, Sharov V, White J, Li J, Liang W, Bhagabati N, et al. TM4: a free, open-source system for microarray data management and analysis. Biotechniques. 2003;34:374–8.

    Article  CAS  PubMed  Google Scholar 

  18. Bettcher BM, Tansey MG, Dorothee G, Heneka MT. Peripheral and central immune system crosstalk in Alzheimer disease - a research prospectus. Nat Rev Neurol. 2021;17:689–701.

    Article  PubMed  PubMed Central  Google Scholar 

  19. Chen B, Khodadoust MS, Liu CL, Newman AM, Alizadeh AA. Profiling tumor infiltrating immune cells with CIBERSORT. Methods Mol Biol. 2018;1711:243–59.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Elali A, Rivest S. The role of ABCB1 and ABCA1 in beta-amyloid clearance at the neurovascular unit in Alzheimer’s disease. Front Physiol. 2013;4:45.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Isenberg JS, Roberts DD. Thrombospondin-1 in maladaptive aging responses: a concept whose time has come. Am J Physiol Cell Physiol. 2020;319:C45–63.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Kang S, Byun J, Son SM, Mook-Jung I. Thrombospondin-1 protects against Abeta-induced mitochondrial fragmentation and dysfunction in hippocampal cells. Cell Death Discov. 2018;4:31.

    Article  PubMed  PubMed Central  Google Scholar 

  23. Gerhold CB, Hauer MH, Gasser SM. INO80-C and SWR-C: guardians of the genome. J Mol Biol. 2015;427:637–51.

    Article  CAS  PubMed  Google Scholar 

  24. Guo L, Zhong MB, Zhang L, Zhang B, Cai D. Sex differences in Alzheimer’s disease: insights from the multiomics landscape. Biol Psychiatry. 2022;91:61–71.

    Article  CAS  PubMed  Google Scholar 

  25. Paranjpe MD, Belonwu S, Wang JK, Oskotsky T, Gupta A, Taubes A, et al. Sex-specific cross tissue meta-analysis identifies immune dysregulation in women with Alzheimer’s disease. Front Aging Neurosci. 2021;13:735611.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Caceres A, Gonzalez JR. Female-specific risk of Alzheimer’s disease is associated with tau phosphorylation processes: A transcriptome-wide interaction analysis. Neurobiol Aging. 2020;96:104–8.

    Article  CAS  PubMed  Google Scholar 

  27. Wickramasuriya N, Hawkins R, Atwood C, Butler T. The roles of GnRH in the human central nervous system. Horm Behav. 2022;145:105230.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Ardura-Fabregat A, Boddeke E, Boza-Serrano A, Brioschi S, Castro-Gomez S, Ceyzériat K, et al. Targeting neuroinflammation to treat Alzheimer’s disease. CNS Drugs. 2017;31:1057–82.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Muñoz-Jiménez M, Zaarkti A, García-Arnés JA, García-Casares N. Antidiabetic drugs in Alzheimer’s disease and mild cognitive impairment: a systematic review. Dement Geriatr Cogn Disord. 2020;49:423–34.

    Article  PubMed  Google Scholar 

  30. Di Costanzo A, Paris D, Melck D, Angiolillo A, Corso G, Maniscalco M, et al. Blood biomarkers indicate that the preclinical stages of Alzheimer’s disease present overlapping molecular features. Sci Rep. 2020;10:15612.

    Article  PubMed  PubMed Central  Google Scholar 

  31. Li X, Wang H, Long J, Pan G, He T, Anichtchik O, et al. Systematic analysis and biomarker study for Alzheimer’s disease. Sci Rep. 2018;8:17394.

    Article  PubMed  PubMed Central  Google Scholar 

  32. Karaman BK, Mormino EC, Sabuncu MR. Machine learning based multi-modal prediction of future decline toward Alzheimer’s disease: an empirical study. PLoS ONE. 2022;17:e0277322.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Kinney JW, Bemiller SM, Murtishaw AS, Leisgang AM, Salazar AM, Lamb BT. Inflammation as a central mechanism in Alzheimer’s disease. Alzheimers Dement (N Y). 2018;4:575–90.

    Article  PubMed  Google Scholar 

  34. Iturria-Medina Y, Khan AF, Adewale Q, Shirazi AH. Blood and brain gene expression trajectories mirror neuropathology and clinical deterioration in neurodegeneration. Brain. 2020;143:661–73.

    Article  PubMed  PubMed Central  Google Scholar 

  35. Wu R, Li A, Sun B, Sun JG, Zhang J, Zhang T, et al. A novel m(6)A reader Prrc2a controls oligodendroglial specification and myelination. Cell Res. 2019;29:23–41.

    Article  PubMed  Google Scholar 

  36. Chi X, Jin W, Zhao X, Xie T, Shao J, Bai X, et al. RORgammat expression in mature T(H)17 cells safeguards their lineage specification by inhibiting conversion to T(H)2 cells. Sci Adv. 2022;8:eabn7774.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Wang W, Zhao F, Ma X, Perry G, Zhu X. Mitochondria dysfunction in the pathogenesis of Alzheimer’s disease: recent advances. Mol Neurodegener. 2020;15:30.

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We thank Ching-Cheng Lin, Yu-Han Luo and Wei-Ju Chang for their technical assistance. The authors acknowledge the sequencing and bioinformatic services provided by the National Genomics Center for Clinical and Biotechnological Applications of the Cancer and Immunology Research Center at National Yang Ming Chiao Tung University, and the National Core Facility for Biopharmaceuticals (NCFB) of National Science and Technology Council.

Funding

This work was supported by grants from the Ministry of Health and Welfare (NHRI-11A1-CG-CO-07–2225-1, NHRI-12A1-CG-CO-07–2225-1 and NHRI-13A1-CG-CO-07–2225-1 to TFT) and the National Science and Technology Council (MOST 110–2320-B-A49A-529-MY3 to TFT). We also acknowledge support by the Interdisciplinary Research Center for Healthy Longevity of National Yang Ming Chiao Tung University from the Featured Areas Research Center Program within the framework of the Higher Education Sprout Project by the Ministry of Education (MOE) in Taiwan.

Author information

Authors and Affiliations

Authors

Contributions

All the authors contributed to the manuscript preparation. P.N.W. recruited subjects and defined the clinical stages. P.N.W. and T.F.T. co-designed the research framework. T.H.T. designed and supervised the ML method. Y.H.C. and C.W.T. contributed to ML analysis. Y.L.H., Z.Q.S. and C.Y.T. participated in DESeq2 analyses. Y.L.H., T.H.T. and Z.Q.S. prepared the figures and drafted the manuscript. T.F.T. wrote the final version of manuscript. All authors have read and agreed to the published version of the manuscript.

Corresponding authors

Correspondence to Pei-Ning Wang or Ting-Fen Tsai.

Ethics declarations

Ethics approval and consent to participate

The protocol for this longitudinal study cohort (from 2012 to 2022), was approved by the Institutional Review Boards of Taipei-VGH (IRB no. 2012–11-003A and 2017–07-015C) and was conducted in accordance with the Declaration of Helsinki. Written informed consent was obtained from all participants.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

13195_2024_1651_MOESM1_ESM.pdf

Supplementary Material 1: Supplementary Table S1. Functional implications of the gene panels selected by three machine learning models. Supplementary Figure S1. Representative timeline of sample selection for Stable MCI (S-MCI) and Progressive MCI (P-MCI) subjects in this study. Supplementary Figure S2. Machine learning pipeline for identification of prognostic features associated with conversion from MCI to AD. Supplementary Figure S3. Box and dot plots showing the cognitive characteristics of patients classified as P-MCI, S-MCI, and AD stratified by gender. Supplementary Figure S4. Estimation of immune cell-type proportions for the S-MCI, P-MCI, and AD samples on the RNA-seq data analyzed by CIBERSORT. Supplementary Figure S5. Box and Whisker plots depicting top WBC transcript levels differentiating the S-MCI, P-MCI, and AD groups from the machine learning models. Supplementary Figure S6. Comparison of DEGs and pathways from the various DESeq2 analyses with features selected by machine learning, incorporating data from both sexes.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huang, YL., Tsai, TH., Shen, ZQ. et al. Transcriptomic predictors of rapid progression from mild cognitive impairment to Alzheimer's disease. Alz Res Therapy 17, 3 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13195-024-01651-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13195-024-01651-0

Keywords