Development of an individualized dementia risk prediction model using deep learning survival analysis incorporating genetic and environmental factors

Yuan, Shiqi; Liu, Qing; Huang, Xiaxuan; Tan, Shanyuan; Bai, Zihong; Yu, Juan; Lei, Fazhen; Le, Huan; Ye, Qingqing; Peng, Xiaoxue; Yang, Juying; Ling, Yitong; Lyu, Jun

doi:10.1186/s13195-024-01663-w

Research
Open access
Published: 30 December 2024

Development of an individualized dementia risk prediction model using deep learning survival analysis incorporating genetic and environmental factors

Shiqi Yuan^1,2^na1,
Qing Liu²^na1,
Xiaxuan Huang¹,
Shanyuan Tan¹,
Zihong Bai¹,
Juan Yu²,
Fazhen Lei²,
Huan Le²,
Qingqing Ye²,
Xiaoxue Peng²,
Juying Yang²,
Yitong Ling¹ &
…
Jun Lyu^3,4

Alzheimer's Research & Therapy volume 16, Article number: 278 (2024) Cite this article

1431 Accesses
Metrics details

Abstract

Background

Dementia is a major public health challenge in modern society. Early detection of high-risk dementia patients and timely intervention or treatment are of significant clinical importance. Neural network survival analysis represents the most advanced technology for survival analysis to date. However, there is a lack of deep learning-based survival analysis models that integrate both genetic and clinical factors to develop and validate individualized dynamic dementia risk prediction models.

Methods and results

This study is based on a large prospective cohort from the UK Biobank, which includes a total of 41,484 participants with an average follow-up period of 12.6 years. Initially, 364 candidate features (predictor variables) were screened. The top 30 key features were then identified by ranking the importance of each predictor variable using the Gradient Boosting Machine (GBM) model. A multi-model comparison strategy was employed to evaluate the predictive performance of four survival analysis models: DeepSurv, DeepHit, Kaplan–Meier estimation, and the Cox proportional hazards model (CoxPH). The results showed that the average Harrell's C-index for the DeepSurv model was 0.743, for the DeepHit model it was 0.633, for the CoxPH model it was 0.749, and for the Kaplan–Meier estimator model it was 0.500. In addition, the average D-Calibration Survival Measure was 6.014, 4408.086, 32274.743, and 1.508, respectively. The Brier score (BS) was used to assess the importance of features for the DeepSurv dementia prediction model, and the relationship between features and dementia was visualized using a partial dependence plot (PDP). To facilitate further research, the team deployed the DeepSurv dementia prediction model on AliCloud servers and designated it as the UKB-DementiaPre Tool.

Conclusion

This study successfully developed and validated the DeepSurv dementia prediction model for individuals aged 60 years and above, integrating both genetic and clinical data. The model was then deployed on AliCloud servers to promote its clinical translation. It is anticipated that this prediction model will provide more accurate decision support for clinical treatment and will serve as a valuable tool for the primary prevention of dementia.

Background

Dementia is a general term used to describe a range of progressive cognitive declines, characterized by the gradual loss of previously acquired cognitive abilities. The primary symptom is the progressive impairment of multiple cognitive functions, including memory, reasoning, judgment, and language [1]. According to estimates by the World Health Organization, between 5 and 8% of individuals aged 60 and above worldwide are affected by dementia. It is estimated that by 2030, the total number of dementia patients worldwide will reach 82 million, and by 2050, it will increase to 152 million [2]. Dementia encompasses more than 100 distinct diseases and conditions, with Alzheimer's disease (AD) representing the most prevalent form, accounting for 60–70% of cases [3].

Although the number of people with dementia is rising, studies have shown that the risk of dementia in some age groups in high-income countries may have actually declined over the past 25 years. This decline is likely due to improved education levels and better control of major cardiovascular risk factors, such as hypertension, diabetes, and hypercholesterolemia [4, 5]. These studies suggest that AD and other forms of dementia are not necessarily an inevitable consequence of aging. It is possible that some individuals may be able to prevent or delay the onset and progression of dementia by modifying their exposure to specific risk factors, such as hypertension, smoking, obesity, and diabetes.

Nevertheless, the current pharmacological treatments for dementia, particularly AD, are not optimal. Although some drugs can improve the symptoms of dementia or AD, they cannot completely halt the progression of the disease [6]. Consequently, the timely identification of individuals at high risk for dementia, along with the implementation of targeted interventions or treatments at an early stage, is crucial. Such approaches are expected to delay the onset of dementia, improve the prognosis for patients, reduce the overall mortality rate, and mitigate the social and familial impacts of the disease.

As research on dementia continues, an increasing number of risk factors associated with the disease have been identified. In recent years, there has been growing interest in developing new models for predicting dementia. In addition to traditional methods, such as logistic regression and the Cox proportional hazards regression model (CoxPH) for establishing dementia risk prediction models [7,8,9], the advancement of artificial intelligence has led to the application of machine learning techniques for the detection and prediction of dementia. These techniques hold great potential for enhancing our understanding of the disease and advancing the fields of psychiatry and neurology [10].

The CoxPH model is a standard survival analysis model, which is semiparametric and is used to quantify the influence of observed covariates on the risk of an event, such as mortality. The model assumes that the patient's risk of an event is a linear combination of the patient's covariates—an assumption known as the proportional hazards’ assumption [11]. However, in many applications, including the provision of personalized treatment recommendations, the assumption that the log-risk function is linear may be overly simplistic. Therefore, a more comprehensive set of survival models is required to more accurately reflect the nonlinear log-risk functions observed in survival data [12].

Neural network survival analysis represents the most advanced technology currently available for survival analysis [13]. Notable examples of this include DeepSurv, DeepHit, Logistic-Hazard, and others. The DeepSurv model employs deep learning to express the risk function of sensitive factors as a multilayer perceptron. This approach incorporates additional nonlinear activation functions and dropout techniques, which enhance the model's ability to capture the relationships between variables [12]. The complexity of the model increases when applied to real-world medical data. By considering the interactions between multi-gene information and clinical parameters, the integration of genetic data can be promoted, thereby providing insights for the primary prevention of dementia. Nevertheless, there is a lack of dynamic, personalized dementia risk prediction models that integrate genetic and clinical factors using deep learning survival analysis.

The objective of this study was to construct and validate a dynamic, personalized dementia risk prediction model based on the UK Biobank database, which contains large-scale population genetic and clinical data, using the DeepSurv model. This model can assist medical practitioners and clinical teams in more accurately assessing the risk of dementia in patients, thereby facilitating the development of more personalized prevention and treatment plans and providing a reference for early dementia prevention.

Methods

Data source: UK biobank

The UK Biobank is a large-scale, population-based prospective study designed to comprehensively investigate the genetic and non-genetic determinants of disease in middle-aged and older individuals. Its objective is to combine broad and precise exposure assessments with detailed tracking and characterization of numerous health-related outcomes, aiming to contribute to the development of innovative scientific knowledge by optimizing resource utilization. Between 2006 and 2010, the UK Biobank recruited over 500,000 individuals aged 40–69 years. The database enables the tracking of health-related events for all participants through a UK-wide networked system. Additionally, all participants provided written informed consent and were enrolled in the study only after approval from the Northwest Multicenter Research Ethics Committee (11/NW/0382). As a result, approval from the UK Biobank Ethics Committee and the Human Organisms Research Organization Bank meant that independent ethics approval was not required for the resources the researchers wished to use, unless re-engagement of participants was necessary [14, 15].

Inclusion and exclusion criteria

A total of 502,367 participants were initially included in this study. Participants were screened based on the following inclusion and exclusion criteria: Inclusion criteria: 1) Registered to participate in the research between 2006 and 2010; 2) Signed the UK Biobank subject research consent form; 3) Aged 60 years or older. Exclusion criteria: 1) Individuals with a history of all-cause dementia, AD, or vascular dementia (VD); 2) Individuals with incomplete genetic data. Finally, participants without dementia were randomly selected from those who remained dementia-free at follow-up, in a 1:5 ratio, to participate in the modeling. The flowchart for participant selection is shown in Fig. 1, and data from 41,484 participants were ultimately included in the modeling.

Definition of dementia outcome

The determination of the dementia outcome event was based on outcomes defined by the UK Biobank database algorithm and hospital diagnostic records (Field IDs: 42,018, 130,840, 130,842, 42,020, 130,836, 42,022, 130,838, and 42,024). The main causes of dementia included AD, vascular dementia, and other forms of dementia. The final follow-up time for this study was defined as the earliest occurrence of the dementia outcome event, with the recorded date of death or the final dementia outcome/death (13 November 2021) serving as the endpoint for follow-up.

Comparative analysis of dementia prediction models

Determination of features

Inclusion of features

A total of 364 features (predictor variables), including sociodemographic, family history, physical measurements, genetic data, and others, were initially included in this study. These features consisted primarily of clinically relevant data collected during the participant's baseline visit. Initial data screening was performed to: (1) exclude candidate predictor variables with missing values exceeding 10% of all participants, and (2) manually clean procedural metric variables (e.g., biospecimen processing metrics, diagnostic codes, meter IDs) that were not clinically meaningful. However, relatively lenient inclusion criteria were applied to avoid overlooking potential associations. Ultimately, 213 features were used in the study, including several generated features not directly available from the UK Biobank, notably history of myocardial infarction, history of hypertension, history of stroke, polygenic risk score (PRS) for dementia, and APOE (Ɛ4) carrier status.

The diagnosis of myocardial infarction, hypertension, and stroke at the time of study inclusion was further defined based on the timing of these diagnoses relative to participant enrollment. To assess genetic risk for dementia, this study used a multigene genetic risk score calculation method, which involves a weighted assessment of single nucleotide polymorphisms (SNPs). To minimize the risk of false-positive genetic risk scores, newly identified SNPs associated with AD in the UK Biobank database were not included in the genetic score. Instead, 29 SNP loci strongly associated with AD, as identified in previous genome-wide association studies (GWAS), were selected for this study (Supplementary material Table S1) [16,17,18]. Using the PRS calculation method published in a previous study [19], we calculated each participant's dementia PRS based on their SNPs and the corresponding weights (β coefficients) derived from the GWAS results [17]. Additionally, APOE genotypes were determined using the combined variants of rs429358 and rs7412. Participants carrying at least one APOE (Ɛ4) allele were classified as APOE (Ɛ4) carriers. All candidate features are listed in Supplementary material Table S2.

Feature filtering and missing value interpolation

To identify the best subset of predictor features contributing to model performance, we used the Gradient Boosting Machine (GBM) algorithm with default hyperparameters to calculate the importance of each feature based on a feature importance filtering method [20], and did not preprocess the data such as multiple interpolation and normalization, and only factorized the categorical features. As shown in Fig. 2, the top 30 features, ranked by their importance to the GBM model, were selected for the study. Although features with more than 10% missing values were excluded, some participant data had a very small proportion of missing values. We performed the missing completely at random (MCAR) test for missing values, which showed P < 0.001, and therefore did not meet the missing completely at random assumption. Assuming the missing data were missing at random (MAR), we interpolated the missing values using the Random Forest (RF) multiple interpolation method (using the mice package) [21]. This choice was made because the RF imputation algorithm (1) is well-suited for data missing at random, (2) can effectively handle both continuous and categorical variables, and most importantly, (3) does not require parametric forms and can effectively account for any non-linear relationships, complex interactions, and high dimensionality in the imputation model [22].

Model development and evaluation

Feature engineering

Based on the DeepSurv and DeepHit neural network models [12, 23], feature engineering is required before formal data analysis. We performed one-hot encoding to transform categorical data and applied feature scaling (standardization) to normalize the data [24, 25]. This process helps prevent certain features from disproportionately influencing the model and improves its stability and convergence speed [26].

Hyperparameter tuning

To optimize the performance and effectiveness of the neural network model, we fine-tuned the parameters of the neural network structure and training process. Hyperparameter tuning was performed, focusing on key parameters such as dropout, weight decay, learning rate, and the number of nodes per layer [23]. To obtain the optimal hyperparameter combination, this study employed an automatic tuner (mlr3 package), which automates the tuning process based on monitored metrics, specifically Harrell's C-index.

The hyperparameter tuning space is as follows: Dropout is primarily used to address the overfitting problem in neural network models by randomly "dropping" a fraction of nodes during training, preventing them from participating in updates [27]. The value of dropout typically ranges from 0 to 1. In this study, we set the dropout rate between 0 and 0.5. Weight decay is a regularization technique (L2 regularization) designed to reduce the impact of data noise and model variance by encouraging smaller weight values. This helps mitigate overfitting [28]. In this study, the weight decay parameter was set between 0 and 0.5. Learning rate controls the step size for parameter updates in each iteration of training [29]. A well-chosen learning rate ensures effective training, avoiding slow convergence (if too small) or instability (if too large). The learning rate was adjusted within a range of 0 to 1 in this study. Number of nodes per layer refers to the number of units in each hidden layer of the neural network [30]. In this study, the range for the number of nodes was set between 1 and 32.

Hyperparameter tuning

In the hyperparameter tuning process, this study employs a random search strategy with an iteration termination condition set to 60 iterations. Additionally, since both hyperparameter selection and model performance estimation are performed on the same dataset, traditional K-fold cross-validation may result in an overly optimistic evaluation of model performance. To overcome this issue, nested cross-validation techniques are used to address common problems related to overfitting and data bias [31]. In our study, threefold cross-validation was applied to generate different inner training and validation sets (inner resampling), while fivefold cross-validation was used to create different non-test and test sets (outer resampling). This nested sample resampling approach allows for a more accurate evaluation of model performance and facilitates hyperparameter tuning. Furthermore, an early stopping strategy was employed to halt training when model performance stopped improving, preventing overfitting. The Adam optimizer was used to achieve optimal results in a short time [32].

Comparison of dementia prediction models

To objectively compare the performance of the DeepSurv, DeepHit, Kaplan–Meier estimator, and CoxPH models built on the UK Biobank dataset, a multi-model comparison approach using benchmarking was employed in this study (https://mlr3benchmark.mlr-org.com/index.html). Harrell's C-index: Also known as the concordance index (C-index), is used to assess the predictive ability of a model and reflects its discriminatory power—i.e., the model’s ability to make accurate predictions [33]. D-Calibration Survival Measure: This measure indicates whether the probability estimates produced by the model's predictions are meaningful. It evaluates the calibration of the models by calculating their calibration level and comparing them to determine which model is better calibrated [34]. The formula for the calibration statistic s is as follows: \(s=B/n {\sum }_{i}{\left({P}_{i}-n/B\right)}^{2}\), where B denotes the number of ‘buckets’, n denotes the number of predictions, and Pi denotes the number of predicted deaths (illnesses) in the ith interval ([0, 100/B), [100/B, 50/B), …., [(B—100)/B, 1) within the predicted number of deaths (illnesses). In this method, the degree of calibration is assessed by calculating the detection statistic s. If si < sj, model i is considered better calibrated than model j. Conversely, if sj < si, model j is considered better calibrated than model i.

Development and interpretation of the DeepSurv dementia prediction model

In this study, the performance of the DeepSurv, DeepHit, Kaplan–Meier estimator, and CoxPH models was compared. The final selection of the DeepSurv model as the optimal choice for building the dementia prediction model, named the DeepSurv Dementia Prediction Model, was based on the results of two assessment metrics: Harrell's C-index and the D-Calibration Survival Measure.

The global interpretation of the DeepSurv Dementia Prediction Model is as follows: Brier score (BS): The BS measures the accuracy of probabilistic predictions, serving as an indicator of the model's calibration. It assesses the discrepancy between the predicted probabilities and the actual outcomes. The BS is one of the most commonly used evaluation metrics for this purpose [35]. Consequently, the importance of a feature in the model can be assessed by examining its impact on the model's calibration level. Partial dependence plot (PDP): A PDP is a tool used in machine learning for model interpretation. It illustrates the marginal effect of a particular feature on the prediction of a model, while accounting for the effects of all other features. PDPs offer visualizations of the relationship between the outcome and the feature, clearly showing whether the relationship is linear, monotonic, or more complex [36, 37].

Creation and implementation of the UKB-DementiaPre tool

To facilitate the wider dissemination of the DeepSurv dementia prediction model developed in this study, we chose to deploy the model on an AliCloud server and named it the UKB-DementiaPre Tool. This will allow a larger number of individuals to utilize the application, benefiting clinical practices and providing a foundation for predicting dementia risk in the population. The model can be accessed via the provided link or QR code associated with the UKB-DementiaPre Tool.

Statistical analyses

The baseline features (the top 30 most important features) of all participants included in the study were statistically analyzed. The Kolmogorov–Smirnov test, suitable for testing the normality of large sample-size data [38], was employed to assess the data distribution of the two groups: participants diagnosed with dementia and those without a dementia diagnosis. Normally distributed data exhibit symmetry, where the mean effectively represents the central tendency, and variance measures the degree of dispersion. Conversely, non-normally distributed data tend to display skewness or extreme values; in such cases, the median is a more robust measure of central tendency, unaffected by outliers, while quartiles provide a reliable measure of dispersion [39]. As a result, numerical data for features are expressed as mean ± standard deviation for normally distributed variables and as median (interquartile range, IQR) for non-normally distributed variables. Categorical data are presented as frequencies and proportions. Additionally, the chi-square test and Wilcoxon rank-sum test (using the epiDisplay package) were employed to compare features between participants with and without a dementia diagnosis during the follow-up period. All statistical analyses were conducted using R software (version 4.1.0). In these analyses, the significance threshold for rejecting the null hypothesis (indicating no difference between groups) was established at a P-value < 0.05 for two-sided tests [40]. Results were considered statistically significant when P < 0.05.

Results

Baseline characteristics

A total of 6,914 participants were newly diagnosed with dementia during the follow-up study, which included 41,484 UK Biobank participants. The mean follow-up period was 12.6 years. A statistical summary of the baseline data is presented in Table 1. Participants with new-onset dementia were found to have a higher proportion of APOE (Ɛ4) carriers at baseline, were older at enrollment, and had a higher proportion receiving attendance, disability, or mobility allowances. They also had a higher proportion of individuals with a long-standing illness, disability, or infirmity. Additionally, these participants showed higher proportions of a history of diabetes, a history of stroke, slower cognitive function-reaction times, a higher prevalence of family history of AD or dementia, Parkinson’s disease, and a higher PRS for dementia. They also had a lower average total household income before tax and a lower proportion of homeownership.

Table 1 Descriptive statistics of baseline characteristics for UK Biobank participants included in the study

Full size table

Performance evaluation of dementia prediction models

As illustrated in Fig. 3, each data point represents the outcome of a specific evaluation of the DeepSurv, DeepHit, Kaplan–Meier estimator, and CoxPH models, with five evaluations conducted for each model. A greater divergence in the distribution of the data points or a wider span of the confidence intervals for each model indicates a more pronounced discrepancy in the results of the five model performance assessments. Figure 3A shows that the mean Harrell's C-index for the DeepSurv, DeepHit, CoxPH, and Kaplan–Meier models were 0.743, 0.633, 0.749, and 0.500, respectively. This suggests that, among the four models, the DeepSurv and CoxPH models—both designed to predict the onset of dementia—demonstrated superior discriminatory ability. Figure 3B presents the average D-Calibration Survival Measure for the DeepSurv, DeepHit, CoxPH, and Kaplan–Meier models, which were 6.014, 4408.086, 32,274.743, and 1.508, respectively. These results indicate that the DeepSurv model exhibited superior calibration and provided more meaningful probability estimates for model predictions. In contrast, the CoxPH model showed inferior calibration in predicting dementia onset. Therefore, to achieve an optimal balance between discriminative ability and calibration, the DeepSurv model was ultimately selected as the most suitable for constructing the dementia prediction model in this study. Additionally, the optimal hyperparameters for the DeepSurv model were extracted, and the learner parameters were updated accordingly.

Global interpretation of the DeepSurv dementia prediction model

To assess the importance of features (predictor variables) in the DeepSurv dementia prediction model, the Brill score was initially employed in this study. A higher Brill score indicates that the feature is more important to the model. As shown in Fig. 4, the final 30 features included in the DeepSurv model were ranked based on their relative importance. The results revealed that APOE (Ɛ4) carriage, age, cognitive function-reaction time, history of diabetes mellitus, family history (mother's disease), and consumption of eggs, dairy, wheat, and sugar, as well as the percentage of right leg fat, were among the most influential variables in the model.

To further explore the relationship between individual features and dementia over time, a PDP was employed. As illustrated in Fig. 5, the following key observations were made: 1) Age: Older participants had a higher risk of developing dementia. 2) Genetic factors: APOE (Ɛ4) carriers exhibited a higher PRS for dementia. Participants with these genetic features were at increased risk if their mothers had AD/dementia, chronic bronchitis/emphysema, or major depression. Chronic illnesses and underlying physical conditions were also identified as risk factors. 3) Medical history and physical factors: The following factors were identified to be associated with an increased risk of dementia: history of diabetes, history of stroke, lower right leg fat ratio or left leg fat mass, receipt of attendance/disability/mobility allowance, long-term illness with disability or infirmity, living in sheltered accommodation or care homes, a higher number of self-reported non-cancer illnesses, more falls in the past year, use of prescription medication, lower peak expiratory flow rate, and doctor's diagnosis of other serious illness/disability. The presence of these features indicated a higher risk of dementia. 4) Economic factors: Lower average gross pre-tax household income, renting without owning a home, and being unemployed or unable to work due to illness or disability were associated with a higher risk of dementia. Additionally, a longer cognitive function-response time was linked to an increased dementia risk. Conversely, participants who frequently or mostly drove at high speeds on the motorway exhibited a lower risk of dementia. 5) Dietary and lifestyle features: Participants who consumed eggs (or egg-containing products), dairy products, wheat, and sugar, or who did not consume sugar or sugar-containing foods or drinks, exhibited a lower risk of dementia. Conversely, participants with longer daily TV time or engaged in physically demanding manual work (e.g., carpentry, digging) were found to have a higher risk of dementia. Interestingly, participants who consumed more alcohol 10 years ago (compared to their current intake) had a lower risk of dementia. No significant association was found between changes in alcohol intake (more, about the same, or less) and dementia risk. 6) Psychiatric factors: Participants who experienced no moodiness, nervousness, or lack of interest over the past two weeks were found to have a lower risk of dementia.

Deployment of the UKB-DementiaPre tool

To facilitate the clinical translation of the established predictive models, the UKB-DementiaPre Tool can be accessed via the following link: http://8.137.113.161:3838/UKBDementiaPre/ or by scanning the UKB-DementiaPre Tool QR code (Supplementary Material, Figure S1). The layout of the UKB-DementiaPre Tool page is shown in Fig. 6. Additionally, a concise overview of how to use the UKB-DementiaPre Tool is provided in Supplementary Material, Introduction 1.

Discussion

The primary findings of this study are as follows: Neural network survival analysis represents the state-of-the-art technique for survival analysis. In this study, a dynamic, individualized dementia risk prediction model for individuals aged 60 and above was developed using the DeepSurv model. The model was based on data from 41,484 participants with a mean follow-up of 12.6 years from the UK Biobank, incorporating both genetic and clinical factors. To make the DeepSurv dementia prediction model clinically applicable, we developed and deployed it on an AliCloud server, where it can be accessed via the provided link or QR code. Additionally, this study identified the top 30 features out of 213 that were most important to the model. A global interpretation of these 30 features was provided, offering deeper insights into the relationship between these features and dementia. This understanding is expected to aid in the early identification and prevention of dementia risk.

Survival analysis is a common method for analyzing medical time-to-event data. It is primarily used to examine statistical patterns of events (such as recurrence, death, or cure) over time in longitudinal studies. Through survival analysis, potential sensitive or risk factors can be further identified [41, 42]. The CoxPH model and the Kaplan–Meier estimator are traditional survival analysis models, while the DeepSurv and DeepHit models are survival analysis models based on deep learning techniques [43]. DeepSurv is a nonlinear version of the CoxPH model that leverages deep learning techniques. It is a neural network designed to predict the effect of patient covariates on their hazard rates by learning network weights [44]. Moreover, the DeepSurv model integrates deep learning concepts with the CoxPH model, expressing the sensitive factor risk function as a multilayer perceptron and incorporating additional nonlinear activation functions and techniques such as dropout [12]. In this study, the mean Harrell's C-index values for the DeepSurv, DeepHit, CoxPH, and Kaplan–Meier models were 0.743, 0.633, 0.749, and 0.743, respectively. The average D-Calibration Survival Measures for the four models were 6.014, 4408.086, 32,274.743, and 1.508, respectively. These results demonstrated that the DeepSurv model outperformed the other models in balancing discriminative ability and calibration. Standardization and Min–Max scaling are two commonly used scaling methods in machine learning. Standardization transforms the data into a distribution with a mean of 0 and a standard deviation of 1, which helps prevent certain features from disproportionately influencing the model. It also improves the stability and convergence speed of the model [26]. Min–Max scaling scales the data to a specified range (usually [0, 1]), transforming each feature’s minimum value to 0 and maximum value to 1. Although Min–Max scaling ensures all features are on the same scale, it does not handle outliers well [45]. In this study, we chose Standardization based on the meaning and characteristics of the data in clinical contexts. We acknowledge that other scaling methods could potentially improve model performance; however, due to the time-consuming nature and high computational cost of model training and comparison, we did not further evaluate the impact of different scaling methods on model performance. Future research should explore this aspect further. The efficacy and functionality of neural networks depend not only on the network configuration and parameters established during training but also on the calibration of hyperparameters [46]. Commonly used hyperparameters in neural networks include dropout, weight decay, learning rate, and the number of nodes per layer [23, 47]. For model training in this study, we employed three-fold cross-validation to generate distinct inner training and validation sets (inner layer resampling), and five-fold cross-validation for different non-test and test sets (outer layer resampling). This approach allowed for a more precise evaluation of model performance and facilitated hyperparameter tuning.

In the context of the established DeepSurv dementia prediction model, interpreting the model features is crucial. To this end, this study employed the BS to assess feature importance. Additionally, PDP was used to provide further clarity regarding the nature of the relationship between features and outcomes, whether it is linear, monotonic, or more complex. Age is widely recognized as the most significant risk factor for dementia [48]. Dementia, particularly AD, results from a combination of genetic and environmental factors [49]. The PRS, which aggregates the effects of numerous disease-related genetic variants into a single score, has shown predictive value for a range of prevalent conditions, including dementia [50, 51]. The study found that participants whose mothers had AD exhibited an elevated risk of developing dementia, a finding consistent with previous research [52,53,54]. Additionally, a strong correlation exists between chronic illnesses, physical frailty, and the risk of dementia [55, 56]. A history of diabetes was identified as a significant risk factor for dementia among the participants [57, 58]. Moreover, a history of stroke was found to be a significant risk factor for dementia. Post-stroke cognitive impairment, which occurs between three and six months after a stroke, is characterized by specific regional cognitive deficits related to the location of stroke damage [59]. The results of this study further confirmed a significant correlation between poor physical health and an elevated risk of dementia. These findings are consistent with previous studies, which have shown that poor physical health and the presence of multiple health conditions are associated with an increased risk of dementia [60]. An elevated self-reported number of non-cancerous diseases and a higher prevalence of other severe conditions or disabilities diagnosed by a medical professional were significantly associated with an increased risk of dementia. Prior research has established that the presence of multiple diseases is associated with an elevated risk of developing dementia, AD, and VD. Furthermore, a robust correlation exists between economic status and the likelihood of developing dementia [61]. The study also demonstrated that an increase in reaction time variability or an elongation of the mean reaction time is associated with an elevated risk of developing dementia within the subsequent four years [62]. More detailed discussion of the possible mechanisms by which the features in this dementia prediction model are associated with dementia can be found in Supplementary material, Discussion 1.

To facilitate the clinical utilization of the DeepSurv dementia prediction model developed in this study, we have deployed the trained model on Alibaba Cloud servers. It can be accessed via a link or QR code. The development and deployment of this application will support its use in clinical settings and serve as a valuable tool for predicting dementia risk in the population.

Strengths and limitations

Main strengths: 1) This study is based on data from 41,484 participants in the UK Biobank, with an average follow-up time of 12.6 years. By combining rich genetic and clinical data, the study employs the most advanced survival analysis techniques to establish a dynamic and personalized dementia risk prediction model for individuals aged 60 and above. 2) To facilitate clinical translation, the DeepSurv dementia prediction model developed in this study is deployed on Alibaba Cloud servers and can be accessed for free through a link or QR code. 3) This study identified the top 30 features from a total of 213 and provided a global explanation of the model, offering valuable insights for future research on the primary prevention of dementia and further advancements in dementia prevention.

Limitations of this study: 1) The UK Biobank cohort predominantly represents European populations, mainly of white ethnicity. While most of the characteristics (predictor variables) included in the model are well-established factors influencing dementia risk, the DeepSurv dementia prediction model developed in this study may not be directly applicable to populations in other countries or regions. When used in different regions, some variables may need to be adjusted and further validated based on local demographics. 2) The age of the population included in this study was 60 years and above, which limits the applicability of the DeepSurv dementia prediction model to individuals outside this age range. 3) Although the UK Biobank’s dementia diagnoses are derived from hospital records and are updated dynamically, some participants may not have received regular or timely medical treatment. This limitation may affect the accuracy and generalizability of the model’s training data.

Conclusion

This study successfully developed and validated the DeepSurv dementia prediction model for individuals aged 60 and above by integrating genetic and clinical data. The model was then deployed on AliCloud servers to facilitate clinical translation. It is anticipated that this prediction model will provide more accurate treatment decision support in clinical practice and serve as a valuable reference for the primary prevention of dementia.

Data availability

The data underpinning our study's findings are accessible from the UK Biobank. However, due to a rigorous approval process, access to these data is restricted, and they are not publicly available.

Abbreviations

AD:: Alzheimer's disease
BS:: Brier Score
C-index:: Concordance index
CoxPH:: Cox proportional hazards Cox
DeepSurv:: Deep Learning Survival Analysis
GBM:: Gradient Boosting Machine
GWAS:: Genome-Wide Association Studies
IV:: Instrumental Variable
IVW:: Inverse variance weighted
LightGBM:: Light Gradient Boosting Machine
NN:: Neural network
PDP:: Partial Dependence Plot
PRS :: Polygenic risk score
SNPs:: Single nucleotide polymorphisms

References

Fong TG, Inouye SK. The inter-relationship between delirium and dementia: the importance of delirium prevention. Nat Rev Neurol. 2022;18:579–96.
Article PubMed PubMed Central Google Scholar
Heng X, Liu X, Li N, Lin J, Zhou X. Spatial disparity and factors associated with dementia mortality: A cross-sectional study in Zhejiang Province. China Front Public Health. 2023;11:1100960.
Article PubMed Google Scholar
Page A, Potter K, Clifford R, McLachlan A, Etherton-Beer C. Prescribing for Australians living with dementia: study protocol using the Delphi technique. BMJ Open. 2015;5: e008048.
Article PubMed PubMed Central Google Scholar
Langa KM, Larson EB, Crimmins EM, Faul JD, Levine DA, Kabeto MU, et al. A Comparison of the Prevalence of Dementia in the United States in 2000 and 2012. JAMA Intern Med. 2017;177:51–8.
Article PubMed PubMed Central Google Scholar
Wu YT, Fratiglioni L, Matthews FE, Lobo A, Breteler MM, Skoog I, et al. Dementia in western Europe: epidemiological evidence and implications for policy making. Lancet Neurol. 2016;15:116–24.
Article CAS PubMed Google Scholar
Tao M, Liu H, Cheng J, Yu C, Zhao L. Motor-Cognitive Interventions May Effectively Improve Cognitive Function in Older Adults with Mild Cognitive Impairment: A Randomized Controlled Trial. Behav Sci (Basel). 2023;13:737.
Article PubMed PubMed Central Google Scholar
Walters K, Hardoon S, Petersen I, Iliffe S, Omar RZ, Nazareth I, et al. Predicting dementia risk in primary care: development and validation of the Dementia Risk Score using routinely collected data. BMC Med. 2016;14:6.
Article CAS PubMed PubMed Central Google Scholar
Park KM, Sung JM, Kim WJ, An SK, Namkoong K, Lee E, et al. Population-based dementia prediction model using Korean public health examination data: A cohort study. PLoS One. 2019;14:e0211957.
Article CAS PubMed PubMed Central Google Scholar
Wang L, Li P, Hou M, Zhang X, Cao X, Li H. Construction of a risk prediction model for Alzheimer’s disease in the elderly population. BMC Neurol. 2021;21:271.
Article PubMed PubMed Central Google Scholar
Merkin A, Krishnamurthi R, Medvedev ON. Machine learning, artificial intelligence and the prediction of dementia. Curr Opin Psychiatr. 2022;35:123–9.
Article Google Scholar
Li W, Lin S, He Y, Wang J, Pan Y. Deep learning survival model for colorectal cancer patients (DeepCRC) with Asian clinical data compared with different theories. Arch Med Sci. 2023;19:264–9.
Article PubMed PubMed Central Google Scholar
Katzman JL, Shaham U, Cloninger A, Bates J, Jiang T, Kluger Y. DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network. BMC Med Res Methodol. 2018;18:24.
Article PubMed PubMed Central Google Scholar
Steinfeldt J, Buergel T, Loock L, Kittner P, Ruyoga G, Zu BJ, et al. Neural network-based integration of polygenic and clinical information: development and validation of a prediction model for 10-year risk of major adverse cardiac events in the UK Biobank cohort. Lancet Digit Health. 2022;4:e84-94.
Article CAS PubMed Google Scholar
Sudlow C, Gallacher J, Allen N, Beral V, Burton P, Danesh J, et al. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. Plos Med. 2015;12: e1001779.
Article PubMed PubMed Central Google Scholar
Raisi-Estabragh Z, Petersen SE. Cardiovascular research highlights from the UK Biobank: opportunities and challenges. Cardiovasc Res. 2020;116:e12–5.
Article CAS PubMed Google Scholar
Marioni RE, Harris SE, Zhang Q, McRae AF, Hagenaars SP, Hill WD, et al. GWAS on family history of Alzheimer’s disease. Transl Psychiat. 2018;8:99.
Article Google Scholar
Jansen IE, Savage JE, Watanabe K, Bryois J, Williams DM, Steinberg S, et al. Genome-wide meta-analysis identifies new loci and functional pathways influencing Alzheimer’s disease risk. Nat Genet. 2019;51:404–13.
Article CAS PubMed PubMed Central Google Scholar
Leng Y, Ackley SF, Glymour MM, Yaffe K, Brenowitz WD. Genetic Risk of Alzheimer’s Disease and Sleep Duration in Non-Demented Elders. Ann Neurol. 2021;89:177–81.
Article CAS PubMed Google Scholar
Fan M, Sun D, Zhou T, Heianza Y, Lv J, Li L, et al. Sleep patterns, genetic susceptibility, and incident cardiovascular disease: a prospective study of 385 292 UK biobank participants. Eur Heart J. 2020;41:1182–9.
Article PubMed Google Scholar
Sharma A, Verbeke W. Understanding importance of clinical biomarkers for diagnosis of anxiety disorders using machine learning models. PLoS One. 2021;16:e0251365.
Article CAS PubMed PubMed Central Google Scholar
WS Miceforest. Github. https://github.com/AnotherSamWilson/miceforest. 2021.
Wang Q, Hall GJ, Zhang Q, Comella S. Predicting implementation of response to intervention in math using elastic net logistic regression. Front Psychol. 2024;15:1410396.
Article PubMed PubMed Central Google Scholar
Prasanna C, Realmuto J, Anderson A, Rombokas E, Klute G. Using Deep Learning Models to Predict Prosthetic Ankle Torque. Sensors (Basel). 2023;23:7712.
Article PubMed PubMed Central Google Scholar
Huang D, Chen K, Song B, Wei Z, Su J, Coenen F, et al. Geographic encoding of transcripts enabled high-accuracy and isoform-aware deep learning of RNA methylation. Nucleic Acids Res. 2022;50:10290–310.
Article CAS PubMed PubMed Central Google Scholar
Kang IA, Njimbouom SN, Kim JD. Optimal Feature Selection-Based Dental Caries Prediction Model Using Machine Learning for Decision Support System. Bioengineering (Basel). 2023;10:245.
Article PubMed PubMed Central Google Scholar
Liu Y, Fan L, Wang L. Urban virtual environment landscape design and system based on PSO-BP neural network. Sci Rep-UK. 2024;14:13747.
Article CAS Google Scholar
Rozet A, Kronish IM, Schwartz JE, Davidson KW. Using Machine Learning to Derive Just-In-Time and Personalized Predictors of Stress: Observational Study Bridging the Gap Between Nomothetic and Ideographic Approaches. J Med Internet Res. 2019;21:e12910.
Article PubMed PubMed Central Google Scholar
Sanga P, Singh J, Dubey AK, Khanna NN, Laird JR, Faa G, et al. DermAI 1.0: A Robust, Generalized, and Novel Attention-Enabled Ensemble-Based Transfer Learning Paradigm for Multiclass Classification of Skin Lesion Images. Diagnostics. 2023;13:3159.
Article PubMed PubMed Central Google Scholar
Yang W, Zhang X, Lei Q, Cheng X. Research on Longitudinal Active Collision Avoidance of Autonomous Emergency Braking Pedestrian System (AEB-P). Sensors (Basel). 2019;19:4671.
Article PubMed PubMed Central Google Scholar
Nguyen TP, Cho MY. Insulator Leakage Current Prediction Using Hybrid of Particle Swarm Optimization and Gene Algorithm-Based Neural Network and Surface Spark Discharge Data. Comput Intel Neurosc. 2022;2022:6379141.
Google Scholar
Jiao SJ, Liu LY, Liu Q. A Hybrid Deep Learning Model for Recognizing Actions of Distracted Drivers. Sensors (Basel). 2021;21:7424.
Article PubMed PubMed Central Google Scholar
DP Kingma, JL Ba. ADAM: A method for stochastic optimization. Cornell University - arXiv. 2014.
Harrell FJ, Califf RM, Pryor DB, Lee KL, Rosati RA. Evaluating the yield of medical tests. JAMA. 1982;247:2543–6.
Article PubMed Google Scholar
Haider H, Hoehn B, Davis S, Greiner R. Effective Ways to Build and Evaluate Individual Survival Distributions. J Mach Learn Res. 2020;21:1–63.
Google Scholar
Trigg LE, Lyons S, Mullan S. Risk factors for, and prediction of, exertional heat illness in Thoroughbred racehorses at British racecourses. Sci Rep-UK. 2023;13:3063.
Article CAS Google Scholar
Tsuzuki S, Fujitsuka N, Horiuchi K, Ijichi S, Gu Y, Fujitomo Y, et al. Factors associated with sufficient knowledge of antibiotics and antimicrobial resistance in the Japanese general population. Sci Rep-UK. 2020;10:3502.
Article CAS Google Scholar
Zhou Q, Soldat DJ. Creeping Bentgrass Yield Prediction With Machine Learning Models. Front Plant Sci. 2021;12:749854.
Article PubMed PubMed Central Google Scholar
Salwa M, Islam S, Tasnim A, Al MM, Bhuiyan MR, Choudhury SR, et al. Health Literacy Among Non-Communicable Disease Service Seekers: A Nationwide Finding from Primary Health Care Settings of Bangladesh. Health Lit Res Pract. 2024;8:e12-20.
PubMed PubMed Central Google Scholar
Zou X, Ren Y, Yang H, Zou M, Meng P, Zhang L, et al. Screening and staging of chronic obstructive pulmonary disease with deep learning based on chest X-ray images and clinical parameters. BMC Pulm Med. 2024;24:153.
Article CAS PubMed PubMed Central Google Scholar
Sayed HY, Ghaly RM, Mostafa AA, Hemeda MS. Cardiovascular effects and clinical outcomes in acute opioid toxicity: A case-control study from Port Said and Damietta Governorates Egypt. Toxicol Rep. 2024;13: 101756.
Article CAS PubMed PubMed Central Google Scholar
Johnson LL, Shih JH. CHAPTER 20 - An introduction to survival analysis. Academic Press; 2007. p. 273–82. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/B978-012369440-9/50024-4.
Bashiri A, Ghazisaeedi M, Safdari R, Shahmoradi L, Ehtesham H. Improving the Prediction of Survival in Cancer Patients by Using Machine Learning Techniques: Experience of Gene Expression Data: A Narrative Review. Iran J Public Health. 2017;46:165–72.
PubMed PubMed Central Google Scholar
Feng J, Zhang H, Li F. Investigating the relevance of major signaling pathways in cancer survival using a biologically meaningful deep learning model. BMC Bioinformatics. 2021;22:47.
Article PubMed PubMed Central Google Scholar
Chen JB, Yang HS, Moi SH, Chuang LY, Yang CH. Identification of mortality-risk-related missense variant for renal clear cell carcinoma using deep learning. Ther Adv Chronic Dis. 2021;12:1755284400.
Article Google Scholar
Kaur G, Rana PS, Arora V. State-of-the-art techniques using pre-operative brain MRI scans for survival prediction of glioblastoma multiforme patients and future research directions. Clin Transl Imaging. 2022;10:355–89.
Article PubMed PubMed Central Google Scholar
Surianarayanan C, Lawrence JJ, Chelliah PR, Prakash E, Hewage C. A Survey on Optimization Techniques for Edge Artificial Intelligence (AI). Sensors (Basel). 2023;23:1279.
Article PubMed PubMed Central Google Scholar
Lin Y, Zhang W, Cao H, Li G, Du W. Classifying Breast Cancer Subtypes Using Deep Neural Networks Based on Multi-Omics Data. Genes (Basel). 2020;11:888.
Article CAS PubMed PubMed Central Google Scholar
Fayosse A, Nguyen DP, Dugravot A, Dumurgier J, Tabak AG, Kivimaki M, et al. Risk prediction models for dementia: role of age and cardiometabolic risk factors. BMC Med. 2020;18:107.
Article PubMed PubMed Central Google Scholar
Xu W, Tan L, Wang HF, Jiang T, Tan MS, Tan L, et al. Meta-analysis of modifiable risk factors for Alzheimer’s disease. J Neurol Neurosur PS. 2015;86:1299–306.
Google Scholar
Lambert SA, Abraham G, Inouye M. Towards clinical utility of polygenic risk scores. Hum Mol Genet. 2019;28:R133–42.
Article CAS PubMed Google Scholar
Chen H, Chen J, Cao Y, Sun Y, Huang L, Ji JS, et al. Sugary beverages and genetic risk in relation to brain structure and incident dementia: a prospective cohort study. Am J Clin Nutr. 2023;117:672–80.
Article PubMed Google Scholar
Edland SD, Silverman JM, Peskind ER, Tsuang D, Wijsman E, Morris JC. Increased risk of dementia in mothers of Alzheimer’s disease cases: evidence for maternal inheritance. Neurology. 1996;47:254–6.
Article CAS PubMed Google Scholar
Gomez-Tortosa E, Barquero MS, Baron M, Sainz MJ, Manzano S, Payno M, et al. Variability of age at onset in siblings with familial Alzheimer disease. Arch Neurol. 2007;64:1743–8.
Article PubMed Google Scholar
Oh DJ, Bae JB, Lipnicki DM, Han JW, Sachdev PS, Kim TH, et al. Parental history of dementia and the risk of dementia: A cross-sectional analysis of a global collaborative study. Psychiat Clin Neuros. 2023;77:449–56.
Article Google Scholar
Shang X, Roccati E, Zhu Z, Kiburg K, Wang W, Huang Y, et al. Leading mediators of sex differences in the incidence of dementia in community-dwelling adults in the UK Biobank: a retrospective cohort study. Alzheimers Res Ther. 2023;15:7.
Article CAS PubMed PubMed Central Google Scholar
Zhang JJ, Wu ZX, Tan W, Liu D, Cheng GR, Xu L, et al. Associations among multidomain lifestyles, chronic diseases, and dementia in older adults: a cross-sectional analysis of a cohort study. Front Aging Neurosci. 2023;15:1200671.
Article PubMed PubMed Central Google Scholar
Ninomiya T. Diabetes mellitus and dementia. Curr Diabetes Rep. 2014;14:487.
Article Google Scholar
Chatterjee S, Peters SA, Woodward M, Mejia AS, Batty GD, Beckett N, et al. Type 2 Diabetes as a Risk Factor for Dementia in Women Compared With Men: A Pooled Analysis of 2.3 Million People Comprising More Than 100,000 Cases of Dementia. Diabetes Care. 2016;39:300–7.
Article CAS PubMed Google Scholar
Rost NS, Brodtmann A, Pase MP, van Veluw SJ, Biffi A, Duering M, et al. Post-Stroke Cognitive Impairment and Dementia. Circ Res. 2022;130:1252–71.
Article CAS PubMed Google Scholar
Minami Y, Tsuji I, Fukao A, Hisamichi S, Asano H, Sato M, et al. Physical status and dementia risk: a three-year prospective study in urban Japan. Int J Soc Psychiatr. 1995;41:47–54.
Article CAS Google Scholar
Cooper C, Lodwick R, Walters K, Raine R, Manthorpe J, Iliffe S, et al. Inequalities in receipt of mental and physical healthcare in people with dementia in the UK. Age Ageing. 2017;46:393–400.
CAS PubMed Google Scholar
Kochan NA, Bunce D, Pont S, Crawford JD, Brodaty H, Sachdev PS. Reaction Time Measures Predict Incident Dementia in Community-Living Older Adults: The Sydney Memory and Ageing Study. Am J Geriat Psychiat. 2016;24:221–31.
Article Google Scholar

Download references

Funding

This study was funded by Guangdong Provincial Key Laboratory of Traditional Chinese Medicine Informatization (Grant No. 2021B1212040007), Special Projects for Scientific and Technological Research in Chinese Medicine and Ethnomedicine (QZYY-2024–035).

Author information

Shiqi Yuan and Qing Liu contributed equally to this work.

Authors and Affiliations

Department of Neurology, The First Affiliated Hospital of Jinan University, No.613, Huangpu Road West, Guangzhou, Guangdong Province, 510630, China
Shiqi Yuan, Xiaxuan Huang, Shanyuan Tan, Zihong Bai & Yitong Ling
Department of Neurology, The Second People’s Hospital of Guiyang (The Affiliated Jinyang Hospital of Guizhou Medical University), Guiyang, Guizhou Province, 550000, China
Shiqi Yuan, Qing Liu, Juan Yu, Fazhen Lei, Huan Le, Qingqing Ye, Xiaoxue Peng & Juying Yang
Department of Clinical Research, The First Affiliated Hospital of Jinan University, No.613, Huangpu Road West, Guangzhou, Guangdong Province, 510630, China
Jun Lyu
Guangdong Provincial Key Laboratory of Traditional Chinese Medicine Informatization, Guangzhou, Guangdong, 510630, China
Jun Lyu

Authors

Shiqi Yuan
View author publications
You can also search for this author inPubMed Google Scholar
Qing Liu
View author publications
You can also search for this author inPubMed Google Scholar
Xiaxuan Huang
View author publications
You can also search for this author inPubMed Google Scholar
Shanyuan Tan
View author publications
You can also search for this author inPubMed Google Scholar
Zihong Bai
View author publications
You can also search for this author inPubMed Google Scholar
Juan Yu
View author publications
You can also search for this author inPubMed Google Scholar
Fazhen Lei
View author publications
You can also search for this author inPubMed Google Scholar
Huan Le
View author publications
You can also search for this author inPubMed Google Scholar
Qingqing Ye
View author publications
You can also search for this author inPubMed Google Scholar
Xiaoxue Peng
View author publications
You can also search for this author inPubMed Google Scholar
Juying Yang
View author publications
You can also search for this author inPubMed Google Scholar
Yitong Ling
View author publications
You can also search for this author inPubMed Google Scholar
Jun Lyu
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

Jun Lyu, Yitong Ling, Qing Liu: Design research; Shiqi Yuan, Qing Liu, Xiaxuan Huang: First written manuscript; Shiqi Yuan, Yitong Ling, Xiaxuan Huang, Shanyuan Tan, Zihong Bai: Participate in data analysis; Juan Yu, Fazhen Lei, Huan Le, Qingqing Ye, Xiaoxue Peng, Juying Yang: Provide comments and changes to the manuscript. All authors reviewed the manuscript.

Corresponding authors

Correspondence to Yitong Ling or Jun Lyu.

Ethics declarations

Ethics approval and consent to participate

All procedures in the UK Biobank study were conducted in accordance with ethical standards at both the institutional and national levels, as well as the Helsinki Declaration of 1975 (revised in 2008) (5). Additionally, it verifies that all participants provided written informed consent and were enrolled in the study only after approval from the Northwest Multicenter Research Ethics Committee (11/NW/0382).

Consent for publication

All authors gave their consent for the publication of this article.

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Material 1.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Yuan, S., Liu, Q., Huang, X. et al. Development of an individualized dementia risk prediction model using deep learning survival analysis incorporating genetic and environmental factors. Alz Res Therapy 16, 278 (2024). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13195-024-01663-w

Download citation

Received: 13 October 2024
Accepted: 20 December 2024
Published: 30 December 2024
DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13195-024-01663-w

Development of an individualized dementia risk prediction model using deep learning survival analysis incorporating genetic and environmental factors

Abstract

Background

Methods and results

Conclusion

Background

Methods

Data source: UK biobank

Inclusion and exclusion criteria

Definition of dementia outcome

Comparative analysis of dementia prediction models

Determination of features

Inclusion of features

Feature filtering and missing value interpolation

Model development and evaluation

Feature engineering

Hyperparameter tuning

Hyperparameter tuning

Comparison of dementia prediction models

Development and interpretation of the DeepSurv dementia prediction model

Creation and implementation of the UKB-DementiaPre tool

Statistical analyses

Results

Baseline characteristics

Performance evaluation of dementia prediction models

Global interpretation of the DeepSurv dementia prediction model

Deployment of the UKB-DementiaPre tool

Discussion

Strengths and limitations

Conclusion

Data availability

Abbreviations

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher's Note

Supplementary Information

Supplementary Material 1.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Alzheimer's Research & Therapy

Contact us