Artificial intelligence for neurodegenerative experimental models
Alexi Nott, Carlo Sala Frigerio and Sandrine Willaime-Morawek contributed equally to this study.
Janice M. Ranson and David J. Llewellyn are considered as joint senior authors.
Abstract
INTRODUCTION
Experimental models are essential tools in neurodegenerative disease research. However, the translation of insights and drugs discovered in model systems has proven immensely challenging, marred by high failure rates in human clinical trials.
METHODS
Here we review the application of artificial intelligence (AI) and machine learning (ML) in experimental medicine for dementia research.
RESULTS
Considering the specific challenges of reproducibility and translation between other species or model systems and human biology in preclinical dementia research, we highlight best practices and resources that can be leveraged to quantify and evaluate translatability. We then evaluate how AI and ML approaches could be applied to enhance both cross-model reproducibility and translation to human biology, while sustaining biological interpretability.
DISCUSSION
AI and ML approaches in experimental medicine remain in their infancy. However, they have great potential to strengthen preclinical research and translation if based upon adequate, robust, and reproducible experimental data.
Highlights
- There are increasing applications of AI in experimental medicine.
- We identified issues in reproducibility, cross-species translation, and data curation in the field.
- Our review highlights data resources and AI approaches as solutions.
- Multi-omics analysis with AI offers exciting future possibilities in drug discovery.
1 INTRODUCTION
The past decades have seen a steep rise in the availability of quantitative biological data within the context of experimental medicine.1 Preclinical experimental models for dementia research are no exception with large amounts of genomic, cellular, and functional phenotyping data generated and released in relation to neurodegenerative diseases.2 As a testing ground for biological hypotheses and novel drugs, these models are of crucial importance to the field. A multitude of studies are published each year, with in vitro work spanning cell culture,3-5 induced pluripotent stem cell (iPSC)-derived cultures,6-8 organotypic slice cultures,9-11 and organoids,12-15 while most preclinical research using in vivo model systems focuses on rodents, including transgenic animals,16 knock-ins,17, 18 exposure models19, 20 and more recently, multi-species models, such as human-mouse chimeras.21 Other model species such as non-human primates offer some advantages over murine models due to their phylogenetic similarity with humans, longer lifespans, and natural presentation of histological, neuroanatomical, or cognitive features of disease pathology.22 Yet, despite this plethora of models, what stands out is the failure rate of clinical trials for neurodegenerative disease treatments, particularly in Alzheimer's disease (AD).23 This raises questions not only about the biological hypotheses underpinning drug discovery, but also the appropriateness of existing animal models and whether methods used to translate insights from the model to human biology are up to the task.24 Improvements to clinical translation will require high quality experimental work in robust and valid model systems, in which both experimental screening and validation,25 as well as improved prediction of clinical effectiveness harnessing artificial intelligence (AI) approaches and machine learning (ML) will be important. In this position paper we discuss AI approaches used in experimental medicine, specifically focusing on approaches used to translate between model systems and human disease biology. Any advanced data analytical approaches, including ML, require robust and reproducible data as input, but equally can contribute to improving reproducibility in experimental research. This review discusses the key challenges of reproducibility, cross-species translation, data curation, and interpretability of AI and ML approaches. We provide recommendations and future directions for driving forward progress in this relatively new field of application. This review is one of a series of eight articles in a Special Issue on “Artificial Intelligence for Alzheimer's Disease and Related Dementias” published in Alzheimer's & Dementia. Together, this series provides a comprehensive overview of current applications of AI to dementia, and future opportunities for innovation to accelerate research. Each review focuses on a different area of dementia research, including experimental models (this article), drug discovery and trials optimization,26 genetics and omics,27 biomarkers,28 neuroimaging,29 prevention,30 applied models and digital health,31 and methods optimization.32
2 REPRODUCIBILITY
2.1 Defining reproducibility
Reproducibility is the extent to which the results of a study can be recreated by applying the same analysis code and data used in the original study33 and can be stratified as computational, empirical (data), and statistical.34 Complementing this, replicability is often discussed in concert with reproducibility and refers to the degree to which a future study employing the same method produces the same scientific conclusions with independent analysis of new data. Both are essential to generate findings that are robust and generalizable and will be discussed collectively as reproducibility. Any advanced data analytical approaches, including ML, will only yield accurate and meaningful results if the underlying data are high quality and reproducible. Conversely, ML approaches can be used to improve computational aspects of reproducibility. Robust datasets should provide qualitatively similar outputs across analytical approaches and should be generalizable across studies.35 Publishing reproducible experiments will be critical for meta-analysis of datasets across laboratories, will accelerate advancements, and maximize research investments. To address this, the Turing Way project (the-turing-way.netlify.app) has provided open access guidance on project design, communication, collaboration, and ethics of reproducibility in data science. In addition, the FAIR Guiding Principles for scientific data management and stewardship have provided actionable recommendations to improve the Findability, Accessibility, Interoperability, and Reuse of digital assets.36 Reproducibility issues can also be inherent to the data type, for example, issues associated with technical noise and bias in single cell genomics data. Tools such as single-cell variational inference (scVI) use deep neural networks and stochastic optimization to account for batch and sensitivity effects when approximating gene expression distributions across cell types.37 Deep learning strategies can also map single-cell datasets onto existing references: single-cell architectural surgery (scArches)38 allows integration across experimental models (including mapping of disease-affected tissue onto control references).
RESEARCH IN CONTEXT
-
Systematic review: Experimental models in dementia research are important tools for fundamental medical research and drug discovery. Here we reviewed challenges in preclinical experimental neurodegenerative disease modeling and translation to clinic, highlighting machine learning (ML) and artificial intelligence (AI) approaches used to overcome these issues.
-
Interpretation: We identified four key challenges: a lack of reproducibility, poor data curation, species divergence, and insufficient interpretability. We offer recommendations and examples of how to address these challenges, using careful experimental design and targeted ML/AI approaches.
-
Future directions: While only recently adopted in preclinical dementia research, AI and ML models have great potential to improve prediction, diagnostics, and biological understanding of neurodegenerative diseases. With high quality, well-curated data and the specific adaptation of approaches including transfer learning, structural equation modeling (SEM), simulations and neural networks, both reproducibility and cross-species translation could be improved, while continued efforts should address the interpretability of these models.
2.2 Reproducibility issues in stem cell technologies
Acquiring high-quality ex vivo neural tissue samples directly from human patients is usually either ethically infeasible or logistically intractable. However, in recent years stem cell technologies such as human iPSCs have allowed brain cell types to be derived from patient biopsies, opening a new era for modeling neurodegenerative diseases. iPSC models have allowed researchers to study disease mechanisms and genotype-phenotype associations in cell type-specific, physiologically relevant human models. iPSCs are as genetically diverse as the donors they are derived from, enabling researchers to study sporadic disease and polygenic risk factors, while simultaneously presenting major challenges in reproducibility. The Human Induced Pluripotent Stem Cells Initiative (HipSci) reported that 5% to 46% of phenotype variation is due to individual genetic background.39 Another source of heterogeneity in iPSC models are somatic mutations that arise from environmental factors, such as UV exposure, and through the reprogramming process.40 Non-genetic contributions to heterogeneity include the differentiation protocols, as well as cell culture and storage conditions. A study across five laboratories using standardized protocols on identical iPSC lines found that laboratory-based sources of variation can overpower genotypic effects.41 The development of multiple iPSC-derived cell type co-culture systems and 3D organoids has allowed modeling of neurodegeneration at the tissue and organ level. However, heterogeneity remains a major challenge again due to genetic variability of iPSC lines, in addition to the complex protocols required for multi-cellular models that introduce further layers of non-genetic variabilities. Encouragingly, studies have reproducibly generated human brain organoids exhibiting consistent cell type diversity and developmental trajectories.42 However, consistency remains to be demonstrated for disease-relevant readouts. Challenges to reproducibility may also arise if the origin of the iPSC is patient-derived and allogeneic or genetically modified from a control group. Finally, iPSCs fail to capture gene regulatory signatures caused by environmental factors during a person's lifetime.43
2.3 Challenges to reproducibility in mouse models
Mice represent the most commonly used animal model in neurodegenerative disease research, complementing human and in vitro studies with a relatively quick reproduction time, inbred strains that minimize genetic heterogeneity, and easy commercial availability across multiple strains. For mouse studies the genetic background of the line can have a major influence on reproducibility.44 For example, mice expressing human amyloid precursor protein (hAPP) when backcrossed to different strains exhibit differences in viability,45 disease course,46 and neuronal excitability.47 Specific genetic loci have since been identified that modify net amyloid-β (Aβ) accumulation in these lines.48, 49 The Jackson laboratory recently crossed the 5xFAD mouse model of AD onto diverse genetic backgrounds to explore the contribution of genetic variation in AD. This approach more closely mirrored variation within human disease and identified marked effects of background-line specific genetic variation on the molecular and behavioral phenotypes of the AD model mice.50, 51 Such efforts are beyond the scope of most laboratories and may be prohibitively costly. However, we recommend whole-genome sequencing (WGS) of new genetic mouse lines by the host lab. To improve reproducibility, and to allow the identification and further investigation of key background genetic modifiers of the disease phenotype, ideally sequencing should be repeated after extensive backcrossing by academic laboratories, and if mice are bred to congenic lines by commercial suppliers.
3 TRANSLATING BETWEEN SPECIES
3.1 Quantifying translatability of model species
Translating biological insights from model systems, particularly non-human species to human disease biology, is one of the primary challenges in preclinical dementia research. Notably, reproducibility is a prerequisite for animal use in drug-discovery pipelines: Poor reproducibility and design of experiments undermine the practical relevance of any model system, impairing clinical trial design and translation.52 Beyond this, limitations of the models themselves further complicate and restrict what can be learned about human disease. Induced and genetically engineered models for Parkinson's disease (PD), AD, frontotemporal dementia (FTD), and amyotrophic lateral sclerosis (ALS) recreate initial biochemical events, such as misfolded protein aggregates, RNA toxicity or repeat expansion mutations,53-59 but often fail to reproduce the whole breadth of downstream cellular and phenotypic responses. For AD, effects of drug treatment are poorly predicted by current models, as highlighted by the divergent outcomes presented by different models when treated with the same drug; their efficacy varies by type of intervention and species, with best results achieved for cholinergic/glutaminergic drugs.60 For example, use of transgenic mice16 and macaques61 enabled prediction of cognitive and behavioral improvements from administration of donepezil.62 In the context of aging, dogs have been considered as suitable models for preclinical studies of AD, to investigate effects of dietary factors, behavior, and therapies targeting Aβ aggregation.63-65 To quantify the extent to which AD models can reflect human AD pathologies, integrated omics platforms for studying the molecular signatures of neurodegenerative diseases in preclinical models and post mortem human brains have proven useful,66 and led to increased understanding of disease-specific cellular responses to disease. AD models have provided valuable insights into disease mechanisms, yet their translation rate to late-stage clinical trials has been extremely low, likely due to the complexity of human pathogenesis.67-70 Drugs targeting N-methyl-D-aspartate (NMDA) and cholinergic receptors provide only symptomatic treatments for patients,71 and phase II/III clinical failures of anti-Aβ antibodies72, 73 have led to a reevaluation of the Aβ cascade hypothesis.74 However, promising recent results in stage III clinical trials of lecanemab75 and donanemab76 have put monoclonal Aβ antibodies back at center stage. At the same time concerns remain regarding the efficacy and costs of anti-amyloid immunotherapy as well as adverse side effects, most commonly in the form of amyloid-related imaging abnormalities (ARIA).77-79
3.2 Challenges to cross-species translation
Translating a drug or treatment from the bench to the bedside is a considerable challenge. Lack of reproducibility within the same model, or across other models, is a major impediment to translatability. As discussed above, the adoption of rigorous standards should be a priority to make drug discovery more efficient and avoid wasted time and resources.80 Similarly, negative results and replication failures are often not reported even though they would help raise red flags early on and would prevent other researchers from treading down futile paths.81 Perhaps the greatest stumbling block in developing an effective drug for dementia remains our imperfect knowledge of dementia biology. This is compounded by models that do not faithfully reproduce all aspects of a pathology, prompting over-interpretation and over-extrapolation of experimental results. The appropriate choice of an experimental model for the question at hand should involve not only consideration of whether aspects of biology under investigation are being captured (eg, is the neuronal circuitry conserved? To what extent is the gene of interest conserved, expressed and part of the same network?), but also involves practical and ethical considerations. Model choice will also be affected by the type of research question: A basic science biological experiment, for example, discovery of the mechanism of action of a gene or protein, may require a different setup than a pharmacological analysis—such as the quantification of a drug's bioavailability in the brain. A frequently neglected aspect is the role of sex-related differences in dementia biology and incidence. Given that women are more likely to be affected by AD, while prevalence of PD is substantially higher in men, sex-related biological differences are clearly relevant and need to be investigated rigorously.82, 83 However, many rodent experiments are conducted in animals of one sex only (often males) for practical reasons, leading to conceivably biased findings and often overlooking the sex-specific efficacy of drugs.84 It is therefore important to assess sex-balanced cohorts, both at preclinical and clinical drug development stages. The correct use and development of better preclinical models should prevent several pitfalls of clinical phases in drug development, such as proper evaluation of pharmacokinetics and pharmacodynamics.70
Further exacerbating these challenges is the lack of a controlled ontology when describing or annotating results of a model experiment, relating these terms across species. This can include, for example, the mouse phenotypes that correspond to specific symptoms of AD or PD in humans. The Human Phenotype Ontology (HPO)85, 86 and its counterparts in non-human species aim to catalogue the full breadth of clinical, behavioral, morphological, functional, physiological, and cellular phenotypes observed in each species. All terms in these ontologies are logically organized in a controlled hierarchical structure, which helps reduce ambiguity when comparing results within and across species. The Unified Phenotype Ontology (uPheno)87, 88 further extended this by providing mappings between homologous terms across species, including human, mouse, frog, zebrafish, fly, worm, and fission yeast. Adopting these standardized ontologies will help minimize misinterpretations, enable downstream meta-analyses, and provide extensive labelled training datasets for more complex ML models. uPheno and other ontologies can be accessed and searched through public resources such as Bioportal.89, 90
3.3 Strategies and positive examples of translation
In thinking about how insights from models of neurodegeneration can be translated more effectively and considering improvements over the years, it stands out that we still lack a model which recapitulates any given neurodegenerative disease in its entirety. To effectively translate between models of disease, experimental design must account for disease-relevant factors such as developmental age, biomarker choice, and appropriateness of model species. There is a critical need for identifying dementia biomarkers present in both preclinical models and patients. An informative and quantifiable biomarker would enable earlier diagnosis, assessment of disease progression, and evaluation of treatment efficacy. Advances have been made in peripheral biomarkers. For example, serum tau protein levels correlate with cognitive impairment in AD and with progression of pathology in a transgenic mouse model of AD,91 although they lack the accuracy and consistency of imaging-based primary biomarkers. Positron emission tomography (PET) imaging studies using 18F-fluorodeoxyglucose (FDG) have illustrated consistent patterns in AD patients and various mouse models of AD,92, 93 although there are concerns that rodent models are too small for imaging studies to be translatable. Another class of systems which may prove vital in advancing preclinical models for dementia research are humanized animal models, wherein human genes are introduced into a mouse genome, or human cells are grafted into mouse tissue. A humanized mouse model of ALS expressing human fused in sarcoma (FUS) protein has already proved to better recapitulate the disease in mice, namely, exhibiting midlife-onset progression of motor-neuron degeneration not seen in previous models. The use of the FUS Delta14 mouse has already yielded novel insights into ALS development, demonstrating that neurodegeneration occurs even in the absence of FUS protein accumulation.94
3.4 How evolution impedes (and promotes) translation
Augmenting many of the above challenges is the fact that animal models are highly evolutionarily divergent from humans. Humans last shared a common ancestor with macaques ∼30 million years ago (MYA), with rodents ∼90 MYA, with zebrafish ∼430 MYA, and with fruit flies ∼800 MYA.95 The degree of homology (conserved features) varies across anatomical regions, cell types, pathways, and genes.96, 97 For example, while it is well-established that the nervous system is homologous across vertebrates and invertebrates, even rodents and macaques do not possess all brain regions, cell types, and connections present in humans.98 Furthermore, brain structures central to dementia pathology, such as the hippocampus,99, 100 striatum,101 and prefrontal cortex,102, 103 have undergone extensive anatomical and molecular reorganization, strongly indicating equally drastic changes in function.98 At the level of genes, mice share only a subset of identifiable 1:1 orthologs with humans (around 75%),103, 104 a problem that is further exacerbated in more distant species (Figure 1). This means that biomarkers and cell type markers discovered in one species may not be readily applicable to another. It also implies that the introduction of transgenically humanized animal models may only partly alleviate this issue, as the broader genetic background in which the humanized genes are acting is still drastically different. The degree of evolutionary conservation varies across molecular pathways105 and cell types (eg, neurons,106 astrocytes,107 microglia,108, 109 oligodendrocytes110). Therefore, systematic and quantitative investigation is first needed to assess a feature's degree of conservation, and thus validity, when using a given species to model human biology,111 including in the development of novel dementia therapeutics.

At the same time, interspecies variation can be a powerful source of information to fuel ML models, because each species reflects billions of years of natural experiments. State-of-the-art protein folding models such as AlphaFold2112 and ESMFold113 use multi-species protein sequences to help accurately predict 3D protein structure. Another application of ML models is to predict the effect of genetic variants by learning the mapping between DNA sequence and functional genomic annotations. Specifically, PrimateAI114 and its successor PrimateAI-3D115 predict whether genetic variants observed in humans are likely to be deleterious or benign based on whether the variant is common in non-human primate populations. The underlying premise being that if a variant is tolerated in species closely related to humans, it is more likely to be benign in humans as well. In this way data from non-human species can be a valuable source of information for human clinical research. Other genetic variant to function prediction models including Enformer,116 Basenji,117 and a semi-supervised deep learning approach proposed by Mourad118 implementing a convolutional neural network within a graph neural network (CNN-GNN) all observed significantly boosted performance when training on data from multiple species, as opposed to a single species. The developers of Nvwa,119 a deep learning model designed to learn DNA sequence motifs controlling cell type-specific gene expression, also observed a boost in performance when the model was trained on multiple closely related species at once. Others like the melanoma enhancer prediction model DeepMEL120 successfully trained on cancer cell lines in one species (human) to predict in another species (dog), which can be considered a form of transfer learning.
4 RESOURCES
4.1 Existing major initiatives
In order to facilitate systematic and quantitative analyses of cross-model and -species translatability, we describe a variety of resources that can be applied to experimental modeling of dementia (Table 1). Available databases include atlases of gene expression and genetic variation in humans and some animal models, as well as phylogenetic resources. However, the rapid increase in knowledge relating to somatic genomic and gene regulatory variation requires an additional level of detail and curation to identify disease-specific determinants. Limited proteomic information is available for proteins identified as being involved in the pathogenesis of dementia, although this is supported in part by detailed compilations of types of protein post-translational modifications, many of which are dependent on manual curation. Since protein conformation is directly related to protein function, there is a critically important role for structural biology databases to predict secondary and tertiary structures of disease-associated proteins and their interactors. Recent breakthroughs by models such as AlphaFold2112 and ESMFold (Evolutionary Scale modeling)113 have enabled the prediction of near-perfect 3D protein structure prediction at scale across many species. Precomputed predictions for millions of proteins have already been deposited in public databases, such as the AlphaFold Protein Structure Database112, 121 and the ESM Metagenomic Atlas,113, 122 rapidly accelerating a wide variety of biomedical research fields.123 Combining verified structures of biologically relevant proteins with information available in pharmacological and pathway databases is essential for the identification of new potentially druggable targets. Harmonizing these databases will promote more sophisticated design of small molecules and other therapies targeting specific proteins or pathways and thereby expedite drug development. Validating biomarkers for dementia, whether identified by imaging or biochemical means, brings with it the challenge of determining which are the most accurate and informative predictors of disease progression and/or response to therapeutic intervention. It will thus become increasingly important to establish detailed and accurate databases that can be linked to each other as well as to electronic health records of individuals to advance the prospect of personalized medicine for dementia.
Resource | Description | Link | Reference |
---|---|---|---|
AD Knowledge Portal | Data, analyses, and tools from the National Institute on Aging's Alzheimer's Disease Translational Research Program | https://adknowledgeportal.synapse.org | 124 |
AlphaFold | Protein structure database | https://alphafold.ebi.ac.uk/ | 112 |
AlzGene | Human AD variants | http://www.alzgene.org | 125 |
Alzforum | AD related biomarkers, genes, mutations, risks, rodent models, therapeutics, tools | https://www.alzforum.org/databases | |
Clinico-Genomic Database (CGDB) | Detailed, curated, electronic health records linked with comprehensive genomic profiling data of over 300 cancer related genes for over 60,000 oncology patients | https://www.roche.com/about/priorities/personalised_healthcare/combining-data-to-advance-personalised-healthcare.html | |
Dementias Platform UK | Online environment to work with some of the richest cohort data optimized for dementia research. | https://www.dementiasplatform.uk/ | 126 |
ESMFold | Protein structure prediction using a large language model | https://esmatlas.com/ | 113 |
FlyBase | D. melanogaster resources | http://flybase.org | 127 |
Gene Ontology | Gene function compendium | http://geneontology.org | 128 |
Genotype-Tissue Expression (GTEx) | Human genotype-tissue expression | https://gtexportal.org/home/ | 129 |
GnomAD | Human genetic variation | https://gnomad.broadinstitute.org | 130 |
Human Integrated Protein-Protein Interaction Reference (HIPPIE) | Human protein-protein interaction networks | http://cbdm-01.zdv.uni-mainz.de/~mschaefer/hippie/ | 131 |
Model AD Explorer | Gene expression and pathology data from AD mouse models developed by the MODEL-AD consortium | https://sagebio.shinyapps.io/MODEL_AD_Explorer/ | |
Mouse Genomes Project | Catalog of all forms of genetic variation between the common laboratory mouse strains and construction and annotation of reference genomes for the key strains | https://www.sanger.ac.uk/data/mouse-genomes-project/ | 132 |
MRC-Wellcome Trust Human Developmental Biology Resource | Tissue bank of human embryonic and fetal tissues since 1999 | https://www.hdbr.org/ | 133 |
Neuro C-BIG Repository (Canada) | Collection of biospecimens, longitudinal clinical and neuropsychiatric information, imaging and genetic data from patients with neurological disease as well as healthy controls | https://cbigr-open.loris.ca/ | 134 |
orthogene | R package for easy and comprehensive mapping of orthologous genes across hundreds of species | https://github.com/neurogenomics/orthogene | 104 |
Open Targets | Drug target identification and prioritization | https://www.opentargets.org | 135 |
PhosphoNET | Repository of known and predicted information on human phosphorylation sites, their evolutionary conservation, the identities of protein kinases that may target these sites and related phosphosites | http://www.phosphonet.ca/ | 136 |
PhosphoSitePlus | Information and tools for the study of protein post-translational modifications including phosphorylation, acetylation, and more | https://www.phosphosite.org/ | 137 |
Protein Lysine Modifications Database (PLMD) | Lysine modifications | http://plmd.biocuckoo.org/ | 138 |
quantification of Post-Translational Modifications (qPTM) database | Database of 6 types of PTMs in 4 different organisms including human, mouse, rat and yeast | http://qptm.omicsbio.info/ | 139, 140 |
Reactome | Pathway database | https://reactome.org | 141 |
RSCB Protein Data Bank | 3D structure data for large biological molecules (proteins, DNA, and RNA) | https://www.rcsb.org | 142 |
scArches | Python package to computationally integrate scRNA-seq datasets into existing ML models | https://github.com/theislab/scarches | 38 |
UK Biobank | Biomedical database containing in-depth genetic and health information from half a million UK participants | https://www.ukbiobank.ac.uk/ | 143 |
UK Brain Banks Network | Various tissue resources, supplies tissue samples to academic and industry researchers | https://brainbanknetwork.ac.uk/ | |
UKCRC Tissue Directory and Coordination Centre (TDCC) | UK's only register of sample collections that covers multiple diseases | https://biobankinguk.org/ | |
UK Stem Cell Bank | Facilitates the use and sharing of quality-controlled stem cell lines | https://www.nibsc.org/ukstemcellbank | 144 |
UniPep | Library of putative glycopeptides | http://www.unipep.org/ | 145 |
WormBase | Genetics, genomics and biology of C. elegans and related nematodes | https://wormbase.org/ | 146 |
4.2 Resource gaps and how to fill them
Validating new models and biomarkers is an essential part of the diagnostic and therapeutic discovery process. However, benchmarks for validation are not always readily identified. When human data are available, were they collected at the same time points, both in terms of age and disease progression? Are they reproducible across labs, and across populations? When using animal and in vitro models, how do you match developmental and aging stages with human progression?147 Are benchmarks reproducible across models, genetic backgrounds, and different laboratories? How can data from in vitro experiments best be linked with model organisms and to human disease? To answer some of these questions, closer integration of the available data on disease pathogenesis needs to be pursued, with metadata on timelines, genetic background and other relevant variables pertaining to models and experiments that are not always systematically and accurately reported. Better patient stratification and quality control of genetic and functional metadata recording would increase the selection accuracy of optimum benchmarks. It would also enhance information determining the choice of a specific model for a given hypothesis to be tested. For example, whether the model accurately recapitulates the neural circuit or the signaling pathway in question will considerably affect the choice of model. Integration of the various existing datasets, coupled with a user-friendly database query mechanism, would be ideal to facilitate the design of high-quality experimental studies relevant to human disease.
5 USING ML TO SOLVE REPRODUCIBILITY AND TRANSLATION CHALLENGES
5.1 Structural equation modeling
Despite widespread adoption and necessity of experimental models in preclinical research, they present significant limitations both in terms of reproducibility of results within models, and translation from models to human patients.148, 149 Various computational approaches, including ML, have been implemented to address each of these issues. Here we highlight several approaches that have been used to enhance reproducibility and translation, including those that have yet to be adopted in dementia research. A more comprehensive overview of ML/AI methodology can be found in the methods optimization paper from the same series.32 Mathematical and statistical modeling of experimental models based on prior domain knowledge (eg, structural equation modeling [SEM]) is an approach that can be used to support hypothesis generation and testing, provide insight into biological mechanisms, and predict the effects of interventions.150 In contrast to conventional deterministic approaches in SEM, in which fixed, predefined parameters determine predicted outcomes, probabilistic simulation-based modeling allows researchers to operationalize and test the effects of uncertainty at multiple levels within the model system. This includes inputs (eg, drug uptake or sequestering), model structure (eg, variation in body size or genetic background), and experimental measurements (eg, behavioral outcomes, molecular assay readouts).151-153 Simulation has been used within preclinical mouse models of metabolic disease to predict disease onset and progression and to more accurately estimate the impact of pharmacological interventions,154, 155 though these approaches have not yet been extended to experimental models of dementia.
5.2 (Semi-)supervised ML approaches
Relative to both deterministic SEM and probabilistic simulations, ML approaches deviate even further from reliance on predefined model assumptions and manual parameter assignment.156 Instead, ML generates an in silico model of the experimental system learned primarily, if not entirely, from data. Regarding the issue of reproducibility, supervised and semi-supervised ML has been applied to learn more robust generalizations when mapping experimental inputs to outputs. Read-across structure activity relationships (RASAR) was trained on hundreds of thousands of animal toxicology experiments to learn the relationship between binarized chemical fingerprints and safety outcome metrics, achieving an average prediction accuracy greater than that of any single animal experiment.157 Other ML approaches have been used to address the issues of translatability by predicting experiment outcomes in humans from matched in vivo or in vitro model data. Efforts such as the systems biology verification (sbv) project's IMPROVER Species Translation Challenge aimed to advance methods for cross-species translation but were met with limited success in part due to insufficient training data.158 Later attempts trained various ML architectures on matched pairs of mouse models and human disease samples to predict disease-associated gene signatures in the latter.159-161 However, this paired interspecies case-control approach is limited by the time it takes to manually curate such datasets. It also presupposes the a priori validity of the animal model when learning model-to-human mappings.
High-quality data with large sample sizes are not always available in human cohorts. Another variant of supervised ML uses transfer learning, which is often a general ML architecture for pretraining a model on a larger dataset that is less specific to your task (eg, histological images from a large cohort of animal models) to learn basic features common to all data of that modality (eg, anatomical borders, cell contours, subcellular features), and then fine-tuning the model with a smaller but more task-specific dataset (eg, disease-associated pathologies in post mortem histological samples). Substantial progress has recently been made in applying AI-based strategies towards rapid and accurate quantification of hallmark AD and PD pathologies at both the micro- (eg, microscopy) and macro- (eg, magnetic resonance imaging) scales.162-164 Transfer learning is also more regularly being applied to omics data. For example, Stumpf et al. (2020) employed this strategy to first train a cell type classifier using single-cell transcriptomic profiles from mouse bone marrow, and then accurately predict human bone marrow cell types.165
5.3 Unsupervised ML approaches
Unlike supervised learning, unsupervised ML aims to learn an in silico model of the data using only the data itself (without the need for labels). Within the domain of single-cell omics (eg, genomics, transcriptomics, epigenomics, proteomics, and multi-omics) there has been an explosion of such methods.166, 167 Specific unsupervised ML frameworks like autoencoders and generative adversarial networks (GANs) have been used extensively for dimensionality reduction, data denoising (eg, dropout correction), artifact removal (eg, batch, species), feature selection (eg, differential gene expression), data labelling (eg, cell type, disease state), data integration (eg, across datasets and/or omics modalities), clustering, data visualization, and other downstream analyses.166-169
Unsupervised ML methods can also be applied to the problem of reproducibility. Despite the large number of cells, the number of donors in a given single-cell dataset is usually quite low (often just a few individuals per study). This small effective sample size of “large” datasets can introduce inflated false positive rates (eg, differentially expressed genes between cases and controls) when trying to reproduce the results in other datasets derived from different individuals using traditional statistics.170-172 Here too, ML can be of great utility. Resources like the scArches database38 store models previously trained on one or more datasets (eg, an unsupervised dimensionality reduction model trained on a large single-cell dataset of cell lines from controls). Other users can then download these pretrained models and apply them to their own smaller datasets (eg, embedding single-cell data from a cohort from AD and control group cell lines into the same low-dimensional space as the large dataset). In this example of unsupervised transfer learning, models can learn patterns across many datasets, increasing the effective sample size and the likelihood that the results will generalize to new unseen datasets. It also obviates the need for direct access to all of the pretraining datasets, which can be non-trivial to acquire and reprocess. Similar transfer learning approaches have been used to successfully predict the effect of novel drug-induced perturbations in cancer cell lines from previously learned latent embeddings.173, 174 Recently, several models trained on a large corpus of scRNA-seq data (eg, single-cell Generative Pre-Trained Transformer [scGPT],175 single-cell bidirectional encoder representations from transformers [scBERT],176 Geneformer177), have been put forward as generalist base models to be fine-tuned by users with smaller, more targeted datasets. The goal of such resources is to make transfer learning both more robust and easily accessible to the research community.38
Finally, unsupervised learning methods can be flexibly combined with supervised ML, simulations, SEM and/or traditional statistical approaches to form innovative solutions to problems inadequately addressed by any one method. GNNs are particularly adept at utilizing supervised and/or supervised ML architectures and can efficiently handle hierarchical or semi-structured data.178 Specifically, they have been used to predict pathway-specific disease mechanisms,179 protein function across multiple species,180 disease-associated disruptions in brain connectivity,181 AD status,182, 183 rare disease gene targets,184 and drug response,185, 186 as well as to integrate multi-modal data.187, 188 In this way, ML can be used to aid in the design, reproduction, interpretation, and translation of studies in experimental models even prior to investment of extra time and resources in new experiments or clinical trials.
5.4 Interpretability and trust in ML approaches
Despite their many advantages, a major hurdle for the widespread adoption of cutting-edge ML approaches is the lack of trust in black-box predictions,189 particularly in healthcare environments where there are concerns of patient safety and privacy.190, 191 This lack of trust is not entirely unfounded, as ML algorithms exploit patterns in data, even if they are not relevant to the problem of interest.192 To minimize this risk, there is increasing focus on making models more easily interpretable,193 less biased,194 and less susceptible to adversarial attacks.195 Interpretability is a particularly difficult challenge as improvement in this domain is often (though certainly not always) accompanied by decreased predictive performance.196 Nevertheless, advances continue to be made by way of text-based explanations, visualizations, explanations by example or simplification, and feature relevance.194, 197 These techniques have increasingly been applied to biomedical sciences and healthcare,178, 198, 199 such as for drug discovery200, 201 and prediction of drug-drug interactions.202 Advances have particularly been reported in the domain of medical imaging analysis,203 employing techniques such as visual attention,204 saliency maps,205 and SHapley Additive exPlanations (SHAP)206 for dementia diagnosis based on neuroimaging data. Overall, the adoption of explainable and interpretable AI for dementia-related applications, however, remains scarce to date, leaving ample opportunities for progress.
5.5 Future applications
Recent advances in AI, driven by composite deep learning models with near human-like intelligence, have the potential to change the landscape of neurodegenerative research in the future. The success of these approaches often relies on large-scale, high-dimensional, uniform datasets, which are required for training complex algorithms. For experimental researchers, generating such datasets is both costly and time consuming. To keep up with the pace at which AI is advancing, rather than wait for large uniform datasets to be created, researchers should focus on developing novel, composite methods for large heterogeneous datasets, integrated from different sources, such as those recently developed in the single-cell genomics field.175-177, 37, 38 Focusing on developing methods that do not rely on high-dimensional uniform data will ensure experimental research into neurodegenerative disease advances alongside AI.
Large-scale population cohorts are likely to facilitate the development of massive uniform datasets that lend themselves to application of AI approaches. UK Biobank has collected genetic information and deep phenotyping data on half a million individuals in the UK.207 In the context of neurodegeneration, there is an important argument to facilitate brain donation from UK Biobank participants in the future, so that phenotypic data can be linked to neuropathological measures and genetic variation. Complementing this, WGS data collected to screen for genetic disorders as a part of the Newborn Genomes Programme may be instrumental in accelerating the diagnostic process of infants born with rare genetic conditions.208 Automated AI analysis pipelines could streamline the process of detecting rare genetic variants or phenotypic associations. For example, pathogenic variants could be filtered and ranked using deep phenotype integration based on natural language processing of the medical literature. In the context of dementia, the detection of variants known to increase risk of AD could be used as a proxy for testing family members. In the long run, ML could be used for evaluating genotype–phenotype correlations,209 biomarker identification,210 to predict individual disease risk and gene function.211 This improved knowledge of disease biology could then be experimentally validated in model systems to develop better diagnostics and therapeutics.
Generative large language models (LLMs) are generalist AI models trained on a massive corpus of text to achieve convincing natural language capabilities with an extensive breadth of knowledge. Open-access implementations of LLMs, including OpenAI's chatGPT, Google's Bard, Bing, or Meta's LLaMA, have recently gained much public interest. Iterations of these models are rapidly evolving, even as they continue to be applied to a wide variety of real-world problems, including biology212 and medicine.213, 214 Several examples that have been specifically trained to synthesize, mine, or infer biomedical knowledge are Flan-PaLM/Med-PaLM (instruction fine-tuned/ medical Pathways Language Models),215 BiomedGPT (Biomedical Generative Pre-trained Transformer),216 PubMedGPT/BioMedLM,217 GeneGPT,218 BioGPT,219 PubMedBERT,220 BioLinkBERT,221 Galactica,222 and BioMegatron.223 These approaches have even been adapted for non-language based biological data (eg, scRNA-seq).175, 176 While static versions of LLMs trained on a snapshot of data from a particular time point are prone to hallucinations (ie, providing real-sounding but objectively false answers), this can be partly ameliorated through the addition of internet-search capabilities. Open-source projects like AutoGPT224 seek to extend this even further by forcing the model to query itself in order to identify what information it is currently lacking to answer the user's question, enabling a semi-automated loop of knowledge gathering and knowledge synthesis. While there are plenty of remaining challenges to address, LLMs are uniquely positioned to offer human-understandable justifications for their reasoning by querying them with natural language, just as one would with another human. For example, one may ask an LLM to predict whether a particular drug will have a side effect of motor impairment in mice, whether this side effect will also occur in humans, and to provide well-cited justifications for its reasoning. In combination with proper validation, human oversight, and ethical implementation, LLMs are likely to open entirely new avenues of biomedical research and healthcare at scale.
5.6 Key recommendations
To improve the quality and scope of the application of AI to experimental models of neurodegenerative diseases and overcome major existing challenges (Figure 2), we make four key recommendations:

Enhancing reproducibility across model systems and experiments: To enhance applications of AI and ML approaches in model systems, reproducibility should become a priority, driven by large enough, well-controlled experiments, that allow the statistical study and resolution of biases and artifacts. Conversely, ML approaches including simulations can improve model reproducibility in experimental research, as can pretrained unsupervised clustering methods in the context of single-cell genomics.
Improving upon small and disjointed datasets: AI and ML methods often require large and high-dimensional training datasets to yield robust and appropriately fitted models. We recommend increasing experimental sample sizes and enhancing integration of existing data resources with biological and clinical data to facilitate this. Numerous data resources are already openly available spanning genomics, proteomics, phylogeny, and clinical databases. These should be expanded and leveraged for ML analyses in experimental dementia research. Ultimately, we should aim to generate massive, uniform datasets, while continuing to develop methods to deal with heterogeneity in the meantime.
Accounting for species divergence through evolution: Inherent differences in biology between species, some driven by millions of years of evolution, complicate translation of biological insights from animal models to human disease. We recommend using information on evolutionary distances in combination with transfer learning or autoencoder approaches to improve cross-species translation.
Enhancing interpretability and transparency of AI/ML approaches: As with applications of AI and ML more generally, there is a risk for opacity and distrust in the methods, especially where clinical data are concerned. A focus on addressing these issues by adapting existing approaches and continued research advances in this domain are needed to increase trust and model interpretability.
6 CONCLUSIONS
Animal models are an important tool for assessing mechanisms of neurodegenerative disease in complex in vivo settings and prioritizing therapeutic approaches. However, promising drugs in animal models have repeatedly shown high failure rates in human clinical trials. Here we reviewed challenges to translation from model to human, including issues surrounding reproducibility, with the aim of making recommendations to enhance reproducible research and translatability via the adoption of AI approaches. Successful applications of AI and ML in the domain of experimental dementia research are limited; however, other biomedical research fields have witnessed promising advances. Such methodological developments and applications can be adapted to research questions in neurodegeneration, building on existing and novel high-dimensional datasets, including single-cell and spatial omics, proteomics, metabolomics, and biomarker profiles. With the projected growth of quantitative data on preclinical models for dementia research, we are optimistic that increased translational efficiency and improved model reproducibility can be enhanced by appropriate and careful application of AI and ML approaches in the field.
AUTHOR CONTRIBUTIONS
Sarah J. Marzi contributed to the conception of the work, coordinated the writing team, and contributed to drafting and revising the manuscript for intellectual content. Brian M. Schilder, Alexi Nott, Carlo Sala Frigerio, and Sandrine Willaime-Morawek contributed to coordinating the writing team, and to drafting and revising the manuscript for intellectual content. Wendy Noble, Jalil-Ahmad Sharif, Zhi Yao, Maria Tsalenchuk, Ümran Yaman, Diane P. Hanger, Patrick A. Lewis, Magda Bucholc, Charlotte James, Francisco Rodriguez-Algarra, and Laura M. Winchester contributed to drafting and revising the manuscript for intellectual content. Janice M. Ranson and David J. Llewellyn contributed to the conception of the work, conceived and organized the symposium from which this paper and others in the series originated, revised the manuscript for intellectual content, and harmonized the manuscript with other papers in the series. Ilianna Lourida revised the manuscript for intellectual content and harmonized the manuscript with other papers in the series. All authors read and approved the final manuscript.
ACKNOWLEDGEMENTS
This manuscript was facilitated by the Alzheimer's Association International Society to Advance Alzheimer's Research and Treatment (ISTAART), through the AI for Precision Dementia Medicine Professional Interest Area (PIA). The views and opinions expressed by authors in this publication represent those of the authors and do not necessarily reflect those of the PIA membership, ISTAART, or the Alzheimer's Association. With thanks to the Deep Dementia Phenotyping (DEMON) Network State of the Science symposium participants (in alphabetical order): Peter Bagshaw, Robin Borchert, Magda Bucholc, James Duce, Charlotte James, David Llewellyn, Donald Lyall, Sarah Marzi, Danielle Newby, Neil Oxtoby, Janice Ranson, Tim Rittman, Nathan Skene, Eugene Tang, Michele Veldsman, Laura Winchester, Zhi Yao. This paper was the product of a DEMON Network State of the Science symposium entitled “Harnessing Data Science and AI in Dementia Research” funded by Alzheimer's Research UK. JMR and DJL are supported by Alzheimer's Research UK and the Alan Turing Institute/Engineering and Physical Sciences Research Council (EP/N510129/1). DJL also receives funding from the Medical Research Council (MR/X005674/1), National Institute for Health Research (NIHR) Applied Research Collaboration South West Peninsula, National Health and Medical Research Council (NHMRC), and National Institute on Aging/National Institutes of Health (RF1AG055654). SJM and AN are funded by the Edmond and Lily Safra Early Career Fellowship Program and the UK Dementia Research Institute, which receives its funding from UK DRI Ltd, funded by the UK Medical Research Council, Alzheimer's Society, and Alzheimer's Research UK. MB is supported by Alzheimer's Research UK, Economic and Social Research Council (ES/W010240/1), EU (SEUPB) INTERREG (ERDF/SEUPB), and HSC R&D (COM/5750/23). PAL acknowledges generous support from the Michael J. Fox Foundation and Parkinson's UK. CJ and LMW are supported by Alzheimer's Research UK.
CONFLICTS OF INTEREST STATEMENT
The authors declare no conflicts of interest. Author disclosures are available in the supporting information.