Novel measures of cognition and function for the AD spectrum in the Novel Measures for Alzheimer's Disease Prevention Trials (NoMAD) project: Psychometric properties, convergent validation, and contrasts with established measures
Abstract
INTRODUCTION
This study derived composite scores for two novel cognitive measures, the No Practice Effect (NPE) battery and the Miami Computerized Functional Skills Assessment and Training system for use in early-stage Alzheimer's disease (AD) clinical trials. Their psychometric properties and associations with AD risk markers were compared to those of well-established measures.
METHODS
For 291 older adults with healthy cognition or early mild cognitive impairment, Exploratory factor analyses were used to identify the factor structure of the NPE. Factor and total scores were examined for their psychometric properties and associations with AD risk biomarkers.
RESULTS
Composite scores from the novel cognitive and functional measures demonstrated better psychometric properties (distribution and test-retest reliability) and stronger associations with AD-related demographic, genetic, and brain risk markers than well-established measures,
DISCUSSION
These novel measures have potential for use as primary cognitive and functional outcomes in early-stage AD clinical trials.
Highlights
- Well-established cognitive tests may not accurately detect subtle cognitive changes.
- No Practice Effect (NPE) and Computerized Functional Skills Assessment and Training are novel measures designed to have improved psychometric properties.
- NPE had Executive Function, Cognitive Control/Speed, and Episodic Memory domains.
- Novel measures had better psychometric properties compared to established measures.
- Significant associations with Alzheimer's disease biomarkers were found with novel measures.
1 BACKGROUND
The Novel Measures for Alzheimer's Disease Prevention Trials (NoMAD) project was initiated in response to the need for improved assessment strategies in early stage Alzheimer's disease (AD) clinical trials. While typical AD clinical trials have targeted improvements in cognition over time or reduced progression of AD using serial assessments, there are validity challenges in the longitudinal measurement of cognitive changes. Many common outcome measures (e.g., Alzheimer's Disease Assessment Scale–Cognitive Subscale [ADAS-Cog] or the Mini-Mental State Examination [MMSE]) were initially devised for more impaired AD populations; thus, ceiling effects in more intact populations are likely.1-4 Although more recently developed batteries, such as the CogState Brief Battery, the NIH Toolbox Cognitive Battery, and Online Repeated Cognitive Assessment (ORCA), were aimed to enhance diagnostic accuracy, their psychometric properties or associations with AD biomarkers are unclear.5 Similar challenges arise for AD-focused measures such as the Functional Activities Questionnaire (FAQ) as a measure of everyday functioning, which is shown to have large ceiling effects. Furthermore, many assessment tools have practice effects in placebo treated participants, making it difficult to identify improvements in an active treatment group,6-8 with changes in performance with exposure (i.e., practice effects or learning effects) commonly seen in both cognitively impaired and non-impaired individuals.9 These changes could obscure detection of a treatment signal.5, 6 These limitations could be reduced with a cognitive assessment method that has robust psychometric properties, including limited floor, ceiling, or practice effects. Additionally, convergent validation through demonstrated associations with AD biomarkers would help determine whether cognitive and functional changes align with progression of neurodegenerative processes.
The NoMAD project tests whether newly developed performance-based cognitive and functional measures (“novel measures”) exhibit advantages over established measures in terms of psychometric properties, practice effects, and associations with AD associated genetic and imaging markers (i.e., apolipoprotein (APOE) genotype, brain atrophy) in individuals with healthy cognition or with early mild cognitive impairment (eMCI). The novel cognitive measure included in NoMAD is the No Practice Effect (NPE) battery, which is a comprehensive neuropsychological test battery, and the Computerized Functional Skills Assessment and Training (CFSAT) system, a performance-based measure of functional capacity. Some measures are predominantly computerized, and all have alternate forms. In the current study, we analyzed baseline data of NoMAD to examine (1) psychometric properties (e.g., normality of distributions, ceiling, and floor, effects) of novel cognitive and functional measures, (2) the factor structures of the novel cognitive (NPE) battery, and (3) the association between factor and composite scores of novel measures with various AD-related demographic, clinical, and genetic factors. Additionally, we assessed psychometric properties of well-established, standard neuropsychological battery that are currently being used in many clinical trials and cohort studies that examine cognitive changes in pre-clinical AD (ADAS-Cog and the modified version of Preclinical Alzheimer's Cognitive Composite [mPACC]) and examined these measures’ association with AD risk markers in order to make side-by-side comparisons with those of our novel measures.
2 METHODS
2.1 Overview
Eligible participants who were cognitively unimpaired or who had eMCI were randomized in a 1:1 allocation to one of two assessment strategy arms, one group receiving novel cognitive and functional measures and the other receiving established measures. This assessment design allows for comparative examination of differences in clinical and neurobiological outcomes across strategies. The parallel study design eliminates potential interference between established and novel measures, especially those involving verbal memory. The parallel design is also advantageous in that it reduces subject burden and potentially leads to reduced attrition.10-13 Assessments were performed at in-person visits at baseline (Week 0), Week 12, and Week 52. Magnetic resonance imaging (MRI) was conducted at baseline and Week 52. While the current report primarily focuses on the dataset collected at baseline, future analyses will include assessment of changes on the batteries at Week 12 and Week 52. Notably, we examined a portion of data from 12-week follow-up to assess the test-retest reliability, which is a crucial psychometric property. This project is registered on ClinicalTrials.gov as Development of NoMAD (Identifier: NCT03900273).
2.2 Participant eligibility
Potential participants were screened with two performance-based assessments, the MMSE14 and Wechsler Memory Scale-III Logical Memory Story A (Logical Memory).15 Inclusion criteria were: English-speaking participants, ages 60–85 years, with normal cognition or eMCI as determined by MMSE and Logical Memory (e.g., MMSE ≥ 24 and Logical Memory Delayed Recall ≥9 for education years of 16 or greater, ≥5 for education years 8–15, and ≥3 for education years 0–7), and ability to provide a family member or friend to serve as an informant. This criteria for eMCI categorization were based on a widely-used Petersen criteria as operationalized by the Alzheimer's Disease Neuroimaging Initiative (ADNI) (http://www.admin-info.org).16 We excluded individuals with diagnoses of stroke, cardiovascular disease, neurologic disease, major psychiatric conditions (e.g., major depression, bipolar disorder, schizophrenia, alcohol, or substance abuse disorder), untreated diabetes, and self-reported active treatment for cancer. Full description of inclusion/exclusion criteria were previously published.17
Participants were recruited from New York State Psychiatric Institute/Columbia University Irving Medical Center (NYSPI), Litwin-Zucker Alzheimer's Research Center/Feinstein Institute for Medical Research, University of Miami – Miller School of Medicine, and University of Southern California – Keck School of Medicine. Of note, diverse geographic locations of the sites allowed for diverse racial distribution within the study sample (∼20% Hispanics or African–Americans; see the Results and Discussion sections for more details). A clinical interview was performed to screen for neurological and psychiatric disorders that may affect cognition (e.g., severe mental illness, neurodegenerative diseases, cerebrovascular diseases, etc.). The presence of depression was evaluated via the Geriatric Depression Scale (GDS-Short Form [15 items]), and individuals who reported mild or more severe depression (GDS > 5) were excluded from participation.18
RESEARCH IN CONTEXT
-
Systematic review: Articles relevant to psychometric limitations of well-established cognitive tests were identified by an electronic database search using PubMed and MEDLINE. We identified additional articles from reference lists of the studies we reviewed. We have primarily focused on studies that used well-established cognitive measures in Alzheimer's disease (AD) clinical trials and additionally searched for the associations between AD risk factors and performance on common cognitive measures.
-
Interpretation: Findings from this study indicated that the novel cognitive and functional measures demonstrated enhanced psychometric properties and stronger associations with AD risk markers, compared with well-established cognitive measures. We also found that there is an interpretable cognitive structure underlying the novel cognitive battery. Our results suggest that these novel measures may be able to detect cognitive changes more effectively and reliably in AD clinical trials than well-established measures.
-
Future directions: Enhanced psychometric properties and significant correlations with AD risk factors of the NPE and CFSAT make them well-positioned for use in preclinical or prodromal AD trials. Future studies will include longitudinal analyses to examine whether novel measures exhibit reduced practice effects and ability to detect genetic- and biomarker-related changes in cognitive performance in preclinical AD spectrum.
2.3 Measures
Novel Cognitive/Functional Measures: The novel cognitive measure included in NoMAD is the NPE battery, which implemented principles from the cognitive science literature to reduce practice effects (e.g., item encoding in memory to reduce strategy changes over time; interference from restricted stimulus sets in the N-Back, Brown-Peterson, and Number-Letter spans to reduce memory for individual items), including three alternative forms and components that attenuate learning of study protocols. For instance, memory tasks include initial forced encoding of items to reduce strategy differences between individuals by answering questions about the items (e.g., “Is this living?”). The Brown Peterson paradigm, a measure of working memory and executive function, involves showing a triad of letters to recall followed by the interference tasks, in order to eliminates potential rehearsal of learned stimuli. In addition, the NPE subtests are predominantly computerized. NPE subtests include the N-Back,19 Simple Letter Number Span, Executive Letter Number Span,20 Brown-Peterson,21, 22 Symbol Coding,23 Verbal Fluency,24 and Word Recognition Memory Test (Word RMT).25 The battery was designed to encompass a wide range of cognitive domains, including executive function, processing speed, and episodic memory. Table S1 provides descriptions of subtests that are presented with associated cognitive domains (See Bell et al., 2021) for the rationale underlying the choices.
The CFSAT system is a performance-based measure designed to measure functional capacity in domains of technology-related instrumental activities of daily living (IADLs). The CFSAT uses computerized simulations of real-world scenarios and contains four computerized simulations of technology demanding IADLs. These tasks include ATM banking, online banking, using a transit ticket kiosk, and a medication management module (e.g., Figure S1). The CFSAT has three alternative forms of each task. The outcome measures for the CFSAT consist of Efficiency (total correct responses/time of completion), Accuracy (total correct responses/administered trials), Time to Completion, and Errors. This performance-based outcome (PRFO) style eliminates the risks of bias from informant reports. The CFSAT has been validated in several different populations, and information on diagnostic sensitivity, convergence with cognitive measures, and sensitivity to treatment has been published.26-34 Orders of administration of NPE and CFSAT versions were counterbalanced across participants.
Well-Established Cognitive/Functional Measures: The well-established measures include the modified Preclinical Alzheimer Cognitive Composite (mPACC),35 which contains the following subtests: Logical Memory Delayed Recall,15 Total Recall from the Selective Reminding Test,36 Digit Symbol Coding,37 and the MMSE.14 We used the version modified by the substitution of Selective Reminding for Free and Cued Selective Reminding Test. The well-established group also contained the ADAS-Cog,38 which measures of cognition and has been used in clinical trials of AD in prodromal/preclinical and more advanced stages. The ADAS-Cog-11 (11-item version) comprises subtests of Word Recall, Naming, Commands, Constructional Praxis, Ideational Praxis, Orientation, Word Recognition, Language, Comprehension of Spoken Language, and Word Finding Difficulty. Functional abilities were measured using the FAQ,39 a measure of everyday function that was administered by telephone to an informant.
Brain MRI: High-resolution T1-weighted MRI was obtained at each site to measure regional volume and cortical thickness. Using each participant's T1-weighted image, we derived structural imaging measures of both global and regional brain volume and regional measures of cortical thickness using FreeSurfer v6.0 (http://surfer.nmr.mgh.harvard.edu/). The volumetric measures used in this analysis were bilateral hippocampal volume and the total cortical volume, corrected for intracranial volume.40, 41
APOE genotyping: Apolipoprotein E (APOE) genetic analysis was completed with DNA extracted from a blood sample collected from participants and examined at the laboratory of the Human Genetics Resources Core at Columbia University Medical Center.
2.4 Statistical analyses
To identify the factor structure of the NPE battery, we performed exploratory factor analysis (EFA) on the correlation matrix of test measures. For estimation, a maximum likelihood principal components extraction was used, followed by varimax rotation. The number of latent factors was determined using parallel analysis as implemented in the psych R package.42 Tucker Lewis Index (TLI) of factoring reliability and root mean square error of approximation (RMSEA) index and its 95% confidence interval were reported. Z-Scores were calculated based on the means and standard deviations from the current study sample, in order to track changes from baseline in our longitudinal analyses (e.g., changes in z-scores between baseline and the final visit). Thus, z-scores for the 3- and 12-month assessments were calculated based on the mean and SD for each measure at baseline. This is considered a standard practice in clinical trials and can facilitate the interpretations of changes in cognitive performance. We followed this methodology given that our study models clinical trials.
The summary scores for NPE factors were calculated by averaging unweighted z-scores of the measures that loaded >0.30 on each factor. The total NPE Composite score was calculated by averaging z-scores from all measures. CFSAT measures included the number of subtasks completed without errors and efficiency (correct responses/time to completion). For the ADAS-Cog and FAQ in the well-established measures, raw scores were used in all analyses due to prominent ceiling effects (see the Results section). For the mPACC, a composite was created by averaging the z-scores as derived from raw scores for each of the individual measures (Logical Memory II, MMSE, Digit Symbol, and Selective Reminding total recall scores).
Distribution of the data was examined using the Kolmogorov-Smirnov Test of Normality, Kurtosis, and Skewness. The test-retest reliabilities between baseline and 3-month follow-up measures were examined by computing intraclass correlation coefficients (ICCs) and corresponding 95% CIs. Reliabilities in the 0.50–0.70 range were considerate adequate and those greater than 0.70 were considered good.
Analyses to test the association between cognitive and functional measures and various AD-related factors (age, education, functional measures, APOE e4, and brain volumes) were conducted using correlations (Pearson's r, after standardizing the cognitive measures with z-scores) and multiple regressions (adjusted for age, sex, and education). To make direct comparison between effect sizes of novel measures and those of well-established measures, we used an interaction model in which the test type x biomarker term was examined in a regression analysis for categorial biomarkers, (e.g., cognitive and APOE status). To compare the correlation coefficients, Fisher's z-transformations (for continuous measures) and z-test were performed. All statistical analyses were performed using R version 4.2.1. All p-values of 0.05 were considered to be statistically significant, and the false discovery rate (FDR) corrections were used to control for multiple comparisons.
3 RESULTS
3.1 Characteristics of the study sample
We examined baseline data of 291 participants with complete data, which included 128 participants from Columbia/NYSPI, 60 from University of Miami, 40 from USC, and 63 from Litwin-Zucker Alzheimer's Research Center.
Demographic characteristics of the study sample are presented in Table 1. The age ranges for the Well-Established Measure and the Novel Measures groups were comparable (60–84 and 59–85, respectively). There were more women than men in each group (68.5% in well-established and 65.5% in novel measures group). The majority of study participants were non-Latinx and White individuals. Among the current sample, 26% of the well-established group and 16.6% of novel measures group were identified as meeting criteria for eMCI.
Parameter | New measures group | Established group | Group difference |
---|---|---|---|
Measure | Overall (N = 145) | Overall (N = 146) | p |
Age | 0.7 | ||
Mean (SD) | 70.1 (6.49) | 69.7 (6.43) | |
Median [Min, Max] | 70.0 [59.0, 85.0] | 70.0 [60.0, 84.0] | |
Missing | 0 (0%) | 1 (0.7%) | |
Race | 0.7 | ||
Asian | 4 (2.8%) | 6 (4.1%) | |
Black or African–American | 17 (11.7%) | 12 (8.2%) | |
Hispanic | 19 (13.1%) | 15 (10.1%) | |
Other | 1 (0.7%) | 2 (1.4%) | |
White | 104 (71.7%) | 109 (74.7%) | |
Missing | 0 (0%) | 1 (0.7%) | |
Education | 0.8 | ||
Mean (SD) | 16.6 (2.69) | 16.6 (2.35) | |
Median [Min, Max] | 16.0 [6.00, 24.0] | 16.0 [12.0, 27.0] | |
Missing | 1 (0.7%) | 1 (0.7%) | |
Gender | 0.5 | ||
Male | 50 (34.5%) | 45 (30.8%) | |
Female | 95 (65.5%) | 100 (68.5%) | |
Missing | 0 (0%) | 1 (0.7%) | |
WMS-III Logical Memory II | 0.9 | ||
Mean (SD) | 13.8 (3.29) | 13.8 (3.47) | |
Median [min, max] | 14.0 [6.00, 21.0] | 14.0 [5.00, 22.0] | |
Missing | 0 (0%) | 1 (0.7%) | |
MCI group | 0.048 | ||
eMCI | 24 (16.6%) | 38 (26.0%) | |
Normal | 120 (82.8%) | 107 (73.3%) | |
Missing | 1 (0.7%) | 1 (0.7%) | |
APOE status | 0.11 | ||
Non-e4 carrier | 60 (70%) | 69 (80%) | |
E4 carrier | 26 (30%) | 17 (20%) | |
Unknown | 59 | 60 |
- Abbreviations: APOE, apolipoprotein; eMCI, early mild cognitive impairment; SD, standard deviation; WMS-III, Wechsler Memory Scale- 3rd Edition.
3.2 Psychometric properties of the NPE and well-established cognitive measures
Means and SDs of novel and well-established cognitive measures are presented in Table S2.
Visual presentation (violin plots) of data distribution in novel measures indicates that their z-scores are well-distributed with no ceiling or floor effects (Figure S2a, 2b). NPE and CFSAT Total Composite scores were both moderately symmetric (D = 0.19, p < 0.0001, Skewness = −0.45, Kurtosis = 3.79 for NPE Total; and D = 0.11, P = 0.06, Skewness = −0.20, Kurtosis = 2.51 for CFSAT Total Composite score).
In contrast, the violin plots of the well-established measures generally indicated narrow ranges of scores (e.g., z-scores for the well-established composite score ranges from −1.6 to 1.5) and skewed distributions, particularly in the MMSE (a subtest of the mPACC) and FAQ (Figure S2c). Examination of the violin plots from well-established raw scores also demonstrated ceiling effects on ADAS-Cog and FAQ, with approximately 21% of individuals scoring between scores 0–1 in ADAS-Cog and most scores reported as 0 for FAQ.
The test-retest reliability of NPE Total Composite and CFSAT was examined as part of psychometric properties. Total Composite scores between Baseline and 3-month follow-up were good (ICC = 0.74 and 0.87, respectively). Well-established measures had test-retest reliabilities of 0.61 for ADAS-Cog and 0.74 for mPACC Composite.
Comparisons between NPE alternative forms (n = 46 for Form A, n = 51 for Form B, n = 45 for Form C) indicated that there were no significant differences in scores by forms in the NPE Total Composite score (F = 0.32, p = 0.70). When comparing two versions of CFSAT, Version 2 yielded a better performance on the CFSAT Efficiency score (F = 5.56, p = 0.02) but no difference in CFSAT Accuracy Score (F = 0.08, p = 0.80), compared with Version 1.
Comparison of cognitive scores between four study sites indicated that there was no significant difference in novel measures (both NPE and CFSAT) by sites (p ≥ 0.14) (data not shown). However, comparison of well-established measures indicated that there were significant site differences in ADAS-Cog Immediate Recall and some subtests from the mPACC (Logical Memory Immediate Recall, MMSE, Digit Symbol total correct, and mPACC Total Composite score) (Ps ≤ 0.03), indicating potential impact from site-wide administration differences.
3.3 Factor structures of the NPE battery
All measures from the NPE battery were included in the factor analyses. The parallel analysis suggested two factors (TLI = 0.874, RMSEA = 0.051, 95% CI 0 to 0.086) based in the comparison of simulated and resampled data; however, the three-factor model had a clearer interpretation and better goodness of fit indices (TLI = 0.976, RMSEA = 0.021, 95% CI 0 to 0.73). Thus, we report the EFA with three factors. The EFAs yielded three cognitive domains within the NPE battery: (1) Executive Functions (including Executive Number-Letter, Simple Number-Letter, and Brown-Peterson), (2) Cognitive Control and Speed (including Digit Symbol, Verbal Fluency Letter and Categorical Fluency, 1-Back Correct, and RMT Encoding Total Correct and Recognition Trial-Correctly Rejected), and (3) Episodic Memory Consolidation (RMT Free Recall and Recognition Hits). The factor analysis model is presented in Figure 1.
3.4 Associations between cognitive measures and demographic factors
Higher age was associated with poorer performance on most of the NPE Composite scores, including the Total Composite score, Executive Functions Composite score, and Cognitive Control and Speed Composite score, along with CFSAT Efficiency (Table 2). Higher education was associated with better performance on NPE Total Composite score and Cognitive Control and Speed, CFSAT Efficiency, and CFSAT Total Composite score but not with other NPE variables (Ps ≥ 0.07). Sex was not associated with performance on any of the NPE or CFSAT tasks (Ps ≥ 0.25).
Novel cognitive measures | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
NPE variables | ||||||||||||
Executive functions | Cognitive control and speed | Episodic memory consolidation | NPE total composite | |||||||||
Parameter | R | p-value | 95% CI | R | p-value | 95% CI | R | p-value | 95% CI | R | p-value | 95% CI |
Age | −0.28 | 0.0007* | −0.42 to −0.12 | −0.32 | 0.0001* | −0.45 to −0.16 | −0.11 | 0.21 | −0.27 to 0.06 | −0.35 | 0.00001* | −0.49 to −0.20 |
Sex (male) | 0.01 | 0.97 | −0.33 to 0.34 | 0.128 | 0.42 | −0.19 to 0.44 | 0.08 | 0.67 | −0.27 to 0.43 | 0.16 | 0.33 | −0.16 to 0.47 |
Education | 0.15 | 0.08 | −0.02 to 0.31 | 0.37 | <0.0001* | 0.22 to 0.51 | 0.12 | 0.14 | −0.04 to 0.28 | 0.33 | <0.0001* | 0.18 to 0.47 |
CFAS variables | |||||||||
---|---|---|---|---|---|---|---|---|---|
CFSAT efficiency | CFSAT accuracy | CFSAT total composite | |||||||
Parameter | R | p-value | 95% CI | R | p-value | 95% CI | R | p-value | 95% CI |
Age | −0.42 | <0.0001* | −0.55 to −0.27 | −0.26 | 0.002* | −0.41 to −0.09 | −0.40 | <0.0001* | −0.53 to −0.25 |
Sex (male) | 0.02 | 0.90 | −0.30 to 0.34 | −0.06 | 0.72 | −0.41 to 0.28 | −0.02 | 0.90 | −0.34 to 0.30 |
Education | 0.29 | 0.0007* | 0.12 to 0.43 | 0.16 | 0.07 | −0.01 to 0.31 | 0.26 | 0.002* | 0.10 to 0.41 |
Well-established cognitive measures | |||||||||
---|---|---|---|---|---|---|---|---|---|
ADAS-Cog total | FAQ | mPACC composite | |||||||
Parameter | R | p-value | 95% CI | R | p-value | 95% CI | R | p-value | 95% CI |
Age | 0.17 | 0.04 | 0.009 to 0.33 | 0.10 | 0.25 | −0.07 to 0.26 | −0.18 | 0.02 | −0.29 to −0.06 |
Sex (male) | −0.07 | 0.72 | −0.42 to 0.29 | 0.04 | 0.81 | −0.32 to 0.41 | 0.21 | 0.01 | 0.05 to 0.38 |
Education | −0.10 | 0.23 | −0.26 to 0.06 | −0.06 | 0.48 | −0.22 to 0.11 | 0.21 | 0.001* | 0.09 to 0.32 |
- Abbreviations: ADAS-Cog, Alzheimer's Disease Assessment Scale–Cognitive Subscale; CFSAT, Computerized Functional Skills Assessment and Training system; FAQ, Functional Activities Questionnaire; mPACC, modified Preclinical Alzheimer'S Cognitive Composite; NPE, No Practice Effect test battery.
- * Statistical significance after false discovery rate (FDR) correction.
In well-established measures, age and sex were not associated with any cognitive or functional scores (Table 2). Education was associated with greater mPACC Composite score but not with any other well-established measures. A direct comparison between the effects of well-established measures and novel measures indicated that NPE had a significantly larger correlations with education (Z = −2.05, p = 0.04) compared with ADAS-Cog, while its correlation with age was not significantly greater than those of the ADAS-Cog and mPACC (Ps ≥ 0.10) (and no difference in education compared with mPACC [Z = 0.28]). Comparison of functional measures indicated that CFSAT had a significantly greater correlation with age (Z = 2.68, p = 0.007) but not education (Z = −1.73, p = 0.08), compared with FAQ.
3.5 Associations between cognitive measures and AD risk factors
APOE e4 genotype was associated with NPE Total Composite score (p = 0.02) and Executive Functions Composite score (p = 0.03), along with CFSAT Efficiency (p = 0.01) (Figure 2A) among the novel measures, with participants having an e4 allele showing lower scores. Individual factor scores were not associated with APOE e4 genotype. However, the APOE e4 genotype was not associated with any well-established measures. The novel measures also exhibited greater effect sizes, compared with well-established measures; particularly, the NPE and CFSAT Total Composite scores demonstrated modest to strong effect on APOE e4 status (Cohen's d = 0.60 and 0.72, respectively). A direct comparison of effect sizes using interaction models indicated that the effect size of NPE's association with APOE e4 status was greater compared to that of mPACC's association with APOE e4 (T = 2.15, p = 0.03). There were no significant differences between NPE and ADAS-Cog or between CFSAT and FAQ (Ps 0.25 and 0.13, respectively).
Nearly all NPE and CFSAT Composite scores were associated with global cognitive status (Figure 2B). The eMCI group had lower scores on NPE Total Composite score, Cognitive Control and Speed (B = −0.50, p = 0.02), and Executive Functions, in addition to CFSAT Efficiency and Accuracy scores. Among well-established measures, the presence of eMCI was associated with poorer ADAS-Cog, FAQ, and the mPACC Composite scores. However, the categorization of cognitive status groups is based on performance on Logical Memory delayed, a measure within the mPACC. For the same reason, the effect size of the mPACC on eMCI status was particularly strong (Cohen's d = 1.52). Nonetheless, a direct comparison of effect sizes using the regression interaction model between novel and well-established measures indicated that there were no significant differences between NPE and mPACC or ADAS-Cog in terms of associations with cognitive status (Ps 0.52 and 0.53, respectively). Functional measures (CFSAT vs. FAQ) also had comparable effect sizes (p = 0.53).
3.6 Associations between cognitive measures and brain volumetric indices
Cortical volume was associated with performance on CFSAT Accuracy, but not with any NPE measures (Ps ≥ 0.10) (Table 3). Hippocampal volume was not associated with novel measures, after FDR corrections were performed.
Novel cognitive measures | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
NPE variables | ||||||||||||
Executive functions | Cognitive control and speed | Episodic memory consolidation | NPE total composite | |||||||||
Parameter | R | p-value | 95% CI | R | p-value | 95% CI | R | p-value | 95% CI | R | p-value | 95% CI |
Cortex Volume | 0.03 | 0.75 | −0.18 to 0.25 | 0.18 | 0.10 | −0.03–0.38 | −0.04 | 0.69 | −0.25 to 0.17 | 0.12 | 0.29 | −0.10 to 0.32 |
Hippocampus Volume | −0.10 | 0.42 | −0.29 to 0.12 | 0.13 | 0.24 | −0.08–0.33 | 0.03 | 0.77 | −0.18 to 0.24 | 0.06 | 0.62 | −0.16 to 0.26 |
CFAS variables | |||||||||
---|---|---|---|---|---|---|---|---|---|
CFSAT efficiency | CFSAT accuracy | CFSAT total composite | |||||||
Parameter | R | p-value | 95% CI | R | p-value | 95% CI | R | p-value | 95% CI |
Cortex volume | 0.21 | 0.06 | −0.006 to 0.40 | 0.25 | 0.02* | 0.42 to 0.44 | 0.27 | 0.01* | 0.06 to 0.46 |
Hippocampus volume | 0.11 | 0.34 | −0.11 to 0.31 | 0.25 | 0.02 | 0.04 to 0.44 | 0.20 | 0.07 | −0.009 to 0.40 |
Well-established cognitive measures | |||||||||
---|---|---|---|---|---|---|---|---|---|
Parameter | ADAS-Cog total | FAQ | mPACC composite | ||||||
R | p-value | 95% CI | R | p-value | 95% CI | R | p-value | 95% CI | |
Cortex volume | 0.09 | 0.43 | −0.13 to 0.29 | −0.25 | 0.03 | −0.44 to −0.04 | 0.12 | 0.12 | −0.03 to 0.26 |
Hippocampus volume | 0.07 | 0.52 | −0.14 to 0.28 | 0.04 | 0.75 | −0.18 to 0.25 | 0.18 | 0.02 | 0.03 to 0.32 |
- Abbreviations: ADAS-Cog, Alzheimer's Disease Assessment Scale–Cognitive Subscale; CFSAT, Computerized Functional Skills Assessment and Training system; FAQ, Functional Activities Questionnaire; mPACC, modified Preclinical Alzheimer'S Cognitive Composite; NPE, No Practice Effect test battery.
- * Statistical significance after false discovery rate (FDR) correction.
Performance on well-established measures was not associated with any brain volumetric indices (Table 3).
With respect to functional measures, effect sizes were not significantly different between the CFSAT and FAQ for hippocampal and cortical volume (Ps ≥ 0.12). The association between brain morphometric measures and neuropsychological measures were comparable between NPE and ADAS-Cog or mPACC (Ps ≥ 0.19).
3.7 Associations between cognitive measures and functional measures
CFSAT variables and NPE variables were strongly correlated overall (Ps ≤ 0.0004) (Table S3). The correlation between NPE Total Composite and CFSAT Total Composite scores was significant and shared 36% of the variance. There were also significant associations between well-established cognitive and functional measures (Ps ≤ 0.01).
4 DISCUSSION
We examined various psychometric properties of our novel cognitive measures, NPE and CFSAT, with the aim of determining their potential utility in AD clinical trials. Our results indicated that the novel measures demonstrated stronger evidence for suitable psychometric properties and metrics reflecting associations with AD risk and biomarkers, compared with well-established measures (ADAS-Cog, mPACC, and FAQ), with effect sizes also in favor of the novel measures.
4.1 Psychometric properties of novel cognitive and functional measures
The novel measures exhibited strong psychometric properties, including adequate distribution/normality, absence of ceiling/floor effects, and strong test-retest reliability. The ICCs for both the NPE and CFSAT can be considered particularly strong, especially given that comparisons were made between different forms. The correlation between novel cognitive (NPE) and functional (CFSAT) measures was also strong, and correlation was particularly robust between CFSAT and higher-order frontal network functioning domains, such as executive functions and cognitive control tasks. These properties were generally superior to those of well-established measures, namely, the ADAS-Cog, mPACC, and FAQ, which showed the presence of ceiling effects and moderately skewed distributions. The multi-site nature of this study also allowed for site-wise differences, and we additionally found that compared with well-established measures, our novel measures were less vulnerable to test administration differences between sites.
4.2 Comparison between novel and established measures on AD-related risk markers
Results from EFA revealed three factors within the NPE battery that pertained to Executive Functions, Cognitive Control and Speed, and Episodic Memory Consolidation. When factor-based composite scores from these cognitive domains were correlated with various demographic factors, the results indicated that the novel measures exhibited stronger associations with AD-related risk factors, such as age and education attainment, when compared with well-established measures. Interestingly, age and education (as it relates to cognitive reserve) are also AD risk factors. For example, the NPE Composite score and Executive Function and Cognitive Control and Speed domains were associated with older age and lower educational attainment, which are key predictors of AD development43, 44
Novel measures were also more sensitive to well-known genetic risk marker APOE e4 genotype than established measures. They were also sensitive to the presence of an early-stage cognitive impairment (i.e., eMCI). Cognitive control and executive function impairments are common in early symptomatic stages of AD (e.g., mild cognitive impairment [MCI]),45 and our working memory and executive functions components of the novel measures battery – especially with enhanced psychometric properties – may help detect subtle cognitive changes in AD clinical trials.
When our novel measures were examined along with brain volumetric indices, we found that performance on the CFSAT was strongly associated with volumes in AD-related brain regions, such as cortical and hippocampal volume. Nonetheless, the NPE measures were not associated with brain volumes. Speculatively, this may suggest that brain morphometric differences were more strongly associated with detectable changes in instrumental functioning (i.e., daily functioning level as indexed by proxy tasks) but less with levels of neurocognitive scores. Previous studies using the UCSD Performance-Based Skills Assessment also demonstrated that these performance-based measures are robustly impaired in individuals with MCI, indicating that performance-based functional capacity assessments may be particularly sensitive to progression in the AD spectrum.46, 47 Irrespective of associations with MRI morphometry, subtle cognitive decline in the eMCI group could occur before the onset of neurodegeneration biomarkers.48, 49 It may be possible that score differences or changes in our neuropsychological measures (NPE) could reflect pre-clinical, subtle decline in cognition.
4.3 Implications for the AD clinical trials
Findings from this study have important implications for use in clinical trials. Cognitive changes in the preclinical stage are typically subtle and only manifest over repeated assessments,50 so it is crucial that clinical trials use cognitive measures that can detect these small changes over time. Ceiling effects hinder the capacity of many commonly used tests to identify these. For instance, in the ADNI data set, 80% of cognitive unimpaired individuals score 10/10 on MMSE orientation items, and another 17% score 9/10.6, 51 The same pattern holds true for functional measures, with most cognitively normal individuals scoring a zero (i.e., no impairment) on the widely used FAQ.52 Ceiling effects were also observed in the ADAS-Cog, MMSE, and FAQ from our well-established group, which are consistent with previous findings.1-3, 52 On the other hand, both the NPE and CFSAT showed minimal ceiling effects and enhanced distribution of scores. These results indicate that these novel measures may be able to detect subtle cognitive changes more accurately over the course of a longitudinal study. The NPE is also specifically designed to reduce practice effects, which obscure cognitive decline,6 and will be addressed by a subsequent publication.
In addition to the NPE and CFSAT's improved psychometric properties compared with ADAS-Cog, mPACC, and FAQ, they also have some practical and logistical benefits for clinical trials. Serial assessments could be time-consuming and add subject burdens (e.g., the ORCA demonstrates reduced ceiling effects but involves 25-min testing sessions every day for 6 consecutive days).53 In contrast, the NPE and CFSAT demonstrate more suitable level of challenge and brevity, with test administration time under 60 minutes and follow-up assessments several (3—6) months apart, and minimal ceiling effects. Further, because much of the NPE is computerized, its administration is more standardized and less prone to administration error than paper-and-pencil tests. This is particularly useful for large, multi-site clinical trials where oversight is difficult. The absence of site-related differences in the NPE scores also suggests that they are less vulnerable to biases from test environment and staffs, again demonstrating their potential use in multi-site studies. The practical advantages of the CFSAT stem from its ability to directly assess functional ability, instead of relying upon informant reports, eliminating potential subjective bias and allowing for socially isolated participants without informants to be included in clinical trials. The latest generation of the CFSAT is now fully remotely deliverable, and in a study of 92 older individuals with MCI Diagnosed with Jak-Bondi criteria, the baseline and follow-up assessments were validly completed at home in 90/92 cases.34 Studies show that isolation is a risk factor for AD,54 so inclusion of this population in AD research is vital. Overall, these administrative advantages, including computerization of test measures, posit substantial benefits over well-established cognitive measures in the post-coronavirus disease 2019 (COVID-19) era, given that the demand for remote and technology-assisted cognitive assessments will be high in the medical setting.
4.4 Strengths and limitations
A major strength of NoMAD is that it follows an innovative “intent-to-treat” design that is reflective of clinical trials armatures. This type of design has advantage over other designs that test intra-individual differences by test measures by reducing interference of test items and testing outcomes as a function of group (well-established vs. novel cognitive measures). Its multi-site design also strengthens reproducibility of findings and reduces site-related variance. It is also important to emphasize that approximately 20% of our sample was comprised of African–Americans and Hispanics, which is more diverse than most clinical trials and is not significantly different from the general population demographics. When the novel measures were compared across different sites (e.g., Louisiana, New York, Florida), there were no site differences. Nonetheless, we are aware of the potential disadvantage in certain racial/ethnic groups with varying education levels; thus, we had adjusted for education in our key analyses. Additionally, we aim to enhance the use of our novel measures in diverse populations by translating the novel measures to Spanish.
A limitation of NoMAD study is that our sample does not include individuals with late-MCI, given its focus on pre-clinical populations. It is also limited by a lack of analyses with several AD biomarkers, such as amyloid and tau, although our future analyses will contain investigation of these in plasma. The single-retest nature of the current analyses is not suited to test for practice effects across multiple reassessments, and we will examine longitudinal changes in the test measures in future analyses. If these future studies confirm our novel measures’ reduced practice effects and enhanced correlations with the status of AD risk factors, they will have the potential to advance clinical assessments within the setting of AD clinical trials, such as pharmacological and non-pharmacological interventions in early-stage AD.
5 CONCLUSION
Overall, enhanced psychometric properties and significant correlations with AD risk factors of the NPE and CFSAT make them well-positioned for use in preclinical or prodromal AD trials. Our future analyses with longitudinal data from 3-month and 12-month follow-up assessments will provide further information about the psychometric properties of our novel measures, in addition to its ability to detect genetic and biomarker-related changes in cognitive performance along the preclinical AD spectrum.
ACKNOWLEDGEMENTS
The authors thank all participants and research staff of the NoMAD Study for their contribution to data collection. This study is supported by the grants funded by the National Institute on Aging (NIA) (1 R01 AG051346-01A1 and 1K23AG080117-01.
CONFLICT OF INTEREST STATEMENT
A.L., B.H., A.B., D.C., S.C., S.A.B., A.M.R., S.S. have none to report. H.K. receives consulting fees from YBrain. M.L.G. reports grants from Alector, Janssen, Novo Nordisk, AbbVie, and Eisai. M.L.G. reports providing expert testimony for Wilson, Elser, Moskowitz, Edelman & Dicker and Morgan and Morgan and participating on data safety advisory boards for Labcorp and Corium. D.P.D. reports participation on a monitoring board for Acadia, BioXcel, Corium and TauRx. L.S.S. reports consulting fees from AC Immune, Alpha-cognition, Athira, Corium, Merck, Neurim Ltd., Roche/Genentech, Lighthouse, and ImmunoBrain, Lid. A.M.B. reports receiving consulting fees from Cogstate and Cognito Therapeutics, issued and pending patents, serving in the monitoring or advisory board for Albert Einstein College of Medicine and University of Illinois, Urbana-Champaign, serving as a section editor for Alzheimer's and Dementia. L.S.S. receives support for attending meetings from Della Martin Foundation and participates on Advisory Boards for Merck, Genentech and UCB. P.D.H. reports grants from NIMH, and the US Department of Veterans Affairs. P.D.H. reports royalties from WCG Endpoint Solutions. P.D.H. reports consulting fees and support for attending meetings from Alkermes, Boehringer-Ingelheim, Karuna Therapeutics, Minerva Neurosciences, Roche Pharma, Sunovion/DSP Pharma/Angelini. P.D.H. reports editorial fees from Elsevier BV. P.D.H. reports stock in i-Functions and software access from Brain HQ. Author disclosures are available in the supporting information.
CONSENT STATEMENT
All human subjects provided informed consent for this study.