Development, initial validation, and application of a visual read method for [18F]MK-6240 tau PET
Joanna L. Shuping and Dawn C. Matthews contributed equally to this work.
Abstract
Background
The positron emission tomography (PET) radiotracer [18F]MK-6240 exhibits high specificity for neurofibrillary tangles (NFTs) of tau protein in Alzheimer's disease (AD), high sensitivity to medial temporal and neocortical NFTs, and low within-brain background. Objectives were to develop and validate a reproducible, clinically relevant visual read method supporting [18F]MK-6240 use to identify and stage AD subjects versus non-AD and controls.
Methods
Five expert readers used their own methods to assess 30 scans of mixed diagnosis (47% cognitively normal, 23% mild cognitive impairment, 20% AD, 10% traumatic brain injury) and provided input regarding regional and global positivity, features influencing assessment, confidence, practicality, and clinical relevance. Inter-reader agreement and concordance with quantitative values were evaluated to confirm that regions could be read reliably. Guided by input regarding clinical applicability and practicality, read classifications were defined. The readers read the scans using the new classifications, establishing by majority agreement a gold standard read for those scans. Two naïve readers were trained and read the 30-scan set, providing initial validation. Inter-rater agreement was further tested by two trained independent readers in 131 scans. One of these readers used the same method to read a full, diverse database of 1842 scans; relationships between read classification, clinical diagnosis, and amyloid status as available were assessed.
Results
Four visual read classifications were determined: no uptake, medial temporal lobe (MTL) only, MTL and neocortical uptake, and uptake outside MTL. Inter-rater kappas were 1.0 for the naïve readers gold standard scans read and 0.98 for the independent readers 131-scan read. All scans in the full database could be classified; classification frequencies were concordant with NFT histopathology literature.
Discussion
This four-class [18F]MK-6240 visual read method captures the presence of medial temporal signal, neocortical expansion associated with disease progression, and atypical distributions that may reflect different phenotypes. The method demonstrates excellent trainability, reproducibility, and clinical relevance supporting clinical use.
Highlights
-
A visual read method has been developed for [18F]MK-6240 tau positron emission tomography.
-
The method is readily trainable and reproducible, with inter-rater kappas of 0.98.
-
The read method has been applied to a diverse set of 1842 [18F]MK-6240 scans.
-
All scans from a spectrum of disease states and acquisitions could be classified.
-
Read classifications are consistent with histopathological neurofibrillary tangle staging literature.
1 BACKGROUND
Tau is central to Alzheimer's disease (AD), aggregating into neurofibrillary tangles (NFTs) that in concert with fibrillar amyloid are hallmark pathologies of the disease.1 NFT spread correlates with cognitive decline and neurodegeneration as disease progresses,2-4 making it a priority target for therapeutic development and diagnosis. [18F]MK-6240 (Cerveau Technologies) is a positron emission tomography (PET) tracer with high affinity and selectivity for 3R/4R NFTs characteristic of AD.5-7 In clinical studies, it has exhibited a favorable safety profile, wide dynamic range with low within-brain background, minimal off-target binding in striatum and choroid plexus, sensitivity to medial temporal NFTs that emerge early in disease, and reproducibility.8-11 These properties suggest potential value in therapeutic trials, diagnosis, and monitoring.12 Objectives of this work were to develop, preliminarily validate, and evaluate broad applicability of a standardized, clinically meaningful visual read method supporting [18F]MK-6240 use in the clinical setting and clinical trials.
AD autopsy studies have established that NFTs typically aggregate earliest in the transentorhinal cortex, spreading to the entorhinal cortex and into the hippocampus (Braak stages I/II, “B1”), then increasing in limbic regions including the amygdala, expanding to the temporal/occipito-temporal neocortex (Braak stages III/IV, “B2”), and spreading to other neocortical tissue (Braak stages V/VI, “B3”).13, 14 Cognitive impairment is typically observed when NFT deposition reaches stage B2, suggesting this stage is clinically relevant.15 Medial temporal NFTs (B1) have been implicated in preclinical AD memory effects and may predict progression to B2.16, 17 However, medial temporal NFT are also found in non-AD aging18 and may not always predict clinical progression. In later Braak stages, NFT spatial distribution varies, reflecting clinical phenotype.19, 20 These factors were considered in establishing [18F]MK-6240 read classifications.
Investigators using [18F]MK-6240 as a biomarker in 15 trials in three countries have contributed to a database consisting of >1840 deidentified [18F]MK-6240 scans with demographic, clinical, structural magnetic resonance imaging (MRI), and amyloid PET data from participants with a wide range of amyloid burden and clinical diagnoses. This database enabled evaluation of the method's broad applicability, and insight into relationships among NFT distribution, clinical diagnosis, and amyloid status.
2 METHODS
2.1 Study design
The process to develop, conduct initial validation upon, additionally test, and apply the [18F]MK-6240 read method is summarized in Figure 1. Method development was conducted in consultation with five experts credentialed in neurology, neuropsychology, and/or nuclear medicine, experienced in AD and [18F]MK-6240 PET imaging. Readers were first asked to apply their own methods of assessing tau PET positivity or negativity to predefined regions and overall for each of 30 scans. A PET-only approach was prioritized to support clinical use where MRI may not be available. Results were analyzed to determine concordance between raters and with quantitative regional standardized uptake value ratios (SUVRs) to assess visual read reliability.
RESEARCH IN CONTEXT
-
Systematic Review: The authors reviewed the literature using PubMed, keyword searches, meeting abstracts, and presentations. Published findings regarding neurofibrillary tangle (NFT) accumulation in Alzheimer's disease (AD) progression; the binding characteristics of the tau positron emission tomography radiotracer [18F]MK-6240; and relationships among tau, amyloid, and clinical phenotype are of particular relevance. Pertinent publications are appropriately cited.
-
Interpretation: This work describes the development, initial validation, and broad application of a visual read method for [18F]MK-6240 that is readily trainable and highly reproducible across readers. The visual read classifications align with clinically relevant disease progression and with published tau distribution findings.
-
Future Directions: This work establishes a visual read approach and initial groundwork that can be further validated in a larger multi-reader study. Findings support use in clinical trials as well as in clinical settings for patient assessment as the clinical utility of NFT assessment is established.
Reader input regarding uptake features influencing overall assessment, method practicality, and clinical relevance to AD pathology and phenotype were used to establish read classifications. The expert readers then read the scans using the new four-class read method, and a gold standard read for those scans was established by majority agreement. Two naïve readers (board-certified neuroradiologists) were trained on the new method and read the 30-scan set, providing inter-rater agreement statistics and initial validation. Inter-rater agreement was further tested by two independent readers in 131 scans. One of these readers then applied the method to the full database of 1842 scans. Results were used to determine whether all scans in this diverse set could be characterized by one of the classifications, and to evaluate classification frequencies related to clinical diagnosis and amyloid status as available.
2.2 Data characteristics
All image data was provided by Cerveau, acquired from investigator-initiated Phase 1 and 2 clinical studies that used [18F]MK-6240 (Table 1). Data was collected with institutional review board/ethics committee approval obtained by each institution, written informed patient consent, and oversight of the institutions’ government authorities. Baseline and follow-up image data were collected whenever available.
Subject characteristics (data shown to extent provided to the database by contributing studies at the time of read) | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Data set | Age (yrs) | Sex | Amyloid status | Clinical diagnosis | ||||||||||
Mean, SD, range | F | M | n/a | Neg | Pos | n/a | CUa | MCI | AD <65y |
AD ≥65y |
Other demb | TBIc | Other or n/ad | |
30-scan gold standard set | 70 (11.2) 36–81 | 16 | 14 | - | 9 | 18 | 3 | 14 | 7 | 2 | 4 | - | 3 | - |
131-scan set | 70 (8.5) 44–92 | 62 | 66 | 3 | 50 | 61 | 20 | 49 | 20 | 12 | 17 | - | 6 | 27 |
All database (N = 1842) | 68.4 (8.6)e 27–97 | 975 | 814 | 53 | 1001 | 491 | 350h | 797 | 202 | 36 | 134 | 10 | 148 | 515 |
Baseline scan onlyg (N = 1563) |
68.1 (8.5)f 27–97 |
835 | 696 | 32 | 891 | 386 | 286i | 632 | 161 | 32 | 121 | 9 | 148 | 460 |
Acquisition Information | ||||||||||||||
Scanners |
Siemens: Biograph mCT, Biograph HiRes 1080, Biograph TruePoint 1093, Biograph mMR, ECAT HR+ GE Healthcare: Discovery MI, Discovery IQ, Discovery STE, Discovery 610, Discovery 710, Optima 560 Philips: Gemini TF 64. |
|||||||||||||
Injected dose |
For database scans from studies that used a 5mCi (185 MBq) dose: N = 1336, 4.82 mCi ± 0.39 (178.34 ± 14.43 MBq), range 3.49 to 7.40 mCi (129.13 to 273.8 MBq). For database scans from studies that used a 10mCi (370 MBq) dose: N = 409, 10.40 ± 0.44 mCi (384.80 ± 16.28 MBq), range 8.00 to 11.00 mCi (296.00 to 407.00 MBq). (N = 206 not available at time of read). Dose levels in the gold standard set (N = 30): 6.15 ± 2.14 mCi (227.55 ± 79.18 MBq), range 4.53 to 10.70 mCi (167.61 to 395.90 MBq). | |||||||||||||
Contributing sites | Austin Health and University of Melbourne, Australia; Biogen Incorporated; Columbia University Medical Center, New York; Massachusetts General Hospital; and University of Wisconsin. Additional data contributing to the read of 131 scans and the full database came from the University of Pittsburgh Medical Center and University Hospital Leuven, Belgium. |
- Abbreviations: AD, Alzheimer's disease; CU, cognitively unimpaired; n/a, not available; MCI, mild cognitive impairment; PET, positron emission tomography; TBI, traumatic brain injury.
- aWithin the CU group for the 30-scan, 131-scan, full database, and baseline/1-scan only data sets, 0, 11, 37, and 21 participants, respectively, were identified in one study as having subjective memory complaints (SMC), but are not grouped separately because SMC participants were not differentiated from CU in other studies.
- bOther dementia included clinical diagnoses of frontotemporal dementia, suspected vascular disease, mixed AD/vascular disease, mixed Lewy body dementia/AD, and other unspecified dementia.
- cTBI was defined as traumatic brain injury due to a single moderate to severe TBI occurring at least 10 years previously.
- dClinical diagnoses that are n/a were not available in the database at the time of read, and “Other” was the non-specific term entered for some participants.
- eN = 1834.
- fOf 350, 31 were categorized as indeterminate and 319 were not available.
- gOf 286, 28 were categorized as indeterminate and 258 were not available.
- hN = 1557.
- iOf these, three scans were a first follow-up scan but no baseline scan was available.
Data characteristics are shown in Table 1. The database included baseline scans from 1560 participants; 259 participants had at least one follow-up timepoint, and 23 had a second follow-up timepoint, which were all read. Participants included cognitively unimpaired (CU) adults, patients with mild cognitive impairment (MCI), AD dementia, other dementia diagnoses (e.g., frontotemporal dementia), and traumatic brain injury (TBI), ages 27 to 97 years. Image acquisition protocols were based on site procedures, using 15 different PET scanner models (Table 1), various reconstruction algorithms, and a range of acquisition windows from which frames were extracted for timing similarity across scans (Supplement in supporting information). Injected activity varied by study as approximately 5 mCi (185 MBq) or 10 mCi (370 MBq) because a single standardized injection protocol was not yet established. A 3D T1-weighted MRI was available at time of read for 1786 of the [18F]MK-6240 scans. Amyloid status, when available, was determined by amyloid PET imaging and designated by each site according to local protocols.
For method development and initial validation, 30 [18F]MK-6240 scans were selected from the database. To ensure heterogenous representation, scans included images acquired on different PET scanners from five different institutions in participants with a range of clinical diagnoses expected to include a spectrum of NFT burden: 14 CU (47%), 7 MCI (23%), 6 AD dementia (20%), 3 TBI (10%). As the read method was not yet developed, inclusion was not based on read classifications. The 131 [18F]MK-6240 PET scans were selected to represent diverse clinical diagnoses, scanners, protocols, uptake, and lesion examples, one scan per participant (Supplement).
2.3 Image processing
The 30 gold standard images used for method development and validation consisted of four or six successive 5-minute frames beginning at 80 or 90 minutes post-tracer injection, motion-corrected and averaged in native space. For quantitative measurement only, the images were smoothed to an approximate uniform resolution based upon the scanner model21 and partial volume correction (PVC, Müller-Gärtner method)22 was applied to most closely quantify actual binding rather than spill-in or atrophy; this was because visual reads on uncorrected scans must be able to differentiate between true uptake and off-target confounds. Regional mean SUVRs were calculated for medial temporal, lateral temporal, medial parietal, and lateral parietal regions measured using PETSurfer in FreeSurfer 6.023, 24 with inferior cerebellar gray matter reference region. SummedSUVR was defined as the summation of each scan's four regional mean SUVRs.
To ensure scans used for further evaluation of the visual read method were consistent with the gold standard scans used to develop the method, for the 131-scan and full database sets, image frames of similar post-injection start time and duration were selected from each acquisition protocol (detail in Supplement). Frames were motion-corrected and averaged, and PET images co-registered with their respective MRI scans in a common orientation and voxel size without spatial warping. Scans were provided in unsmoothed form; a 4 mm smoothed version was provided for optional comparison. This was consistent with clinical settings in which readers may slightly smooth images, particularly if acquired on scanners with fine pixelation. For report generation for the visual reads and for the supplemental quantitative measurement only, images were warped to a template.
2.4 Visual read method development
The questionnaire used to gather visual assessment information from expert readers is shown in Figure S2 in supporting information. Predefined regions consisted of Braak stage regions (stages I–VI),13 and MeTeR categories (mesial-temporal [Me], temporoparietal [Te], and rest of neocortex [R] divided into frontal subregion and a superior temporal/anterior cingulate subregion).25 Readers indicated any other regions used to determine the overall read, features contributing to overall assessment, confidence level, and the clinical relevance of various regions, for example, whether uptake indicated AD. Computed tomography (CT) or MRI images were not provided, nor clinical and demographic information. Evaluations were performed independently, and readers used their own display software, reading approaches, and preferred color scale.
Reader responses were analyzed to determine whether [18F]MK-6240 uptake could be assessed visually and whether assessments were consistent between readers and with quantitative tracer uptake. The summations of the percentage of readers who read (1) each of the Braak stage regions as positive or (2) each of the MeTeR regions as positive were compared to summedSUVR values (Supplement). Consideration was given to clinical relevance, and whether regions could be grouped into a readily trained and implemented set of classifications supporting clinical use. Based upon these inputs, classifications were defined with criteria for regions included in each class and when to consider intensity elevated. Grayscale was agreed upon for the gold standard and naïve reader reads to avoid variability due to potential color perception and concern that color thresholds could create artificial boundaries. The expert readers then re-read the scans using the new read method. Inter-rater agreement was calculated and the majority read determined, with consensus discussion in the single case of a reader tie. This 30-scan set with expert majority reads was designated as the gold standard set for visual reads.
For initial validation, the two naïve readers were trained on the visual read method. In a first session, they were shown one [18F]MK-6240 image representing each of the four read classifications. In a second session, readers read nine [18F]MK-6240 images as test cases. Training/test scans were not from the 30-scan set. Readers then visually assessed the 30 gold-standard [18F]MK-6240 images using a grayscale display and axial, coronal, and sagittal views, PET only. Inter-rater agreement and agreement with the gold standard reads were calculated.
2.5 Additional rater assessment
To prepare for the additional inter-rater testing, an internal training process was developed (Supplement). Two scientists with >15 years of experience each in brain image analysis were trained and applied the method to 28 of the gold standard scans available at time of read, using only PET scans. A third, junior scientist new to the field was also trained and read the gold standard scans to further explore trainability and consistency. Results were compared to the gold standard and validation reads, providing evidence that the readers were qualified to read additional scans.
For the inter-rater testing using a larger data set, the 131 images were randomly ordered, and 20 scans selected for re-read without reference to the initial read. The two experienced readers then each read all 131 images in gray/color scale of choice (20 scans twice). Unsmoothed and smoothed images were available for comparison. Each read was performed using only PET to simulate clinical practice in which MRI may not be available, and then with MRI underlay, simulating the approach likely to be used in clinical trials in which MRI scans are typically acquired. This enabled comparison of classification and reader confidence. Final reads were based upon the combination of PET with and without MRI underlay. Results were compared between readers, and in cases of discrepancy, discussed to reach consensus.
All 1842 scans in the database were then read by one of the readers who read the 131 scans, with (color) and without (inverted gray) MRI underlay. Classification frequencies were evaluated in relation to clinical diagnosis and amyloid status, using one scan per participant with available data (baseline if more than one present). SUVRs were measured on an exploratory subset of 215 scans and compared to read classifications (Supplement).
2.6 Statistical analysis
For the expert reader survey and initial read method validation, inter-reader agreements were assessed by percent agreement (pairwise agreements/total assessments) and free-marginal Fleiss’ kappa.26 Correlation between confidence levels and case-level inter-reader concordance, and between SUVR metrics and visual reads were assessed by Spearman's correlation (rho) (Supplement). For the 131-scan read, inter- and intra-rater kappas were calculated using JMP v16 statistical software (SAS).
3 RESULTS
3.1 Results of expert reader input used in developing read paradigm
Inter-reader agreement values across binary, Braak, and MeTeR classifications, and with SUVR measurements are shown in Table 2. Percent reader agreement ranged across regions from 73% to 86%. The Spearman's correlations (rho) between SummedSUVR and the summations of (1) Braak stage % of readers positive or (2) MeTeR regions % of readers positive were 0.861 (P < 0.0001) or 0.914 (P < 0.0001), respectively (Supplement). In this method development phase, when reader methods were unguided and region definitions were relatively complex, these metrics were sufficient to provide confidence that images could be visually read without use of quantitation.
Kappa | Scores SUVR Comparisonsa | ||||||
---|---|---|---|---|---|---|---|
Number of readers | Mean kappa | 95% low | 95% high | % agreement | ρ | P-value | |
Binary (overall) | 5 | 0.67 | 0.49 | 0.84 | 83% | NA | NA |
Braak I | 4 | 0.58 | 0.37 | 0.78 | 79% | 0.86 | 0.000000004 |
Braak II | 4 | 0.52 | 0.32 | 0.72 | 76% | ||
Braak III | 4 | 0.68 | 0.50 | 0.86 | 84% | ||
Braak IV | 4 | 0.69 | 0.50 | 0.88 | 84% | ||
Braak V | 4 | 0.69 | 0.50 | 0.88 | 84% | ||
Braak VI | 4 | 0.47 | 0.27 | 0.66 | 73% | ||
Me | 5 | 0.49 | 0.30 | 0.68 | 75% | 0.91 | 0.00000000001 |
Te | 5 | 0.56 | 0.38 | 0.74 | 78% | ||
R-Frontal | 5 | 0.72 | 0.56 | 0.88 | 86% | ||
R-Sup T | 5 | 0.56 | 0.38 | 0.74 | 78% |
- Notes: The sum of mean SUVR for all regions defined using the Braak or MeTeR regions of interest were tested for correlation with the sum of the percentage of readers scoring the regions as positive using Spearman's rank correlation test. Binary read was not considered to be applicable (NA). All reads were performed using PET-only, without an MRI underlay.
- aSUVR values were available for 28 subjects.
- Abbreviations: MeTeR, mesial-temporal, temporoparietal, and rest of neocortex; MRI, magnetic resonance imaging; PET, positron emission tomography; SUVR, standardized uptake value ratio.
Several points emerged from expert reader input regarding clinical relevance of patterns of [18F]MK-6240 uptake. Uptake seen in medial temporal plus cortical regions was consistently interpreted as NFT-positive and clinically meaningful (reflective of AD pathology). Readers agreed that cases with elevated uptake in neocortical regions but no uptake in medial temporal lobe (MTL) could be clinically relevant, but that this was distinct from the pattern of NFT pathology seen in typical cases of AD. The readers did not reach consensus on the clinical significance of [18F]MK-6240 uptake limited to medial temporal regions, and whether this pattern should be interpreted as NFT-positive if limited to transentorhinal/entorhinal cortex.
3.2 [18F]MK-6240 visual read classifications and assessment method
Visual read development resulted in definition of four classifications: “no uptake,” “MTL only,” “MTL AND,” and “outside MTL” (examples in Figure 2). “No uptake” is defined as a lack of elevated MK-6240 signal in medial temporal or neocortical regions, or signal in any “off-target” brain region such as striatum, cerebellum, or midbrain that does not exceed signal in the retina (high due to neuromelanin4). “MTL only” is defined as elevated intensity in any MTL structure (defined to include transentorhinal, entorhinal, subiculum, hippocampus, parahippocampus, and amygdala) in either hemisphere, without any neocortical uptake. “MTL AND” indicates elevated signal in MTL and at least one additional neocortical region, in either/both hemispheres. Uptake in lateral temporal cortex (including anterolateral tissue and fusiform), frontal, parietal, occipital, and/or cingulate regions all contribute to “AND.” “Outside MTL” is defined as uptake in neocortical regions or in subcortical regions other than MTL that exceed retinal intensity, without elevated medial temporal uptake. Cases include scans in which focal uptake is found only in a region such as occipital or frontal cortex, or with a medial temporal-sparing pattern having neocortical uptake otherwise typical of AD. Non-brain off-target signal, such as in meninges6, 8 does not constitute positive uptake.
The presence of uptake is determined visually, without generating an SUVR image or applying a numeric or color threshold. Visual intensity range is set through contrast adjustment so that regions of low (such as cerebellum) and high (such as retina) are distinct with maximal range between.
3.3 Gold standard and validation reads
As indicated in Table 2, percent agreement among the five gold-standard readers using the new method was 72% (Fleiss’ kappa 0.62), with the majority of readers assigning 11 cases “no uptake,” 2 “MTL only,” 16 “MTL AND,” and 1 “outside MTL.” One case required adjudication due to lack of majority read, and the resulting consensus majority read was “MTL only.” Disagreements between expert readers were limited to differing opinions regarding how transentorhinal/entorhinal signal should be rated, pertinent to several of the 30 cases. Percent agreement between naïve readers was 100% (Fleiss’ kappa 1), assigning 11 cases “no uptake,” 2 cases “MTL only,” and 17 cases “MTL AND.” Percent agreement between naïve and the gold standard expert reader majority was 90%.
3.4 Inter-rater results for the additional validation set read and 131 scan set
After training, the two experienced scientist readers and one junior reader read 28 gold standard scans available from the set of 30 with 100% agreement, agreeing with validation read results (one scan that had disagreement between expert and naïve readers was unavailable). For the 131 scans, the inter-rater kappa between the two readers was 0.988 (95% confidence interval [CI: 0.96, 1.00]) prior to consensus (two discrepant cases, related to lesion interpretation) and final reads were 57 no uptake, 17 “MTL only,” 56 “MTL AND,” 1 “outside MTL”; intra-rater kappas were 0.92 [95% CI 0.77, 1.0] and 1.0. Scans having no elevated uptake, “MTL only” signal reaching Braak Stage III/B2 structures beyond transentorhinal/entorhinal cortex (Figure 3B), and neocortical uptake were readily assessed with PET only. Scans with emerging transentorhinal/entorhinal uptake could often be discriminated using PET only (Figure 3A). Figure 3C,E and Supplement show examples of neocortical variability accommodated within the “MTL AND” classification. Inter-rater agreement was achieved despite different reader preferences and use of inverted grayscale versus color. Classifications differed between PET only and with MRI underlay for 5.3% and 7.6% of 131 scans for the two readers, respectively. Differences pertained to distinguishing off-center inter-hemispheric binding related to off-target lesions, uptake limited to transentorhinal/entorhinal cortex, and a case of occipital binding that was interpreted with PET-only as signal spill-in. Variations in dose or late frame time window did not impact cross-sectional readability.
3.5 Full database read classification
Figure 4A shows the read classification distribution for the 1842 [18F]MK-6240 database scans. All scans could be read as one of the four classifications. Figures 4B–D show classification distributions by reported clinical diagnosis (CU, MCI, and AD in 4B–C, TBI in 4D) and amyloid status, using only baseline or first available scan for each participant. Among the 341 reported amyloid positive (A+) subjects in Figure 4C, the frequency of “MTL AND” classification increased from those who were CU (32%) to those with MCI (75%) and was greatest (88%) with AD dementia (6% had “no uptake”). In the only contributing study that identified a subgroup of A+ CU as having subjective memory complaints (SMC), of the 13 participants in that subgroup 5 were classified as “MTL only” and 6 were classified as “MTL AND.” Of the 890 baseline or first available scans from participants classified as amyloid negative, 92% were “no uptake” regardless of clinical diagnosis. As evident in Figure 4D (top panel), most (91%) of the TBI participants were A– and 97% of them were classified as “no uptake” (examples including potential read confounds in Supplement).
4 DISCUSSION AND CONCLUSIONS
A standardized, readily learned, clinically relevant visual read method has been developed for use with the [18F]MK-6240 tau PET tracer. Initial validation using two naïve readers followed by two additional independent readers yielded inter-rater kappas >0.9, in line with kappas for other tracers in clinical use.27, 28 Requiring only a PET scan with optional anatomical underlay, the method is applicable for clinical use and clinical trials. Application to 1842 [18F]MK-6240 scans demonstrated the method's ability to characterize a broad spectrum of clinical and amyloid states, and concordance with published NFT distributions.
The four-class [18F]MK-6240 read method aligns well with Braak staging,14 with the note that tau PET sensitivity differs from (lags) neuropathology in that by the time NFTs reach the threshold of PET detection in Braak stage I/II, they have likely spread to Braak stage III/IV regions microscopically.29 “MTL only” captures the sensitivity of [18F]MK-6240 to medial temporal NFTs30 and facilitates detection of preclinical AD using the A/T/N framework.31 Clinical interpretation of medial temporal NFTs is not yet standardized,30 and “MTL only” identifies this uptake while distinguishing it from neocortical spread associated with amyloid-dependent disease progression.32, 33 Within the “MTL only” classification, uptake reaching Braak Stage III/B2 involving amygdala and extending into hippocampus (Figure 3B with “hook-like” uptake)34 can further be distinguished from transentorhinal/entorhinal uptake (Figure 3A), presenting subclassification opportunity. The former has been associated with the limbic predominant variant of AD35, 15 while early transentorhinal/entorhinal uptake may indicate either primary age-related tauopathy (PART)36 or the earliest stages of AD.
“MTL AND” captures neocortical NFT spread associated with AD progression and an accelerated rate of NFT accumulation, the slowing of which is the goal of several clinical trials.37 This spatial discrimination is relevant to monitoring disease progression and treatment response. While aligned with Braak staging, “MTL AND” does not require discrimination of individual cortical regions and accommodates the spatial variability observed in AD (Supplement). “Outside MTL” allows identification of cases that may involve clinical phenotypes differing from typical AD. The “MTL AND” class also allows for subclassification pending clinical trial or patient assessment objectives (examples in Supplement). While “outside MTL” cases were infrequent (N = 14), this category was important to capture scans with other atypical distributions.
[18F]MK-6240 binds to leptomeninges, more frequently in scans that do not exhibit binding in NFT-associated regions.8, 9 While meningeal binding was apparent in [18F]MK-6240 scans, all readers were able to visually distinguish it from likely NFT binding in cortical regions. Visual approaches to discern between meningeal signal and cortical uptake were identified and can be incorporated into training for broader use. For MTL uptake, the “hook”-like uptake34 (Figure 2B) associated with spread into hippocampus and amygdala provides a useful indicator of NFT binding, and is spatially distinct from meninges. For the early uptake limited to entorhinal cortex (a region with unclear clinical relevance), examining multiple slices and multiple views (sagittal, coronal, axial) was helpful in differentiating from meningeal signal. Meningeal uptake tended to be ring-like and at least partially contiguous, or inter-hemispheric, or concentrated between cerebellum and occipital tissue, and did not follow sulcal patterns or cluster in specific tissue as with cortical uptake. It also occurred most frequently in younger cases without cortical tau uptake as has been reported by others.38 For all reads, scans were not masked and included the meninges, which allowed determination of whether elevated signal at the cortical periphery was spilling in from meningeal binding. We note that the visual assessment of whether uptake is present in cortex beyond that adjacent to meninges has analogy to strategies of peripheral erosion and meningeal masking used by various groups for quantitative analysis.34, 40 Harrison et al. point out the importance of adjusting both cortex and reference, which a visual assessment takes into account.38
Findings from the 131-scan read supported consistency between PET-only and with-MRI reads. The primary benefit of an MRI underlay was increased reader confidence. There are, however, cases in which MRI could be helpful for confirming lesions, atypical morphology, or discerning uptake in entorhinal cortex (examples in Supplement).
Application of the read method to 1842 scans with a comprehensive range of ages, clinical diagnoses, amyloid, and atrophy demonstrated the broad relevance of the tracer. Read classifications were consistent in reader agreement as well as in association with amyloid and clinical status despite variations of dose, acquisition protocol, and scanner. Results from scans acquired using 5 mCi (185 MBq) injected [18F]MK-6240 activity (n = 1336) suggest that this lower amount of radioactivity could be adopted as a standard dose to minimize radiation exposure. Concordance between visual read classification and measured SUVRs supported visual read reliability (Supplement).
While SUVR quantitation was intended only as a confirmation of a correlation between visual observation and measured intensity, concordance was observed in both the SUVR comparison to visual detection in the 30 gold standard scans as well as in the comparison using 215 database scans (Supplement). These comparisons both showed general agreement between visual detection and regional quantitation. Complete concordance is not expected due to the use of averaged values across a region for quantitation rather than detection of a cluster within that region. The method used to adjust for potential off target spill-in, which was PVC for the gold standard scans and region of interest erosion for the supplemental database comparison, did not impact concordance between visual reads and SUVRs. This is consistent with the finding by Harrison et al. that different off-target binding adjustment methods, including PVC and erosion,38 did not significantly impact relationships between measured uptake and other endpoints. It can be noted that different compensation methods have various advantages and disadvantages when applied to longitudinal evaluation (such as increased variability with PVC),39 but this was not the focus or scope of the present work. The finding of general concordance between visual detection of uptake and quantitative values for [18F]MK-6250 was consistent with results reported by others.34, 40
Database visual classification frequencies are consistent with NFT burden reported in other studies for clinical stage and amyloid status.16, 34, 41, 42 The 30% of A+ CU subject scans in the “MTL AND” category (Figure 3) is similar to the 32% of cognitively normal subjects reported T+ by Therriault et al.38 The 75% of “MTL AND” scans among A+ MCI participants is similar to the 74% of A+ MCI who were also T+ as reported by Therriault et al.,41 as well as the percentage of A+ MCI participants who were NFT Braak stage III or higher as reported by Maass et al.,16 using different data sets of similar mean age. Occurrences of “outside MTL” were infrequent, potentially due to [18F]MK-6240 sensitivity in detecting relatively low NFT levels in MTL and the consideration of subtle MTL uptake as MTL positive.
4.1 Limitations
The lack of a comparison to a histopathological standard of truth is a limitation as contributing trials did not include end-of-life populations or brain banking endpoints. A post mortem standard of truth may be less feasible due to the time that typically lapses between PET imaging and autopsy. Validation of the “MTL only” designation would prove particularly difficult, as these early-stage patients may accumulate additional NFT for many years, extending into neocortex, prior to autopsy. The variety of protocols and study designs used to acquire the data, and differences in the tracers and thresholds used to assign amyloid status, created a heterogeneous data set which may limit some conclusions. Amyloid and clinical data were not available for a portion of database scans. As this work was intended only as a preliminary validation and application to a broad set of images, the number of different readers was limited.
4.2 Conclusions
The four-class visual read method developed for [18F]MK-6240 captures medial temporal tau as well as neocortical expansion and variability that occur with AD progression. Results of initial validation, further testing, and application to 1842 scans representing diverse disease states and acquisition parameters have demonstrated the method's reproducibility and robustness as a basis for its larger-scale validation. Clinical relevance observed in this work supports use in pharmaceutical trials and clinical assessment as clinical utility of NFT accumulation is established.
AUTHOR CONTRIBUTIONS
All authors have provided intellectual contributions to this work and have approved it for publication. Joanna Shuping and Dawn Matthews had equal contributions to this work. Joanna Shuping managed the overall development, validation, and application of the visual read method, and provided direction, input, and editing for the manuscript. After the method development, Dawn Matthews directed the independent experienced reader training, participated in the read of 131 scans, visually read the full database and analyzed results, and was primary author of the manuscript. Jeffrey Evelhoch performed classification result analyses and provided input and editing to the manuscript. David Scott and Kate Adamzuk led the visual read method development and provided method expertise, hosted the expert and naïve reader reads, performed quantitative assessments for use during method development, and provided input to the manuscript. Christopher Rowe, William (Chuck) Kreisl, and Sterling Johnson provided data, served as expert readers and assisted in developing the method, and provided manuscript input. Ana Lukic performed image processing and configuration of image visualization and reporting tools for reads. Pedro Rosa-Neto served as an expert reader and provided method input. Keith Johnson contributed data, served as an expert reader, and provided input to the method. Lindsay Cordes contributed to manuscript drafting. Claire Wilde was data management lead at Cerveau. Randolph Andrews served as one of the experienced independent readers and assisted with data management at ADMdx. Jerome Barakos and Derk D. Purcell served as the two naïve readers for initial validation. Koen Van Laere contributed data and provided review and editing of the manuscript. Larry Ward, Davangere Devanand, Yaakov Stern, Jose Luchsinger, Adam Brickman, Cyrille Sur, Julie Price, William Klunk, Adam Boxer, and Patrick Lao all contributed data. Sulantha Mathotaarachchi assisted with the transfer of data, review of read results, and review of the manuscript.
ACKNOWLEDGMENTS
This project used data from trials supported by the National Institute for Aging (R01AG063888, K23AG052633, R01AG021155, SV2AAG062285, RF1AG027161, P30AG062715, R01AG062167, vS10OD025245, R01AG055422, R01AG038465, R01AG050440, RF1AG051556, RF1AG051556-01S2, R01AG055299, K24AG045334, R01AG050436, R01AG052414, P01AG025204, and RF1AG058067); the Australian Imaging, Biomarkers & Lifestyle Flagship Study of Ageing; the Australian Dementia Network and the University of California Cures Alzheimer's Disease program; and Biogen, Inc. Institutions that acquired and/or contributed data were: Austin Health and University of Melbourne, Australia; Columbia University Medical Center, New York; Massachusetts General Hospital; University of Wisconsin; University of Pittsburgh Medical Center; University Hospital Leuven, Belgium; Biogen; and Molecular Neuroimaging. We thank those who were involved in acquiring the study data and providing it for use in the database. Special thanks are extended to the patients, healthy controls, and families involved in the studies. We are grateful to Cristian Salinas, Jonathan DuBois, Ajay Purohit, Raj Rajagovindan, Laurent Martarello, and John Beaver from Biogen Inc. for their roles in contributing data; Courteney Gerken for data retrieval and reconciliation; and Laura Matthews for assisting with report generation, data review, and readability input. The development and validation of the visual read method was fully funded by Enigma Biointelligence, Inc. (EBI), an affiliate of Cerveau Technologies, Inc.
CONFLICTS OF INTEREST
J. Shuping is a consultant to Enigma Biomedical Group. C. Wilde was a consultant to Cerveau Technologies. J. Evelhoch, W. Kreisl, K. Johnson, C. Rowe, and K. Van Laere are consultants/advisors to Cerveau Technologies. D. Matthews, A. Lukic, and R. Andrews are employees of ADM Diagnostics, Inc. D. Scott, K. Adamczuk, J. Barakos, and D. Purcell are employees of Clario. C. Rowe is an employee of Austin Health, Melbourne and University of Melbourne, Australia. S. Johnson is an employee of the University of Wisconsin Madison. W. Kreisl, A. Brickman, D. Devanand, J. Luchsinger, Y. Stern, and P. Lao are employees of Columbia University. P. Rosa-Neto is an employee of McGill University. K. Van Laere is an employee of UZ Leuven. K. Johnson and J. Price are employees of Massachusetts General Hospital and Harvard University. A. Boxer is an employee of the University of California San Francisco. W. Klunk is an employee of the University of Pittsburgh. C. Sur was an employee of Merck & Co., Inc. L. Cordes is an employee of StatKing Clinical Services. L. Ward is an employee of Florey Department of Neuroscience and Mental Health and University of Melbourne. S. Mathotaarachchi is an employee of Enigma Biomedical Group. Author disclosures are available in the supporting information.