Benchmarking the generalizability of brain age models: Challenges posed by scanner variance and prediction bias
Hum Brain Mapp. Robert J Jirsaraie et al.
Machine learning has been increasingly applied to neuroimaging data to predict age, deriving a personalized biomarker with potential clinical applications. The scientific and clinical value of these models depends on their applicability to independently acquired scans from diverse sources. Accordingly, we evaluated the generalizability of two brain age models that were trained across the lifespan by applying them to three distinct early-life samples with participants aged 8-22 years. These models were chosen based on the size and diversity of their training data, but they also differed greatly in their processing methods and predictive algorithms. Specifically, one brain age model was built by applying gradient tree boosting (GTB) to extracted features of cortical thickness, surface area, and brain volume. The other model applied a 2D convolutional neural network (DBN) to minimally preprocessed slices of T1-weighted scans. Additional model variants were created to understand how generalizability changed when each model was trained with data that became more similar to the test samples in terms of age and acquisition protocols. Our results illustrated numerous trade-offs. The GTB predictions were relatively more accurate overall and yielded more reliable predictions when applied to lower quality scans. In contrast, the DBN displayed the most utility in detecting associations between brain age gaps and cognitive functioning. Broadly speaking, the largest limitations affecting generalizability were acquisition protocol differences and biased brain age estimates. If such confounds could eventually be removed without post-hoc corrections, brain age predictions may have greater utility as personalized biomarkers of healthy aging.