Empirical Evaluation of Cross-Site Reproducibility in Radiomic Features for Characterizing Prostate MRI
mri; Stability; prostate; feature analysis; multi-site; radiomics; reproducibility; variance
The recent advent of radiomics has enabled the development of prognostic and predictive tools which use routine imaging, but a key question that still remains is how reproducible these features may be across multiple sites and scanners. This is especially relevant in the context of MRI data, where signal intensity values lack tissue specific, quantitative meaning, as well as being dependent on acquisition parameters (magnetic field strength, image resolution, type of receiver coil). In this paper we present the first empirical study of the reproducibility of 5 different radiomic feature families in a multi-site setting; specifically, for characterizing prostate MRI appearance. Our cohort comprised 147 patient T2w MRI datasets from 4 different sites, all of which were first pre-processed to correct acquisition-related for artifacts such as bias field, differing voxel resolutions, as well as intensity drift (non-standardness). 406 3D voxel wise radiomic features were extracted and evaluated in a cross-site setting to determine how reproducible they were within a relatively homogeneous non-tumor tissue region; using 2 different measures of reproducibility: Multivariate Coefficient of Variation and Instability Score. Our results demonstrated that Haralick features were most reproducible between all 4 sites. By comparison, Laws features were among the least reproducible between sites, as well as performing highly variably across their entire parameter space. Similarly, the Gabor feature family demonstrated good cross-site reproducibility, but for certain parameter combinations alone. These trends indicate that despite extensive pre-processing, only a subset of radiomic features and associated parameters may be reproducible enough for use within radiomics-based machine learning classifier schemes.
Chirra P; Leo P; Yim M; Bloch B N; Rastinehad A R; Purysko A; Rosen M; Madabhushi A; Viswanath S
Medical Imaging 2018: Computer-Aided Diagnosis
2018
2018
Book Chapter
<a href="http://doi.org/10.1117/12.2293992" target="_blank" rel="noreferrer noopener">10.1117/12.2293992</a>
Comparing radiomic classifiers and classifier ensembles for detection of peripheral zone prostate tumors on T2-weighted MRI: a multi-site study
adenocarcinoma; benign; cancer; Classifiers; Comparison; features; machine-learning-methods; MRI; Nuclear Medicine & Medical Imaging; Prostate cancer; Radiology; Radiomics; texture analysis
BackgroundFor most computer-aided diagnosis (CAD) problems involving prostate cancer detection via medical imaging data, the choice of classifier has been largely ad hoc, or been motivated by classifier comparison studies that have involved large synthetic datasets. More significantly, it is currently unknown how classifier choices and trends generalize across multiple institutions, due to heterogeneous acquisition and intensity characteristics (especially when considering MR imaging data). In this work, we empirically evaluate and compare a number of different classifiers and classifier ensembles in a multi-site setting, for voxel-wise detection of prostate cancer (PCa) using radiomic texture features derived from high-resolution in vivo T2-weighted (T2w) MRI.MethodsTwelve different supervised classifier schemes: Quadratic Discriminant Analysis (QDA), Support Vector Machines (SVMs), naive Bayes, Decision Trees (DTs), and their ensemble variants (bagging, boosting), were compared in terms of classification accuracy as well as execution time. Our study utilized 85 prostate cancer T2w MRI datasets acquired from across 3 different institutions (1 for discovery, 2 for independent validation), from patients who later underwent radical prostatectomy. Surrogate ground truth for disease extent on MRI was established by expert annotation of pre-operative MRI through spatial correlation with corresponding ex vivo whole-mount histology sections. Classifier accuracy in detecting PCa extent on MRI on a per-voxel basis was evaluated via area under the ROC curve.ResultsThe boosted DT classifier yielded the highest cross-validated AUC (= 0.744) for detecting PCa in the discovery cohort. However, in independent validation, the boosted QDA classifier was identified as the most accurate and robust for voxel-wise detection of PCa extent (AUCs of 0.735, 0.683, 0.768 across the 3 sites). The next most accurate and robust classifier was the single QDA classifier, which also enjoyed the advantage of significantly lower computation times compared to any of the other methods.ConclusionsOur results therefore suggest that simpler classifiers (such as QDA and its ensemble variants) may be more robust, accurate, and efficient for prostate cancer CAD problems, especially in the context of multi-site validation.
Viswanath S E; Chirra P V; Yim M C; Rofsky N M; Purysko A S; Rosen M A; Bloch B N; Madabhushi A
Bmc Medical Imaging
2019
2019-02
Journal Article
<a href="http://doi.org/10.1186/s12880-019-0308-6" target="_blank" rel="noreferrer noopener">10.1186/s12880-019-0308-6</a>