Multivariable Diagnostic Prediction Model to Detect Hormone Secretion Profile From T2W MRI Radiomics with Artificial Neural Networks in Pituitary Adenomas
PDF
Cite
Share
Request
Original Article
VOLUME: 37 ISSUE: 1
P: 36 - 43
March 2022

Multivariable Diagnostic Prediction Model to Detect Hormone Secretion Profile From T2W MRI Radiomics with Artificial Neural Networks in Pituitary Adenomas

Medeni Med J 2022;37(1):36-43
1. Istanbul Goztepe Prof. Dr. Suleyman Yalcin City Hospital, Clinic of Radiology, Istanbul, Turkey
No information available.
No information available
Received Date: 03.12.2021
Accepted Date: 25.01.2022
Publish Date: 18.03.2022
PDF
Cite
Share
Request

ABSTRACT

Objective:

This study aims to develop neural networks to detect hormone secretion profiles in the pituitary adenomas based on T2 weighted magnetic resonance imaging (MRI) radiomics.

Methods:

This retrospective model-development study included a cohort of patients with pituitary adenomas (n=130) from January 2015 to January 2020 in one tertiary center. The mean age was 46.49±13.69 years, and 76/130 (58.46%) were women. Three observers segmented lesions on coronal T2 weighted MRI, and an interrater agreement was evaluated using the Dice coefficient. Predictors were determined as radiomics features (n=851). Feature selection was based on intraclass correlation coefficient, coefficient variance, variance inflation factor, and LASSO regression analysis. Outcomes were identified as 7 hormone secretion profiles [nonfunctioning pituitary adenoma, growth hormone-secreting adenomas, prolactinomas, adrenocorticotropic hormone-secreting adenomas, pluri-hormonal secreting adenomas (PHA), follicle-stimulating hormone and luteinizing hormone-secreting adenomas, and thyroid-stimulating hormone adenomas]. A multivariable diagnostic prediction model was developed with artificial neural networks (ANN) for 7 outcomes. ANN performance was presented as an area under the receiver operating characteristic curve (AUC) and accepted as successful if the AUC was >0.85 and p-value was <0.01.

Results:

The performance of the ANN distinguishing prolactinomas from other adenomas was validated (AUC=0.95, p<0.001, sensitivity: 91%, and specificity: 98%). The model distinguishing PHA had the lowest AUC (AUC=0.74 and p<0.001). The AUC values for the other five ANN were >0.85 and p values were <0.001.

Conclusions:

This study was successful in training neural networks that could differentiate the hormone secretion profile of pituitary adenomas.

Keywords:
Pituitary adenoma, magnetic resonance imaging, machine learning, artificial intelligence, radiomics

INTRODUCTION

Pituitary adenoma is the second most common primary central nervous system tumor and constitutes approximately 14% of intracranial masses1,2,3. Pituitary adenomas are classified according to their size as microadenoma (<1 cm), macroadenoma (≥1 cm) and giant adenoma (>4 cm). Additionally, 54-62% of pituitary adenomas are active hormone secreting tumors [growth hormone-secreting adenomas (GHSA), prolactinomas, adrenocorticotropic hormone-secreting adenomas (ACTHSA), pluri-hormonal adenomas (PHA), follicle-stimulating hormone and luteinizing hormone-secreting adenomas (FSH&LHSA), and thyroid-stimulating hormone adenomas (TSHA)], and 38-46% of them are non-functioning4,5,6. However, it is possible to determine the hormone secretion profile by using plasma hormone concentrations. Currently, due to increasing use of radiological imaging, many pituitary adenomas are detected incidentally7. For these tumors, it may be possible to estimate the hormone secretion profile at the time of imaging by exploiting the heterogeneity8.

Radiomics is a quantitative approach that extracts many image features from medical images and allows the development of diagnostic tools8. The success of this approach in determining tumor subtypes has been studied and confirmed in some other tumors9,10. In addition, in a limited number of recent studies, a model based on radiomics features was developed to predict tumor consistency in patients with pituitary adenoma11,12,13,14,15.

In prior radiomics studies, the stability of the radiomics feature was only evaluated at the level of interrater agreement with the intraclass correlation coefficient11. Therefore, this approach may be inadequate to detect stability of radiomics feature. However, the recent statement offered that stable features also should have high precision and accuracy16. Therefore, creating diagnostic models based on stable radiomics features may positively affect reproducibility, precision, and accuracy.

This study aims to develop neural networks to detect hormone secretion profiles in the pituitary adenomas based on T2 weighted magnetic resonance imaging (MRI) radiomics.

MATERIALS and METHODS

Ethical Considerations

This retrospective model-development study was done after it was approved by the Istanbul Medeniyet University Goztepe Prof. Dr. Suleyman Yalcin Training and Research Hospital Clinical Research Ethics Committee (decision no: 2020/0304, date: 05.18.2020), and written informed consent was waived. The STARD 2015 statement was followed to document the study, and white papers and statements of multiple societies were followed17,18,19,20,21. This study was scored (18/36) with a radiomics quality score17.

Study Population and Data Collection

This model-development study was carried out in a single tertiary-care center. From the patients documented between January 2015 to January 2020, 130 patients who met the inclusion criteria were included in the study. Inclusion criteria were determined as compliance with the following criteria: 1. The MRI, including T2W sequences, of the patient must be present. 2. Image quality should be sufficient to allow segmentation. Patients diagnosed in our center, but whose imaging was performed in another center were excluded. The MRI protocol is described in Table 1.

Predictors: Analysis of the T2W Images

Three radiologists with 8 years, 3 years, and 1 year of experience performed segmentation using 3D Slicer software, version 4.10.2 (https://www.slicer.org). Segmentation was done volumetrically on T2W images. The 851 radiomics feature, which is the predictor of this study, was extracted with the PyRadiomics (version 2.2.0). All the features (shape, first order, and high order) in this module were selected. Resampling was done, normalization was enabled, and wavelet-based filters were activated (Figure 1).

Outcomes

Outcomes were identified as 7 hormone secretion profiles [non-functioning pituitary adenoma (n=19), GHSA (n=21), prolactinomas (n=64), ACTHSA (n=6), PHA (n=6), FSA&LHSA (n=8), and TSHA (n=6)].

Features Stability Analyses: Interobserver Agreement Evaluation and Coefficient of Variation Analysis

Segmentations and radiomic features were separately assessed for interobserver agreement. For segmentation, the Dice similarity coefficient was used to measure interobserver reliability, while intraclass correlation coefficient (ICC - 3,k), two-way random effects model, and absolute agreement were used for radiomics features22. Features with an ICC>0.75 were included in the coefficient of variation (CoV) analysis, with those presenting >15% variances being eliminated16. The predictor features that passed the CoV analysis were subjected to Spearman’s correlation (SC) analysis, and correlation matrixes were performed for variance inflation factor (VIF) analysis.

Features Selection Analyses: Collinearity-multicollinearity Evaluation and Least Absolute Shrinkage and Selection Operator Regression

VIF analyses were performed to reduce the collinearity-multicollinearity using the formula 1/1-R2. If the VIF was above 10, the feature was eliminated23. The features with smaller CoV were preserved in this elimination process. Further, validated imaging biomarkers were evaluated using SC analysis between features and outcomes (p<0.01).

Features were selected with the least absolute shrinkage and selection operator (LASSO) with L1 normalization. Random sampling and 5-fold cross-validation were used for seeding LASSO.

Structuring Artificial Neural Networks

For training, networks of multilayer perceptron and radial basis function were selected. The software appointed the number of layers, the number of neurons, error function, hidden activation, and output activation in these models.  The software used random number generator for sampling 70% of the patients as train, 15% as a test, and 15% as a validation (hold-out) set for each training session of neurons. These subgroups were in a similar distribution in terms of predictors and outcomes. Hyperparameter tuning was made with the “early stopping” algorithm. The “Early stopping” algorithm trains the neural networks with the “training” set and performs hyperparameter tuning with the “test” set at the end of each epoch. Neuron training continues as long as the error rate decreases in both groups. The training is terminated when the error rate starts to increase in the “test” set. Finally, neuron performance is measured with the validation (hold-out) set.

Statistical Analysis

Statistical analyses and neural network development were performed using the TIBCO Statistica version 13.0.5 (TIBCO Software, Palo Alto, CA). Neural network results with the highest diagnostic accuracy are presented with area under the receiver operating characteristic curve (AUC) with 95% confidence intervals. In receiver operating curve analysis, if AUC was >0.85 and p-value was <0.01, then it was considered a validated classifier neural network16.

RESULTS

Patient’s Characteristics

This study included 130 consecutive patients with pituitary adenoma. The mean age was 46.49±13.69 years, and 76/130 (58.46%) were women. All patients were Caucasians. A full summary of clinicopathologic characteristics of the patients is presented in Table 2.

Model Development and Specification

The interobserver median Dice coefficient values for segmentations were 0.84 [interquartile range (IQR): 0.06] between observers 1 and 2; 0.84 (IQR: 0.17) between observers 1 and 3; and 0.79 (IQR: 0.20) between observers 2 and 3.

The 204 features were eliminated by using ICC (<0.75). By using CoV analysis (>0.15), 552 features were eliminated. Finally, another 44 features were eliminated by using VIF analysis due to collinearity. Most of the radiomics features were found to be unstable (n=800, 94%).

Stable predictors (n=51) and all outcomes were used for correlation analysis, and correlation matrixes were created to evaluate the unadjusted relation between each candidate predictor and outcomes (Figure 2). In this analysis, all SC coefficients were below 0.30, with p<0.01 for only five predictors (Figure 3). Finally, LASSO regression was used for regularization, and the most relevant predictors were selected for neural network training.

Diagnostic Prediction Model Results

The performance of the ANN distinguishing prolactinomas from other adenomas was validated (AUC=0.95, p<0.001, sensitivity: 91%, and specificity: 98%). The model distinguishing PHA had the lowest AUC (AUC=0.74 and p<0.001). Results of seven neural networks are presented in Table 3.

DISCUSSION

The most obvious result of this study was that prolactinomas, which were found in about half of the included patients, were predicted with high accuracy based on the heterogeneity in the T2W MRI images. However, the model distinguishing PHA had the lowest AUC. Difficulty in distinguishing these tumors with more than one cell group suggests that the results are not random and related to tissue heterogeneity.

There are limited studies in the literature on the classification of pituitary adenomas from MRI images11,12,13,14,15. The four of these studies investigated surgical consistency after surgical excision of adenomas11,13,14,15. In the study, which included 89 macroadenomas, Cuocolo et al.11 predicted 28 patients’ outcomes in the test group, and only two soft tumors were misclassified as fibrous tumors. However, all fibrous tumors were correctly classified. Fan et al.14 reported that adding clinical data such as age, sex, hormone levels to the model improved the model’s accuracy. These results meant that patients who might require re-surgery were identified by imaging the early phase of the disease. This information can make the surgeon confident for surgical planning and reduce residuals and recurrence rates. A second benefit is that the patient can be informed that the tumor is consistent and may need re-surgery in the future. In another study, Peng et al.12 used T1W, contrast-enhanced T1W, and T2W MRI images and three different machine learning algorithms, and they predicted three different immunohistochemical classes of pituitary adenomas preoperatively. They observed that T2W radiomics based model’s accuracy was the highest. The best classifier was the support vector machine. Considering these results, we did our study with T2W radiomics features and pre-trained neural networks.

Currently, radiomics studies are facing a reproducibility crisis. Therefore, the European Society of Radiology (ESR) has recently presented the statement for imaging biomarkers stability such as radiomics.16 Cuoculo et al.11 and Zeynalova et al. evaluated the reproducibility of radiomics features by using ICC and included the features with ICC>0.75 and ICC>0.90, respectively. Peng et al.1213, Fan et al.14, and Rui et al.15 did not evaluate the reproducibility of radiomics features. In this study, we followed the ESR statement to evaluate the feature’s stability. Therefore, we eliminated high variance features by using CoV and high collinear features by using VIF analysis16. Although Cuocolo et al.11 did not accept variance and collinearity as a criterion of stability, they also eliminated these features similar to our study.

The incidence of incidental adenoma is increasing due to the increasing frequency of imaging7. Detecting these lesions’ secretion profiles and consistency at the time of imaging can be beneficial for accelerating patient management. Due to several studies on tumor stiffness and consistency, we focused on the secretion profile in this study11,13,14,15. We hypothesized that the cells that determined the secretion profile could be detected by quantitative analysis in this study and we thought that estimating PHA with the lowest accuracy while estimating prolactinoma with the highest accuracy confirmed this hypothesis. Because each of the pluri-hormonal tumors has different amounts of different secretory types of cells, this condition restricts imaging profiling whereas imaging profiling in a tumor containing a single type of cell, such as a prolactinoma, is succesful.

This study had several limitations. First, prolactinomas were found in half of the patients, and this neural network trained balanced distribution; however other networks have not. Second, the ground truth was hormone plasma levels because our patient population was consisted of patients admitting to the outpatient clinic of endocrinology. Third, the study was single-centered. However, radiomics features were subjected to rigorous stability analyses to increase reproducibility and precision, and the internal validation methods were used in training neural networks to increase accuracy.

CONCLUSIONS

Soon, this study and previous studies will become parts of a complex web and accumulate, allowing us to obtain much more quantitative data on patients than current. Until then, we need to increase our quantitative data and closely test our imaging biomarkers’ reproducibility, precision, and accuracy. This study shows that the ANN distinguishes with 95% accuracy whether a pituitary adenoma is a prolactinoma.