Abstract
Background. Previously published predictive models for microdissection testicular sperm extraction (micro-TESE) were generally assumed patients with nonobstructive azoospermia (NOA) a homogenous population, i.e., the laboratory predictors were associated with sperm retrieval rate (SRR) in a similar way among different subpopulations. In addition, previous studies primarily regarded the success of sperm retrieval as the sole endpoint, although live birth is the ultimate goal for the couples. Objectives. The main objective is to develop and evaluate the clinical benefit of a model predicting the clinical outcome of micro-TESE in heterogeneous population with NOA. The outcome of pregnancy was taken into account via assessing the association between the predicted outcome of micro-TESE and pregnancy. Materials and Methods. A development cohort of 1,292 patients with NOA and an external validation cohort of 530 patients were included. Sperm retrieval was performed using micro-TESE. Clinical outcomes, including sperm retrieval, clinical pregnancy, and live birth, were collected. We developed a model using the machine learning method random forest and provided a web-based calculator. Results. The SRR was 38.1% (492/1,292) in the development cohort and 48.5% (257/530) in the validation cohort. The final model includes etiology, AMH, sperm retrieval surgical history, testicular volume, FSH, LH, and age as predictors (ordered by variable importance). The area under the curve of our model was 0.76 (0.74–0.79) in the development cohort and 0.75 (0.71–0.79) in the external validation cohort. The decision curve analysis showed that personalized model-based surgical decision provides additional clinical benefit. The clinical pregnancy rate (CPR) and cumulative live birth rate (CLBR) were 45.3% (405/895) and 57.6% (338/587), respectively, in the overall population. For patients of different SRR, the CPR and CLBR of whom had successful sperm retrieval were similar. Discussion and Conclusion. Our model predicting the SRR of micro-TESE was generalizable and easy to use. Predicted pregnancy outcomes like CPR and CLBR could also be derived from predicted SRR. A model-based surgical decision after personalized consultation would be beneficial to patients with NOA.
1. Introduction
Azoospermia is defined as the complete absence of sperm in the ejaculates. Based on its etiology, azoospermia is classified as obstructive azoospermia and nonobstructive azoospermia (NOA) [1]. The first-line treatment of NOA is testicular sperm extraction (TESE) combined with intracytoplasmic sperm injection (ICSI) [1, 2]. Microdissection testicular sperm extraction (micro-TESE) was first described in 1999 and is widely recommended for sperm retrieval, as this method enables the larger dilated tubules more likely to contain foci of intact spermatogenesis to be identified under microscopic visualization [3].
Despite the surgical improvements, sperm is successfully retrieved from only approximately half of the patients with NOA. The SRR may be even lower in patients with idiopathic NOA [4, 5]. Therefore, a considerable number of patients with NOA undergo unnecessary surgery that has little chance of success. Micro-TESE is an invasive procedure with risks of complications that may result in the loss of testicular tissue [6]. As sperm retrieval is often scheduled with oocyte retrieval after ovarian stimulation, the oocyte is lost if micro-TESE fails. Even though emergency oocyte cryopreservation is a feasible strategy, it still causes potential damages. Furthermore, sperm retrieval and ICSI cycles cause emotional and financial burdens [4, 7, 8]. Therefore, it would be beneficial to predict the success of sperm retrieval before attempting treatment [7, 9].
A number of factors, such as testicular volume, serum follicle-stimulating hormone (FSH) level, and serum inhibin B level, have been suggested to have predictive value in distinguishing between patients with a good versus a poor chance of sperm retrieval [10–16]. None of these parameters serve as a stand-alone marker of persistent spermatogenesis in men with NOA [7]. There is currently no consensus regarding the predictive factors for sperm retrieval in micro-TESE for men with NOA [9].
Several prediction models have been reported for sperm retrieval in patients with NOA [17, 18]. Most models predict the outcome of TESE and fine needle aspiration (FNA), instead of micro-TESE, with moderate efficacy [19, 20]. Furthermore, most models were not validated using external data [19]. Cissen et al. developed and validated a model to predict the SRR of TESE in men with NOA. The predictive capacity was moderate, with an area under the curve (AUC) of 0.69 in the development cohort and 0.65 in the validation cohort; however, the development and validation cohorts comprised very similar populations [7]. Ma et al. developed a model that accurately identified 86.4% of those who were likely to experience failure of sperm retrieval during FNA [9]. We applied these previously reported models to our cohort, but none of them showed reproducible performance. We suggest there must be critical population heterogeneity ignored in previous research which significantly hindered the generalizability of SRR prediction in NOA patients.
In addition, there is a lack of data regarding these prediction models of pregnancy outcomes for ICSI [7, 17, 18]. Although sperm retrieval is a crucial first step, the ultimate goal of the couple is pregnancy and a live birth. Patients should be well informed about their chances of successful sperm retrieval via micro-TESE and the subsequent outcome of obtaining a baby [4, 21, 22]. These information would be valuable in surgical decision-making.
To our knowledge, there is no reliable externally validated model for the prediction of sperm retrieval in micro-TESE based on a large sample. The investigations regarding the prediction model and the outcomes of ICSI are still rare and lacking. The clinical benefit in adopting personalized model-based surgical decision was not well discussed. The main aim of this study is to develop and validate a reliable model to predict the SRR of micro-TESE in men with NOA and evaluate the clinical benefit of model-based decision-making. The outcome of pregnancy was taken into account further via assessing the association between the predicting outcome of micro-TESE and pregnancy.
2. Materials and Methods
2.1. Patients
We retrospectively collected data on 1,292 patients with NOA from the Shanghai General Hospital between Mar 2015 and Aug 2021 as a development cohort and data on 530 patients from The Sixth Affiliated Hospital of Sun Yat-sen University between Jan 2016 and Aug 2021 as a validation cohort. The inclusion criteria for both cohorts were the males with NOA who underwent micro-TESE. Azoospermia was confirmed by the analysis of at least two centrifuged samples of ejaculates in accordance with the WHO laboratory manual for the examination and processing of human semen (5th edition) [23]. The diagnosis of NOA was based on a comprehensive medical history, physical examination, and auxiliary examination. The Prader orchydometer used to measure the testicular volume. The varicocele was evaluated by both physical examination and ultrasound. Abnormalities, such as abnormal karyotype, Y chromosome deletions, cryptorchidism, and mumps orchitis, were considered etiological factors of NOA. Varicocele was considered a risk factors for NOA. Other patients with no obvious abnormality were diagnosed with idiopathic NOA. Patients with NOA usually had a smaller or normal testicular volume and elevated (or normal) level of serum FSH. Patients with AZFa and/or AZFb deletions, and who underwent chemoradiotherapy were excluded from the present study. Men with any evidence of obstruction (e.g., history of vasectomy, congenital bilateral absence of the vas deferens) or ejaculation abnormality (e.g., low volume, decreased pH) were also excluded.
2.2. Surgery
Micro-TESE was performed under general anesthesia. The micro-TESE procedure was the same as described previously [3]. Briefly, the testis was delivered, and the tunica vaginal was opened, followed by a midline scrotal incision. The procedure began on the testis with the largest volume or on the right testis if there was no difference between the two testes. The larger, the more opaque seminiferous tubules were teased out to search for sperm. The procedure was terminated when sperm were retrieved or when further dissection was considered likely to jeopardize the testicular blood supply. The presence of sperm was confirmed under a phase contrast microscope at 200x magnification after the tubules were mechanically dissected. The testicular histology was performed on all biopsies. ICSI was performed utilizing fresh sperm, and the embryo was transferred at the cleavage and blastocyst stages. The age of the women was in the development cohort and in the validation cohort.
2.3. Ethical Approval
The study was approved by the Ethics Committee of Shanghai General Hospital (Number: 2020SQ041).
2.4. Model Development and Evaluation Method
The candidate predictors included age, etiology, testicular volume, serum concentrations of luteinizing hormone (LH), FSH, testosterone, estradiol, prolactin, anti-müllerian hormone (AMH), and inhibin B, surgical history of sperm retrieval, and varicocele. Testosterone, estradiol, prolactin, and inhibin B were dropped due to a lack of prediction performance. Several machine learning methods, including logistic regression, Lasso regression, random forest (RF), and Xgboost, were tried to build the prediction model based on the development cohort. The random forest model was finally selected according to the model performance under 10-fold cross-validation. Then the selected RF model was validated in both the development and validation cohorts. Discrimination performance of the model was assessed by the receiver-operating characteristic curve (ROC) and the area under the curve (AUC); the calibration performance of the model was assessed by the calibration plot and calibration in-the-large [24]. Decision curve analysis (Dca) was applied to assess the clinical benefit of the RF model in surgical consultant. Variable importance plot (VIP) of increased root mean square error was used to estimate the relative importance of each predictor in the model. Partial dependence plot was used to interpret the RF model and explore the dependence of SRR on each quantitative predictor. The development, validation, and reporting of the prediction models followed the “transparent reporting of a multivariable prediction model for individual prognosis or diagnosis” (TRIPOD) statement.
2.5. Statistical Analysis
Categorical data were summarized as frequencies and percentages, while continuous data were summarized as mean (standard deviation) or median (interquartile range) as appropriate. Proportion between groups were compared using the chi-square test. AUCs were compared using Delong’s method. A two-tailed value of <0.05 was considered significant. Missing data were imputed via the multiple imputation method. Statistical analysis was done with R software, version 3.5.1 (R Studio, Boston, MA).
3. Results
3.1. Clinical Characteristics
A total of 1,822 micro-TESE procedures were included, comprising 1,292 cases in the development cohort and 530 cases in the external validation cohort. Baseline characteristics are shown in Table 1.
3.2. Surgical Outcomes
The SRR was 38.1% (492/1292) in the development cohort and 48.5% (257/530) in the validation cohort.
The SRR of patients with different etiologies is shown in Figure 1 and Table S1. In our study, the SRR of patients with mumps orchitis ranked the highest among all the groups, followed by patients with cryptorchidism, AZFc deletion, Klinefelter syndrome, and others.

3.3. Model Specification
The prediction model built with the machine learning method random forest (RF) was finally selected with optimized hyperparameters (, , ). The final model included the following predictors: etiology, testicular volume, AMH, FSH, LH, age, and sperm retrieval surgical history. To make the model easier to utilize, patients and clinicians could access the model at the website https://wit004.shinyapps.io/prednoa (Figure S1).
3.4. Model Evaluation
The discriminative performance of our model is presented in Figure 2(a). The AUCs of our model were 0.76 (0.74–0.79) in the development cohort and 0.75 (0.71–0.79) in the external validation cohort. The small AUC difference between development and validation cohorts exhibits good model generalizability. The calibration plots in Figure 2(b) show good calibration of our model in both cohorts.

(a)

(b)
3.5. Complications
Three men developed postoperative scrotal hematoma (3/1,822, 0.2%), all of whom recovered after debridement. No scrotal edema, infection, or testicular atrophy occurred during follow-up.
3.6. Pregnancy Outcome
The pregnancy outcomes for all patients are shown in Table 2. The clinical pregnancy rate and cumulative live birth rate were 45.3% (405/895) and 57.6% (338/587), respectively, in the overall population. We classified the subjects into three groups according to their predicted SRR: the low SRR (0-30.0%) group, medium SRR (30.0%-60.0%) group, and high SRR (60.0%-100.0%) group. For patients of different SRR, the CPR and CLBR of those who had successful sperm retrieval were similar (, 0.29, respectively).
3.7. Decision Curve Analysis
A patient-oriented surgical decision should be base not only on the predicted SSR of a specific patient but also on how that patient personally values a successful birth and the surgical cost. To quantify the personalized value, the decision threshold of a patient is defined as the minimum SSR at which the patient would like to have micro-TESE surgery. The model-based surgical decision strategy requires identifying the personalized decision threshold of a patient through careful consultation and performing micro-TESE when the model predicted SSR > decision threshold. Figure 3(a) shows how many micro-TESE surgeries would be done and how many would be successful per 1000 patients using the model-based decision strategy at given decision threshold. Figure 3(b) (the decision curve) shows the standardized net benefit which is the average surgical benefit per patient without consideration of surgical injury or cost. “Standardized net benefit” of 0.099 means that assessing 100 patients with our model will provide additional clinical utility equivalent to performing additional 9.9 surgeries at no cost. As shown in Figure 3, the model-based decision strategy provides additional benefit compared to micro-TESE for all patients, micro-TESE for no patient, and micro-TESE decision based on etiology only. The exceptions are those with a decision threshold <0.2 (i.e., those willing to have surgery with very small chance of success); a model-based decision strategy is similar to micro-TESE for all patients.

(a)

(b)
3.8. Model Interpretation
Random forest is a type of machine learning model with a black-box nature. It could only provide a prediction but cannot explain how it makes this prediction. Therefore, model interpretation is desired to make the model well understood and more trustworthy.
Figure 4 is the variable importance plot (VIP) of the RF model. In VIP, the relative importance of predictors was estimated according to the increased root mean squared error. The importance of the predictors was ordered from high to low as etiology, AMH, surgical history of sperm retrieval, testicular volume, FSH, LH, and age as predictors.

Figure 5 is the partial dependence plot (PDP), which reveals the dependence of the SRR on each predictor. As Figure 5 shows, the small AMH (<2 ng/ml) and small testicular volume (<5 ml) were associated with high SSR, but the strength of the association varies in different etiologies. Low FSH and LH are associate with higher SSR in the etiological subpopulation of mumps orchitis, cryptorchidism, and AZFc deletion but with lower SSR in Klinefelter syndrome; there is a U-shaped correlation between FSH (or LH) and SSR among patients with mumps orchitis, cryptorchidism, AZFc deletion, or Klinefelter syndrome. These findings show complicated nonlinearity and interaction effects in the prediction model.

4. Discussion
We developed and validated a model with seven clinical features (i.e., etiology, AMH, surgical history of sperm retrieval, testicular volume, FSH, LH, and age as predictors, which are ordered by relative importance for prediction) to predict the SRR of micro-TESE. Our model showed good accuracy and generalizability and may benefit patient consultation and surgical decision-making. The model predicted SRR could also be interpreted as an associated pregnancy outcome.
Several studies have developed models for the prediction of sperm retrieval outcomes [7, 9, 17, 20], but none of these models showed replicable predictive power in our cohorts (Table S2). There may be several reasons for this inconsistency. First, some models were developed for slightly different clinical settings: Ma et al.’s model aimed to predict the outcome of FNA [9], while Cissen et al.’s model aimed to predict the outcome of TESE [7]. Second, and more importantly, the population heterogeneity of NOA patients was not fully addressed in previous models [7, 9, 11, 17]. It was generally assumed in previous studies that the laboratory predictors have similar predictive effect in subpopulations with different etiologies. However, as we revealed in the partial dependence plot, it is not the truth. For example, Cissen et al. found a U-shape correlation between FSH (or LH) and SSR [7], while we showed that the U shape may be a composition of positive correlations in subpopulations with some etiologies and negative correlations in others.
The model interpretation can further shed light on the disease mechanism of NOA with different etiologies. Serum FSH and AMH levels have been used to predict the outcome of sperm retrieval [10, 15, 16]. The AMH had the highest importance value among all laboratory indicators based on variable importance analysis, much higher than the FSH and LH. It implied that the AMH might play an important role in NOA, at least in subpopulation with some etiology. We suspected that germ cell defects and Sertoli cell defects were two major pathophysiological mechanisms of NOA. Serum FSH concentration may be an important predictive factor for defects in germ cells, which were typically presented in patients with AZFc deletion and cryptorchidism [25, 26]. The serum level of AMH was closely related to the numbers of Sertoli cells. For NOA with Sertoli cell defects, the slight amounts of Sertoli cells may possess sufficient functions to support intact spermatogenesis in some sections of the seminiferous tubules, which was more typical in patients with Klinefelter syndrome [27]. The other NOA may depend on the mix of these two mechanisms.
Most predictive models in previous studies primarily regarded the success of sperm retrieval as the sole endpoint. However, a live birth is the ultimate goal of the couple. Therefore, it is important to investigate how the model prediction could be interpreted in terms of pregnancy outcomes [22, 28]. A potential concern is that patients with low SRR might tend to have sperm activity issues which would further affect pregnancy outcomes. However, our result showed that this was not the case. For patients with different SRR, the CPR and CLBR after successful sperm retrieval were similar. Therefore, personalized CPR and CLBR could be derived from the model-predicted SRR. That means if a patient had a higher predicted SRR than another, he would also have proportionally higher CPR and CLBR consequently. For instance, for a given patient with a SRR of 40%, his CPR could be estimated at 23.03% (). In some literature, the outcomes of ICSI using sperm from micro-TESE has been considered different in azoospermic men with varied etiologies. Patients with mumps orchitis or cryptorchidism usually possessed the high rates of clinical pregnancy and live birth, while the patients with AZFc microdeletion had the low rates of clinical pregnancy and live birth [5, 29]. In our study, the sample size seemed to be too limited to analyze the influence of the etiology factor on the outcomes of ICSI.
The present study has the following strengths. First, the machine learning method we used to the build prediction model can capture interaction and nonlinear effects, which were shown to be significant in the SRR of micro-TESE, to enhance the model generalizability in heterogeneous populations. Secondly, we included all patients with NOA who sought surgical treatment, except those who were definitely untreatable by micro-TESE, i.e., men with AZFa deletion. The large number of patients was treated by a few regular expert surgeons and laboratory technicians in the past six years. These factors reduce the risk of selection bias and bias from technical aspects [30]. Thirdly, the pregnancy outcomes were taken into account. The model could also be interpreted as a prediction of the outcomes of ICSI. Finally, utilizing the model is less invasive as we did not include testicular histopathology as a predictor.
The present study has some limitations. First, AMH detection was not carried out at an early stage in our hospital. The partial lack of AMH data may decrease the accuracy of the prediction model. Second, some etiological subgroups had a small number of patients, which may weaken the model in these subgroups.
Appropriate counselling of men undergoing sperm retrieval is important from physiological and psychological viewpoints [7]. The ideal surgical decision for patients’ benefit should combine the accurate prediction of surgical outcome and the personal preference of patients. Couples whose predicted prognosis do not meet their desired threshold might refuse surgical therapy and instead consider adoption or fertilization with donor sperm. We provided an online calculator based on our predictive model. We proposed that using this calculator, properly interpreting the model output for the patient, and making model-based surgical decision could be a plausible framework for consultation and will provide significant clinical benefit for patients with NOA.
In conclusion, by combining etiology with other preoperative clinical features, we built a model that predicted the SRR of micro-TESE with generalizable accuracy, and the model-based surgical strategy could provide a benefit in patient decision-making.
Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
No conflict of interest is declared.
Authors’ Contributions
Ruhui Tian did the surgery performing, analysis and interpretation of the data, and the manuscript writing. Jing Zhang worked on the surgery performing, analysis and interpretation of data, and manuscript writing. Yuan Xu performed the sperm processing, ICSI, and investigation. Shiwei Liu conducted the analysis and interpretation of the data. Cunzhong Deng did the sperm processing. Chen Huixing was assigned to surgery performing. Li Peng performed the surgery. Huang Yuhua conducted the patient management and collection and analysis of data. Erlei Zhi executed the patient management and collection and analysis of data. Guihua Liu did the patient management and data collection. Guihua Sun performed the sperm processing. Xiaoyan Liang conducted the clinical treatment and interpretation of data. Fujun Zhao was assigned to clinical treatment and the interpretation of data. Yu Wu did the clinical treatment and interpretation of data. Chencheng Yao conducted the data collection and analysis and critical revision of the manuscript. Weituo Zhang implemented the design and supervision. Zheng Li worked on the design and supervision. Ruhui Tian and Jing Zhang contributed equally to this work.
Acknowledgments
This work was supported by the Strategic Priority Research Program of the Chinese Academy of Sciences (XDA16020701), Interdisciplinary Program of Shanghai Jiao Tong University (YG2017ZD04), Clinical Research Innovation Plan of Shanghai General Hospital (KD007-ly01, CTCCR-C04), and National Natural Science Foundation of China (82171590, 81903417). We are very grateful to all our colleagues involved in this work. Special thanks are due to Jianxiong Zhang, Ningjing Ou, Jiayuan Jiang, Xueying Ding, and Yan Qiu. Their assistance in the management of patients and suggestions for the data analysis and interpretations have enhanced the quality of our work.
Supplementary Materials
Supplementary 1. Figure S1. The layout of web-based calculator. We provided a web-based calculator for our predictive model. To use this calculator, visit the web address https://wit004.shinyapps.io/prednoa, input predictor values for specific patient, and press calculate button. The predicted SSR would be shown.
Supplementary 2. Table S1. The SRR of patients with different etiologies.
Supplementary 3. Table S2. Summary of performance of predictive models in different cohorts.