Abstract

In end-stage renal disease (ESRD), vascular calcification risk factors are essential for the survival of hemodialysis patients. To effectively assess the level of vascular calcification, the machine learning algorithm can be used to predict the vascular calcification risk in ESRD patients. As the amount of collected data is unbalanced under different risk levels, it has an influence on the classification task. So, an effective fuzzy support vector machine based on self-representation (FSVM-SR) is proposed to predict vascular calcification risk in this work. In addition, our method is also compared with other conventional machine learning methods, and the results show that our method can better complete the classification task of the vascular calcification risk.

1. Introduction

Cronic kidney disease-mineral bone disease (CKD-MBD) is one of the most serious complications in patients with end-stage renal failure, including an abnormal metabolism of calcium, phosphorus, parathyroid hormone, vitamin D, abnormal bone transformation, vascular calcification, and ultimately cardiovascular disease.

In recent years, fibroblast growth factor (FGF23) has been recognized as a protein that plays an important role in phosphate regulation. Klotho protein is the receptor protein of FGF23. It participates in regulating the body’s bone metabolism, calcium and phosphorus metabolism, protecting the integrity of blood vessels, and inhibiting vascular calcification through the formation of FGF23-klotho complexes. Therefore, FGF23 and klotho are key participants in CKD-MBD, and they are closely related to the occurrence of vascular calcification and cardiovascular disease. Existing evidence shows that there is a clear correlation between FGF23 and the occurrence of vascular calcification and cardiovascular disease (CVD). The increase of FGF23 can be used as a risk factor for CVD in patients with end-stage renal disease (ESRD) [1].

Fetuin-A is considered to be an inhibitor of the progression of vascular calcification and can delay the progression of abdominal aortic calcification [2]. Studies have shown that there is a close correlation between fetuin-A and the malnutrition-microinflammatory state of ESRD patients [2]. It is currently believed that low serum fetuin-A levels in ESRD patients is an independent risk factor for vascular calcification.

Malnutrition is a common complication in ESRD patients, and it is closely related to vascular calcification, cardiovascular events, and all-cause mortality. Factors affecting the nutrition of ESRD patients include protein-energy expenditure, digestion and absorption, inflammation, and endocrine hormone level disorders [3].

There are a variety of tools available to assess the nutritional status of dialysis patients. Among them, the geriatric nutrition risk index (GNRI) is considered to be an important predictor of cardiovascular death [4]. The latest research also shows that there is a certain positive correlation between GNRI and the degree of aortic calcification in CKD patients [5].

Vascular calcification (VC) scores of the artery or aorta on plain radiographs are associated with CVD events and may be predictive of CVD in dialysis patients [6]. Many research results show that abdominal aortic calcification as assessed on a lateral lumbar X-ray is predictive for the presence of significant coronary artery disease in asymptomatic dialysis patients [7].

The previous research results of our work show that patients with end-stage renal failure have abnormal levels of FGF23 and klotho and microinflammatory states. Their interaction and mutual influence are involved in the occurrence and development of vascular calcification and CKD-MBD [8].

Therefore, in order to further explore the risk factors of vascular calcification in patients with ESRD, this article studies the scientific and accurate prediction of vascular calcification risk factors in ESRD patients with different forecasting model, so as to help clinicians to detect and intervene early, thereby delaying the occurrence and development of CKD-MBD, reducing the incidence of CVD, and improving the prognosis. Machine learning (ML) has been widely used in the dry weight (DW) [9] of hemodialysis patients and has achieved good results. Lots of ML-based models also have been well used in drug discovery [1012], protein function [1316], and disease analysis [17, 18].

In this study, we employ a support vector machine (SVM) to build a predictive model. SVM has the following advantages: (1) Nonlinear mapping is the theoretical basis of the SVM method. SVM uses the inner product kernel function to replace the nonlinear mapping to high-dimensional space. (2) The optimal hyperplane to divide the feature space is the goal of SVM, and the idea of maximizing the classification margin is the core of the SVM method. (3) A small number of support vectors determine the final result, which can not only help us capture key samples but also “remove” a large number of redundant samples. For imbalanced datasets, the standard SVM is not good at classifying a small number of categories. In this work, we propose a fuzzy support vector machine based on self-representation (FSVM-SR) to identify vascular calcification of hemodialysis patients under imbalanced data. FSVM can estimate a weight for each training sample. When constructing the hyperplane of classification, FSVM avoids some low-weight samples (noise samples) to alleviate the influences of imbalanced datasets.

2. Materials and Methods

2.1. Materials

This work employs 29 features to describe the patient’s information, which includes gender, age, body mass index (BMI), diabetes mellitus (DM), cerebral infarction (CI), and coronary heart disease (CHD). Table 1 shows the details of our dataset. The mean and standard deviation of samples is also list in it.

During the data collection process, we classified 59 patients into risk levels. We roughly classify the 7 risks according to two classification methods. Some adjacent levels will be grouped into one category. The classification results are shown in Table 2. In the first classification scheme (CS1), levels {0, 1, 2} and levels {3, 4, 5, 6} are classified into classes 1 and 2, respectively. In addition, the levels {0, 1}, {2, 3}, and {4, 5, 6} are classified into classes 1, 2, and 3 (in CS2), respectively.

2.2. Methods
2.2.1. Abdominal Aortic Calcification

All patients need to undergo lateral lumbar X-ray examination within 1 week of blood biochemical examination to assess the calcification of the abdominal aorta corresponding to levels 1-4 [7]. According to the length of the calcified plaques on the anterior and posterior walls of the abdominal aorta, for scores of 0 to 3: no calcification is 0 points, calcification range<1/3 arterial wall length is 1, calcification range 1/3-2/3 arterial wall length is 2, calcification range>2/3 arterial wall length is 3, and total score is between 0 and 24. Two radiologists separately scored and averaged. The calculation of geriatric nutrition risk index (GNRI) [19] can be estimate by

Serum levels of intact FGF23, soluble Klotho, Fetuin-A, and interleukin-6 were received by using two-site enzyme-linked immune assays (reagents from Elabscience Biotech, Wuhan, China).

2.2.2. Fuzzy Support Vector Machine

SVM is a robust machine learning method based on statistical learning, which considers empirical risk and adds a regularization term to reduce structural risk. It is a sparse and robust classifier [20]. SVM also can perform nonlinear classification through the kernel method, which is one of the common kernel learning methods. In many practical classification tasks, the number of samples in different categories is often different. Under the imbalanced dataset, the SVM model will produce a large deviation. In order to avoid the above situation, Lin and Wang proposed fuzzy SVM (FSVM) [21]. Different from SVM, FSVM uses membership value to describe the weight of the training sample. In general, the membership value of outlier samples is lower, and it is easier for the algorithm to weaken the contribution to the decision hyperplane during the training process.

For FSVM, a training sample can be defined as , where is the number of training samples, and are feature vector, label, and membership value of sample , respectively. The feature vector dimension of the model is . The objective optimization function of FSVM is where denotes the regularization parameter, is the error measure of . To build a robust model, different training samples should be given different regularization parameters. is the error measure, which is weighted by the membership value. Outliers (noise) have a lower weight; on the contrary, important sample points will have a higher weight. In an imbalanced dataset, the type of data with a large number of samples often contains more outliers. In order to reduce the deviation, FSVM can well reduce its impact. Equation (2) also can be rewritten by the Lagrange dual problem: where is the Lagrange multiplier coefficient for sample . is the value of samples and in the kernel matrix. And the kernel matrix can be calculated by the radial basis function (RBF): where is a Gaussian kernel bandwidth.

The final decision function of classification is

The basic SVM can only perform binary classification tasks. In this work, we use the one-against-one strategy to achieve multiple classifications.

2.2.3. Self-Representation-Based Membership Function

In this work, we propose a method based on a reconstruction error to construct the membership function. This method can measure the consistency between the overall data structure and a single data point. The reconstruction error can quantify the outlier degree of the noise sample, which helps to improve the robustness of the model.

Let , the self-representation function is defined as follows: where and are the coefficient and error matrix. is the new representation of sample by other training samples. The self-representation formulation can be optimized by where is the Laplacian regular term to smooth the coefficient : where is the similarity matrix between samples. It also can be replaced by kernel matrix. is a normalized Laplacian matrix and . The is an element of the diagonal matrix . In this work, denotes the coefficient of the Laplacian regular term, which is set as 0.01. Setting , the solution of Equation (7) can be obtained as follows: where is a Sylvester equation. For each training sample, the reconstruction error of can be calculated as

To map the value of reconstruction error in . We define the following formula: where and are the minimum and maximum reconstruction errors, respectively. The process of our method is list in Algorithm 1.

Require: training set , test set, the parameters of C and ; the Gaussian kernel bandwidth of .
Ensure: The predictive values of .
1. Calculate the training kernel (), test kernel matrix () and Laplacian matrix by Equation (4) and ;
2. Estimate the self-representing coefficient matrix of by Equation (9);
3. Calculate the reconstruction error for each training sample via ;
4. Obtain the final membership value of each training sample by ;
5. Train FSVM-SR (obtaining ) via solving Eq. (3);
6. Predict by SVM decision function:.

3. Results

3.1. Evaluation measurements.

In our study, the accuracy (ACC) is employed to evaluate the predictive performance of our predictive model. In addition, a 10-fold crossvalidation method [2225] was used in this work. The calculation method of ACC is as follows: where is the number true positive (TP) in subclass . is the number of classes. and denote the number of whole test samples and subclass test samples. is the accuracy of subclass .

4. Selection of Optimal Parameters

In order to obtain the best prediction performance, we use the grid search method to obtain the optimal parameters and . The search ranges are from 2-5 to 210 (), and from 2-10 to 25 (), with the step of 21. Figures 1. and 2 show the average ACC with different and (under CS1 and CS2), respectively.

As shown in the figures, the model reach ACC of 83.05% and 64.40%, when the optimal parameters , and , , respectively.

4.1. Comparison to Other Classifiers

To further evaluate the performance of our model, we introduced other similar machine learning models [26, 27], including logistic regression, back propagation (BP) neural network, radial basis function (RBF) neural network, Takagi-Sugeno-Kang fuzzy system (TSK-FS) [2830], and standard SVM. Under 2 classes (Table 3), logistic regression, BP network, RBF network, and TSK-FS achieve whole ACC of 71.18%, 66.10%, 77.96%, and 76.27%, respectively. The whole ACC of SVM (79.66%) and FSVM-SR (83.05%) are better than other models. In particular, FSVM-SR obtains the best prediction accuracy. In subclasses 1 and 2, FSVM-SR also achieves best accuracy of 95.23% and 52.94%, respectively. It can be seen from the results that for small sample learning, the SVM has more advantages than the neural network models. As the fuzzy model, FSVM has better performance than TSK-FS on this dataset. FSVM-SR can effectively suppress the influence of noise samples on the model. The receiver operating characteristic curves (ROC) of different models are shown in Figure 3. It can also be found that our method obtains the highest area under curve (AUC) value of 0.7955.

Under CS3, FSVM-SR also compares with these predictors, and the results of comparison are listed in Table 4. SVM and FSVM-SR achieve the best whole ACC of 64.40%. In subclass 1, the ACC of RBF neural network and SVM are 72.00% (best). FSVM-SR and TSK-FS obtain 65.21%, and FSVM-SR has smaller standard deviation (30.88%) in subclass 2. In addition, SVM and FSVM-SR have better performance (54.54%) in subclass 3. It can be seen that FSVM-SR is also more stable and effective in the case of CS3.

Two-sample -test is employed to evaluate the significance differences of average ACC value in CS1 and CS2, respectively. In our work, the significance level is 0.05. FSVM-SR is compared with other models via 10-fold crossvalidation (20 times). The results of statistical significance are shown in Table 5. In CS1, the differences between FSVM-SR and other models are all significant ( value < 0.05). The max value is 0.0064 (for SVM) and min value is 3.4341-11 (for BP neural network). Except for SVM ( value 0.1965), the differences with other models are significant in CS2. It can be seen from the results that the proposed method (FSVM-SR) outperforms most methods in two patient risk classification schemes.

5. Discussion

Cardiovascular death is the main cause of ESRD patients. Studies have confirmed that the occurrence of abdominal aortic calcification (in ESRD patients) is extremely important for cardiovascular death [31]. In recent years, the influence of nontraditional risk factors, such as FGF23, klotho abnormality, microinflammatory state, and malnutrition on vascular calcification has attracted much attention from scholars.

As a protein that plays a key role in phosphate regulation, FGF23 is involved in controlling the metabolism of phosphate, parathyroid hormone, and 1,25 dihydroxy vitamin D. FGF23 can not only regulate phosphate homeostasis but also further promote disease progression, left ventricular hypertrophy, and increase the occurrence and death of CVD [1]. Klotho, as an antiaging gene, has been confirmed by many studies that it participates in cardiovascular protection in ESRD patients by inhibiting phosphate-driven vascular calcification [32]. The results of previous studies of our center showed that serum FGF23 levels increased, and soluble klotho levels decreased. Moreover, after the combined abdominal aortic sclerosis, the abnormalities of serum FGF23 and klotho are more obvious [8]. This study is consistent with the results of previous studies. The levels of FGF23 and klotho are abnormal in ESRD patients. They are risk factors for vascular calcification. And the analysis results of different risk prediction models all support this conclusion.

Fetuin-A, as a protective factor for vascular calcification, can inhibit the process of vascular calcification [33]. The results of this study show that the serum fetuin-A of ESRD patients is significantly reduced. In addition, the serum fetuin-A decreases more significantly in patients with abdominal aortic calcification. The level of decrease has a certain early warning effect on vascular calcification. Both FSVM and traditional prediction models suggest that fetuin-A is an independent risk factor for abdominal aortic calcification. Therefore, clinicians should pay close attention to serum fetuin-A levels in the process of CKD. Once abnormalities occur, they should intervene as soon as possible.

Studies have shown that malnutrition is closely related to vascular calcification, cardiovascular death, and all-cause death in ESRD patients [4]. In this work, the results of different risk prediction models support that malnutrition is an independent risk factor for abdominal aortic calcification. Existing studies have shown that malnutrition and microinflammatory state, insulin resistance, FGF23/klotho axis abnormalities, and other factors are interconnected and ultimately jointly promote the occurrence of vascular calcification [19]. Therefore, many risk factors for vascular calcification are often mixed, and it is difficult for clinicians to accurately determine the main risk factors and provide precise treatment intervention. By comparing the accuracy of different prediction models for predicting the risk of abdominal aortic calcification, it is found that the FSVM and SVM models are more accurate in identifying the main risk factors than the traditional logistic regression model. Under CS2, FSVM-SR achieves best accuracy of 95.23% and 52.94%, respectively. Our method (FSVM-SR) is significantly better than other methods ( value, logistic regression: , BP network: , RBF network: , TSK-FS: , and SVM: 0.0064). What is more, FSVM-SR and SVM achieve the best whole ACC of 64.40% in CS3. Self-representation-based membership function can estimate weight for training sample. The reconstruction error of outliers is relatively large, and the corresponding membership value is low. When constructing the hyperplane of classification, FSVM avoids some low-weight samples (outliers) to alleviate the influences of imbalanced datasets. The fuzzy methods [34] improve the interpretability and robustness of the model. There are related applications in the medical fields [35, 36].

6. Conclusions

In this work, we propose a FSVM based on a self- representation method to filter noise samples, improve the generalization ability of the model, and obtain good results. Although our method has achieved a better accuracy, it still has the following disadvantage: (1) The sample size needs to be further increased to minimize the prediction bias. (2) There is no detailed analysis of the various factors of the patient [37, 38]. (3) The interpretability of the model is not as good as that of the linear model. Based on the above, we will propose a sparse linear model in the next work to solve the problem of poor interpretability and factor analysis.

Data Availability

The data used to support the research can be obtained from the corresponding authors according to the requirements of the institution.

Ethical Approval

This study had been approved by the ethics committee of the hospital (ethical approval no. KS2019041). The experimental protocol was established, according to the ethical guidelines of the Helsinki Declaration, and was approved by the Human Ethics Committee (Wuxi People’s Hospital Ethics Committee).

Written informed consent for publication was obtained from all participants.

Conflicts of Interest

The authors declare that they have no conflict of interest.

Authors’ Contributions

Xiaobin Liu, Xiran Zhang, and Xiaoyi Guo are joint first authors.

Acknowledgments

We thank the Department of Nephrology of Wuxi People’s Hospital for collecting data in this study. This work is supported by a grant from the Top Talent Support Program for young and middle-aged people of Wuxi Health Committee (HB2020008), the Scientific research project of Wuxi health committee (MS201927), the Scientific research project of Wuxi health committee (Z201914), the Scientific Research Projects of Jiangsu Provincial Health Commission (LGY201801), the Jiangsu Province “333” project (BRA2020142), National Natural Science Foundation of China (NSFC 61902271), and the Natural Science Research of Jiangsu Higher Education Institutions of China (19KJB520014).