An Analysis of Performance and Comparison of Models for Cardiovascular Disease Prediction via Machine Learning Models in Healthcare

Anand Polamarasetti; Krishna Madhav Jha; Vasu Velaga; Kishan Kumar Routhu; Gangadhar Sadaram; Suneel Babu Boppana; Srikanth Reddy Vangala

Article Open Access December 17, 2024

An Analysis of Performance and Comparison of Models for Cardiovascular Disease Prediction via Machine Learning Models in Healthcare

Anand Polamarasetti ^1,*, Krishna Madhav Jha ², Vasu Velaga ³, Kishan Kumar Routhu ⁴, Gangadhar Sadaram ⁵, Suneel Babu Boppana ⁶ and Srikanth Reddy Vangala ⁷

¹

MCA, Andhra University, USA

²

Topbuild Corp, Sr Business Analyst, USA

³

Cintas Corporation, SAP Functional Analyst, USA

⁴

ADP, Senior Solution Architect, USA

⁵

Bank of America, VP DevOps/ OpenShift Admin Engineer, USA

⁶

iSite Technologies, Project Manager, USA

⁷

Department of Computer Science, University of Bridgeport, USA

Publihed in: Journal of Artificial Intelligence and Big Data (Volume 4, Issue 2, 2024)

Page(s): 96-108

DOI: 10.31586/jaibd.2024.1332

Received
August 02, 2024

Revised
September 27, 2024

Accepted
November 20, 2024

Published
December 17, 2024

Keywords

Cardiovascular Disease; Heart Disease Classification; Convolutional Neural Network (CNN); UCI Heart Disease Dataset; Sigmoid Function; Machine Learning (ML); Deep Learning (DL)

Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Abstract

Over the past few decades, cardiovascular disease and related complications have surpassed all others as the important causes of death on a universal scale. At the moment, they are the important cause of mortality universal, including in India. It is important to know how to find cardiovascular problems early so that patients get better care and prices go down. This project utilizes the UCI Heart Disease Dataset to develop ML and DL models capable of detecting cardiac diseases. Heart illness was categorized using Convolutional Neural Network (CNN) models, which are able to detect intricate patterns in supplied data. A confusion matrix rating, an F1-score, a ROC curve, accuracy, precision, and recall were some of the measures used to grade the model. It did much better than the Neural Network, Deep Neural Network (DNN), and Gradient Boosted Trees (GBT) models, with 91.71% accuracy, 88.88% precision, 82.75% memory, and 85.70% F1-score. Comparative study showed that CNN was the most accurate model. Other models had different balances between accuracy and recall. The experiment results show that the optional CNN model is a decent way to identify cardiovascular disease. This means that it could be used in healthcare systems to find diseases earlier and treat patients better.

1. Introduction

The heart of a person is a very important part. Getting blood to all parts of the body is necessary. In minutes, the brain and other systems will stop working, which means the person will die. A lot of heart diseases are becoming more likely because people are changing how they live, being stressed at work, and living poorly. Now, heart and blood vessel problems are a big part of why people die all over the world. The WHO reports that 17.7 million people die every year from heart disease. It accounts for one-third of all fatalities globally. Worldwide, cardiovascular illnesses and strokes continue to rank high among public health concerns and the leading causes of mortality. There are many types of CVD, but IHD and hit are the two most common and important ones that cause death and health loss around the world. They are constantly ranked as the main reasons for CVD-related health problems [1]. Heart disease accounts for 75% of all death rate in low and central income countries (LMICs) [2]. Their GDP drops by 7% as a result, which is bad news for their budgets. CVD costs have shifted from high-income to low- and middle-income countries, according to [3]. This is still the case even though fewer risk factors are present in low- and middle-income nations [4]. Variable risk factors for cardiovascular disease include tobacco use, poor nutrition, obesity, insufficient physical activity, and heavy alcohol consumption [5]. Reducing the prevalence of heart disease can be achieved by adopting the lifestyle adjustments indicated before [6]. Identifying cardiovascular illness at an early stage is vital for its successful treatment [7]. For instance, by giving the right medicine and counsel at the right moments, patients can improve their outcome and regain control of their illness [8].

ML-based methods have been used in medicine. But scientists are still trying to find ways to make the method better. For example, ensemble learning [9] has been shown to make jobs better. The work aims to find heart problems using a system that learns on its own. An ensemble classifier is a group of separate classifiers that work together using a method that can predict the future, like majority vote. In the healthcare business, ML can speed up data processing and analysis [10], and it can also help doctors make better predictions for their patients [11]. Ml allows for the training of predictive analytics algorithms on massive datasets. When they are used, they can do more in-depth analysis on a number of factors with little change [12].

1.1. Motivation and Contribution of Paper

The development of reliable and practical prediction models is crucial for the early detection and treatment of heart disease, which is a leading cause of people globally. Innovative ML methods, such as CNN and other DL models, can be used to help patients do better and make diagnoses more accurate. This project was started because it needed to make a reliable, automated method for classifying heart diseases that could be used by many people. The UCI Heart Disease Dataset is an example of this kind of data. The point of this study is to find better ways to prepare data, choose features, and evaluate models so that AI-powered healthcare detection systems can be made that will help doctors make decisions by quickly and correctly finding cardiovascular illness.

Created a systematic ML pipeline that predicts heart disease using features extracted from the UCI Heart Disease Dataset beginning from data acquisition up to evaluation.
A method to improve the accuracy of predictions was proposed that involved a systematic approach to data gathering, preprocessing, feature selection, training the model, and evaluation.
The dataset quality improved through processing methods that dealt with missing values and encoded categorical data and identified and eliminated duplicate records. The team applied techniques for reducing dimensions while selecting features to make the model both more effective and easier to understand.
Developed a CNN-based DL architecture tailored for cardiovascular disease classification. Employed the sigmoid activation function to facilitate binary classification of heart disease presence.
Conducted a comparative study demonstrating that CNN outperforms NN, DNN, and GBT models in accuracy and overall predictive capability.

1.2. Novality and Justification

This research explores a new method for cardiovascular disease classification through the use of CNN on UCI Heart Disease Dataset while applying improved preprocessing procedures along with selection features and performance enhancement measures. Unlike traditional ML models, the proposed CNN model effectively captures intricate patterns within the dataset, enhancing predictive accuracy. The novelty lies in the combination of dimensionality reduction, recursive feature elimination, and DL for robust classification. The CNN model achieves a good balance of recall, accuracy, and F1-score when compared to other models, such as NN, DNN, and GBT. This research justifies the adoption of CNN for heart disease detection by demonstrating its superior classification capability, which can facilitate early diagnosis and improve patient outcomes in healthcare applications.

1.3. Structure of the paper

The study is organized in the subsequent way: In Section II, related work is shown. Heart problems and ML in healthcare. Section III goes into more depth about the steps, methods, and materials that were used. In section IV, the results are shown, analyzed, and talked about in terms of the suggested system. Section V is the end and lists what needs to be done next.

2. Literature of Review

This segment provides a synopsis of the reviewed literature, which centers on artificial intelligence (AI) frameworks for automated threat intelligence detection and its use in cybersecurity designs.

Maiga, Hungilo and Pranowo (2019) This research aims to evaluate various ML models for cardiovascular risk factor-based heart disease prediction. The dataset has 70,000 records of patients and comes from Kaggle ML contests. RF, NB, KNN, and LR are the ML methods that were used in this research. The tests show that Random Forest is very good at classifying things, with 73% accuracy, 65% precision, and 80% sensitivity [13].

Talab et al.(2019) describe a safe and inexpensive way to find heart problems using cell phones that most people these days have access to. To teach the neural network, a collection of recorded human heart sounds and the diagnoses that go with them is used. Further, neural network fine-tuning methods like ADAM Regularization is used to make the process of making predictions smoother. The suggested method is tried on 5- to 8-second heart sound signals and was found to work 94.2% of the time on the validation set [14].

Atallah and Al-Mousa (2019) provides a way to use a majority vote ensemble to ascertain the likelihood of cardiac disease. They use simple, low-cost medical tests that any local clinic can administer to find out the result. A secondary objective of this endeavor is to instill greater trust in medical professionals' conclusions. This is due to the fact that the program's training data consisted of actual individuals in various health states. The model can give more right answers than if it only had one model because it has several models that vote on how to classify the patient. Finally, this method worked 90% of the time when the hard-voting ensemble model was used [15].

Nita, Bitam and Mellouk (2018) A novel technique called simulated annealing (SA) was developed to improve the random forest method and determine the best quantity of trees. It is the purpose of this algorithm to correctly categorize the ECG signal. Using the extended random forest method for classification, preprocessing and denoising ECG data, and extracting features are the four main components of this system. Two datasets from the UCI ML library were utilized: the USA Heart Disease Data Set and the Arrhythmia Data Set. Physio net ST-T and MIT/BIH, two prominent European datasets, were also utilized. It was found that the improved random forest can group things 99.62% of the time when the right number of trees are used [16].

Alić, Gurbeta and Badnjević (2017) The paper summarizes the current state of the art in using ANN and BN for diabetes and CVD classification. In this comparative study, publications published between 2008 and 2017 were chosen at random. Some research commonly uses ANNs of the multilayer feedforward variety trained using the Levenberg-Marquardt approach. There are, however, a lot of different types of BNs. They can correctly diagnose CVD (99.51%) and diabetes (97.92%) more often than any other test [17].

Casas et al. (2016) MIT-BIH's 108,653 ECG classified beats were broken down into 80 different traits that were used to sort the beats into Normal, PVC, and other types. Three ML-based classifier systems were put through a series of tests using different factors to see how well they worked. This gave us a total of 14 models, the F1 score, and the ability to predict the future. Their tests show that the F1 scores could be close to the unit level, which means the models are better than 93% of the time [18].

The Table 1 below shows the literature review summary of cardiovascular disease in healthcare on different studies, methods, datasets used, their key findings, and their limitations and future work.

3. Methodology

The main purpose of this research involves machine learning model creation and evaluation for healthcare-related cardiovascular disease prediction tasks. The main objective assesses the performance of various algorithms when classifying heart disease cases with UCI heart disease dataset. The proposed methodology for cardiovascular disease in healthcare, as illustrated in Figure 1, provides a Flowchart of cardiovascular disease in healthcare, which begins with acquiring the UCI Heart disease Dataset, which contains labeled transaction data. Classification of cardiac diseases is carried out using a pipeline-based methodology that makes use of the UCI Heart Disease Dataset. Data collection marks the first step when raw data needs acquisition. Preprocessing follows data collection through operations like data labeling coupled with value completion for missing data and duplicate entry elimination and dimensional reduction techniques to strengthen data quality. The next step for model efficiency enhancement involves performing feature selection to retain the most important attributes. Two parts of the information are made: 80% is used for teaching and 20% is used for testing. An CNN model can be used to classify data by finding small trends in it. The model is given binary classification odds by the sigmoid activation function. To make sure the model works well, it is carefully checked using main evaluation measures like F1-score, memory, precision, accuracy, and loss. This all-encompassing method delivers accurate labelling, making it easier to find heart disease early.

3.1. Data Collection

The dataset that was collected from Kaggle had data on 1050 cases and had 76 attributes 14 of the 76 traits were used to figure out if someone would get heart disease. For this reason, these characteristics have a bigger effect on the disease than the others. Before the data is put into groups, it is filtered and cleaned to get rid of any duplicate or missing numbers. The data was then randomly split into two groups: 205 records were kept for testing and 820 records were kept for training. The validation data were then used to see how well the model worked.

The following section provides a brief explanation of each step shown in the flowchart:

3.2. Data Analysis and Visualization

Data visualization highlights the significance of certain attributes in diagnosing heart disease. These visual insights help in understanding the critical thresholds that contribute to heart disease diagnosis.

Figure 2, a heatmap is used to show the size and direction of linear relationships between quantitative features in a correlation matrix. The color scale shows the strength of the correlations. Darker shades show weaker relationships, while lighter shades show stronger positive associations. This picture helps you see how different features are connected and how they affect the main variable. This makes it easy to pick out the most connected features for further study.

3.3. Data Preprocessing

Building trustworthy detection models, particularly when comparing them, requires data processing as a basic step. It's critical to supply consistent data to the models so that when their performance is measured, the models themselves are evaluated, not the manner in which the data was supplied. Below is a list of the pre-processing actions:

Check null values: Null values in a dataset pose challenges to data analysis and model accuracy, necessitating appropriate handling methods like imputation, removal, or substitution to maintain data quality and ensure robust predictive performance [19].
Remove duplicate values: Take out any numbers that are missing or duplicates from the dataset. Only cases of the MAR type with missing values are deleted, along with rows that have a lot of empty entries.
Data labelling: The dataset has categorical factors that need to be dealt with before any ML method can be used. It need to split each group into dummy columns in order to work with categorical variables. The get dummies method was used to do this. This set of data has been enlarged.

3.4. Feature selection

Feature selection cuts down on training time and boosts performance by picking out relevant characteristics and leaving out those that don't predict the goal variable very well. The recursive feature elimination algorithm performs selection by building successive models which evaluate various feature combinations for identifying the optimal subset.

3.5. Data Splitting

A random number generator divided the information into two groups, one for training and one for testing. The testing data corresponded eighty percent of the whole and the training data consisted of twenty percent. The training of CNN relied on using training data whereas test data helped evaluate its operational performance.

3.6. Classification of Convolutional Neural Network (CNN) Model

A deep learning network called the CNN has been very important in computer vision (CV). Figure 3 shows a famous CNN called LeNet-5 [20], Yann Lecun developed this architecture with its convolution layer combined with pooling layer while including multiple MLP layers:

The convolution kernel's job in the convolution layer is to filter out features in input data, often images. Through Equation (1) it find the pixel located at y in the pth convolution layer for the fth filter L p f (y). The L p−1 c (x) expression reveals the pixel located at position x which belongs to the cth channel of the preceding (p−1)th convolution layer. The initial m pixels of the filter W p f exist within the ith component. The ith filter in the pth convolution layer has that value. The number of elements in that filter is t, and the bias of that filter is bf. Crucial to understanding convolution [21] is the convolution kernel. There are some parameters in the kernel: one is the size of the kernel, e.g., S ∗ S ∗ C ∗ N, where S is the size of the kernel, C is the depth of the kernel, and N is the number of kernels. The C is often determined by the depth of the input data for example, an RGB image whose depth is three.

L_{f}^{p} (y) = \sum_{c = 1}^{n} \sum_{i = 1, x = m}^{i = t, x = m + t} (W_{f}^{p} (i) \times L_{c}^{p - 1} (x)) + b^{f}

(1)

In CNN, the pooling layer is mostly used to reduce the amount of data that is sampled and the sharpness of the images. After the first step of convolution, the picture data size will grow a lot. The pooling layer can be used to shrink this size. Equation (1) shows max pooling, which is one of the most popular ways to pool. This method involves picking a region to be the size of the max pool [22] and then finding the highest value in that area to be the final value. Moving the max pool to the spot next to it and starting the first part of the computation all over again is the next step. lastly, the best or biggest numbers are added together to make a new matrix. This is done by the layer that pools. Path and patw in Equation (2) stand for the convolution's breadth and height. M stands for the convolution's height and width, and L(x) stands for its result.

M_{c}^{y} (y) = \max (L_{f}^{s} (x)) f o r x = 1, 1 t o {{p a t}_{h}, p a t}_{w})

(2)

This layer, called multilayer perceptron (MLP), sorts the CNN model's data into groups. A number of convolution and pooling layer methods will be used on the input data to make the extracted feature map. With one-dimensional data, MLP makes a mapping link between the feature data and the output data of N groups. Equation (3) explains how each layer of a perceptron works in MLP.

O_{l} = \sum_{K = 1}^{K} (O_{l - 1} \times k . l) + b_{l}

(3)

The k-th input feature's weight (W), its bias (bl), and the output (Ol-1) from the layer before it is all variables in that layer. The last layer's action is denoted by Ol, and K is the total number of traits added to the system.

Through hyperparameters, the sigmoid function maps numbers between 0 and 1 into an S-shaped mathematical function. It is often used as an activation function in neural networks [23], binary classification's output layer in particular. The study's data was transformed into probability ratings for predicting heart disease using the sigmoid function in the last layer of the CNN model.

3.7. Performance Metrics

This section goes into more depth about the test results that were collected about performance. It will talk about the success parameters that were used in this study after it have learnt more about them. All factors became detectable by evaluating the FN, FP, TP and TN results. These data reveal the percentages of test occurrences that led to correct identification of false negative and true positive outcomes together with correct identification of false positive and true negative results. The numbers P and N stand for the number of sampled positive and negative class examples, respectively. What kinds of cases are really bad? The things they do should be marked as bad. What kinds of cases are really good? They should have been put in the positive category, and they are positive [24]. The performance evaluation of the classifier relies on four metrics including accuracy, precision, recall and f1-score.

Accuracy: One technique to evaluate a machine learning program’s data recognition ability is to look at the relation of right predictions to total predictions. Equation (4) makes this clear:

A c c u r a c y = \frac{T P + T N}{T P + F p + T N + F N}

(4)

Precision: Precision describes the relationship between correctly predicted cases and all input data points. To put it another way, it has good predictive value. It can use Equation (5) to find the accuracy:

P r e c i s i o n = \frac{T P}{T P + F P}

(5)

Recall: Recall represents the proportion of accurate class predictions among all class inputs, and it also goes by the name of sensitivity. There is a way to find out how full the classifier is. Equation (6) can be used to figure out the recall:

R e c a l l = \frac{T P}{T P + F N}

(6)

F1-score: Selecting a model with good accuracy but poor recall, or vice versa, could be challenging when comparing several models. Both recall and precision make up the F1 score. Equation (7) can be used to find it:

F 1 - S c o r e = 2 \times \frac{(P r e c i s i o n * R e c a l l)}{P r e c i s i o n + R e a l l}

(7)

4. Result Analysis and Discussion

Hardware and software both backed up the results of the test. The research utilized a 2.50 GHz Intel(R) Core (TM) i5-2520M CPU and 12 GB RAM computer system with associated tools situated in the nearby testing laboratory. The computer possessed abundant memory together with sufficient processing strength to accomplish tasks efficiently. In this case, the suggested CNN model was tested against the UCI heart illness database. Table 2 shows the UCI dataset on heart problems. The CNN model is used in healthcare to look at it and make detailed predictions about heart disease.

Figure 4 shows the outcomes of using CNN models on the UCI Heart Disease dataset. The model's accuracy (91.71%), recall (82.75%), precision (88.88%), and F1-score (85.7%). Figure 5, which is a visual representation of the model evaluation elements, shows that the model has significant predictive capacity, with high rates of accuracy and precision. The recall and F1-score values indicate the model maintains exact proportionality between sensitivity and specificity values.

Figure 5 presents the results of training and validation accuracy changes that occurred during the training duration. The accuracy is displayed on the y-axis which corresponds to the x-axis showing years in this line chart. It shows two different shapes, one in purple for training accuracy and one in green for validation accuracy. The legend positioned at the top-left explains both curves as "accuracy" and "val_accuracy." The graphical depiction represents the relation between training accuracy and valid accuracy which the title "Train Accuracy vs. Valid Accuracy" describes. Both training and validation accuracy metrics show continuous growth during the epochs while validation accuracy displays brief drops in its pattern. The plot includes grids that improve visibility.

The representation in Figure 6 visualizes training and validation loss variations during all training epochs through a line graph. The chart displays loss amounts on the y-axis together with the number of epochs on the x-axis. Validation loss has a purple dashed line while training loss has a green one. The two shapes are distinct. In the upper right area, a legend says that these curves are "loss" and "val_loss." The title of the graph, "Train Loss vs. Valid Loss," tells you what the link is. As the number of epochs goes up, the trends in the figure show that both training loss and validation loss go down overall, with validation loss going up and down. Having a grid makes things easier to see.

A suggested model's efficacy relative to that of a random predictor is illustrated by the ROC curve in Figure 7. On one side, the TPR, and on the other, the FPR. The orange curve represents the proposed model's efficacy, whereas the blue diagonal line depicts a random classifier. The model outperforms guesswork in terms of identification accuracy (AUC) at around 91.0. A higher true positive rate is achieved, despite changes in false positive rates, because the model is able to balance specificity and sensitivity, as seen by the higher location of the ROC curve.

Figure 8 can observe the uncertainty matrix of the CNN model. It demonstrates the machine's proficiency in classifying objects into sets. In this grid, TN= 29, FN= 5, TP= 24, and FN= 5 make up the individual cells. The diagonal parts TN and TP show if the examples were correctly categorized or not. The off-diagonal parts FP and FN show if cases were put in the wrong category. The strength of the colors shows how often something takes place, with darker colors showing more instances.

4.1. Comparative Analysis and Discussion

This section provides the proposed model CNN comparison with existing model NN [25], DNN [26], GBT [27] performance on the same dataset. The findings in Table 3 show the model comparisons used as evidence.

This work evaluated different ML and DL models on their ability to predict cardiovascular diseases when using the UCI heart disease dataset. Among all factors F1-score together with accuracy and precision and memory utilization, played the most critical role. The CNN resulted as the highest-ranking model with 85.70 percent overall performance because of its efficiency score at 91.71 and accuracy score of 91.71 and precision score at 88.88 and memory performance score at 82.75. The NN model reached outstanding results by demonstrating 86.9% accuracy and 86.4% precision. Both its memory and F1-score were 84.7% and 86.4%, respectively. An F1-score of 85.71%, recall of 93.51%, accuracy of 83.67%, and precision of 79.12% were the results obtained by the DNN in the study. The model with the highest precision score (94.1%), memory value (80.7%), F1-score (86.8%), and accuracy (78.3%) was the GBT model. According to the results, CNN offers the best overall accuracy ratings; although, the precision and memory capacities of the different models must be taken into account for each application.

A CNN model for cardiovascular disease prediction includes multiple key benefits in its structure. The developed prediction model reached an accuracy level of 91.71% to guarantee dependable and efficient analysis. The system automatically finds complex data patterns directly from the input instead of demanding manual feature development which enlarges processing efficiency. The system demonstrates a balanced combination of precision along with recall measures and F1-score which enables it to correctly manage both false-positive and false-negative cases. Through a systematic workflow including data cleanup and selecting features and evaluating models the system performance reaches new heights. Healthcare facilities benefit from using CNN models which provide superior performance than Neural Networks and DNN and GBT and serve as an effective tool for identifying heart diseases in the early stages.

5. Conclusion and Future Direction

A group of classifiers known as ML classifiers predicts whether heart problems exist. The research data originated from UCI repository. The examination of heart disease using the UCI dataset establishes CNN effectiveness for disease identification by achieving 91.71% accuracy. The particular structured framework beginning with data collection followed by preprocessing and feature selection before classification achieves the best model results. Only the CNN delivers more accurate predictions than other relevant models such as DNN, Gradient Boosted Trees and Neural Networks despite its adept precision/recall balance. The achieved results demonstrate that CNN provides maximum accuracy but different models give distinct trade-offs between precision and recall levels depending on specific application requirements. The research demonstrates that deep learning is crucial for medical practice because it enables prompt disease identification and enhances precision in medical diagnoses which leads to immediate healthcare responses.

In the future, more datasets will be used to try to get more reliable results. Metaheuristics and algorithms inspired by nature will be utilized to enhance ML classifiers and deep learning techniques for improved cardiac disease screening using a range of condition-related datasets. Additionally, the current algorithms' accuracy will be enhanced. “

References

M. Woodward, “Cardiovascular disease and the female disadvantage,” International Journal of Environmental Research and Public Health. 2019. doi: 10.3390/ijerph16071165.[CrossRef] [PubMed]
M. I. Qrenawi and W. Al Sarraj, “Identification of Cardiovascular Diseases Risk Factors among Diabetes Patients Using Ontological Data Mining Techniques,” in Proceedings - 2018 International Conference on Promising Electronic Technologies, ICPET 2018, 2018. doi: 10.1109/ICPET.2018.00030.[CrossRef]
R. Casas, S. Castro-Barquero, R. Estruch, and E. Sacanella, “Nutrition and cardiovascular health,” International Journal of Molecular Sciences. 2018. doi: 10.3390/ijms19123988.[CrossRef] [PubMed]
C. N. Zhao et al., “Fruits for prevention and treatment of cardiovascular diseases,” Nutrients. 2017. doi: 10.3390/nu9060598.[CrossRef] [PubMed]
S. N. Yoon and D. H. Lee, “Artificial intelligence and robots in healthcare: What are the success factors for technology-based service encounters?,” International Journal of Healthcare Management. 2019. doi: 10.1080/20479700.2018.1498220.[CrossRef]
S. C. Thompson, L. Nedkoff, J. Katzenellenbogen, M. A. Hussain, and F. Sanfilippo, “Challenges in managing acute cardiovascular diseases and follow up care in rural areas: A narrative review,” International Journal of Environmental Research and Public Health. 2019. doi: 10.3390/ijerph16245126.[CrossRef] [PubMed]
A. Ravera et al., “Nutrition and cardiovascular disease: Finding the perfect recipe for cardiovascular health,” Nutrients. 2016. doi: 10.3390/nu8060363.[CrossRef] [PubMed]
E. O. Olaniyi, O. K. Oyedotun, and K. Adnan, “Heart Diseases Diagnosis Using Neural Networks Arbitration,” Int. J. Intell. Syst. Appl., 2015, doi: 10.5815/ijisa.2015.12.08.[CrossRef]
J. A. M. Sidey-Gibbons and C. J. Sidey-Gibbons, “Machine learning in medicine: a practical introduction,” BMC Med. Res. Methodol., 2019, doi: 10.1186/s12874-019-0681-4.[CrossRef] [PubMed]
D. Sorriento and G. Iaccarino, “Inflammation and cardiovascular diseases: The most recent findings,” International Journal of Molecular Sciences. 2019. doi: 10.3390/ijms20163879.[CrossRef] [PubMed]
M. Ganesan and N. Sivakumar, “IoT based heart disease prediction and diagnosis model for healthcare using machine learning models,” in 2019 IEEE International Conference on System, Computation, Automation and Networking, ICSCAN 2019, 2019. doi: 10.1109/ICSCAN.2019.8878850.[CrossRef] [PubMed]
A. Golande and T. Pavan Kumar, “Heart disease prediction using effective machine learning techniques,” Int. J. Recent Technol. Eng., 2019.
J. Maiga, G. G. Hungilo, and Pranowo, “Comparison of Machine Learning Models in Prediction of Cardiovascular Disease Using Health Record Data,” in Proceedings - 1st International Conference on Informatics, Multimedia, Cyber and Information System, ICIMCIS 2019, 2019. doi: 10.1109/ICIMCIS48181.2019.8985205.[CrossRef]
E. Talab, O. Mohamed, L. Begum, F. Aloul, and A. Sagahyroon, “Detecting heart anomalies using mobile phones and machine learning,” in Proceedings - 2019 IEEE 19th International Conference on Bioinformatics and Bioengineering, BIBE 2019, 2019. doi: 10.1109/BIBE.2019.00083.[CrossRef]
R. Atallah and A. Al-Mousa, “Heart Disease Detection Using Machine Learning Majority Voting Ensemble Method,” in 2019 2nd International Conference on New Trends in Computing Sciences, ICTCS 2019 - Proceedings, 2019. doi: 10.1109/ICTCS.2019.8923053.[CrossRef]
S. Nita, S. Bitam, and A. Mellouk, “An Enhanced Random Forest for Cardiac Diseases Identification based on ECG signal,” in 2018 14th International Wireless Communications and Mobile Computing Conference, IWCMC 2018, 2018. doi: 10.1109/IWCMC.2018.8450361.[CrossRef]
B. Alić, L. Gurbeta, and A. Badnjević, “Machine learning techniques for classification of diabetes and cardiovascular diseases,” in 2017 6th Mediterranean Conference on Embedded Computing, MECO 2017 - Including ECYPS 2017, Proceedings, 2017. doi: 10.1109/MECO.2017.7977152.[CrossRef] [PubMed]
M. M. Casas, R. L. Avitia, M. A. Reyna, and A. Cardenas, “Evaluation of three machine learning algorithms as classifiers of premature ventricular contractions on ECG beats,” in 2016 Global Medical Engineering Physics Exchanges/Pan American Health Care Exchanges, GMEPE/PAHCE 2016, 2016. doi: 10.1109/GMEPE-PAHCE.2016.7504615.[CrossRef] [PubMed]
P. Singh, S. Singh, and G. S. Pandi-Jain, “Effective heart disease prediction system using data mining techniques,” Int. J. Nanomedicine, 2018, doi: 10.2147/IJN.S124998.[CrossRef] [PubMed]
A. D’Acremont, R. Fablet, A. Baussard, and G. Quin, “CNN-based target recognition and identification for infrared imaging in defense systems,” Sensors (Switzerland), 2019, doi: 10.3390/s19092040.[CrossRef] [PubMed]
X. Xu, H. Zheng, Z. Guo, X. Wu, and Z. Zheng, “SDD-CNN: Small data-driven convolution neural networks for subtle roller defect inspection,” Appl. Sci., 2019, doi: 10.3390/app9071364.[CrossRef]
J. Lee, J. Kim, I. Kim, and K. Han, “Cyber Threat Detection Based on Artificial Neural Networks Using Event Profiles,” IEEE Access, 2019, doi: 10.1109/ACCESS.2019.2953095.[CrossRef]
Md. Fazle Rabbi et al., “Performance Evaluation of Data Mining Classification Techniques for Heart Disease Prediction,” Am. J. Eng. Res., 2018.
N. Khare and S. Yunus Sait, “Credit Card Fraud Detection Using Machine Learning Models and Collating Machine Learning Models,” Int. J. Pure Appl. Math., vol. 118, no. 20, pp. 825–838, 2018.
H. A. Esfahani and M. Ghazanfari, “Cardiovascular Disease Detection using a New Ensemble Classifier,” in 2017 IEEE 4Th International Conference on Knowledege Based Engineering and Innovation (KBEI), 2017.[CrossRef]
K. H. Miao and J. H. Miao, “Coronary heart disease diagnosis using deep neural networks,” Int. J. Adv. Comput. Sci. Appl., vol. 9, no. 10, pp. 1–8, 2018, doi: 10.14569/IJACSA.2018.091001.[CrossRef]
S. Mohan, C. Thirumalai, and G. Srivastava, “Effective heart disease prediction using hybrid machine learning techniques,” IEEE Access, 2019, doi: 10.1109/ACCESS.2019.2923707.[CrossRef]
Kuraku, D. S., Kalla, D., Smith, N., & Samaah, F. (2023). Safeguarding FinTech: elevating employee cybersecurity awareness in financial sector. International Journal of Applied Information Systems (IJAIS), 12(42).
Kalla, D., Smith, N., Samaah, F., & Polimetla, K. (2022). Enhancing Early Diagnosis: Machine Learning Applications in Diabetes Prediction. Journal of Artificial Intelligence & Cloud Computing. SRC/JAICC-205. DOI: doi. org/10.47363/JAICC/2022 (1), 191, 2-7.[CrossRef]
Kuraku, S., Kalla, D., Samaah, F., & Smith, N. (2023). Cultivating proactive cybersecurity culture among IT professional to combat evolving threats. International Journal of Electrical, Electronics and Computers, 8(6).[CrossRef]
Kuraku, D. S., & Kalla, D. (2023). Phishing Website URL’s Detection Using NLP and Machine Learning Techniques. Journal on Artificial Intelligence-Tech
Kalla, D., & Chandrasekaran, A. (2023). Heart disease prediction using machine learning and deep learning. International Journal of Data Mining & Knowledge Management Process (IJDKP), 13(3). Science.[CrossRef]

[R1] M. Woodward, “Cardiovascular disease and the female disadvantage,” International Journal of Environmental Research and Public Health. 2019. doi: 10.3390/ijerph16071165.[CrossRef] [PubMed]

[R2] M. I. Qrenawi and W. Al Sarraj, “Identification of Cardiovascular Diseases Risk Factors among Diabetes Patients Using Ontological Data Mining Techniques,” in Proceedings - 2018 International Conference on Promising Electronic Technologies, ICPET 2018, 2018. doi: 10.1109/ICPET.2018.00030.[CrossRef]

[R3] R. Casas, S. Castro-Barquero, R. Estruch, and E. Sacanella, “Nutrition and cardiovascular health,” International Journal of Molecular Sciences. 2018. doi: 10.3390/ijms19123988.[CrossRef] [PubMed]

[R4] C. N. Zhao et al., “Fruits for prevention and treatment of cardiovascular diseases,” Nutrients. 2017. doi: 10.3390/nu9060598.[CrossRef] [PubMed]

[R5] S. N. Yoon and D. H. Lee, “Artificial intelligence and robots in healthcare: What are the success factors for technology-based service encounters?,” International Journal of Healthcare Management. 2019. doi: 10.1080/20479700.2018.1498220.[CrossRef]

[R6] S. C. Thompson, L. Nedkoff, J. Katzenellenbogen, M. A. Hussain, and F. Sanfilippo, “Challenges in managing acute cardiovascular diseases and follow up care in rural areas: A narrative review,” International Journal of Environmental Research and Public Health. 2019. doi: 10.3390/ijerph16245126.[CrossRef] [PubMed]

[R7] A. Ravera et al., “Nutrition and cardiovascular disease: Finding the perfect recipe for cardiovascular health,” Nutrients. 2016. doi: 10.3390/nu8060363.[CrossRef] [PubMed]

[R8] E. O. Olaniyi, O. K. Oyedotun, and K. Adnan, “Heart Diseases Diagnosis Using Neural Networks Arbitration,” Int. J. Intell. Syst. Appl., 2015, doi: 10.5815/ijisa.2015.12.08.[CrossRef]

[R9] J. A. M. Sidey-Gibbons and C. J. Sidey-Gibbons, “Machine learning in medicine: a practical introduction,” BMC Med. Res. Methodol., 2019, doi: 10.1186/s12874-019-0681-4.[CrossRef] [PubMed]

[R10] D. Sorriento and G. Iaccarino, “Inflammation and cardiovascular diseases: The most recent findings,” International Journal of Molecular Sciences. 2019. doi: 10.3390/ijms20163879.[CrossRef] [PubMed]

[R11] M. Ganesan and N. Sivakumar, “IoT based heart disease prediction and diagnosis model for healthcare using machine learning models,” in 2019 IEEE International Conference on System, Computation, Automation and Networking, ICSCAN 2019, 2019. doi: 10.1109/ICSCAN.2019.8878850.[CrossRef] [PubMed]

[R12] A. Golande and T. Pavan Kumar, “Heart disease prediction using effective machine learning techniques,” Int. J. Recent Technol. Eng., 2019.

[R13] J. Maiga, G. G. Hungilo, and Pranowo, “Comparison of Machine Learning Models in Prediction of Cardiovascular Disease Using Health Record Data,” in Proceedings - 1st International Conference on Informatics, Multimedia, Cyber and Information System, ICIMCIS 2019, 2019. doi: 10.1109/ICIMCIS48181.2019.8985205.[CrossRef]

[R14] E. Talab, O. Mohamed, L. Begum, F. Aloul, and A. Sagahyroon, “Detecting heart anomalies using mobile phones and machine learning,” in Proceedings - 2019 IEEE 19th International Conference on Bioinformatics and Bioengineering, BIBE 2019, 2019. doi: 10.1109/BIBE.2019.00083.[CrossRef]

[R15] R. Atallah and A. Al-Mousa, “Heart Disease Detection Using Machine Learning Majority Voting Ensemble Method,” in 2019 2nd International Conference on New Trends in Computing Sciences, ICTCS 2019 - Proceedings, 2019. doi: 10.1109/ICTCS.2019.8923053.[CrossRef]

[R16] S. Nita, S. Bitam, and A. Mellouk, “An Enhanced Random Forest for Cardiac Diseases Identification based on ECG signal,” in 2018 14th International Wireless Communications and Mobile Computing Conference, IWCMC 2018, 2018. doi: 10.1109/IWCMC.2018.8450361.[CrossRef]

[R17] B. Alić, L. Gurbeta, and A. Badnjević, “Machine learning techniques for classification of diabetes and cardiovascular diseases,” in 2017 6th Mediterranean Conference on Embedded Computing, MECO 2017 - Including ECYPS 2017, Proceedings, 2017. doi: 10.1109/MECO.2017.7977152.[CrossRef] [PubMed]

[R18] M. M. Casas, R. L. Avitia, M. A. Reyna, and A. Cardenas, “Evaluation of three machine learning algorithms as classifiers of premature ventricular contractions on ECG beats,” in 2016 Global Medical Engineering Physics Exchanges/Pan American Health Care Exchanges, GMEPE/PAHCE 2016, 2016. doi: 10.1109/GMEPE-PAHCE.2016.7504615.[CrossRef] [PubMed]

[R19] P. Singh, S. Singh, and G. S. Pandi-Jain, “Effective heart disease prediction system using data mining techniques,” Int. J. Nanomedicine, 2018, doi: 10.2147/IJN.S124998.[CrossRef] [PubMed]

[R20] A. D’Acremont, R. Fablet, A. Baussard, and G. Quin, “CNN-based target recognition and identification for infrared imaging in defense systems,” Sensors (Switzerland), 2019, doi: 10.3390/s19092040.[CrossRef] [PubMed]

[R21] X. Xu, H. Zheng, Z. Guo, X. Wu, and Z. Zheng, “SDD-CNN: Small data-driven convolution neural networks for subtle roller defect inspection,” Appl. Sci., 2019, doi: 10.3390/app9071364.[CrossRef]

[R22] J. Lee, J. Kim, I. Kim, and K. Han, “Cyber Threat Detection Based on Artificial Neural Networks Using Event Profiles,” IEEE Access, 2019, doi: 10.1109/ACCESS.2019.2953095.[CrossRef]

[R23] Md. Fazle Rabbi et al., “Performance Evaluation of Data Mining Classification Techniques for Heart Disease Prediction,” Am. J. Eng. Res., 2018.

[R24] N. Khare and S. Yunus Sait, “Credit Card Fraud Detection Using Machine Learning Models and Collating Machine Learning Models,” Int. J. Pure Appl. Math., vol. 118, no. 20, pp. 825–838, 2018.

[R25] H. A. Esfahani and M. Ghazanfari, “Cardiovascular Disease Detection using a New Ensemble Classifier,” in 2017 IEEE 4Th International Conference on Knowledege Based Engineering and Innovation (KBEI), 2017.[CrossRef]

[R26] K. H. Miao and J. H. Miao, “Coronary heart disease diagnosis using deep neural networks,” Int. J. Adv. Comput. Sci. Appl., vol. 9, no. 10, pp. 1–8, 2018, doi: 10.14569/IJACSA.2018.091001.[CrossRef]

[R27] S. Mohan, C. Thirumalai, and G. Srivastava, “Effective heart disease prediction using hybrid machine learning techniques,” IEEE Access, 2019, doi: 10.1109/ACCESS.2019.2923707.[CrossRef]

[R28] Kuraku, D. S., Kalla, D., Smith, N., & Samaah, F. (2023). Safeguarding FinTech: elevating employee cybersecurity awareness in financial sector. International Journal of Applied Information Systems (IJAIS), 12(42).

[R29] Kalla, D., Smith, N., Samaah, F., & Polimetla, K. (2022). Enhancing Early Diagnosis: Machine Learning Applications in Diabetes Prediction. Journal of Artificial Intelligence & Cloud Computing. SRC/JAICC-205. DOI: doi. org/10.47363/JAICC/2022 (1), 191, 2-7.[CrossRef]

[R30] Kuraku, S., Kalla, D., Samaah, F., & Smith, N. (2023). Cultivating proactive cybersecurity culture among IT professional to combat evolving threats. International Journal of Electrical, Electronics and Computers, 8(6).[CrossRef]

[R31] Kuraku, D. S., & Kalla, D. (2023). Phishing Website URL’s Detection Using NLP and Machine Learning Techniques. Journal on Artificial Intelligence-Tech

[R32] Kalla, D., & Chandrasekaran, A. (2023). Heart disease prediction using machine learning and deep learning. International Journal of Data Mining & Knowledge Management Process (IJDKP), 13(3). Science.[CrossRef]

An Analysis of Performance and Comparison of Models for Cardiovascular Disease Prediction via Machine Learning Models in Healthcare

Abstract

1. Introduction

1.1. Motivation and Contribution of Paper

1.2. Novality and Justification

1.3. Structure of the paper

2. Literature of Review

3. Methodology

3.1. Data Collection

3.2. Data Analysis and Visualization

3.3. Data Preprocessing

3.4. Feature selection

3.5. Data Splitting

3.6. Classification of Convolutional Neural Network (CNN) Model

3.7. Performance Metrics

4. Result Analysis and Discussion

4.1. Comparative Analysis and Discussion

5. Conclusion and Future Direction

References

Cite This Article

Information

About SCIPUB

Policies

Follow SCIPUB