Intelligent Detection of Injection Attacks via SQL Based on Supervised Machine Learning Models for Enhancing Web Security

Table 1.

Summary of the study on Detection of SQLInjection Attacks using machine learning

Author Proposed Work Dataset Key Findings Challenges/Gaps

Hasan, Balbahaith, and Tarique (2019) Developed a heuristic ML-based algorithm and GUI app using the top 5 of 23 classifiers 616 SQL statements Achieved 93.8% accuracy in detecting SQLi attacks Small dataset size; scalability to real-world scenarios not validated
Noor et al. (2019) suggested an arrangement based on semantic ML to connect risks and TTPs via probabilistic networks TTP taxonomy dataset (133 TTPs, 45 threat families) Detected threats with 92% accuracy; low false positives; 0.15s average detection time Specific to TTP-based threats; generalization to SQLi-specific detection not tested
Zhang (2019) Designed ML classifiers (CNN, MLP) to detect SQLi vulnerabilities in PHP code using code-level features PHP source code files CNN achieved 95.4% precision; MLP achieved 63.7% recall and F-measure of 0.746 Limited to PHP; varying performance across different classifiers
Ul Islam et al. (2019) Created a NoSQL injection supervised learning tool. detection with a novel dataset Custom-designed NoSQL injection dataset Achieved 0.93 F2-score; outperformed Sqreen by 36.25%; database-agnostic Limited availability of NoSQL datasets; manual feature engineering required
McWhirter et al. (2018) Gap-Weighted String Subsequence was used. Kernel + SVM on SQL query strings for classification Amnesia testbed datasets Achieved 97.07% (Select) and 92.48% (Insert) accuracy; adapted to unseen threats Lower accuracy with unsanitized quotation marks; sensitive to input anomalies
Chattopadhyay et al. (2018) examined the difficulties in implementing ML methods for identifying malware Multiple datasets (unspecified) Compared various ML techniques across datasets; summarized performance based on different metrics; identified optimal techniques for evolving patterns. Lack of clarity in dataset specifics; issues in defining and generalizing ML approaches to dynamic, real-world intrusion patterns; scalability concerns.