Journal of Artificial Intelligence and Big Data

Download PDF XML

An Analysis of Crime Prediction and Classification Using Data Mining Techniques

Table 1.

Summary of Crime Prediction andClassification Using Data Mining


Authors	Methods	Dataset	Key Findings	Limitations and Future Work

Almaw & Kadam (2018)	Random Tree, 1-ensemble, 3-ensemble, Statistical Analysis	Crime Dataset	Random Tree: 82.02% accuracy	More ensemble techniques needed and extend crime trend analysis.
Feng et.al.	Stateful LSTM with Keras and Prophet Model	Crime data (3 years training)	Compared to conventional neural network models, the Prophet model and Keras LSTM produced superior prediction results, which helped law enforcement allocate resources.	Further optimization of training dataset sizes and exploration of hybrid deep learning methods.
Crimes prediction using spatiotemporal data and kernel density estimation et.al.	Gradient Boosting Machine (GBM)	Spatiotemporal and zoning datasets	KDE with zoning district characteristics and smoothing improves model performance; achieved a multiclass logarithmic loss of 2.356104 on validation and 2.35443 on test sets.	Expand to real-time prediction applications and evaluate generalizability across various cities and regions.
Kim et.al.	Enhanced Decision Tree with K-Nearest Neighbour	Vancouver crime data (15 years)	The prediction accuracy of KNN and Boosted Decision Tree models varied between 39% and 44%.	Improve accuracy through advanced preprocessing, feature engineering, and incorporating contextual external data.
Almaw and Kadam et.al.	Naive Bayes, J48, and Random Tree	Experimented dataset	Random Tree outperformed others with 82.0227% accuracy. Ensemble models showed 81.6073% (1-ensemble) and 79.2353% (3-ensemble).	Limited focus on computational efficiency and need to explore ensemble models with novel classifiers.
Sivaranjani, Sivakumari and Aasha et.al.	K-Means and K-Nearest Neighbor (KNN)	Crime data visualized on Google Maps	K-Means clustering visualized with Google Maps enhances usability; KNN used for prediction and evaluated using precision, recall, and F-measure.	Need to refine spatial accuracy and investigate more advanced algorithms for geospatial clustering and prediction.