Open-Source Datasets for Recommender Systems Analysis
Table 2. Datasets – specification andavailability.
| Dataset | Specification | Availability |
| MovieLens | Collection of movie ratings with 27000 movies & 140000 users | https://grouplens.org/datasets/movielens/ |
| Jester | The joke rating system consists of 6 million ratings of 150 jokes | http://eigentaste.berkeley.edu/ |
| Book-Crossings | Book rating dataset with 1.1 M ratings (90000 users & 270000 books) | http://www2.informatik.uni-freiburg.de/~cziegler/BX/ |
| Last.fm | Music recommendations dataset with aggregated data. | https://grouplens.org/datasets/hetrec-2011/ |
| Wikipedia | Collaborative encyclopedia used for general applications. | https://en.wikipedia.org/wiki/Wikipedia:Database_download#English-language_Wikipedia |
| OpenStreetMap | Collaborative project for maps. | https://planet.openstreetmap.org/planet/full-history/ |
| Python Git Repositories | Git repositories Python code. | https://github.com/python |
| MovieLens 25M | Collection of movie ratings with 62423 movies & 25000095 ratings. | https://grouplens.org/datasets/movielens/25m/ |
| Social Network Influencer | Learning task preferences. | https://www.kaggle.com/c/predict-who-is-more-influential-in-a-social-network/data |
| Million Song | Audio features for music tracks. | http://millionsongdataset.com/ |
| Free Music Archive | Music analysis audio downloads. | https://github.com/mdeff/fma |
| Netflix Prize | Applied in the competition of Netflix Prize. | https://academictorrents.com/details/9b13183dc4d60676b773c9e2cd6de5e5542cee9a |
| Amazon Review | The reviews collection. | https://nijianmo.github.io/amazon/index.html |
| Yahoo! Music User Ratings | Musical artists collection preferences. | https://webscope.sandbox.yahoo.com/catalog.php?datatype=r&guce_referrer=aHR0cHM6Ly9naXRodWIuY29tL2Nhc2VyZWMvRGF0YXNldHMtZm9yLVJlY29tbWVuZGVyLVN5c3RlbXM&guce_referrer_sig=AQAAAAkyH74jyiIv4JxPjvejltL1_Sk-yDNtNAbIpHn2YfUnG1v-2mxj_XOD-qtpvdqg-aoNtTk9pzWVkYzz3ZbvN5C2_RrjVAowWPR7lmx-GidaMerX8qOzosJayRViVuW2IEoTjMAeZ8xJlIoK38-6GQAJOwZjFsSv0AyQNj4oagqX&guccounter=2 |
| Steam Video Games | Collection of the behaviors of users. | https://www.kaggle.com/datasets/tamber/steam-video-games |