Back to Article

Open-Source Datasets for Recommender Systems Analysis

International Journal of Mathematical, Engineering, Biological and Applied Computing | Vol 1, Issue 2

Table 2. Datasets – specification andavailability.

Dataset

Specification

Availability

MovieLens

Collection of movie ratings with 27000 movies & 140000 users

https://grouplens.org/datasets/movielens/

Jester

The joke rating system consists of 6 million ratings of 150 jokes

http://eigentaste.berkeley.edu/

Book-Crossings

Book rating dataset with 1.1 M ratings (90000 users & 270000 books)

http://www2.informatik.uni-freiburg.de/~cziegler/BX/

Last.fm

Music recommendations dataset with aggregated data.

https://grouplens.org/datasets/hetrec-2011/

Wikipedia

Collaborative encyclopedia used for general applications.

https://en.wikipedia.org/wiki/Wikipedia:Database_download#English-language_Wikipedia

OpenStreetMap

Collaborative project for maps.

https://planet.openstreetmap.org/planet/full-history/

Python Git Repositories

Git repositories Python code.

https://github.com/python

MovieLens 25M

Collection of movie ratings with 62423 movies & 25000095 ratings.

https://grouplens.org/datasets/movielens/25m/

Social Network Influencer

Learning task preferences.

https://www.kaggle.com/c/predict-who-is-more-influential-in-a-social-network/data

Million Song

Audio features for music tracks.

http://millionsongdataset.com/

Free Music Archive

Music analysis audio downloads.

https://github.com/mdeff/fma

Netflix Prize

Applied in the competition of Netflix Prize.

https://academictorrents.com/details/9b13183dc4d60676b773c9e2cd6de5e5542cee9a

Amazon Review

The reviews collection.

https://nijianmo.github.io/amazon/index.html

Yahoo! Music User Ratings

Musical artists collection preferences.

https://webscope.sandbox.yahoo.com/catalog.php?datatype=r&guce_referrer=aHR0cHM6Ly9naXRodWIuY29tL2Nhc2VyZWMvRGF0YXNldHMtZm9yLVJlY29tbWVuZGVyLVN5c3RlbXM&guce_referrer_sig=AQAAAAkyH74jyiIv4JxPjvejltL1_Sk-yDNtNAbIpHn2YfUnG1v-2mxj_XOD-qtpvdqg-aoNtTk9pzWVkYzz3ZbvN5C2_RrjVAowWPR7lmx-GidaMerX8qOzosJayRViVuW2IEoTjMAeZ8xJlIoK38-6GQAJOwZjFsSv0AyQNj4oagqX&guccounter=2

Steam Video Games

Collection of the behaviors of users.

https://www.kaggle.com/datasets/tamber/steam-video-games