﻿<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD with MathML3 v1.2 20190208//EN" "http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd">
<article
    xmlns:mml="http://www.w3.org/1998/Math/MathML"
    xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="3.0" xml:lang="en" article-type="mini-review">
  <front>
    <journal-meta>
      <journal-id journal-id-type="publisher-id">IJMEBAC</journal-id>
      <journal-title-group>
        <journal-title>International Journal of Mathematical, Engineering, Biological and Applied Computing</journal-title>
      </journal-title-group>
      <issn pub-type="epub">2832-5273</issn>
      <issn pub-type="ppub"></issn>
      <publisher>
        <publisher-name>Science Publications</publisher-name>
      </publisher>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.31586/ijmebac.2022.350</article-id>
      <article-id pub-id-type="publisher-id">IJMEBAC-350</article-id>
      <article-categories>
        <subj-group subj-group-type="heading">
          <subject>Mini Review</subject>
        </subj-group>
      </article-categories>
      <title-group>
        <article-title>
          Open-Source Datasets for Recommender Systems Analysis
        </article-title>
      </title-group>
      <contrib-group>
<contrib contrib-type="author">
<name>
<surname>Marappan</surname>
<given-names>Raja</given-names>
</name>
<xref rid="af1" ref-type="aff">1</xref>
<xref rid="cr1" ref-type="corresp">*</xref>
</contrib>
      </contrib-group>
<aff id="af1"><label>1</label>School of Computing, SASTRA Deemed University, Thanjavur, India</aff>
<author-notes>
<corresp id="c1">
<label>*</label>Corresponding author at: School of Computing, SASTRA Deemed University, Thanjavur, India
</corresp>
</author-notes>
      <pub-date pub-type="epub">
        <day>27</day>
        <month>06</month>
        <year>2022</year>
      </pub-date>
      <volume>1</volume>
      <issue>2</issue>
      <history>
        <date date-type="received">
          <day>27</day>
          <month>06</month>
          <year>2022</year>
        </date>
        <date date-type="rev-recd">
          <day>27</day>
          <month>06</month>
          <year>2022</year>
        </date>
        <date date-type="accepted">
          <day>27</day>
          <month>06</month>
          <year>2022</year>
        </date>
        <date date-type="pub">
          <day>27</day>
          <month>06</month>
          <year>2022</year>
        </date>
      </history>
      <permissions>
        <copyright-statement>&#xa9; Copyright 2022 by authors and Trend Research Publishing Inc. </copyright-statement>
        <copyright-year>2022</copyright-year>
        <license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/4.0/">
          <license-p>This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0/</license-p>
        </license>
      </permissions>
      <abstract>
        There are different traditional and nontraditional datasets available to investigate the performance of recommender systems. This article focuses on the different datasets required for the investigation of recommender systems.
      </abstract>
      <kwd-group>
        <kwd-group><kwd>Recommender Systems; Traditional Datasets; Nontraditional Datasets; Systems Analysis</kwd>
</kwd-group>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec1">
<title>Introduction</title><p>The following terms are used to define the recommendation systems: items, user, and ratings as sketched inTable <xref ref-type="table" rid="tab1">1</xref> [
<xref ref-type="bibr" rid="R1">1</xref>,<xref ref-type="bibr" rid="R2">2</xref>,<xref ref-type="bibr" rid="R3">3</xref>,<xref ref-type="bibr" rid="R4">4</xref>,<xref ref-type="bibr" rid="R5">5</xref>].</p>
</sec><sec id="sec2">
<title>Datasets</title><p>This section explores the different datasets required to investigate the recommendation systems. The specification and availability of different datasets are sketched inTable <xref ref-type="table" rid="tab2">2</xref> [
<xref ref-type="bibr" rid="R6">6</xref>,<xref ref-type="bibr" rid="R7">7</xref>,<xref ref-type="bibr" rid="R8">8</xref>,<xref ref-type="bibr" rid="R9">9</xref>].</p>
<p>The comparison of datasets using different metrics &#x26;#x02013; users, items, ratings, density, and rating scale is sketched inTable <xref ref-type="table" rid="tab3">3</xref> [
<xref ref-type="bibr" rid="R10">10</xref>,<xref ref-type="bibr" rid="R11">11</xref>,<xref ref-type="bibr" rid="R12">12</xref>,<xref ref-type="bibr" rid="R13">13</xref>,<xref ref-type="bibr" rid="R14">14</xref>,<xref ref-type="bibr" rid="R15">15</xref>].</p>
</sec><sec id="sec3">
<title>Conclusions &#x00026;#x26; Future Work</title><p>This article explained the datasets required for the investigation of recommender systems. These datasets are also compared using the metrics such as users, items, ratings, density, and rating scale. The recommender systems can be developed using several soft computing models in the future [
<xref ref-type="bibr" rid="R16">16</xref>,<xref ref-type="bibr" rid="R17">17</xref>,<xref ref-type="bibr" rid="R18">18</xref>,<xref ref-type="bibr" rid="R19">19</xref>,<xref ref-type="bibr" rid="R20">20</xref>].</p>
<p></p>
<p></p>
<p></p>
<p></p>
<p></p>
<p></p>
<p></p>
<p></p>
<p></p>
<p></p>
<p></p>
<p></p>
<p></p>
<p></p>
<p></p>
<p></p>
<p></p>
<table-wrap id="tab1">
<label>Table 1</label>
<caption>
<p>Terms for recommendation systems</p>
</caption>
<table> <tr>  <td>  <p><b >Term</b></p>  </td>  <td>  <p><b >Definition</b></p>  </td> </tr> <tr>  <td>  <p>Item</p>  </td>  <td>  <p>This defines what is to be recommended. For  example, this refers &#8211; information, movies, products etc.</p>  </td> </tr> <tr>  <td>  <p>User</p>  </td>  <td>  <p>The user rates products or items and receives new  items recommendations.</p>  </td> </tr> <tr>  <td>  <p>Rating</p>  </td>  <td>  <p>The user's choice or preference is defined as a  rating. For example, rating defines &#8211; dislike or like, 1 star to 5  stars, integer or floating representation, etc.</p>  </td> </tr></table>
</table-wrap><p></p>
<table-wrap id="tab2">
<label>Table 2</label>
<caption>
<p>Datasets &#x02013; specification and availability.</p>
</caption>
<table> <tr>  <td>  <p><b >Dataset</b></p>  </td>  <td>  <p><b >Specification</b></p>  </td>  <td>  <p><b >Availability</b></p>  </td> </tr> <tr>  <td>  <p>MovieLens</p>  </td>  <td>  <p>Collection of movie ratings with 27000 movies  &amp; 140000 users</p>  </td>  <td>  <p>https://grouplens.org/datasets/movielens/</p>  </td> </tr> <tr>  <td>  <p>Jester</p>  </td>  <td>  <p>The joke rating system consists of 6 million  ratings of 150 jokes</p>  </td>  <td>  <p>http://eigentaste.berkeley.edu/</p>  </td> </tr> <tr>  <td>  <p>Book-Crossings</p>  </td>  <td>  <p>Book rating dataset with 1.1 M ratings (90000  users &amp; 270000 books)</p>  </td>  <td>  <p>http://www2.informatik.uni-freiburg.de/~cziegler/BX/</p>  </td> </tr> <tr>  <td>  <p>Last.fm</p>  </td>  <td>  <p>Music recommendations dataset with aggregated  data.</p>  </td>  <td>  <p>https://grouplens.org/datasets/hetrec-2011/</p>  </td> </tr> <tr>  <td>  <p>Wikipedia</p>  </td>  <td>  <p>Collaborative encyclopedia used for general  applications.</p>  </td>  <td>  <p>https://en.wikipedia.org/wiki/Wikipedia:Database_download#English-language_Wikipedia</p>  </td> </tr> <tr>  <td>  <p>OpenStreetMap</p>  </td>  <td>  <p>Collaborative project for maps.</p>  </td>  <td>  <p>https://planet.openstreetmap.org/planet/full-history/</p>  </td> </tr> <tr>  <td>  <p>Python Git Repositories</p>  </td>  <td>  <p>Git repositories Python code.</p>  </td>  <td>  <p>https://github.com/python</p>  </td> </tr> <tr>  <td>  <p>MovieLens 25M</p>  </td>  <td>  <p>Collection of movie ratings with 62423 movies  &amp; 25000095 ratings.</p>  </td>  <td>  <p>https://grouplens.org/datasets/movielens/25m/</p>  </td> </tr> <tr>  <td>  <p>Social Network Influencer</p>  </td>  <td>  <p>Learning task preferences.</p>  </td>  <td>  <p>https://www.kaggle.com/c/predict-who-is-more-influential-in-a-social-network/data</p>  </td> </tr> <tr>  <td>  <p>Million Song</p>  </td>  <td>  <p>Audio features for music tracks.</p>  </td>  <td>  <p>http://millionsongdataset.com/</p>  </td> </tr> <tr>  <td>  <p>Free Music Archive</p>  </td>  <td>  <p>Music analysis audio downloads.</p>  </td>  <td>  <p>https://github.com/mdeff/fma</p>  </td> </tr> <tr>  <td>  <p>Netflix Prize</p>  </td>  <td>  <p>Applied in the competition of Netflix Prize.</p>  </td>  <td>  <p>https://academictorrents.com/details/9b13183dc4d60676b773c9e2cd6de5e5542cee9a</p>  </td> </tr> <tr>  <td>  <p>Amazon Review</p>  </td>  <td>  <p>The reviews collection.</p>  </td>  <td>  <p>https://nijianmo.github.io/amazon/index.html</p>  </td> </tr> <tr>  <td>  <p>Yahoo! Music User Ratings</p>  </td>  <td>  <p>Musical artists collection preferences.</p>  </td>  <td>  <p>https://webscope.sandbox.yahoo.com/catalog.php?datatype=r&amp;guce_referrer=aHR0cHM6Ly9naXRodWIuY29tL2Nhc2VyZWMvRGF0YXNldHMtZm9yLVJlY29tbWVuZGVyLVN5c3RlbXM&amp;guce_referrer_sig=AQAAAAkyH74jyiIv4JxPjvejltL1_Sk-yDNtNAbIpHn2YfUnG1v-2mxj_XOD-qtpvdqg-aoNtTk9pzWVkYzz3ZbvN5C2_RrjVAowWPR7lmx-GidaMerX8qOzosJayRViVuW2IEoTjMAeZ8xJlIoK38-6GQAJOwZjFsSv0AyQNj4oagqX&amp;guccounter=2</p>  </td> </tr> <tr>  <td>  <p>Steam Video Games</p>  </td>  <td>  <p>Collection of the behaviors of users.</p>  </td>  <td>  <p>https://www.kaggle.com/datasets/tamber/steam-video-games</p>  </td> </tr></table>
</table-wrap><p></p>
<p></p>
<p></p>
<table-wrap id="tab3">
<label>Table 3</label>
<caption>
<p>Performance comparison of datasets using different metrics</p>
</caption>
<table> <tr>  <td>  <p><b >Dataset</b></p>  </td>  <td>  <p><b >Items</b></p>  </td>  <td>  <p><b >Users</b></p>  </td>  <td>  <p><b >Density</b></p>  </td>  <td>  <p><b >Ratings</b></p>  </td>  <td>  <p><b >Rating  Scale</b></p>  </td> </tr> <tr>  <td>  <p>Book-Crossing</p>  </td>  <td>  <p>271379</p>  </td>  <td>  <p>92107</p>  </td>  <td>  <p>0.0041%</p>  </td>  <td>  <p>1031175</p>  </td>  <td>  <p>[1, 10], and implicit</p>  </td> </tr> <tr>  <td>  <p>Wikipedia</p>  </td>  <td>  <p>4936761</p>  </td>  <td>  <p>5583724</p>  </td>  <td>  <p>0.0015%</p>  </td>  <td>  <p>417996366</p>  </td>  <td>  <p>Interactions</p>  </td> </tr> <tr>  <td>  <p>Git</p>  </td>  <td>  <p>1757</p>  </td>  <td>  <p>790</p>  </td>  <td>  <p>0.95%</p>  </td>  <td>  <p>13165</p>  </td>  <td>  <p>Interactions</p>  </td> </tr> <tr>  <td>  <p>Jester</p>  </td>  <td>  <p>150</p>  </td>  <td>  <p>124113</p>  </td>  <td>  <p>31.50%</p>  </td>  <td>  <p>5865235</p>  </td>  <td>  <p>[-10, 10]</p>  </td> </tr> <tr>  <td>  <p>Last.fm</p>  </td>  <td>  <p>17632</p>  </td>  <td>  <p>1892</p>  </td>  <td>  <p>0.28%</p>  </td>  <td>  <p>92834</p>  </td>  <td>  <p>Play Counts</p>  </td> </tr> <tr>  <td>  <p>OpenStreetMap</p>  </td>  <td>  <p>108330</p>  </td>  <td>  <p>231</p>  </td>  <td>  <p>0.82%</p>  </td>  <td>  <p>205774</p>  </td>  <td>  <p>Interactions</p>  </td> </tr> <tr>  <td>  <p>Movielens 1M</p>  </td>  <td>  <p>3883</p>  </td>  <td>  <p>6040</p>  </td>  <td>  <p>4.26%</p>  </td>  <td>  <p>1000209</p>  </td>  <td>  <p>[1-5]</p>  </td> </tr> <tr>  <td>  <p>Movielens 10M</p>  </td>  <td>  <p>10681</p>  </td>  <td>  <p>69878</p>  </td>  <td>  <p>1.33%</p>  </td>  <td>  <p>10000054</p>  </td>  <td>  <p>[0.5-5]</p>  </td> </tr> <tr>  <td>  <p>Movielens 20M</p>  </td>  <td>  <p>27278</p>  </td>  <td>  <p>138493</p>  </td>  <td>  <p>0.52%</p>  </td>  <td>  <p>20000263</p>  </td>  <td>  <p>[0.5-5]</p>  </td> </tr></table>
</table-wrap><p></p>
</sec>
  </body>
  <back>
    <ref-list>
      <title>References</title>
      
<ref id="R1">
<label>[1]</label>
<mixed-citation publication-type="other">G. Adomavicius, A. Tuzhilin. Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions IEEE Trans. Knowl. Data Eng. (2005), 10.1109/TKDE.2005.99
</mixed-citation>
</ref>
<ref id="R2">
<label>[2]</label>
<mixed-citation publication-type="other">J. Chen, X. Wang, S. Zhao, F. Qian, Y. Zhang. Deep attention user-based collaborative filtering for recommendation Neurocomputing, 383 (2020), 10.1016/j.neucom.2019.09.050
</mixed-citation>
</ref>
<ref id="R3">
<label>[3]</label>
<mixed-citation publication-type="other">A. Da'u, N. Salim, I. Rabiu, A. Osman. Recommendation system exploiting aspect-based opinion mining with deep learning method. Inf. Sci., 512 (2020), 10.1016/j.ins.2019.10.038
</mixed-citation>
</ref>
<ref id="R4">
<label>[4]</label>
<mixed-citation publication-type="other">Lu J., Zhang Q., Zhang G. Recommender Systems: Advanced Developments World Scientific (2020)
</mixed-citation>
</ref>
<ref id="R5">
<label>[5]</label>
<mixed-citation publication-type="other">Zhang S., Yao L., Sun A., Tay Y. Deep learning based recommender system: A survey and new perspectives ACM Com-put. Surv. (2019)
</mixed-citation>
</ref>
<ref id="R6">
<label>[6]</label>
<mixed-citation publication-type="other">Zhongying Zhao, Xuejian Zhang, Hui Zhou, Chao Li, Maoguo Gong, Yongqing Wang Hetnerec: heterogeneous network embedding based recommendation Knowl. Base Syst., 204 (2020), Article 106218
</mixed-citation>
</ref>
<ref id="R7">
<label>[7]</label>
<mixed-citation publication-type="other">Liao W., Zhang Q., Yuan B., Zhang G., Lu J. Heterogeneous multidomain recommender system through adversarial learning IEEE Trans. Neural Netw. Learn. Syst. (2022)
</mixed-citation>
</ref>
<ref id="R8">
<label>[8]</label>
<mixed-citation publication-type="other">Zhang Q., Liao W., Zhang G., Yuan B., Lu J. A deep dual adversarial network for cross-domain recommendation IEEE Trans. Knowl. Data Eng. (2021)
</mixed-citation>
</ref>
<ref id="R9">
<label>[9]</label>
<mixed-citation publication-type="other">Qingyu Guo, Fuzhen Zhuang, Chuan Qin, Hengshu Zhu, Xing Xie, Hui Xiong, Qing He A Survey on Knowledge Graph-Based Recommender Systems IEEE Transactions on Knowledge and Data Engineering (2020)
</mixed-citation>
</ref>
<ref id="R10">
<label>[10]</label>
<mixed-citation publication-type="other">Zhang Y., Chen X. Explainable recommendation: A survey and new perspectives (2020) arXiv preprint arXiv:1804.11192
</mixed-citation>
</ref>
<ref id="R11">
<label>[11]</label>
<mixed-citation publication-type="other">Bhaskaran, S.; Marappan, R.; Santhi, B. Design and Comparative Analysis of New Personalized Recommender Algo-rithms with Specific Features for Large Scale Datasets. Mathematics 2020, 8, 1106. https://doi.org/10.3390/math8071106
</mixed-citation>
</ref>
<ref id="R12">
<label>[12]</label>
<mixed-citation publication-type="other">Bhaskaran, S.; Marappan, R.; Santhi, B. Design and Analysis of a Cluster-Based Intelligent Hybrid Recommendation Sys-tem for E-Learning Applications. Mathematics 2021, 9, 197. https://doi.org/10.3390/math9020197
</mixed-citation>
</ref>
<ref id="R13">
<label>[13]</label>
<mixed-citation publication-type="other">Marappan, R. (2022). Classification and Analysis of Recommender Systems. International Journal of Mathematical, Engi-neering, Biological and Applied Computing, 1(1), 17-21. DOI: 10.31586/ijmebac.2022.331
</mixed-citation>
</ref>
<ref id="R14">
<label>[14]</label>
<mixed-citation publication-type="other">Marappan, R., &#x00026; Bhaskaran, S. (2022). Movie Recommendation System Modeling Using Machine Learning. International Journal of Mathematical, Engineering, Biological and Applied Computing 2022, 1(1), 12-16. DOI: 10.31586/ijmebac.2022.291
</mixed-citation>
</ref>
<ref id="R15">
<label>[15]</label>
<mixed-citation publication-type="other">Marappan, R., &#x00026; Bhaskaran, S. (2022). Analysis of Network Modeling for Real-world Recommender Systems. Interna-tional Journal of Mathematical, Engineering, Biological and Applied Computing, 1(1), 1-7. DOI: 10.31586/ijmebac.2022.283
</mixed-citation>
</ref>
<ref id="R16">
<label>[16]</label>
<mixed-citation publication-type="other">Marappan, R.; Sethumadhavan, G. Complexity Analysis and Stochastic Convergence of Some Well-known Evolutionary Operators for Solving Graph Coloring Problem. Mathematics 2020, 8, 303. https://doi.org/10.3390/math8030303
</mixed-citation>
</ref>
<ref id="R17">
<label>[17]</label>
<mixed-citation publication-type="other">Marappan, R., Sethumadhavan, G. Solution to Graph Coloring Using Genetic and Tabu Search Procedures. Arab J Sci Eng 43, 525-542 (2018). https://doi.org/10.1007/s13369-017-2686-9
</mixed-citation>
</ref>
<ref id="R18">
<label>[18]</label>
<mixed-citation publication-type="other">Raja Marappan: A New Multi-Objective Optimization in Solving Graph Coloring and Wireless Networks Channels Al-location Problems. Int. J. Advanced Networking and Applications Volume: 13 Issue: 02 Pages: 4891-4895 (2021)
</mixed-citation>
</ref>
<ref id="R19">
<label>[19]</label>
<mixed-citation publication-type="other">R. Marappan and G. Sethumadhavan, "A New Genetic Algorithm for Graph Coloring," 2013 Fifth International Confer-ence on Computational Intelligence, Modelling and Simulation, 2013, pp. 49-54, doi: 10.1109/CIMSim.2013.17.
</mixed-citation>
</ref>
<ref id="R20">
<label>[20]</label>
<mixed-citation publication-type="other">Raja Marappan, S. Bhaskaran. (2022). Analysis of Recent Trends in E-Learning Personalization Techniques. The Educa-tional Review, USA, 6(5), 167-170. DOI: http://dx.doi.org/10.26855/er.2022.05.003
</mixed-citation>
</ref>
    </ref-list>
  </back>
</article>