You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

3.0 KiB

MovieLens Dataset

The MovieLens Dataset was collected by GroupLens Research. The data set contains some user information, movie information, and many movie ratings from [1-5]. The data sets have many version depending on the size of set. We use MovieLens 1M Dataset as a demo dataset, which contains 1 million ratings from 6000 users on 4000 movies. Released 2/2003.

Dataset Features

In ml-1m Dataset, there are many features in these dataset. The data files (which have ".dat" extension) in ml-1m Dataset is basically CSV file that delimiter is "::". The description in README we quote here.


All ratings are contained in the file "ratings.dat" and are in the following format:


  • UserIDs range between 1 and 6040
  • MovieIDs range between 1 and 3952
  • Ratings are made on a 5-star scale (whole-star ratings only)
  • Timestamp is represented in seconds since the epoch as returned by time(2)
  • Each user has at least 20 ratings


User information is in the file "users.dat" and is in the following format:


All demographic information is provided voluntarily by the users and is not checked for accuracy. Only users who have provided some demographic information are included in this data set.

  • Gender is denoted by a "M" for male and "F" for female

  • Age is chosen from the following ranges:

    • 1: "Under 18"
    • 18: "18-24"
    • 25: "25-34"
    • 35: "35-44"
    • 45: "45-49"
    • 50: "50-55"
    • 56: "56+"
  • Occupation is chosen from the following choices:

    • 0: "other" or not specified
    • 1: "academic/educator"
    • 2: "artist"
    • 3: "clerical/admin"
    • 4: "college/grad student"
    • 5: "customer service"
    • 6: "doctor/health care"
    • 7: "executive/managerial"
    • 8: "farmer"
    • 9: "homemaker"
    • 10: "K-12 student"
    • 11: "lawyer"
    • 12: "programmer"
    • 13: "retired"
    • 14: "sales/marketing"
    • 15: "scientist"
    • 16: "self-employed"
    • 17: "technician/engineer"
    • 18: "tradesman/craftsman"
    • 19: "unemployed"
    • 20: "writer"


Movie information is in the file "movies.dat" and is in the following format:


  • Titles are identical to titles provided by the IMDB (including year of release)

  • Genres are pipe-separated and are selected from the following genres:

    • Action
    • Adventure
    • Animation
    • Children's
    • Comedy
    • Crime
    • Documentary
    • Drama
    • Fantasy
    • Film-Noir
    • Horror
    • Musical
    • Mystery
    • Romance
    • Sci-Fi
    • Thriller
    • War
    • Western
  • Some MovieIDs do not correspond to a movie due to accidental duplicate entries and/or test entries

  • Movies are mostly entered by hand, so errors and inconsistencies may exist