Video Popularity Prediction Challenge | bitgrit

Video Popularity Prediction Challenge

Develop a machine learning model to predict the number of views that videos are likely to receive.




March 31, 2021, 6:30 p.m. UTC

About DataGateway: DataGateway is a Japanese startup with a mission to build a more digital society by applying the power of cutting-edge technology including AI, blockchain, and decentralized computing. One of DataGateway's clients is looking for an algorithm that predicts the number of views that videos uploaded to their platform are likely to get so they can maximize views. This algorithm will also help the company price the ads shown in the videos on its platform. About this challenge: The goal of this competition is to develop a machine learning model to predict the number of views that videos are likely to receive based on attributes such as duration, location, and time published.

1st Prize


2nd Prize


3rd Prize

  • Jan. 18, 2021 Competition Starts
  • March 31, 2021 Competition Ends
  • April 21, 2021 Winners Announced (Subject to change based on submission results)
In order to build your machine learning model, we have provided the following data sets: 1. Metadata comp_id: Unique ID ad_blocked: Indicates whether or not ads are blocked on the video embed: Indicates whether or not the video can be embedded ratio: The aspect ratio of the video duration: Duration of the video (in seconds) language: Language used in the video (encoded) partner: Indicates whether the video is certified by the partner/sponsor partner_active: Indicates whether the partner/sponsor is still active n_likes: The number of likes the video has n_tags: The number of tags in the video n_formats: The number of streaming formats available for the video dayofweek: The day of week when the video was published hour: The hour when the video was published (24-hour time format) 2. Image data Thumbnail pixel data 3. Title data Vectorized title data 4. Description data Vectorized descriptions *Training datasets are marked with 'train_' at the beginning of their filenames. Please use these sets to develop the model. Datasets marked 'public_' at the beginning of their filenames are sets you can use to make predictions with your model and test how well your model performs on unseen data. The submission file should follow the same format as the example file (solution_format.csv). Submissions are evaluated on e^(-RMSE/MAX(observed_values)), where Root Mean Squared Error is calculated between the predicted values and observed values. NOTE: You may submit a solution file up to 5 times a day. A few minutes after submitting your solution, you will see the accuracy of your solution on the submission page over a subset of the test data. Final competition results are based on the Private Leaderboard results, and the winner will be the user at the top of the Private Leaderboard.
Who do I contact if I need help regarding a competition?
For any inquiries, please contact us at
How will I know if I’ve won?
If you are one of the top three winners for this competition, we will email you with the final result and information about how to claim your reward.
How can I report a bug?
Please shoot us an email at with details and a description of the bug you are facing, and if possible, please attach a screenshot of the bug itself.
If I win, how can I receive my reward?
Prizes can be delivered via PayPal, wire transfer, or other suitable payment transfer method. We will do our best to accommodate your payment preference as best as possible depending on your location and our ability to do so.
What metric is used for evaluation in this competition?
Submissions are evaluated on e^(-RMSE/MAX(observed_values)), where Root Mean Squared Error is calculated between the predicted values and observed values.
1. This competition is governed by the following Terms of Participation. Participants must agree to and comply with these Terms to participate. 2. Users can make a maximum number of five submissions per day. If users want to submit new files after making five submissions in a day, they will have to wait until the following day to do so. Please keep this in mind when uploading a submission.csv file. 3. The use of external datasets is not allowed. 4. A competition prize will be awarded after we have received, successfully executed, and confirmed the validity of both the code and the solution. Once winners are announced and our team reaches out to them, the winners must provide the following by April 7, 2021 to be qualified as a competition winner and receive their prize: a. All source files required to preprocess the data b. All source files required to build, train and make predictions with the model using the processed data c. A requirements.txt (or equivalent) file indicating all the required libraries and their versions as needed d. A ReadMe file containing the following: • Clear and unambiguous instructions on how to reproduce the predictions from start to finish including data pre-processing, feature extraction, model training and predictions generation • Environment details regarding where the model was developed and trained, including OS, memory (RAM), disk space, CPU/GPU used, and any required environment configurations required to execute the code • Clear answers to the following questions: - Which data files are being used? - How are these files processed? - What is the algorithm used and what are its main hyperparameters? - Any other comments considered relevant to understanding and using the model In the event these items are not provided or do not meet the minimum requirements listed above, we will not be able to award the winner with their respective prize. 5. Payments to winners may be subject to local, state, federal and foreign tax reporting and withholding requirements. 6. If two or more participants have the same score on the leaderboard, the participant who submitted the winning file first will be considered the winner. 7. All submissions need to be made as an individual; no teams are allowed in this competition. Users who do not comply with this rule will be immediately disqualified in the case that we find the same or very similar scores and/or uploaded solutions. 8. If you have any inquiries about this competition, please don’t hesitate to reach out to us at We ask that users do not contact DataGateway directly.
New Submission
Step 1
Upload your file
Upload your csv file here.
Your submission should be in .csv format.
Step 2
Briefly describe your submission (400 characters or less)

You have exceeded the number of allowed submissions for this competition.
*You have 5 submissions left for today.

Thanks for your submission!

We’ll send updates to your email. You can check your email and preferences here.
My Submissions