Viral Tweets Prediction Challenge | bitgrit
Competition banner

Viral Tweets Prediction Challenge

Can you predict which tweets will go viral?

DataGateway
1024 Participants
3582 Submissions
bitgrit facebook bitgrit twitter
Brief
The goal of this competition is to develop a machine learning model to predict the virality level of each tweet based on attributes such as tweet content, media attached to the tweet, and date/time published. About DataGateway: DataGateway is a Japanese startup with a mission to build a more digital society by applying the power of cutting-edge technology including AI, blockchain, and decentralized computing.
Prizes
  • 1st Prize ($ 1500)

  • 2nd Prize ($ 1000)

  • 3rd Prize ($ 500)

Timeline
  • 06 May 2021 Competition Starts
  • 06 Jul 2021 Competition Ends
  • 27 Jul 2021 Winners Announced (Subject to change based on submission results)
Data Breakdown
In order to build your machine learning model, we have provided the following data sets: 1. users.csv: Users basic data. 'user_id' : Twitter holder's account ID 'user_like_count' : The number of likes that the account receives. 'user_followers_count' : The number of followers that the account has. 'user_following_count' : The number of accounts that the account is following. 'user_listed_on_count' : The number of lists that that the account is a member of. 'user_has_location' : Indicates whether the account has location information or not. 'user_tweet_count' : The number of tweets by the account. 'user_has_url' : Indicates whether the account has URL or not. 'user_verified' : Indicates whether the account is verified or not. 'user_created_at_year' : The year the account was created. 'user_created_at_month' : The month the account was created. 2. user_vectorized_descriptions.csv: Vectorized user profile Bio. 'user_id' : Twitter holder's account ID. 'feature_0' : Vectorized feature 'feature_1' : Vectorized feature 'feature_2' : Vectorized feature 'feature_3' : Vectorized feature ... 'feature_767' : Vectorized feature 3. user_vectorized_profile_images.csv: Vectorized user profile image. 'user_id' : Twitter holder's account ID. 'feature_0' : Vectorized feature 'feature_1' : Vectorized feature 'feature_2' : Vectorized feature 'feature_3' : Vectorized feature ... 'feature_2047' : Vectorized feature 4. tweets.csv 'tweet_id' : tweet ID 'tweet_user_id' : Twitter holder's account ID. 'tweet_created_at_year' : The year the tweet was created. 'tweet_created_at_month' : The month the tweet was created. 'tweet_created_at_day' : The day the tweet was created. 'tweet_created_at_hour' : The hour the tweet was created. 'tweet_hashtag_count' : The number of hashtag in the tweet. 'tweet_url_count' : The number of URL tweet has. 'tweet_mention_count' : The number of mentions in the tweet. 'tweet_has_attachment' : Indicates whether the tweet has an attachment or not. 'tweet_attachment_class' : The attachment type *We won't be able to disclose what each type means. 'tweet_language_id' : The language id that the tweet is written by. 'tweet_topic_ids' : Tweet's entities' topics: TOPIC ID of different "keywords" mentioned within the text of the tweet. 'virality' : Virality level *This is the target Variable. 5. tweets_vectorized_text.csv 'tweet_id' : tweet ID 'feature_0' : Vectorized feature 'feature_1' : Vectorized feature 'feature_2' : Vectorized feature 'feature_3' : Vectorized feature ... 'feature_767' : Vectorized feature 6. tweets_vectorized_media.csv 'tweet_id' : tweet ID 'media_id' : media ID *Please note that a tweet could be tied to multiple media IDs (for example, one tweet can have multiple images with different media IDs) 'img_feature_0' : Vectorized feature 'img_feature_1' : Vectorized feature 'img_feature_2' : Vectorized feature 'img_feature_3' : Vectorized feature ... 'img_feature_767' : Vectorized feature *Training datasets are marked with 'train_' at the beginning of their filenames. Please use these sets to develop the model. Datasets marked 'test_' at the beginning of their filenames are sets you can use to make predictions with your model and test how well your model performs on unseen data. The submission file should follow the same format as the example file (solution_format.csv). Submissions are evaluated on accuracy (that is, 'Number of correct predictions / Total Number of predictions). NOTE: You may submit a solution file up to 5 times a day. A few minutes after submitting your solution, you will see the accuracy of your solution on the submission page over a subset of the test data. Final competition results are based on the Private Leaderboard results, and the winner will be the user at the top of the Private Leaderboard.
FAQs
Who do I contact if I need help regarding a competition?
For any inquiries, please contact us at [email protected]
How will I know if I’ve won?
If you are one of the top three winners for this competition, we will email you with the final result and information about how to claim your reward.
How can I report a bug?
Please shoot us an email at [email protected] with details and a description of the bug you are facing, and if possible, please attach a screenshot of the bug itself.
If I win, how can I receive my reward?
Prizes will be paid by bank transfer. If for some reason you are not able to accept payment by bank transfer, please let us know and we will do our best to accommodate your needs as possible.
Rules
1. This competition is governed by the following Terms of Participation. Participants must agree to and comply with these Terms to participate. 2. Users can make a maximum number of five submissions per day. If users want to submit new files after making five submissions in a day, they will have to wait until the following day to do so. Please keep this in mind when uploading a submission.csv file. Any attempt to circumvent stated limits will result in disqualification. 3. The use of external datasets is not allowed. 4. It is not allowed to upload the competition dataset to other websites. Users who do not comply with this rule will be disqualified. 5. A competition prize will be awarded after we have received, successfully executed, and confirmed the validity of both the code and the solution. Once winners are announced and our team reaches out to them, the winners must provide the following by July 13, 2021 to be qualified as a competition winner and receive their prize: a. All source files required to preprocess the data b. All source files required to build, train and make predictions with the model using the processed data c. A requirements.txt (or equivalent) file indicating all the required libraries and their versions as needed d. A ReadMe file containing the following: • Clear and unambiguous instructions on how to reproduce the predictions from start to finish including data pre-processing, feature extraction, model training and predictions generation • Environment details regarding where the model was developed and trained, including OS, memory (RAM), disk space, CPU/GPU used, and any required environment configurations required to execute the code • Clear answers to the following questions: - Which data files are being used? - How are these files processed? - What is the algorithm used and what are its main hyperparameters? - Any other comments considered relevant to understanding and using the model In the event these items are not provided or do not meet the minimum requirements listed above, we will not be able to award the winner with their respective prize. 6. The submitted solution should be able to generate exactly the same output that gives the corresponding score on the leaderboard. If the score obtained from the code is different from what’s shown on the leaderboard, the new score will be used for the final rankings unless a logical explanation is provided. 7. Any prize awards are subject to verification of eligibility and compliance with these Terms of Participation. All decisions of bitgrit and the Competition Sponsor will be final and binding on all matters relating to this Competition. 8. Payments to winners may be subject to local, state, federal and foreign tax reporting and withholding requirements. 9. If two or more participants have the same score on the leaderboard, the participant who submitted the winning file first will be considered the winner. 10. All submissions need to be made as an individual; no teams are allowed in this competition. Users who do not comply with this rule will be immediately disqualified in the case that we find the same or very similar scores and/or uploaded solutions. 11. Any Participant shall delete the Company-Provided Information immediately after the completion of a Competition. 12. If you have any inquiries about this competition, please don’t hesitate to reach out to us at [email protected]. We ask that users do not contact DataGateway directly.
New Submission
Step 1
Upload or drop your file
Upload or drop your csv file here.
Your submission should be in .csv format.
Step 2
Description
Briefly describe your submission (400 characters or less)

You have exceeded the number of allowed submissions for this competition.
5 submission(s) left

Thanks for your submission!

We'll send updates to your email. You can check your email and preferences here.
My Submissions