Brief
This 2-phase competition is part of the NASA Tournament Lab, hosted by NCATS (The National Center for Advancing Translational Sciences) with contributions from the National Library of Medicine (NLM). These institutions, in collaboration with bitgrit and CrowdPlat, have come together to bring you this challenge where you can deploy your data-driven technology solutions towards accelerating scientific research in medicine and ensure that data from biomedical publications can be maximally leveraged and reach a wide range of biomedical researchers.
Each phase of the competition is designed to spur innovation in the field of natural language processing, asking competitors to design systems that can accurately recognize scientific concepts from the text of scientific articles, connect those concepts into knowledge assertions, and determine if that claim is a novel finding or background information.
Part 1: Given only an abstract text, the goal is to find all the nodes or biomedical entities (position in text and BioLink Model Category).
Part 2: Given the abstract and the nodes annotated from it, the goal is to find all the relationships between them (position in text and BioLink Model Predicate).
*NOTE: The prizes listed will be awarded based on a competitor’s combined, weighted scores from both phases of the competition. Please see the Rules section for more information.*
The National Center for Advancing Translational Sciences (NCATS, a center of the National Institutes of Health):
NCATS, is conducting this challenge under the America Creating Opportunities to Meaningfully Promote Excellence in Technology, Education, and Science (COMPETES) Reauthorization Act of 2010. This challenge will spur innovation in NLP to advance the field and allow the generation of more accurate and useful data from biomedical publications, which will enhance the ability for data scientists to create tools to foster discovery and generate new hypotheses.
The National Center for Biotechnology Information (NCBI, part of the National Library of Medicine, a division of the National Institutes of Health):
NCBI intramural researchers and their collaborators have provided a corpus of annotated abstracts from published scientific research articles and knowledge assertions between these concepts, which will be provided to participants for training and testing purposes.
CrowdPlat (Project Company):
The LitCoin project was awarded to and is being managed by CrowdPlat under NASA's NOIS2 contract. Located in San Jose, California; CrowdPlat provides crowdsourcing solutions to medium to large scale enterprises seeking project execution through a crowdsourced talent pool.
Prizes
1st Prize ($ 35000)
The prize money displayed is the total prize for both phases of the LitCoin NLP Challenge. Please see the Rules section for more info!2nd Prize ($ 25000)
The prize money displayed is the total prize for both phases of the LitCoin NLP Challenge. Please see the Rules section for more info!3rd Prize ($ 20000)
The prize money displayed is the total prize for both phases of the LitCoin NLP Challenge. Please see the Rules section for more info!4th Prize ($ 5000)
The prize money displayed is the total prize for both phases of the LitCoin NLP Challenge. Please see the Rules section for more info!5th Prize ($ 5000)
The prize money displayed is the total prize for both phases of the LitCoin NLP Challenge. Please see the Rules section for more info!6th Prize ($ 5000)
The prize money displayed is the total prize for both phases of the LitCoin NLP Challenge. Please see the Rules section for more info!7th Prize ($ 5000)
The prize money displayed is the total prize for both phases of the LitCoin NLP Challenge. Please see the Rules section for more info!
Timeline
- 23 Dec 2021 Competition Phase-1 Ended
- 27 Dec 2021 Competition Phase-2 Starts
- 28 Feb 2022 Competition Phase-2 Ends
- 08 Apr 2022 Winners Announced (Subject to change based on submission results)
Data Breakdown
The goal of the second part of the LitCoin NLP Challenge is to identify all the relations between biomedical entities within a research paper’s title and abstract.
The type of biomedical relation comes from the BioLink Model Predicates, and can be one and only one of the following:
・Association
・Positive Correlation
・Negative Correlation
・Bind
・Cotreatment
・Comparison
・Drug Interaction
・Conversion
To properly understand these predicates it would be helpful to get familiar with the BioLink Model and biomedical ontologies in general.
The BioLink Model is a high level datamodel of biological entities (genes, diseases, phenotypes, pathways, individuals, substances, etc) and their associations. It can be thought of a high-level biomedical ontology that can help categorize and associate concepts that might come from different lower level biomedical ontologies.
Ontologies are controlled vocabularies that allow describing the meaning of data (its semantics) in a human and machine readable way. They have been widely used in the biomedical area to help solve the issue of data heterogeneity, leading to advanced data analysis, knowledge organization and reasoning.
To better understand the BioLink Model and learn more about biomedical ontologies it would be helpful to take a look at the following links:
・BioLink Model:
https://biolink.github.io/biolink-model/
http://tree-viz-biolink.herokuapp.com
・Ontologies
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4300097/
https://disease-ontology.org
https://www.omim.org/
https://www.bioontology.org/
The number of relations in each abstract varies depending on the text and all relations should be identified.
The available data corresponds to the same data used for the first part of the competition, but also including the entities for the test set:
abstracts_train.csv, entities_train.csv, relations_train.cs, abstracts_test.csv and entities_test.csv, which will be described below. All these are CSV files delimited with tab instead of ",".
・abstracts_train.csv: CSV file containing research papers that can be used for training.
# abstract_id: PubMed ID of the research paper.
# title: title of the research paper.
# abstract: abstract or summary of the research paper.
・entities_train.csv: CSV file containing all the entities' mentions found in the texts from abstracts_train.csv that can be used for training.
# id: unique ID of the entity's mention
# abstract_id: PubMed ID of the research paper where the entity's mention appears.
# offset_start: position of the character where the entity's mention substring begins in the text (title + abstract).
# offset_finish: position of the character where the entity's mention substring ends in the text (title + abstract).
# type: type of entity as one of the 6 possible categories mentioned in the first part of the competition.
# mention: substring representing the actual entity's mention in the text. Can also be extracted using the offsets and the input text.
# entity_ids*: comma separated external IDs from a biomedical ontology to specifically identify the entity.
*The ontologies used to obtain these external IDs are the following:
Gene: NCBI Gene
Disease: MEDIC (which is MESH + OMIM)
Chemical: MESH
Variant: RS# in dbSNP
Species: NCBI Taxonomy
CellLine: NCBI Taxonomy
・relations_train.csv: CSV file containing all the relations found in the abstracts that can be used for training.
# id: unique ID of the relation
# abstract_id: PubMed ID of the research paper where the relation appears.
# type: type or predicate connecting the two entities.
# entity_1_id: external ID of the entity that corresponds to the subject of the relation.
# entity_2_id: external ID of the entity that corresponds to the object of the relation.
# novel: whether the relation found corresponds to a novel discovery or not.
・abstracts_test.csv: CSV file containing research papers whose relations between entities have to be identified with a trained model.
# abstract_id: PubMed ID of the research paper.
# title: title of the research paper.
# abstract: abstract or summary of the research paper.
・entities_test.csv: CSV file containing all the entities' mentions found in the texts from abstracts_test.csv that can be used to identified the test relations and create the submission file.
# id: unique ID of the entity's mention
# abstract_id: PubMed ID of the research paper where the entity's mention appears.
# offset_start: position of the character where the entity's mention substring begins in the text (title + abstract).
# offset_finish: position of the character where the entity's mention substring ends in the text (title + abstract).
# type: type of entity as one of the 6 possible categories mentioned in the first part of the competition.
# mention: substring representing the actual entity's mention in the text. Can also be extracted using the offsets and the input text.
# entity_ids: comma separated external IDs from a biomedical ontology to specifically identify the entity.
・submission_example_2.csv: CSV file containing an example of how a submission file (output) should look like in order to be uploaded and scored in the platform. It is similar to the format of relations_train.csv. Please note that this is just a small example where types and entity_ids have been randomly selected, and the number of relations per abstract might be much smaller than usual. The column order must be respected, as well as using tab as a delimiter and including a header with the column titles. Failure to comply with this format will result in error or lower score.
# id: unique ID of the relation
# abstract_id: PubMed ID of the research paper where the relation appears.
# type: type or predicate connecting the two entities.
# entity_1_id: external ID of the entity that corresponds to the subject of the relation.
# entity_2_id: external ID of the entity that corresponds to the object of the relation.
# novel: whether the relation found corresponds to a novel discovery or not.
The evaluation metric for this problem is a modified version of the Jaccard Similarity Score:
・For each abstract_id A, a set of predicted relations P and a set of correct relations O, the formula is: |P⋂O| / (|P| + |O| - |P⋂O|), where ⋂ means intersection, || means length (amount of relations for that abstract_id).
・Matching relations (for the intersection) are determined the following way: a predicted relation and a correct relation's match is represented as an "intersection score" between 0 and 1, under the formula intersection_score = 0.25 x {correct pair of entities (irrespective of order)} + 0.5 x {correct pair of entities and correct type}* + 0.25 x {correct pair of entities and correct novelty}. For example, given a correct relation from an abstract, if there is a predicted relation with the same pair of entities, type and novelty, then the "intersection score" of that match is 1. If there is only a predicted relation that contains the same pair of entities, but not the same type or novelty, then the "intersection score" of that match is 0.25.
*In the particular case of the types Positive_Correlation and Negative_Correlation, if there is a match on the pair of entities but there is a misclassification between one of these 2 types (e.g. Positive_Correlation is indicated instead of Negative_Correlation or viceversa), instead of 0.5 the score given for it will be 0.175.
The Jaccard similarity scores of all abstract_ids are then averaged to return the final score.
Final competition results are based on competitors' combined, weighted scores from both phases of the competition: 30% of the total score will be determined by problem statement 1 and 70% of the total score will be determined by problem statement 2.
FAQs
Who do I contact if I need help regarding a competition?
If you have any inquiries about participating in this competition, please don’t hesitate to reach out to us at [email protected]. For questions about eligibility or prize distribution, email NCATS at [email protected]
How will I know if I’ve won?
If you are one of the top seven winners for this competition, we will email you with the final result and information about how to claim your reward.
How can I report a bug?
Please shoot us an email at [email protected] with details and a description of the bug you are facing, and if possible, please attach a screenshot of the bug itself.
If I win, how can I receive my reward?
The money prize will be awarded by NIH/NCATS directly to the winner (if an individual) or Team Lead of the winning team (if a team). Prizes awarded under this Challenge will be paid by electronic funds transfer and may be subject to Federal income taxes. HHS/NIH will comply with the Internal Revenue Service withholding and reporting requirements, where applicable.
How is novelty defined in the dataset?
Novelty tags were generated by curators based entirely on the abstract as written, without doing an exhaustive search into the history of the work. In other words, when the curators were looking over this abstract, the language used in it suggested to them that this finding was novel, so these tags are based purely on context within the abstract.
Rules
1. This competition is governed by the following Terms of Participation (“Participation Rules”). Participants must agree to and comply with the Participation Rules to compete.
2. This competition consists of 2 problem statements, herein considered as competition sub-phases. Winners will be determined by a weighted average of scores from the two competition phases: 30% of the total score will be determined by problem statement 1 and 70% of the total score will be determined by problem statement 2.
3. The competition dates are detailed below:
Phase 1 Start Date: November 9th, 2021
Phase 1 Closing Date: December 23rd, 2021
Phase 2 Start Date: December 27th, 2021
Phase 2 Closing Date: February 28th, 2022
Submission (Final Source Code): March 11th, 2022
Winner’s Announced: April 8th, 2022
4. Participants are allowed to participate in an individual capacity or as part of a team.
5. It is not allowed to merge teams midway through the competition.
6. Each participant may only be a member of a single team and may not participate as individuals and on a team simultaneously.
7. In order to participate in this competition and be eligible for the prize money, participants must be a U.S. citizen or a U.S. permanent resident. Non-U.S. citizens and non-permanent residents can participate as well, as a member of a team that includes a citizen or permanent resident of the U.S, or they can participate on their own. However, such non-U.S. citizens and non-permanent residents are not eligible to win a monetary prize (in whole or in part). Their participation as part of a winning team, if applicable, may be recognized when the results are announced. Similarly, if participating on their own, they may be eligible to win a non-cash recognition prize. Proof of citizenship and permanent residency will be required. For more information on competition eligibility requirements, please see https://ncats.nih.gov/funding/challenges/litcoin
8. In the case of a team participation, all submissions must be made by the team lead.
9. The use of external datasets for the purposes of training is allowed, but submissions
must be generated using the test corpus provided.
10. During the competition period, participants will be allowed to submit a maximum number of 5 submissions per day. If participants exceed the set submission limit, the platform will be reset to allow additional 5 submissions the following day. Please keep this in mind when uploading a submission file. Any attempt to circumvent stated limits will result in disqualification.
11. Participants are not permitted to share or upload the competition dataset to any platform outside of competition. Participants that do not comply with the confidentiality regulations of the competition will be disqualified.
12. The top seven (7) winning participants will be eligible to receive a competition prize (ranked by performance) after we have received, successfully executed, and confirmed the validity of both the code and the solution (See 14.). In order to ensure that at least 7 participants may be awarded prizes, the top fifteen (15) individuals/teams will be asked to submit their source code for evaluation (see 13.).
13. Once potential competition winners are determined and our team reaches out to them, the top scoring participants must provide the following by March 11, 2022 for evaluation to be qualified as competition winner(s) and receive their prize:
Winning Model Documentation template filled in (this document is available on the “Resources” tab on the competition page)
b. All source files required to preprocess the data
c. All source files required to build, train and make predictions with the model using the processed data
d. A requirements.txt (or equivalent) file indicating all the required libraries and their versions as needed
e. A ReadMe file containing the following:
• Clear and unambiguous instructions on how to reproduce the predictions from start to finish including data pre-processing, feature extraction, model training and predictions generation
• Environment details regarding where the model was developed and trained, including OS, memory (RAM), disk space, CPU/GPU used, and any required environment configurations required to execute the code
• Clear answers to the following questions:
- Which data files are being used?
- How are these files processed?
- What is the algorithm used and what are its main hyperparameters?
- Any other comments considered relevant to understanding and using the model
14. Solution submissions should be able to generate the exact output that gives the corresponding score on the leaderboard. If the score obtained from the code is different from what’s shown on the leaderboard, the new score (which may be lower) will be used for the final rankings unless a logical explanation is provided. Please make sure to set the seed or random_state etc. so we can obtain the same result from your code.
15. Solution submissions will also be used to generate output based on a validation dataset, generated in the same manner with which the provided test and training sets were generated, which will be kept hidden from all participants, in order to verify that code was not customized for the provided dataset. This output will not be used to determine leaderboard position, but could be used to disqualify a participant from receiving a prize if the output is judged to be severely inaccurate by bitgrit, CrowdPlat and NCATS.
16. In order to be eligible for the prize, a competition winner (whether an individual, group of individuals, or entity) must agree to grant to the NIH an irrevocable, paid-up, royalty-free non-exclusive worldwide license to reproduce, publish, post, link to, share, and display publicly the submission on the web or elsewhere, and a nonexclusive, non transferable, irrevocable, paid-up license to practice or have practiced for or on its behalf, the solution throughout the world. For more detailed information, please visit https://ncats.nih.gov/funding/challenges/litcoin.
17. Any prize awards are subject to verification of eligibility and compliance with these Participation Rules. Novelty and innovation of submissions may also affect the final ranking. All decisions of bitgrit, CrowdPlat and NCATS will be final and binding on all matters relating to this Competition.
18. Cash prizes will be paid directly by NIH/NCATS to the competition winners. In the case of a winning team, the money prize will be paid directly by NIH/NCATS to the Team Lead. Non-U.S. citizens and non-permanent residents are not eligible to receive a cash prize (in whole or in part). Their participation as part of a winning team, if applicable, may be recognized when the results are announced.
Prizes awarded under this Challenge will be paid by electronic funds transfer and may be subject to local, state, federal and foreign tax reporting and withholding requirements. HHS/NIH will comply with the Internal Revenue Service withholding and reporting requirements, where applicable.
19. If two or more participants have the same score on the leaderboard, an earlier submission will take precedence and be ranked higher than a later submission.
20. If you have any inquiries about participating in this competition, please don’t hesitate to reach out to us at [email protected]. For questions about eligibility or prize distribution, email NCATS at [email protected]
Thanks for your submission!
We'll send updates to your email. You can check your email and preferences here.
My Submissions
Terms of Participation
Agreement regarding confidential information and competition rules
These Terms of Participation (“Agreement”) are hereby entered into on the date of your participation conditional upon your agreement to these terms (“Effective Date”) between you (“Participant”), as a participant in the LitCoin NLP Challenge: Part 2 competition (the “Competition”) hosted at bitgrit.net (the “Competition Site”), and bitgrit Inc. (“bitgrit”).
IMPORTANT, READ CAREFULLY: Your participation in the Competition on the above Competition Site is conditional upon your comprehension of, compliance with, and acceptance of these terms. Please review thoroughly before accepting.
I. General Clauses
1. This competition consists of 2 problem statements, herein considered as competition sub-phases. Winners will be determined by a weighted average of scores from the two competition phases: 30% of the total score will be determined by problem statement 1 and 70% of the total score will be determined by problem statement 2.
2. Participants are allowed to participate in an individual capacity or as part of a team.
3. It is not allowed to merge teams midway through the competition.
4. Each participant may only be a member of a single team and may not participate as an individual and on a team simultaneously.
5. In order to be eligible for the prize money, participants must be a U.S. citizen or a U.S. permanent resident. Non-U.S. citizens and non-permanent residents can participate as well, as a member of a team that includes a citizen or permanent resident of the U.S., or they can participate on their own. However, such non-U.S. citizens and non-permanent residents are not eligible to win a monetary prize (in whole or in part). Their participation as part of a winning team, if applicable, may be recognized when the results are announced. Similarly, if participating on their own, they may be eligible to win a non-cash recognition prize. Proof of citizenship and permanent residency will be required. For more information on competition eligibility requirements, please see https://ncats.nih.gov/funding/challenges/litcoin
6. In the case of a team participation, all submissions must be made by the team lead.
7. Participants are not permitted to share or upload the competition dataset to any platform outside of competition. Participants that do not comply with the confidentiality regulations of the competition will be disqualified.
8. All participants who are under the age of 18, or are considered a minor in the country they live in, are required to submit a signed copy of the parent/legal guardian consent form. This form can be found at https://ncats.nih.gov/files/LitCoin-Parental-Consent-Form-508.pdf. Signed forms can be sent to [email protected]
9. The top seven (7) winning participants will be eligible to receive a competition prize (ranked by performance) after we have received, successfully executed, and confirmed the validity of both the code and the solution. In order to ensure that at least 7 participants may be awarded prizes, the top fifteen (15) individuals/teams will be asked to submit their source code for evaluation.
10. Any prize awards are subject to verification of eligibility and compliance with these Terms of Participation. Novelty and innovation of submissions may also affect the final ranking. All decisions of bitgrit, CrowdPlat and NCATS will be final and binding on all matters relating to this Competition.
11. Cash prizes will be paid directly by NIH/NCATS to the competition winners. In the case of a winning team, the money prize will be paid directly by NIH/NCATS to the Team Lead. Non-U.S. citizens and non-permanent residents are not eligible to receive a cash prize (in whole or in part). Their participation as part of a winning team, if applicable, may be recognized when the results are announced.
Payments to winners may be subject to local, state, federal and foreign tax reporting and withholding requirements.
II. Clauses of Non-Disclosure
1. Confidential Information
(1) Confidential Information shall mean any and all information disclosed by bitgrit to the Participant with regard to the entry and participation in the Competition, including (i) metadata, source code, object code, firmware etc. and, in addition to these, (ii) analytes, compilations or any other deliverable produced by the Participant in which such disclosed information is utilized or reflected.
(2) Confidential Information shall not include information which;
(a) is now or hereafter becomes, through no act or omission on the Participant, generally known or available to the public, or, in the present or into the future, enters the public domain through no act or omission by the Participant;
(b) is acquired by the Participant before receiving such information from bitgrit and such acquisition was without restriction as to the use or disclosure of the same;
(c) is hereafter rightfully furnished to the participant by a third party, without restriction as to use or disclosure of the same.
2. Non-Disclosure Obligation
The Participant agrees:
(a) to hold Confidential Information in strict confidence;
(b) to exercise at least the same care in protecting Confidential Information from disclosure as the party uses with regard to its own confidential information;
(c) not to use any Confidential Information except for as it concerns the Purpose elaborated upon above;
(d) not to disclose such Confidential Information to third parties;
(e) to inform bitgrit if it becomes aware of an unauthorized disclosure of Confidential Information.
3. No Warranty
All Confidential Information is provided “as is.” None of the Confidential Information shall contain any representation, warranty, assurance, or integrity by bitgrit to the Participant of any kind.
4. No Assignment of Rights
The Participant agrees that nothing contained in this Agreement shall be construed as conferring, transferring or granting any rights to the Participant, by license or otherwise, to use any of the Confidential Information.
III Rights to Deliverables
1. Transferable rights / Licenses
In order to be eligible for the prize, a competition winner (whether an individual, group of individuals, or entity) must agree to grant to the NIH an irrevocable, paid-up, royalty-free non-exclusive worldwide license to reproduce, publish, post, link to, share, and display publicly the submission on the web or elsewhere, and a non-transferable, irrevocable, paid-up, royalty-free non-exclusive worldwide license to practice or have practiced for or on its behalf, the solution. For more detailed information, please visit https://ncats.nih.gov/funding/challenges/litcoin.
2. Restrictions on Use
The Participant hereby agrees to not utilize Submitted Algorithms to or for businesses, business endeavors, products, or services in competition with bitgrit or with the Competition co-host.
3. Authorization of Non-compensatory Use
The Participant hereby authorizes and consents to bitgrit and/or relevant third parties utilizing, analyzing, altering, or further reauthorizing the use of the Submitted Algorithm(s) to other third parties and will not make claims or demands for monetary compensation in regard to the above purposes.
4. Representations and Warranties
The Participant hereby declares and warrants that the Participant’s, bitgrit’s, and the related third party’s use of the Submitted Algorithms does not violate or infringe upon the intellectual property rights, business secrets, or other rights of any other third party.
5. Warranty Against Exercising of Moral Rights
The Participant agrees to not exercise moral rights to bitgrit or to related third parties in regard to the Submitted Algorithms.
6. Rights Regarding Modified and Derivative Works
The Participant hereby agrees that Intellectual Property Rights and other rights regarding any modified or derivative works created from the Submitted Algorithms shall belong to the creator of that modified or derivative work.
Uploading a new submission file will overwrite the existing file.
Terms & Conditions
Competition Unavailable
Login
Please login to access this page
Join our newsletter
Our team releases a useful and informative newsletter every month. Subscribe to get it delivered straight into your inbox!
bitgrit will be your one stop shop for all
your AI solution needs
Services
Business
Contact Us
- Japan Office
- +81 3 6671 8256
-
Koganei Building 4th Floor,
3-4-3 Kami-Meguro,
Meguro City, Tokyo, Japan - UAE Office
-
DD-14-122-070, WeWork Hub 71 Al Khatem Tower,
ADGM Square Al Maryah Island, Abu Dhabi, UAE