Brief
Welcome to The NASA Breath Diagnostics Challenge!
The National Aeronautics and Space Administration (NASA) Science Mission Directorate (SMD) seeks innovative solutions to improve the accuracy of its NASA E-Nose as a potential clinical tool that would measure the molecular composition of human breath to provide diagnostic results. We invite data scientists and AI experts to participate in this challenge, leveraging their expertise to develop a classification model that can accurately discriminate between the breath of COVID-positive and COVID-negative individuals, using data obtained from a recent clinical study. The total prize pool for this competition is $55,000.
The objective of this challenge is to develop a diagnostic model by using NASA E-Nose data gathered from exhaled breath of 63 volunteers in a COVID-19 study. Challenge participants will use advanced data preparation and AI techniques to overcome the limited sample size of subjects in the COVID-19 study. The innovative solutions emerging from this challenge may assist NASA in advancing the technical capability of the NASA E-Nose for a wide range of clinical applications relevant to human space exploration.
Prizes
- 1st Place - $20,000
- 2nd Place - $12,000
- 3rd Place - $8,000
- 4th Place - $4,000
- 5th Place - $4,000
- 6th Place - $3,000
- 7th Place - $2,000
- 8th Place - $2,000
Timeline
Competition Starts – July 5th, 2024
Competition Ends – September 6th, 2024
Winners Announcement – November 8th, 2024
Data Breakdown
IMPORTANT: This competition uses unique criteria to determine eligible models. It is extremely important that participants read the rules carefully before continuing. In particular, please pay special attention to the Rules section before embarking in this challenge.
Welcome to The NASA Breath Diagnostics Challenge
The objective of this challenge is to develop a classification model that can accurately diagnose patients with COVID-19 based on data that was captured from the exhaled breath of volunteers using the NASA E-Nose device. The total number of patients, and therefore examples, is 63. You will note that this is a limited dataset, so making efficient use of the provided data is of extreme importance in this challenge. We encourage creativity on dealing with the small sample size, since one of the goals of this competition is to address the challenge of limited training data in scenarios such as emerging diseases with few confirmed cases. The limited data also impacts submission testing, so it is very important to understand the rules governing this event and how this will be ultimately scored.
The data consists of 63 txt files representing the 63 patients, numbered 1 to 63.
Each file contains the Patient ID, the COVID-19 Diagnosis Result (POSITIVE or NEGATIVE) and numeric measurements for 64 sensors, D1 to D64. These sensors are installed within the E-Nose device, and they each measure different molecular signals in the breath of the patients.
All sensor data is indexed by a timestamp with the format Min:Sec, which represents the minute of the hour, and the second of that minute in which that sensor was sampled. The hour of the timestamp has been left out, but when the minute counter resets, it means that the next hour has begun. Keep this in mind when working with this time axis.
In order to achieve maximum consistency across patients, the data was exposed to the E-Nose device using a pulsation bag that had previously collected a patient's breath. The E-Nose measurement procedure also includes flushing the sensors with ambient air, which can be used to calibrate the readings taken when exposed to human breath.
The data was exposed to the E-Nose device for all patients using windows of exposure through the following process:
1. 5 min baseline measurement using ambient air
2. 1 min breath sample exposure and
measurement, using the filled breath bag
3. 2 min sensor “recovery” using ambient air
4. 1 min breath sample exposure and
measurement, using the filled breath bag
5. 2 min sensor “recovery” using ambient air
6. 1 min breath sample exposure and
measurement, using the filled breath bag
7. 2 min sensor “recovery” using ambient air
Total time = 14 mins
The data is distributed into training and test sets:
Train: 45 patients
Test: 18 patients
Within the dataset there are also 2 other files: submission_example.csv and train_test_split.csv. The first file represents how all submission files should look like. The values should be 0 for NEGATIVE and 1 for POSITIVE. Failing to follow this format will result in an error or lower score.
The second file (train_test_split.csv) represents which patient IDs are considered for Train (i.e., labeled) and which one are for Test (i.e., not labeled).
The order of the predictions in the submission file should be the same as in the TEST indicated rows in train_test_split_order.csv file.
The index column of this submission file is NOT the ID of the patient, but the order of values in the Result column should follow the one in the train_test_split_order.csv file. This is very important.
The evaluation metric is Accuracy.
The leaderboard will be split into a Public and Private leaderboard, where the preliminary results to advance to the final evaluation stage will be determined by the Private Leaderboard, which will be revealed at the end of the competition period. Again, please refer to the Rules section, especially rules 7 to 11, in order to understand the particular evaluation criteria for this challenge.
Please note that the goal of the Public Leaderboard will be mostly for your reference, as it will represent only an approximate assessment of a model's performance. The final score may deviate substantially from this score. Any attempt to try to "game" the Public Leaderboard score or artificially inflate it will not result in any benefit accounting for the final score and could end in disqualification if the model is found to be purposely overfitting the Public Leaderboard test data.
We wish you good luck in this challenge. If there are questions, please refer to the FAQ, Rules or send an email to [email protected].
FAQs
Rules
- This competition is governed by the following Terms of Participation. Participants must agree to and comply with these Terms to participate.
- Users can make a maximum number of 2 submissions per day. If users want to submit new files after making 2 submissions in a day, they will have to wait until the following day to do so. Please keep this in mind when uploading a submission file. Any attempt to circumvent stated limits will result in disqualification.
- The use of external datasets is strictly forbidden. However, we encourage the creative use of derivative datasets, such as calculated features, synthetic training data and other data augmentation techniques.
- It is not allowed to upload the competition dataset to other websites. Users who do not comply with this rule will be immediately disqualified.
- The final submission has to be selected manually before the end of the competition (you can select up to 2), otherwise the final submission will be selected automatically based on your highest public score.
- If at the time of the end of the competition two or more participants have the same score on the private leaderboard, the participant who submitted the winning file first will be considered for the following review stage.
- Once the competition period ends, our team will reach out to top scorers based on the Private Leaderboard score, which will be revealed at this point. Top scorers will be asked to provide the following information by September 16th, 2024, to be qualified for the final review stage, Failure to provide this information may result in disqualification.
a. All source files required to preprocess the data
b. All source files required to build, train and make predictions with the model using the processed data
c. A requirements.txt (or equivalent) file indicating all the required libraries and their versions as needed
d. A ReadMe file containing the following:
• Clear and unambiguous instructions on how to reproduce the predictions from start to finish includi
• Environment details regarding where the model was developed and trained, including OS, memory (RAM), disk space, CPU/GPU used, and any required environment configurations required to execute the training code.
• Clear answers to the following questions:
- Which data files are being used to train the model?
- How is the training dataset prepared, including derivative data?
- What is the algorithm used and what are its main hyperparameters?
- Any other comments considered relevant to understanding, replicating and using the model. - The submitted solution should be able to generate exactly the same model and the same inferencing output that gives the corresponding score on the leaderboard. If the score obtained from the code is different from what’s shown on the leaderboard, the new score will be used for the final rankings unless a logical explanation is provided. Please make sure to set the seed or random state appropriately so we can obtain the same result from your code.
- To ensure fairness and integrity in the competition, participants are prohibited from exploiting any non-statistical patterns, anomalies, or other data artifacts that may exist within the dataset. Any attempt to identify and utilize such patterns, which do not derive from legitimate model analysis or generalization but rather from specific quirks or errors in the data collection or labeling process, will result in immediate disqualification. This rule is intended to maintain a level playing field and ensure that model performance is based solely on genuine predictive ability rather than incidental characteristics of the data.
- The submitted models must be capable of performing inference efficiently on standard consumer-grade hardware, such as a tablet or similar mobile device, within a reasonable time frame, typically less than a minute. This requirement ensures that the models are not only accurate but also practical and scalable for real-world applications where resources may be limited.
- Given the particularly small size of the data for this competition, additional measures are required to ensure fairness and integrity. To be considered eligible for winning a prize, participants must meet the following criteria, in order:
- All the required information must have been provided (see rule 7).
- The scores on the private and public leaderboards must be reproducible (see rule 8).
- Their models are able to be used for inference on modern consumer-grade hardware within a reasonable time frame (see rule 10).
- The bitgrit and NASA team will calculate an Internal Score by performing Cross Validation and similar experiments, including testing with different random seeds and stratified data splits, using the provided model and the same size of test data. If the Internal Score deviates by more than 10% from the Overall Score (Private + Public Scores), only the Internal Score will be considered the Final Score. If the results are consistent, the Final Score will be calculated as the average of the Overall Score and the Internal Score.
- The final ranking will be determined based on the Final Score, and winners will be awarded according to this ranking.
- Any evidence of exploiting non-statistical patterns, data artifacts, or any other anomalies that do not derive from legitimate model generalization will result in immediate disqualification. (see rule 9)
- Competition prizes will only be awarded after the receipt, successful execution, and confirmation of the validity, integrity, and consistency of both the code and the solution (see rules 7, 8, 9), along with the final challenge score calculation.
- In order to be eligible for the prize, the competition winner must agree to transfer to NASA and the relevant transferee of rights in such Competition all transferable rights, such as copyrights, rights to obtain patents and know-how, etc. in and to all analysis and prediction results, reports, analysis and prediction model, algorithm, source code and documentation for the model reproducibility, etc., and the Submissions contained in the Final Submissions.
- Any prize awards are subject to eligibility verification and compliance with these Terms of Participation. All decisions of bitgrit will be final and binding on all matters relating to this Competition.
- Payments to winners may be subject to local, state, federal and foreign tax reporting and withholding requirements.
- If you have any inquiries about this competition, please don’t hesitate to reach out to us at [email protected].
Thanks for your submission!
We'll send updates to your email. You can check your email and preferences here.
My Submissions
Non-Disclosure Agreement (NDA)
An agreement to not reveal the information shared regarding this competition to others.
- This Non-Disclosure Agreement (“Agreement”) is hereby entered into on 26th January 2025 (“Effective Date”) between you (“Participant”), as a participant in the The NASA Breath Diagnostics Challenge (the “Competition”) hosted at bitgrit.net (the “Competition Site”), and bitgrit Inc. (“Bitgrit”).
- Purpose: This Agreement aims to protect information disclosed by Bitgrit to Participant (the “Purpose”).
- Confidential Information: (1) Confidential Information shall mean any and all information disclosed by Bitgrit to the Participant with regard to the entry and participation in the Competition, including (i) metadata, source code, object code, firmware etc. and, in addition to these, (ii) analytes, compilations or any other deliverable produced by the Participant in which such disclosed information is utilized or reflected. (2) Confidential Information shall not include information which; (a) is now or hereafter becomes, through no act or omission on the Participant, generally known or available to the public, or, in the present or into the future, enters the public domain through no act or omission by the Participant; (b) is acquired by the Participant before receiving such information from Bitgrit and such acquisition was without restriction as to the use or disclosure of the same; (c) is hereafter rightfully furnished to the participant by a third party, without restriction as to use or disclosure of the same.
- Non-Disclosure Obligation: The Participant agrees: (a) to hold Confidential Information in strict confidence; (b) to exercise at least the same care in protecting Confidential Information from disclosure as the party uses with regard to its own confidential information; (c) not use any Confidential Information except for as it concerns the Purpose elaborated upon above; (d) not disclose such Confidential Information to third parties; (e) to inform Bitgrit if it becomes aware of an unauthorized disclosure of Confidential Information.
- No Warranty: All Confidential Information is provided “as is.” None of the Confidential Information shall contain any representation, warranty, assurance, or integrity by Bitgrit to the Participant of any kind.
- No Granting of Rights: The Participant agrees that nothing contained in this Agreement shall be construed as conferring, transferring or granting any rights to the Participant, by license or otherwise, to use any of the Confidential Information.
- No Assignment: Participant shall not assign, transfer or otherwise dispose of this Agreement or any of its rights, interest or obligations hereunder without the prior written consent of Bitgrit.
- Injunctive Relief: In the event of a breach or the possibility of breach of this Agreement by the Participant, in addition to any remedies otherwise available, Bitgrit shall be entitled to seek injunctive relief or equitable relief, as well as monetary damages.
- Return/Destruction of the Confidential Information: (1) On the request of Bitgrit, the Participant shall promptly, in a manner specified by Bitgrit, return or destroy the Confidential Information along with any copies of said information. (2) Bitgrit may request the Participant to submit documentation to confirm the destruction of said Confidential Information to Bitgrit in the event that Bitgrit requests the Participant to destroy this Confidential Information, pursuant to the provision of the preceding paragraph.
- Term: The obligations with respect to the Confidential Information under this Agreement shall survive for a period of three (3) years after the effective date. Provided however, if the Confidential Information could be considered to fall under the category of “Trade Secret” of Bitgrit or any related third parties, this Agreement is to remain effective relative to that information for as far as the said information is regarded as Trade Secret under applicable laws and regulations. If the Confidential Information contains personal information, the terms of this Agreement shall remain effective on that information permanently.
- Governing Law: This Agreement shall be governed by and construed and interpreted under the laws of Japan without reference to its principles governing conflicts of laws.
Terms & Conditions
Competition Unavailable
Login
Please login to access this page
Join our newsletter
Our team releases a useful and informative newsletter every month. Subscribe to get it delivered straight into your inbox!
bitgrit will be your one stop shop for all
your AI solution needs
- Japan Office
- +81 3 6671 8256
-
Koganei Building 4th Floor,
3-4-3 Kami-Meguro,
Meguro City, Tokyo, Japan - UAE Office
-
DD-14-122-070, WeWork Hub 71 Al Khatem Tower,
ADGM Square Al Maryah Island, Abu Dhabi, UAE