The NASA Breath Diagnostics Challenge | bitgrit
Competition banner

The NASA Breath Diagnostics Challenge

Enhance NASA's E-Nose for Accurate Medical Diagnostics

46 days to go
165 Participants
297 Submissions
bitgrit linkedin

Welcome to The NASA Breath Diagnostics Challenge!

The National Aeronautics and Space Administration (NASA) Science Mission Directorate (SMD) seeks innovative solutions to improve the accuracy of its NASA E-Nose as a potential clinical tool that would measure the molecular composition of human breath to provide diagnostic results. We invite data scientists and AI experts to participate in this challenge, leveraging their expertise to develop a classification model that can accurately discriminate between the breath of COVID-positive and COVID-negative individuals, using data obtained from a recent clinical study. The total prize pool for this competition is $55,000.

The objective of this challenge is to develop a diagnostic model by using NASA E-Nose data gathered from exhaled breath of 63 volunteers in a COVID-19 study.  Challenge participants will use advanced data preparation and AI techniques to overcome the limited sample size of subjects in the COVID-19 study. The innovative solutions emerging from this challenge may assist NASA in advancing the technical capability of the NASA E-Nose for a wide range of clinical applications relevant to human space exploration.

  • 1st Place - $20,000
  • 2nd Place - $12,000
  • 3rd Place - $8,000
  • 4th Place - $4,000
  • 5th Place - $4,000
  • 6th Place - $3,000
  • 7th Place - $2,000
  • 8th Place - $2,000

Competition Starts – July 5th, 2024
Competition Ends – September 6th, 2024
Winners Announcement – November 8th, 2024

Data Breakdown

IMPORTANT: This competition presents a unique structure to determine eligible models for winning. It is extremely important that participants read the rules carefully before continuing. In particular, sections 7-9 should be very well understood before embarking in this challenge. 

Welcome to The NASA Breath Diagnostics Challenge

The objective of this challenge is to develop a classification model that can accurately diagnose patients with COVID-19 based on data that was captured using NASA's own E-Nose device.

The total number of patients, and therefore examples, is 63. As you can probably tell, this is a very limited dataset, so making efficient use of the provided data is of extreme importance in this challenge. We encourage creativity on dealing with the data in order to make the best use of it.
Also for this reason, it is very important to understand the rules governing this event and how this will be ultimately scored to ensure that the best models win.

The data consists of 63 txt files representing the 63 patients, numbered 1 to 63.
Each file contains the Patient ID, the COVID-19 Diagnosis Result (POSITIVE or NEGATIVE) and numeric measurements for 64 sensors D1 to D64. These sensors are evenly distributed inside the E-Nose device and they measure different biochemical signals that can be present in the breadth of the patients.

All sensor data is indexed by a timestamp with the format Min:Sec, which represents the minute of the hour, and the second of that minute in which that sensor was sampled. The hour is left out, but when the minute counter resets, it means that the next hour has begun. Keep this in mind when working with this time axis.

In order to achieve maximum consistency across patients, the data was exposed to the E-Nose device using a pulsation bag that had previously collected a patient's breath. The E-Nose device also reads from an ambient air signal that can be used to normalize the exposed breaths.
The data was exposed to the E-Nose device for all patients using windows of exposure through the following process:

1. 5 min baseline measurement using ambient air
2. 1 min breath sample exposure and
measurement, using the filled breath bag
3. 2 min sensor “recovery” using ambient air
4. 1 min breath sample exposure and
measurement, using the filled breath bag
5. 2 min sensor “recovery” using ambient air
6. 1 min breath sample exposure and
measurement, using the filled breath bag
7. 2 min sensor “recovery” using ambient air
Total time = 14 mins

The data is distributed like this:
Train: 45 patients
Test: 18 patients

Within the dataset there are also 2 other files: submission_example.csv and train_test_split_order.csv. The first file represents how all submission files should look like. The values should be 0 for NEGATIVE and 1 for POSITIVE.
Failing to follow this format will result in error or lower score.

The second file (train_test_split_order.csv) represents which patient IDs are considered for Train (i.e., are labeled) and which one are for Test (not labeled), and the order in which they should be put inside the the submission files.

The order of the predictions in the submission file should be the same as in the TEST indicated rows in train_test_split_order.csv file. 
The index column of this submission file is NOT the ID of the patient, but the order of values in the Result column should follow the one in the train_test_split_order.csv file. This is very important.

The evaluation metric is Accuracy.

The leaderboard will be split into a Public and Private leaderboard, where the preliminary results to advance to the final evaluation stage will be determined by the Private Leaderboard, which will be revealed at the end of the competition period.
Again, please refer to the rules, in particular sections 7 to 9 to understand the particular evaluation criteria for this challenge.

Please note that the goal of the Public Leaderboard will be mostly for reference, as it will represent only a very rough assesment of a model's performance. The final score may deviate substantially from this score. 
Any attempt to try to "game" this score or artificially inflate it will not result in any benefit accounting for the final score and could end in disqualification if the model is found to be purposely overfitting this value.

We wish you good luck in this challenge. If there are questions, please refer to the FAQ, Rules or send an email to [email protected].

Who do I contact if I need help regarding a competition?
For any inquiries, please contact us at [email protected]
How will I know if I've won?
If you are one of the top three winners for this competition, we will email you with the final result and information about how to claim your reward.
How can I report a bug?
Please shoot us an email at [email protected] with details and a description of the bug you are facing, and if possible, please attach a screenshot of the bug itself.
If I win, how can I receive my reward?
Prizes will be paid by bank transfer. If for some reason you are not able to accept payment by bank transfer, please let us know and we will do our best to accommodate your needs as possible.

1. Note: The challenge is open to individual participants and teams. Prize eligibility is subject to the United States Federal Acquisition Regulation. For more information, visit:

2. This competition is governed by the following Terms of Participation. Participants must agree to and comply with these Terms to participate.

3. Users can make a maximum number of 2 submissions per day. If users want to submit new files after making 2 submissions in a day, they will have to wait until the following day to do so. Please keep this in mind when uploading a submission.csv file. Any attempt to circumvent stated limits will result in disqualification.

4. The use of external datasets is strictly forbidden.

5. It is not allowed to upload the competition dataset to other websites. Users who do not comply with this rule will be immediately disqualified.

6. The final submission has to be selected manually before the end of the competition (you can select up to 2), or else it will be selected automatically based on your highest public score.

7. If at the time of the end of the competition two or more participants have the same score on the private leaderboard, the participant who submitted the winning file first will be considered for the following review stage.

8. A competition prize will be awarded after we have received, successfully executed, confirmed the validity of both the code and the solution (see 8), and calculated the final challenge score (see 9). 
Once the competition period ends, our team will reach out to top scorers based on the Private Leaderboard score, which will be revealed at this point. Top scorers will be asked to provide the following information by September 16th, 2024 to be qualified for the final review stage, Failure to provide this information may result in disqualification. 
a. All source files required to preprocess the data 
b. All source files required to build, train and make predictions with the model using the processed data 
c. A requirements.txt (or equivalent) file indicating all the required libraries and their versions as needed 
d. A ReadMe file containing the following: 
     • Clear and unambiguous instructions on how to reproduce the predictions from start to finish including data pre-processing, feature extraction, model training, and predictions generation 
     • Environment details regarding where the model was developed and trained, including OS, memory (RAM), disk space, CPU/GPU used, and any required environment configurations required to execute the code 
     • Clear answers to the following questions: 
          - Which data files are being used? 
          - How are these files processed? 
          - What is the algorithm used and what are its main hyperparameters? 
          - Any other comments considered relevant to understanding and using the model

9. The submitted solution should be able to generate exactly the same output that gives the corresponding score on the leaderboard. If the score obtained from the code is different from what’s shown on the leaderboard, the new score will be used for the final rankings unless a logical explanation is provided. Please make sure to set the seed or random state appropriately so we can obtain the same result from your code.

10. Given the particularly small size of the data for this competition additional measures have been put in place in order to be considered elegible for winning a prize.
To be considered a winner in this competition, the following criteria should be met:
    1. All the required information should have been provided. (see 7)
    2. The scores in the private and public leaderboard should be reproducible (see 8)
    3. bitgrit and NASA team will perform internal scoring using Cross Validation and similar strategies with the provided model, using the same test size, to ensure that the model is consistent with its results. If the results are inconsistent with the score on the leaderboard (>10% difference between the Overall Score (Private + Public Scores) and the Internal Score, then only the Internal Score will be considered for as Final Score. If the results are consistent, then the Final Score will be calculated as the average between the Overall Score and the Internal Score.
    4. The final ranking will be elaborated using the Final Score, and the winners will be awarded per this ranking.

11. In order to be eligible for the prize, the competition winner must agree to transfer to NASA and the relevant transferee of rights in such Competition all transferable rights, such as copyrights, rights to obtain patents and know-how, etc. in and to all analysis and prediction results, reports, analysis and prediction model, algorithm, source code and documentation for the model reproducibility, etc., and the Submissions contained in the Final Submissions.

12. Any prize awards are subject to eligibility verification and compliance with these Terms of Participation. All decisions of bitgrit will be final and binding on all matters relating to this Competition.

13. Payments to winners may be subject to local, state, federal and foreign tax reporting and withholding requirements.

14. If you have any inquiries about this competition, please don’t hesitate to reach out to us at [email protected]."

New Submission
Step 1
Upload or drop your file
Upload or drop your csv file here.
Your submission should be in .csv format.
Step 2
Briefly describe your submission (400 characters or less)
2 submission(s) left

Thanks for your submission!

We'll send updates to your email. You can check your email and preferences here.
My Submissions