Skip to main content

TD Hospital Exploration

Introduction

Given the ubiquity of automated decision-making systems in our lives, a good understanding of the inner workings of different models is important. Modern AI and machine learning algorithms are especially difficult with respect to explaining their outcomes. Explaining the "black box" problem inherent in these models will provide people with more confidence in their use.

Description

The TD Hospital has come to you regarding a problem with their patient data. Over the many years of its operation, the hospital data accrued flaws, but no one working there now knows exactly what those flaws are. They want your help to fix it and understand what information is contained.

  • You are given a dataset for a specific group of patients which has patient data and whether or not the patient survived.
  • You want to use this dataset to understand what contributes to survival.
  • The dataset has inherent flaws that you may want to change or remove.
  • You are also provided with example code that takes in the data and trains a model.
  • You can use this code for a baseline submission to get up and running.

You will need to create a presentation to give to the hospital explaining:

Data

  • What modifications you made.
  • How you made them.
  • Why you chose to do so.

Predictions

  • What type of model did you choose to use if you decided to use a model outside of the starter code.(Random forests, SVM, Regression, Other types of Neural Networks)
  • And if you did choose to use another one or change the starter code why did you choose to do so.

The objective is to interpret the data and communicate your understandings in an effective manner.

Data

Your dataset is given to you in the form of a csv file, the first row will have the titles of the columns, and each subsequent row will contain information for a patient with one row representing one patient. You are also provided a document explaining what data the hospital has collected. Be careful about the information given, carefully decide whether and how it should be used. Make sure to do your research!

Quick Start

We will provide you with a starter code and a quick video on how to set up your environment. We also provide pointers on how to train a simple model using the starting code, and how to submit for grading. Be sure to check out our workshop!

  • Set up python on Mac: Link

  • Set up python on Windows: Link

  • Look through our starter code

  • Workshop Links and Feature Data Explanation: Link

Train a model using the data:

You will need to train a model to predict data, either using the starter code given to you, or using another type of model. The predictions from this model will be submitted to get a score on how accurate your model is.

Make a submission:

Once we open up our grading platform, "attorney", be sure to follow the instructions below.

  • Link to .whl file Link You will want to first download the .whl file from the link above, then after you have it downloaded you will want to run the command "pip install [path_to_file]". Once you have it installed, you will want to open a seperate terminal(so you have 2 open). After you have both the terminals open, you will first want to run your starter submission code using the command "python3 [name_of_python_file]". In the second terminal, you will want to run the command "attorney" and fill out the prompts it gives you. The port number for the attourney will be the port number you have in the starter submission code. If the port is taken you can change it to a random number and try again.

Be sure to submit to devpost and detail everything that you want the judges to see!

Judging

You will be judged primarily on your handling of the data and your explanations. Your score will be dependent on detailing your decisions. Additionally, your score will be affected by things like your model’s predictive power on patient survival, type 1 and 2 errors, and complexity.

Resources

  • Super Helpful to Understand everything Machine Learning Related Link
  • Principal Component Analysis: Link
  • Data cleaning with pandas and numpy: Link
  • Pandas Library for Python: Link
  • Handling Categorical Data: Link
  • Handling Null Values: Link
  • Getting started with numpy: Link

Prizes

  • Echo Dot
  • Anker Soundcore 2 Portable Bluetooth Speaker
  • $25 Gift Card