Article
YURI MARX GOMES · Jan 9 5m read

Ten real health datasets in a just one OEX application

 

 

Hi Community,

In a hard work of data curation and data quality, the Health Dataset application deploy to you the above datasets.

These datasets can be used in your ML applications/models, AutoML and analytics projects. See more details here:

Installation

  1. Clone/git pull the repo into any local directory
$ git clone https://github.com/yurimarx/automl-heart.git
  1. Open a Docker terminal in this directory and run:
$ docker-compose build
  1. Run the IRIS container:
$ docker-compose up -d
  1. Do a Select to the HeartDisease dataset:
SELECT 
age, bp, chestPainType, cholesterol, ekgResults, exerciseAngina, fbsOver120, heartDisease, maxHr, numberOfVesselsFluro, sex, slopeOfSt, stDepression, thallium
FROM dc_data_health.HeartDisease
  1. Do a Select to the Kidney Disease dataset:
SELECT 
age, al, ane, appet, ba, bgr, bp, bu, cad, classification, dm, hemo, htn, pc, pcc, pcv, pe, pot, rbc, rc, sc, sg, sod, su, wc
FROM dc_data_health.KidneyDisease
  1. Do a Select to the Diabetes dataset:
SELECT 
Outcome, age, bloodpressure, bmi, diabetespedigree, glucose, insulin, pregnancies, skinthickness
FROM dc_data_health.Diabetes
  1. Do a Select to the Breast Cancer dataset:
SELECT 
areamean, arease, areaworst, compactnessmean, compactnessse, compactnessworst, concavepointsmean, concavepointsse, concavepointsworst, concavitymean, concavityse, concavityworst, diagnosis, fractaldimensionmean, fractaldimensionse, fractaldimensionworst, perimetermean, perimeterse, perimeterworst, radiusmean, radiusse, radiusworst, smoothnessmean, smoothnessse, smoothnessworst, symmetrymean, symmetryse, symmetryworst, texturemean, texturese, textureworst
FROM dc_data_health.BreastCancer
  1. Do a Select to the Maternal Health Risk dataset:
SELECT 
BS, BodyTemp, DiastolicBP, HeartRate, RiskLevel, SystolicBP, age
FROM dc_data_health.MaternalHealthRisk
  1. Do a Select to the Hospital Mortality dataset:
SELECT 
age, aniongap, atrialfibrillation, basophils, bicarbote, bloodcalcium, bloodpotassium, bloodsodium, bmi, chdwithnomi, chloride, copd, creatinekise, creatinine, deficiencyanemias, depression, diabetes, diastolicbloodpressure, ef, gendera, glucose, "group", heartrate, hematocrit, hyperlipemia, hypertensive, inr, lacticaacid, leucocyte, lymphocyte, magnesiumion, mch, mchc, mcv, neutrophils, ntprobnp, outcome, pco2, ph, platelets, pt, rbc, rdw, relfailure, respiratoryrate, spo2, systolicbloodpressure, temperature, ureanitrogen, urineoutput
FROM dc_data_health.HospitalMortality
  1. Do a Select to the Life Expectancy dataset:
SELECT 
AdultMortality, Alcohol, BMI, Country, Diphtheria, GDP, HIVAIDS, HepatitisB, IncomeCompositionOfResources, InfantDeaths, LifeExpectancy, Measles, PercentageExpenditure, Polio, Population, Schooling, Status, Thinness1To19Years, Thinness5To9Years, TotalExpenditure, UnderFiveDeaths, Year
FROM dc_data_health.LifeExpectancy
  1. Do a Select to the Pollution Deaths dataset:
SELECT 
Country, CountryCode, DeathYear, ExcessMortality
FROM dc_data_health.PollutionDeaths
  1. Do a Select to the Dementia dataset:
SELECT 
ASF, Age, CDR, EDUC, Genre, Hand, MMSE, MRDelay, Outcome, SES, Visit, eTIV, nWBV
FROM dc_data_health.Dementia
  1. Do a Select to the Hepatitis Death risk dataset:
SELECT 
age, albumin, alkphosphate, anorexia, antivirals, ascites, bilirubin, fatigue, histology, liverbig, liverfirm, malaise, outcome, protime, sex, sgot, spiders, spleenpalpable, steroid, varices
FROM dc_data_health.Hepatitis

To install with ZPM

It's packaged with ZPM so it could be installed as:

zpm "install dataset-health"

Dataset Licenses and sources/credits

  1. MIT License for this Application
  2. CC BY-NC-SA 4.0 License for the Breast Cancer Dataset
  3. CC0: Public Domain for Diabetes Dataset
  4. CC0: Public Domain for Heart Disease
  5. CC0: Public Domain for Maternal Health Risk
  6. CC0: Public Domain for World Life Expectancy
    • Original Source: https://www.kaggle.com/kumarajarshi/life-expectancy-who - The data was collected from WHO and United Nations website with the help of Deeksha Russell and Duan Wang.
    • File into the app: /opt/irisapp/data/life_expectancy.csv
    • Persistent Class: dc.data.health.LifeExpectancy
  7. CC0 1.0 Universal (CC0 1.0) Public Domain Dedication for Hospital Mortality
  8. CC0 1.0 Universal (CC0 1.0) Public Domain for Pollution Deaths dataset
  9. Attribution-NonCommercial-ShareAlike 3.0 IGO (CC BY-NC-SA 3.0 IGO) for Dementia dataset
  10. CC0 1.0 Universal (CC0 1.0) Public Domain for Hepatitis Death Risk dataset
  11. CC0: Public Domain for Kidney Disease
    • Original Source:
      • @misc{Dua:2019 ,
      • author = "Dua, Dheeru and Graff, Casey",
      • year = "2017",
      • title = "{UCI} Machine Learning Repository",
      • url = "http://archive.ics.uci.edu/ml",
      • institution = "University of California, Irvine, School of Information and Computer Sciences" }
    • File into the app: /opt/irisapp/data/kidney_disease.csv
    • Persistent Class: dc.data.health.KidneyDisease
20
1 0 0 23
Log in or sign up to continue