Ten real health datasets in a just one OEX application



In a hard work of data curation and data quality, the Health Dataset application deploy to you the above datasets.

These datasets can be used in your ML applications/models, AutoML and analytics projects. See more details here:


  1. Clone/git pull the repo into any local directory
$ git clone
  1. Open a Docker terminal in this directory and run:
$ docker-compose build
  1. Run the IRIS container:
$ docker-compose up -d
  1. Do a Select to the HeartDisease dataset:
age, bp, chestPainType, cholesterol, ekgResults, exerciseAngina, fbsOver120, heartDisease, maxHr, numberOfVesselsFluro, sex, slopeOfSt, stDepression, thallium
FROM dc_data_health.HeartDisease
  1. Do a Select to the Kidney Disease dataset:
age, al, ane, appet, ba, bgr, bp, bu, cad, classification, dm, hemo, htn, pc, pcc, pcv, pe, pot, rbc, rc, sc, sg, sod, su, wc
FROM dc_data_health.KidneyDisease
  1. Do a Select to the Diabetes dataset:
Outcome, age, bloodpressure, bmi, diabetespedigree, glucose, insulin, pregnancies, skinthickness
FROM dc_data_health.Diabetes
  1. Do a Select to the Breast Cancer dataset:
areamean, arease, areaworst, compactnessmean, compactnessse, compactnessworst, concavepointsmean, concavepointsse, concavepointsworst, concavitymean, concavityse, concavityworst, diagnosis, fractaldimensionmean, fractaldimensionse, fractaldimensionworst, perimetermean, perimeterse, perimeterworst, radiusmean, radiusse, radiusworst, smoothnessmean, smoothnessse, smoothnessworst, symmetrymean, symmetryse, symmetryworst, texturemean, texturese, textureworst
FROM dc_data_health.BreastCancer
  1. Do a Select to the Maternal Health Risk dataset:
BS, BodyTemp, DiastolicBP, HeartRate, RiskLevel, SystolicBP, age
FROM dc_data_health.MaternalHealthRisk
  1. Do a Select to the Hospital Mortality dataset:
age, aniongap, atrialfibrillation, basophils, bicarbote, bloodcalcium, bloodpotassium, bloodsodium, bmi, chdwithnomi, chloride, copd, creatinekise, creatinine, deficiencyanemias, depression, diabetes, diastolicbloodpressure, ef, gendera, glucose, "group", heartrate, hematocrit, hyperlipemia, hypertensive, inr, lacticaacid, leucocyte, lymphocyte, magnesiumion, mch, mchc, mcv, neutrophils, ntprobnp, outcome, pco2, ph, platelets, pt, rbc, rdw, relfailure, respiratoryrate, spo2, systolicbloodpressure, temperature, ureanitrogen, urineoutput
FROM dc_data_health.HospitalMortality
  1. Do a Select to the Life Expectancy dataset:
AdultMortality, Alcohol, BMI, Country, Diphtheria, GDP, HIVAIDS, HepatitisB, IncomeCompositionOfResources, InfantDeaths, LifeExpectancy, Measles, PercentageExpenditure, Polio, Population, Schooling, Status, Thinness1To19Years, Thinness5To9Years, TotalExpenditure, UnderFiveDeaths, Year
FROM dc_data_health.LifeExpectancy
  1. Do a Select to the Pollution Deaths dataset:
Country, CountryCode, DeathYear, ExcessMortality
FROM dc_data_health.PollutionDeaths
  1. Do a Select to the Dementia dataset:
ASF, Age, CDR, EDUC, Genre, Hand, MMSE, MRDelay, Outcome, SES, Visit, eTIV, nWBV
FROM dc_data_health.Dementia
  1. Do a Select to the Hepatitis Death risk dataset:
age, albumin, alkphosphate, anorexia, antivirals, ascites, bilirubin, fatigue, histology, liverbig, liverfirm, malaise, outcome, protime, sex, sgot, spiders, spleenpalpable, steroid, varices
FROM dc_data_health.Hepatitis

To install with ZPM

It's packaged with ZPM so it could be installed as:

zpm "install dataset-health"

Dataset Licenses and sources/credits

  1. MIT License for this Application
  2. CC BY-NC-SA 4.0 License for the Breast Cancer Dataset
  3. CC0: Public Domain for Diabetes Dataset
  4. CC0: Public Domain for Heart Disease
  5. CC0: Public Domain for Maternal Health Risk
  6. CC0: Public Domain for World Life Expectancy
    • Original Source: - The data was collected from WHO and United Nations website with the help of Deeksha Russell and Duan Wang.
    • File into the app: /opt/irisapp/data/life_expectancy.csv
    • Persistent Class:
  7. CC0 1.0 Universal (CC0 1.0) Public Domain Dedication for Hospital Mortality
  8. CC0 1.0 Universal (CC0 1.0) Public Domain for Pollution Deaths dataset
  9. Attribution-NonCommercial-ShareAlike 3.0 IGO (CC BY-NC-SA 3.0 IGO) for Dementia dataset
  10. CC0 1.0 Universal (CC0 1.0) Public Domain for Hepatitis Death Risk dataset
  11. CC0: Public Domain for Kidney Disease
    • Original Source:
      • @misc{Dua:2019 ,
      • author = "Dua, Dheeru and Graff, Casey",
      • year = "2017",
      • title = "{UCI} Machine Learning Repository",
      • url = "",
      • institution = "University of California, Irvine, School of Information and Computer Sciences" }
    • File into the app: /opt/irisapp/data/kidney_disease.csv
    • Persistent Class:
