YURI MARX GOMES · Jan 5 2m read

InterSystems IRIS Open Datasets for predict important diseases

According to the WHO, The top global causes of death, in order of total number of lives lost, are associated with three broad topics (source:

  1. Cardiovascular (ischaemic heart disease, stroke),
  2. Respiratory (chronic obstructive pulmonary disease, lower respiratory infections) and
  3. Neonatal conditions – which include birth asphyxia and birth trauma, neonatal sepsis and infections, and preterm birth complications.

I created an application that's provides real data (without personal data) for some of these top 10 scenarios of diseases identified by WHO. The datasets for this application are:

  • Diabetes dataset: data to predict diabetes diagnosis
  • Heart Disease: data to predict heart disease
  • Kidney Disease: data to predict kidney disease
  • Breast Cancer: data to predict breast cancer
  • Maternal Health Risk: data to predict maternal health risk level

To download and install the application go to

Follow these instructions:

1. Clone/git pull the repo into any local directory

$ git clone

2. Open a Docker terminal in this directory and run:

$ docker-compose build

3. Run the IRIS container:

$ docker-compose up -d

4. Do a Select to the HeartDisease dataset:

age, bp, chestPainType, cholesterol, ekgResults, exerciseAngina, fbsOver120, heartDisease, maxHr, numberOfVesselsFluro, sex, slopeOfSt, stDepression, thallium
FROM dc_data_health.HeartDisease

5. Do a Select to the Kidney Disease dataset:

age, al, ane, appet, ba, bgr, bp, bu, cad, classification, dm, hemo, htn, pc, pcc, pcv, pe, pot, rbc, rc, sc, sg, sod, su, wc
FROM dc_data_health.KidneyDisease

6. Do a Select to the Diabetes dataset:

Outcome, age, bloodpressure, bmi, diabetespedigree, glucose, insulin, pregnancies, skinthickness
FROM dc_data_health.Diabetes

7. Do a Select to the Breast Cancer dataset:

areamean, arease, areaworst, compactnessmean, compactnessse, compactnessworst, concavepointsmean, concavepointsse, concavepointsworst, concavitymean, concavityse, concavityworst, diagnosis, fractaldimensionmean, fractaldimensionse, fractaldimensionworst, perimetermean, perimeterse, perimeterworst, radiusmean, radiusse, radiusworst, smoothnessmean, smoothnessse, smoothnessworst, symmetrymean, symmetryse, symmetryworst, texturemean, texturese, textureworst
FROM dc_data_health.BreastCancer

8. Do a Select to the Maternal Health Risk dataset:

BS, BodyTemp, DiastolicBP, HeartRate, RiskLevel, SystolicBP, age
FROM dc_data_health.MaternalHealthRisk

These datasets can be used into AutoML/Machine Learning applications to support breast cancer, heart disease, kidney disease and diabetes diagnostics (support only, because human doctor diagnosis is mandatory).


2 2 2 70
Log in or sign up to continue