This is my introduction to a series of posts explaining how to create an end-to-end Machine Learning system.

Starting with one problem

Our IRIS Development Community has several posts without tags or wrong tagged. As the posts keep growing the organization
of each tag and the experience of any community member browsing the subjects tends to decrease.

First solutions in mind

We can think some usual solutions for this scenario, like:

5 11
1 355

Last week, we announced the InterSystems IRIS Data Platform, our new and comprehensive platform for all your data endeavours, whether transactional, analytics or both. We've included many of the features our customers know and loved from Caché and Ensemble, but in this article we'll shed a little more light on one of the new capabilities of the platform: SQL Sharding, a powerful new feature in our scalability story.

13 11
1 1,320

This is the third post of a series explaining how to create an end-to-end Machine Learning system.

Training a Machine Learning Model

When you work with machine learning is common to hear this work: training. Do you what training mean in a ML Pipeline?
Training could mean all the development process of a machine learning model OR the specific point in all development process
that uses training data and results in a machine learning model.

4 10
1 239
Announcement
Eduard Lebedyuk · Sep 12, 2019
Python Gateway 0.8 release

I'm happy to announce the latest Python Gateway release.

This is not an InterSystems product, it is community supported open source project.

Download new release from GitHub.

Now for the new features.

Fast transfer. Pass globals, classes and tables from InterSystems IRIS to Python with ease and speed (10x faster than old QueryExecute). Documentation.

2 8
0 414

Hey Community,

We're pleased to invite you to the InterSystems AI+ML Summit 2021, which will be held virtually from January 25 to February 4! Join us for a two-week event that ranges from thought leadership to technical sessions and even 1:1 “Ask the Expert” sessions.

The sessions will be in both German and English. And this summit is free to attend!

See details below:

4 7
0 482

Keywords: ChatGPT, COS, Lookup Table, IRIS, AI

Purpose


It has been more than a month since I imagined how ChatGPT would automate some daily engineering works. Time seems warped these days. GPT-4 came out last week, able to process 32K tokens (around 25K words on average) and handle image inputs, compared with ChatGPT's 4K token limit. It seems last month felt as if last year, and last year as last century. Before sending ChatGPT away, I just wish to share some quick appreciations on how it actually helped, already, and to recap some possible quick wins or gaps out of GPTs .

Task(s)


One of the simplest possible tasks could be a Lookup table, right?

Occasionally you might also have to turn tediously long CSV or Excel code tables into an Ensemble/I4H Lookup table in XML, manually. Then clinical teams changed it, then changed again and again.

Can ChatGPT help? Can I just tell it what I want, then he gave me a tool to just turn that messy long CSV into a XML lookup table? Here below is a quick try.

Tests:

The following prompt is sent in ChatGPT:

Prompt:

You are a program to automatically convert a CSV file into a XML lookup table file.

The input is a comma delimited CSV file with 3 data columns , such as:
Blood Science ,XCOM,CANCELLED/REJECTED SAMPLE
Blood Science ,ADX ,ADD ON TEST COMMENT
Blood Transfusion,DARA,ALERT FOR DARA/CAMELLIA
Blood Transfusion, BT1 ,REJECT SAMPLE UNLABELLED

The output should be a XML file in the form of:
<?xml version="1.0"?>
<lookupTable>
<entry table="HIE.ICE.FilterOBR4" key="XCOM">Blood Science</entry>
<entry table="HIE.ICE.FilterOBR4" key="ADX ">Blood Science</entry>
<entry table="HIE.ICE.FilterOB
R4" key="DARA">Blood Transfusion</entry>
<entry table="HIE.ICE.FilterOBR4" key="BT1">Blood Transfusion</entry>
</lookupTable>
Please note: the sequence of the XML attribute in each root should be kept as they are


ChatGPT:

Here's a Python script that reads in the CSV file, processes it, and outputs the resulting XML file:

import csv

def convert_csv_to_xml(input_file, output_file):
    """
    Convert a CSV file to an XML lookup table file.

    Parameters:
    input_file (str): Path to the input CSV file.
    output_file (str): Path to the output XML file.

    Returns:
    int: 0 if the conversion was successful, 1 if there was an error.
    """
    try:
        # Create a dictionary to hold the lookup table data
        lookup_table = {}

        # Read in the CSV file and process the data
        with open(input_file, "r") as csvfile:
            csvreader = csv.reader(csvfile)
            for row in csvreader:
                category = row[0].strip()
                key = row[1].strip()
                value = row[2].strip()
                lookup_table[key] = category

        # Generate the XML file
        with open(output_file, "w") as xmlfile:
            xmlfile.write('<?xml version="1.0"?>\n')
            xmlfile.write('<lookupTable>\n')
            for key in lookup_table:
                xmlfile.write(f'<entry table="HIE.ICE.FilterOBR4" key="{key}">{lookup_table[key]}</entry>\n')
            xmlfile.write('</lookupTable>\n')

        return 0
    except Exception as e:
        print("Error:", e)
        return 1

This script first reads in the CSV file using the csv module and creates a dictionary to hold the lookup table data. It then processes each row of the CSV file and adds the relevant data to the dictionary.

Finally, the script generates the XML file by iterating through the dictionary and writing the appropriate XML tags and attributes to the output file. Note that the f string syntax is used to dynamically insert the key and category data into the XML tags.


So it gave a Python code by default, without specific asking. Fine, I ran it briefly in a Jupyter notebook, it checked out, except a tiny tweak and a bit local testing. It also stripped out the blanks. It gave a tool allowing me handling really lengthy CSVs, without being limited to token size.

In this specific case, I'd say it increased my productivity about 10+ times, particularly when you need to handle revised CSVs quite a few times over next few weeks.

0 6
0 24
Announcement
James Breen · Aug 30, 2018
Machine Learning 101 Presentation

View Machine Learning 101 recording at: https://videos.intersystems.com/detail/video/5827774460001/machine-learning-101?autoStart=true&q=machine%20learning.

In addition to our webinar on machine learning (https://community.intersystems.com/post/rescheduled-webinar-its-machine-learning-not-rocket-science-july-31-1100-am-edt), we are pleased to announce a basic introduction to machine learning presentation that provides an overview of the basic algorithms by @Don Woodlock, InterSystems VP of HealthShare Platforms.

4 6
2 725

Hey Developers!

This week is a voting week for the InterSystems IRIS AI Programming Contest!

So, it's time to give your vote to the best AI- and ML-enabled solution on InterSystems IRIS!

🔥 You decide: VOTING IS HERE 🔥

How to vote? This is easy: you will have one vote, and your vote goes either in Experts Nomination or in Community Nomination.

3 5
0 303

Apache Spark has rapidly become one of the most exciting technologies for big data analytics and machine learning. Spark is a general data processing engine created for use in clustered computing environments. Its heart is the Resilient Distributed Dataset (RDD) which represents a distributed, fault tolerant, collection of data that can be operated on in parallel across the nodes of a cluster. Spark is implemented using a combination of Java and Scala and so comes as a library that can run on any JVM.

11 5
0 2,419

Hi Everyone!

Very soon, almost every product and application will include artificial intelligence (AI).

On the afternoon of Wednesday, October 3, at the Global Summit 2018 in San Antonio we’re pulling together experts from InterSystems and from the front lines of the AI industry to discuss the current and future state-of-the-art for AI solutions.

Learn more about our Post-Summit Symposium: Artificial Intelligence and Machine Learning.

2 4
0 495

Hi Community!

We are pleased to invite all the developers to the upcoming InterSystems Analytics Contest Kick-off Webinar! The topic of this webinar is dedicated to the Analytics contest.

On this webinar, we’ll demo the iris-analytics-template and answer the questions on how to develop, build, and deploy Analytics applications using InterSystems IRIS.

Date & Time: Monday, December 7 — 12:00 PM EDT

Speakers:
🗣 @Carmen Logue, InterSystems Product Manager - Analytics and AI
🗣 @Evgeny Shvarov, InterSystems Developer Ecosystem Manager

5 4
0 188

On this GitHub you can find all the information on how to use a HuggingFace machine learning / AI model on the IRIS Framework using python.

1. iris-huggingface

Usage of Machine Learning models in IRIS using Python; For text-to-text, text-to-image or image-to-image models.

5 4
0 292

Last week saw the launch of the InterSystems IRIS Data Platform in sunny California.

For the engaging eXPerience Labs (XP-Labs) training sessions, my first customer and favourite department (Learning Services), was working hard assisting and supporting us all behind the scene.

11 3
0 927