Article
· Apr 12, 2019 1m read

Python Gateway Part I: Introduction

This series of articles would cover Python Gateway for InterSystems Data Platforms. Leverage modern AI/ML tools and execute Python code and more from InterSystems IRIS. This project brings you the power of Python right into your InterSystems IRIS environment:

  • Execute arbitrary Python code
  • Seamlessly transfer data from InterSystems IRIS into Python
  • Build intelligent Interoperability business processes with Python Interoperability Adapter
  • Save, examine, modify and restore Python context from InterSystems IRIS

Index

The plan for the series so far (subject to change).

  • Part I: Overview, Landscape and Introduction <-- you're here
  • Part II: Installation and Troubleshooting
  • Part III: Basic functionality
  • Part IV: Interoperability Adapter
  • Part V: Execute function
  • Part VI: Dynamic Gateway
  • Part VII: Proxy Gateway
  • Part VIII: Use cases and ML Toolkit

Overview

Machine learning (ML) - is the study of algorithms and statistical models to effectively perform a specific task without using explicit instructions, relying on patterns and inference instead.

Machine learning algorithms and models are becoming more and more commonplace. There is a variety of reasons for that, but it all comes down to affordability, simplicity and producing actionable results. Is clustering or even neural network modeling a new technology? Of course not, but nowadays you do not need to write hundreds of thousands lines of code to run one and the costs are much more manageable.

Tools are evolving - while we currently do not have completely GUI-based AI/ML tools, but the same progress we saw with many other computer technologies, most notable being BI tools (from writing code to utilizing frameworks to GUI-based configurable solutions) is seen with AI/ML tools. We already passed the point of writing code and are currently utilizing frameworks to configure and calculate the models.

Other improvements, i.e. distributing pre-trained model, where end user should just finish model training on a real-life data also simplify onboarding process. These advances make getting into data science a much easier endeavor for both individuals and companies.

On the other hand nowadays we collect more data about every transaction business makes. With a unified data platform such as InterSystems IRIS all this information can be accessed immediately and used as a fuel for predictive models.

With the other big mover – cloud, running AI/ML workloads becomes easier than ever. Even more important is that we can consume only the resources we require. Moreover, with massive parallelization offered by cloud platforms we can save on a time to a working solution.

But what about results? Here it gets a little trickier. There are lots of tools to build a model, and I’ll talk about them later, and it’s not always easy to build a good model, but what comes after? Extracting business value from a model is also a nontrivial endeavor. The root of the problem is the separation of analytical and transactional data flows and data models. When we train the model, we usually do that on a historical data in a warehouse system. But the greatest place for the built model to be is in the heart of transactional processing. What good is the best fraud detection model if we run it once a day? The criminals would be long gone with the money. We need to train the model on a historical data but we also need to apply the model in a real time on the new incoming data so that our business processes can act on predictions the model makes.

MLToolkit

MLToolkit is a comprehensive set of tools, which aims to do exactly that – bring predictive models and transactional environments together, so that the models you build can be easily leveraged right inside your business processes. Titular Python Gateway is a part of MLToolkit and provides integration with a Python language.

Landscape

Before we go further, I would like to describe several tools and libraries for Python, which we would use later.

Tools

  • Python is an interpreted, high-level, general-purpose programming language. The main advantage of the language is a big library of mathematical, ML and AI libraries. Same as ObjectScript it's an object-oriented language but everything is dynamic rather that static. Also, everything is an object. The later articles assume a passing familiarity with the language. If you want to start learning, I recommend starting with documentation.
  • For our later exercises install Python 3.6.7 64 bit.
  • IDE: I use PyCharm, bet there are a lot of them. If you're using Atelier, Eclipse for Python developers is a thing.
  • Notebook: instead of IDE you can write and share your scripts in a Web-based notebook. The most popular one is Jupyter.

Libraries

Here's a (incomplete) list of libraries used for Machine Learning.

  • Numpy is the fundamental package for scientific computing with Python.
  • Pandas is a library providing high-performance, easy-to-use data structures and data analysis tools.
  • Matplotlib is a 2D plotting library which produces figures in a variety of hardcopy formats and interactive environments across platforms.
  • Seaborn is a data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics.
  • Sklearn is a machine Learning library.
  • XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible and portable. It implements machine learning algorithms under the Gradient Boosting framework.
  • Gensim is a library for unsupervised topic modeling and natural language processing.
  • Keras is a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano.
  • Tensorflow is an end-to-end open source machine learning platform.
  • PyTorch deep learning platform similar to Tensorflow but Python focused.
  • Nyoka produces PMML from Python models.

Summary

AI/ML technologies allow business to be more effective and more adaptable. Moreover, today these technologies are becoming easier to build and deploy. Start investigating AI/ML technologies and how it can help your organization to grow and prosper. There are examples, stories and use cases from almost every industry. Do not miss your chance to use future technologies today.

What's next

In the next part we would install Python Gateway. Don't forget to register for the upcoming webinar (details below)!

Links

Webinar

Do you want to reap the benefits of the advances in the fields of artificial intelligence and machine learning? With InterSystems IRIS and the Machine Learning (ML) Toolkit, it’s easier than ever. Join my colleague Sergey Lukyanchikov and me on Tuesday, April 23rd at 11am EDT for the Machine Learning Toolkit for InterSystems IRIS webinar to find out how InterSystems IRIS can be used as both a standalone development platform and an orchestration platform for AI/ML models that brings together InterSystems IRIS, Python and other external tools.

Date: Tuesday, April 23rd at 11am EDT

Recommended Audience: Developers, Solution Architects, Data Scientists, and Data Engineers.

REGISTER NOW!

Discussion (11)3
Log in or sign up to continue

Tried to do a docker-compose locally and got failed. Here is the part of trace:

+ python get-pip.py --disable-pip-version-check --no-cache-dir pip==19.0.3
Collecting pip==19.0.3
  Downloading https://files.pythonhosted.org/packages/d8/f3/413bab4ff08e1fc4828dfc59996d721917df8e8583ea85385d51125dceff/pip-19.0.3-py2.py3-none-any.whl (1.4MB)
Collecting setuptools
  Downloading https://files.pythonhosted.org/packages/c8/b0/cc6b7ba28d5fb790cf0d5946df849233e32b8872b6baca10c9e002ff5b41/setuptools-41.0.0-py2.py3-none-any.whl (575kB)
Collecting wheel
  Downloading https://files.pythonhosted.org/packages/96/ba/a4702cbb6a3a485239fbe9525443446203f00771af9ac000fa3ef2788201/wheel-0.33.1-py2.py3-none-any.whl
Installing collected packages: pip, setuptools, wheel
Successfully installed pip-19.0.3 setuptools-41.0.0 wheel-0.33.1
+ pip --version
pip 19.0.3 from /usr/local/lib/python3.6/site-packages/pip (python 3.6)
+ find /usr/local -depth ( ( -type d -a ( -name test -o -name tests ) ) -o ( -type f -a ( -name *.pyc -o -name *.pyo ) ) ) -exec rm -rf {} +
+ rm -f get-pip.py
Removing intermediate container 21587d33a883
 ---> 5be4246c2aad
Step 13/19 : RUN pip install pandas matplotlib seaborn numpy dill
 ---> Running in ade32bf9dd2c
Collecting pandas
  Downloading https://files.pythonhosted.org/packages/19/74/e50234bc82c553fecdbd566d8650801e3fe2d6d8c8d940638e3d8a7c5522/pandas-0.24.2-cp36-cp36m-manylinux1_x86_64.whl (10.1MB)
Collecting matplotlib
  Downloading https://files.pythonhosted.org/packages/e9/69/f5e05f578585ed9935247be3788b374f90701296a70c8871bcd6d21edb00/matplotlib-3.0.3-cp36-cp36m-manylinux1_x86_64.whl (13.0MB)
Exception:
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/pip/_vendor/urllib3/response.py", line 360, in _error_catcher
    yield
  File "/usr/local/lib/python3.6/site-packages/pip/_vendor/urllib3/response.py", line 442, in read
    data = self._fp.read(amt)
  File "/usr/local/lib/python3.6/site-packages/pip/_vendor/cachecontrol/filewrapper.py", line 62, in read
    data = self.__fp.read(amt)
  File "/usr/local/lib/python3.6/http/client.py", line 449, in read
    n = self.readinto(b)
  File "/usr/local/lib/python3.6/http/client.py", line 493, in readinto
    n = self.fp.readinto(b)
  File "/usr/local/lib/python3.6/socket.py", line 586, in readinto
    return self._sock.recv_into(b)
  File "/usr/local/lib/python3.6/ssl.py", line 1012, in recv_into
    return self.read(nbytes, buffer)
  File "/usr/local/lib/python3.6/ssl.py", line 874, in read
    return self._sslobj.read(len, buffer)
  File "/usr/local/lib/python3.6/ssl.py", line 631, in read
    v = self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/pip/_internal/cli/base_command.py", line 179, in main
    status = self.run(options, args)
  File "/usr/local/lib/python3.6/site-packages/pip/_internal/commands/install.py", line 315, in run
    resolver.resolve(requirement_set)
  File "/usr/local/lib/python3.6/site-packages/pip/_internal/resolve.py", line 131, in resolve
    self._resolve_one(requirement_set, req)
  File "/usr/local/lib/python3.6/site-packages/pip/_internal/resolve.py", line 294, in _resolve_one
    abstract_dist = self._get_abstract_dist_for(req_to_install)
  File "/usr/local/lib/python3.6/site-packages/pip/_internal/resolve.py", line 242, in _get_abstract_dist_for
    self.require_hashes
  File "/usr/local/lib/python3.6/site-packages/pip/_internal/operations/prepare.py", line 334, in prepare_linked_requirement
    progress_bar=self.progress_bar
  File "/usr/local/lib/python3.6/site-packages/pip/_internal/download.py", line 878, in unpack_url
    progress_bar=progress_bar
  File "/usr/local/lib/python3.6/site-packages/pip/_internal/download.py", line 702, in unpack_http_url
    progress_bar)
  File "/usr/local/lib/python3.6/site-packages/pip/_internal/download.py", line 946, in _download_http_url
    _download_url(resp, link, content_file, hashes, progress_bar)
  File "/usr/local/lib/python3.6/site-packages/pip/_internal/download.py", line 639, in _download_url
    hashes.check_against_chunks(downloaded_chunks)
  File "/usr/local/lib/python3.6/site-packages/pip/_internal/utils/hashes.py", line 62, in check_against_chunks
    for chunk in chunks:
  File "/usr/local/lib/python3.6/site-packages/pip/_internal/download.py", line 607, in written_chunks
    for chunk in chunks:
  File "/usr/local/lib/python3.6/site-packages/pip/_internal/utils/ui.py", line 159, in iter
    for x in it:
  File "/usr/local/lib/python3.6/site-packages/pip/_internal/download.py", line 596, in resp_read
    decode_content=False):
  File "/usr/local/lib/python3.6/site-packages/pip/_vendor/urllib3/response.py", line 494, in stream
    data = self.read(amt=amt, decode_content=decode_content)
  File "/usr/local/lib/python3.6/site-packages/pip/_vendor/urllib3/response.py", line 459, in read
    raise IncompleteRead(self._fp_bytes_read, self.length_remaining)
  File "/usr/local/lib/python3.6/contextlib.py", line 99, in __exit__
    self.gen.throw(type, value, traceback)
  File "/usr/local/lib/python3.6/site-packages/pip/_vendor/urllib3/response.py", line 365, in _error_catcher
    raise ReadTimeoutError(self._pool, None, 'Read timed out.')
pip._vendor.urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='files.pythonhosted.org', port=443): Read timed out.
ERROR: Service 'iris' failed to build: The command '/bin/sh -c pip install pandas matplotlib seaborn numpy dill' returned a non-zero code: 2

Thanks, Ed!

I placed iscpython.so file into the repo floder and this helped - docker-compose build worked smoothly and container started.

So I managed to build this with  IRIS Community Edition, though you need to go directly to PYTHON namespace to make it work - class mapping is not available on IRIS CE unfortunately.

Anyway, python in container is callable from IRIS:

Namespace: 
You're in namespace USER
Default directory is /usr/irissys/mgr/user/
USER>zn "PYTHON"

PYTHON>set sc = ##class(isc.py.Callout).Setup()

PYTHON>set sc = ##class(isc.py.Main).SimpleString("x='Hello from Python'", "x", , .x)

PYTHON>write x
Hello from Python
PYTHON>set sc = ##class(isc.py.Callout).Finalize()

PYTHON>set sc = ##class(isc.py.Callout).Unload()

PYTHON>

Check out our webinar today! It would be about Python Gateway/ML.
Also, there's ML Toolkit user group - a private GitHub repository set up as part of InterSystems corporate GitHub organization. It is addressed to the external users that are installing, learning or are already using ML Toolkit components. To join ML Toolkit user group, please send a short e-mail at the following address: MLToolkit@intersystems.com and indicate in your e-mail the following details (needed for the group members to get to know and identify you during discussions):

  • GitHub username
  • Full Name (your first name followed by your last name in Latin script)
  • Organization (you are working for, or you study at, or your home office)
  • Position (your actual position in your organization, or “Student”, or “Independent”)
  • Country (you are based in)