New post

Find

Article
· Nov 27, 2024 8m read

Using IRIS and Presto for high-performance and scalable SQL queries

The rise of Big Data projects, real-time self-service analytics, online query services, and social networks, among others, have enabled scenarios for massive and high-performance data queries. In response to this challenge, MPP (massively parallel processing database) technology was created, and it quickly established itself. Among the open-source MPP options, Presto (https://prestodb.io/) is the best-known option. It originated in Facebook and was utilized for data analytics, but later became open-sourced. However, since Teradata has joined the Presto community, it offers support now.

Presto connects to transactional data sources (Oracle, DB2, MySQL, PostgreSQL, MongoDB, and other SQL and NoSQL databases) and delivers distributed and in-memory SQL processing, combined with automatic optimizations of execution plans. Its objective, above all, is to execute fast queries regardless of whether you handle gigabytes or terabytes of data, scaling and parallelizing workloads.

Presto did not originally have a native connector for the IRIS Database, but fortunately, this issue was resolved with an InterSystems community project presto-iris (https://openexchange.intersystems.com/package/presto-iris). That is why we can now expose an MPP layer in front of InterSystems IRIS repositories to enable high-performance queries, reports, and dashboards from transactional data in IRIS.

In this article, we will follow a step-by-step guide on configuring Presto, connecting it to IRIS, and establishing an MPP layer for your customers to employ. We will also demonstrate the main features of Presto, its main commands and tools, always with IRIS as the source database.

Presto Features

Presto contains the following features:

  1. Simple but extensible architecture.
  2. Pluggable connectors (Presto supports pluggable connectors to provide metadata and data for queries).
  3. Pipelined executions (it avoids unnecessary I/O latency overhead).
  4. User-defined functions (analysts can create custom user-defined functions to migrate easily).
  5. Vectorized columnar processing.


Benefits of Presto

Below, you can see a list of benefits that Apache Presto offers:

Specialized SQL operations;
Easy installation and debugging;
Simple storage abstraction;
Quickly scalable petabytes of data with low latency.

Presto Architecture


  • Clients: They are PrestoDB consumers. Clients use the JDBC/ODBC/REST protocol to talk to coordinators.
  • Coordinators: They are responsible for managing associated worker nodes, parsing, handling queries, and generating execution plans. They are also in charge of data delivery for processing between workers, which creates logical plans composed of stages, where every stage is executed in a distributed way using tasks across workers.
  • Workers: They are compute nodes for executing tasks and processing data, allowing scale data processing and consumption.
  • Communication: Each Presto worker advertises the coordinator using a discovery server to prepare for work.
  • Connectors: Each possible data source type has a connector employed by Presto to consume data. The project https://openexchange.intersystems.com/package/presto-iris allows Presto to utilize InterSystems IRIS.
  • Catalog: It contains information about the location of data including schemas and the data source. When users run an SQL statement in Presto, they run it against one or more catalogs.

Presto use cases

The InterSystems IRIS and Presto together allow you the following use cases:

  • Ad-hoc queries: You can run ad-hoc queries with high performance against terabytes of data.
  • Reporting and dashboarding: There is an engine to deliver high-performance data queries for reports, self-service BI, and analytics tools, e.g., Apache Superset (check out the sample in this article).
  • Open lakehouse: Presto has the connectors and catalogs to unify required data sources and deliver scalable queries and data using SQL between workers.

InterSystems IRIS is a perfect partner for Presto. Since it is a high-performance data repository that supports distributed processing with the help of shards and associated with Presto workers, any volume of data can be queried in just a few milliseconds.

Installing and running the PrestoDB

There are various options (Docker and Java JAR) of how to install Presto. You can find more details related to it on https://prestodb.io/docs/current/installation/deployment.html. In this article we will utilize Docker. To facilitate understanding and enable a quick start, we have made a sample application available on Open Exchange (it was derived from another package https://openexchange.intersystems.com/package/presto-iris). Take the following steps to see for yourself:

  1. Go to https://openexchange.intersystems.com/package/iris-presto-sample and download the sample used in this tutorial.
  2. Start the demo environment with docker-compose:
    docker-compose up -d --build
    Note: For the demo purpose, it operates Apache Superset with superset-iris and examples that come with it. For that reason, it takes a while for it to load.
  3. Presto UI will become available via this link: http://localhost:8080/ui/#.
  4. Wait 15 to 20 minutes (there is a lot of sample data to load). When SuperSet finishes loading examples after 10-15 minutes, it should become available at the link http://localhost:8088/databaseview/list (type admin/admin as username/password on the login page).
  5. Now go to Dashboards:
  6. If we visit http://localhost:8080/ui, we can notice that Presto has executed queries and is displaying some statistics:

Above you can see the Presto web interface for monitoring and managing queries. It can be accessed from the port number specified in the coordinator Config Properties (for this article it is 8080).

Details about the sample code

Dockerfile

The Dockerfile is employed to create a PrestoDB Docker image with the presto-iris plugin and the InterSystems IRIS JDBC file included:

# Official PrestoDB image on Docker Hub
FROM prestodb/presto

# From https://github.com/caretdev/presto-iris/releases
# Adding presto-iris plugin into the Docker image
ADD https://github.com/caretdev/presto-iris/releases/download/0.1/presto-iris-0.1-plugin.tar.gz /tmp/presto-iris/presto-iris-0.1-plugin.tar.gz

# From https://github.com/intersystems-community/iris-driver-distribution
# Adding IRIS JDBC driver into the Docker image
ADD https://raw.githubusercontent.com/intersystems-community/iris-driver-distribution/refs/heads/main/JDBC/JDK18/com/intersystems/intersystems-jdbc/3.8.4/intersystems-jdbc-3.8.4.jar /opt/presto-server/plugin/iris/intersystems-jdbc-3.8.4.jar

RUN --mount=type=bind,src=.,dst=/tmp/presto-iris \
    tar -zxvf /tmp/presto-iris/presto-iris-0.1-plugin.tar.gz -C /opt/presto-server/plugin/iris/ --strip-components=1

 

Docker-compose.yml file

This file creates 3 container instances: one for InterSystems IRIS (IRIS service), one for PrestoDB (Presto service), and one for the Superset (Superset service). The Superset is an Analytics visualization tool utilized for seeing data in Dashboards.

# from the project https://github.com/caretdev/presto-iris
services:
  # create an InterSystems IRIS container instance
  iris:
    image: intersystemsdc/iris-community
    ports:
      - 1972
      - 52773
    environment:
      IRIS_USERNAME: _SYSTEM
      IRIS_PASSWORD: SYS
  # create a PrestoDB container instance consuming the IRIS database
  presto:
    build: .
    volumes:
      # PrestoDB will use iris.properties to get connection information
       - ./iris.properties:/opt/presto-server/etc/catalog/iris.properties
    ports:
      - 8080:8080
  # create a Superset (Dashboard analytics tool) container instance
  superset:
    image: apache/superset:3.0.2
    platform: linux/amd64
    environment:
      SUPERSET_SECRET_KEY: supersecret
      # create an InterSystems IRIS connection to load sample data
      SUPERSET_SQLALCHEMY_EXAMPLES_URI: iris://_SYSTEM:SYS@iris:1972/USER
    volumes:
      - ./superset_entrypoint.sh:/superset_entrypoint.sh
      - ./superset_config.py:/app/pythonpath/superset_config.py
    ports:
      - 8088:8088
    entrypoint: /superset_entrypoint.sh

The iris.properties file

This file has the information required to connect PrestoDB to InterSystems IRIS DB and create an MPP layer for high-performance and scalable queries from Superset dashboards.

# from the project https://github.com/caretdev/presto-iris
connector.name=iris
connection-url=jdbc:IRIS://iris:1972/USER
connection-user=_SYSTEM
connection-password=SYS

The superset_entrypoint.sh file

This script installs the superset-iris library (for Superset support of IRIS), inits the Superset instance, and loads sample data into InterSystems IRIS DB. At runtime the data consumed by Superset will come from PrestoDB, which will be an MPP layer for IRIS DB.

#!/bin/bash

# Install the InterSystems IRIS Superset extension
pip install superset-iris

superset db upgrade

superset fab create-admin \
            --username admin \
            --firstname Superset \
            --lastname Admin \
            --email admin@superset.com \
            --password ${ADMIN_PASSWORD:-admin}
superset init

# Load examples to IRIS
superset load-examples

# Change examples database URI to Presto
superset set-database-uri -d examples -u presto://presto:8080/iris

/usr/bin/run-server.sh

About Superset

This article used Superset for dashboards.It is a modern data exploration and visualization platform that can replace or augment proprietary business intelligence tools for many teams.Superset integrates well with a variety of data sources.
Superset provides the following:

  • A no-code interface for building charts quickly
  • A powerful web-based SQL Editor for advanced querying
  • A lightweight semantic layer for defining custom dimensions and metrics fast
  • Out-of-the-box support for nearly any SQL database or data engine
  • A wide array of beautiful visualizations to showcase your data, ranging from simple bar charts to geospatial visualizations
  • Lightweight configurable caching layer to help ease database load
  • Highly extensible security roles and authentication options
  • An API for programmatic customization
  • A cloud-native architecture designed from the ground up for scale

 

Sources and additional learning materials

  1. Extensive tutorial about PrestoDB: https://www.tutorialspoint.com/apache_presto/apache_presto_quick_guide.htm
  2. PrestoDB documentation: https://prestodb.io/docs/current/overview.html
  3. Presto-iris plugin: https://openexchange.intersystems.com/package/presto-iris
  4. Iris-presto sample: https://openexchange.intersystems.com/package/iris-presto-sample
  5. About Superset: https://github.com/apache/superset
  6. Superset with InterSystems IRIS: https://openexchange.intersystems.com/package/superset-iris  
Discussion (0)1
Log in or sign up to continue
Announcement
· Nov 27, 2024

¡Notificaciones y Co-Desarrolladores: Novedades en Open Exchange!

¡Manteneos al Día con las Notificaciones en Open Exchange!

Ahora podéis ver todas vuestras notificaciones directamente en el sitio web y elegir cómo queréis recibirlas: en la web, por correo electrónico o ambas opciones.

Esto es sobre lo que recibiréis notificaciones:

  • Lanzamientos de Apps: Mantened vuestra información actualizada sobre los nuevos lanzamientos de las apps a las que estáis suscritos (haced clic en el botón de campana en la página de una app).
  • Nuevas Apps de Miembros que Seguís: Recibid actualizaciones sobre nuevas apps publicadas por los miembros a los que seguís (usad el botón de seguir en la página de un miembro).
  • Anuncios de Concursos: No os perdáis la oportunidad de participar en concursos emocionantes.
  • Reseñas y Respuestas: Recibid actualizaciones sobre reseñas y respuestas relacionadas con vuestras apps.
  • Respuestas a Vuestras Reseñas: Recibiréis notificaciones cuando alguien responda a vuestras reseñas.
  • Invitaciones para Ser Co-Desarrolladores: Enteraos al instante cuando alguien os invite a co-desarrollar una app.
 
Cómo personalizar vuestras preferencias de notificación en la página de Configuración de Notificaciones:

¡Invitad a Co-Desarrolladores a Vuestra App

Esta función os permite invitar a co-desarrolladores a uniros a vuestro proyecto de app. Sus nombres se mostrarán con orgullo en la página de la app y en el tile de la app en el catálogo, dando el crédito que se merece y mostrando la colaboración de vuestro equipo.

Información Importante sobre la Invitación de Co-Desarrolladore

Cuando invitéis a co-desarrolladores a vuestra app, tened en cuenta lo siguiente:

  • La aplicación debe estar publicada en Open Exchange.
  • Los co-desarrolladores deben ser miembros de Open Exchange antes de que podáis invitarles.
  • Los co-desarrolladores no podrán editar ni gestionar vuestra aplicación. Seguís siendo el único gestor y propietario de la app en Open Exchange.
  • Los co-desarrolladores verán la app listada en su página de "Aplicaciones" en su portal de desarrollador, pero sin permisos de gestión.
  • Los avatares y nombres de los co-desarrolladores se mostrarán en la página de la app y en los tiles de la app, destacando su contribución.
 
Cómo invitar a co-desarrolladores a vuestra aplicación:
Discussion (0)1
Log in or sign up to continue
Article
· Nov 27, 2024 1m read

Novedad en la interoperabilidad 2024.3 - Nueva opción para reenviar mensajes desde Visual Trace

La capacidad de reenviar mensajes fácilmente siempre ha sido una de las características más destacadas de nuestras capacidades de interoperabilidad.

Con la versión 2024.3 que se lanzará pronto (ya disponible como Developer Preview), ¡lo hemos hecho aún más sencillo!

Ahora, desde el propio Visual Trace (Traza de Mensaje), podéis encontrar junto al encabezado del mensaje un botón de Reenviar. Podéis hacer clic en él y acceder directamente al editor habitual de Reenvío de Mensajes (sin necesidad de pasar por el visor de mensajes y buscar vuestro mensaje específico).

Aquí tenéis un ejemplo de cómo se ve esto:

Consultad también las Notas de la Versión relacionadas.

Discussion (0)1
Log in or sign up to continue
Article
· Nov 26, 2024 5m read

d[IA]gnosis_ IRIS for Health を使用した RAG アプリケーションの開発

ベクトルデータ型と Vector Search 機能が IRIS に導入されたことにより、アプリケーションの開発に多数の可能性が開かれました。こういったアプリケーションの例として、バレンシア保健省が AI モデルを使用した ICD-10 コーディング支援ツールを要求した公募で出品されたアプリケーションが最近私の目に留まりました。

要求されたツールのようなアプリケーションをどのように実装できるでしょうか? 必要なものを確認しましょう。

  1. ICD-10 コードのリスト。自由テキスト内で診断を検索するための RAG アプリケーションのコンテキストとして使用します。
  2. ICD-10 コード内で相当するものを検索するためにテキストをベクトル化するトレーニング済みモデル。
  3. ICD-10 コードとテキストの取り込みとベクトル化を行うために必要な Python ライブラリ。
  4. 可能性のある診断を見つけるためのテキストを受け入れる使いやすいフロントエンド。
  5. フロントエンドから受信するリクエストのオーケストレーション。

これらのニーズに対応するために、IRIS は何を提供できるでしょうか?

Discussion (0)1
Log in or sign up to continue
Article
· Nov 26, 2024 2m read

第三十三章 UDP 客户端 服务器通信 - IPv4 和 IPv6

第三十三章 UDP 客户端 服务器通信 - IPv4 和 IPv6

UDP 支持 IPv4IPv6 互联网协议。由于这些协议不兼容,服务器和客户端都必须使用相同的Internet协议,否则传输将失败。

IPv4 地址具有以下格式。 n0255 范围内的十进制整数:

n.n.n.n

可以将 IPv4 协议指定为"0.0.0.0" 。

IPv6 地址具有以下完整格式。 h是一个具有四个十六进制数字的十六进制数:

Discussion (0)1
Log in or sign up to continue