Clear filter
Article
Maxim Yerokhin · Sep 21, 2016
Imagine that your .NET project uses the Caché DBMS and you need a fully-functional and reliable authorization system. Writing such a system from scratch would not make much sense, and you will clearly want to use something that already exists in .NET, e.g. ASP.NET Identity. By default, however, this framework supports only its native DBMS – MS SQL. Our task was to create an adaptor that would let us quickly and easily port Identity to the InterSystems Caché DBMS. This work resulted in creation of the ASP.NET Identity Caché Provider.
MSSQL is the default data provider for ASP.NET Identity, but since Identity’s authorization system can interact with any other relational DBMS, we implemented this functionality for InterSystems Caché.
The goal of the ASP.NET Identity Caché Provider project was to implement a Caché data provider that would work with ASP.NET Identity. The main task was to store and provide access to such tables as AspNetRoles, AspNetUserClaims, AspNetUserLogins, AspNetUserRoles and AspNetUsers without breaking the standard workflows involving these tables.
Let’s take a look at the implementation of the Caché data provider for ASP.NET Identity. It had two phases:
Implementation of data storage classes (that will be responsible for storing state data) and the IdentityDbContext class that encapsulates all low-level logic for interaction with the data storage. We also implemented the IdentityDbInitializer class that adapts the Caché database for working with Identity.Implementation of the UserStore and RoleStore classes (along with integration tests). A demo project.
During the first stage, the following classes were implemented:
IdentityUser — implementation of the IUser interface.IdentityUserRole — an associative entity for the User–Role pair.IdentityUserLogin — user login data.
Extendable version of the UserLoginInfo class.
IdentityUserClaim —information about the user’s claims.IdentityDbContext<TUser, TRole, TKey, TUserLogin, TUserRole, TUserClaim> — context of the Entity Framework database.
Let’s take a look at the IdentityUser entity more detailed. It is a storage for users, roles, logins, claims and user-role relations. Below there is an example of a regular and generalized variant of IdentityUser.
namespace InterSystems.AspNet.Identity.Cache
{
/// <summary>
/// IUser implementation
/// </summary>
public class IdentityUser : IdentityUser<string, IdentityUserLogin, IdentityUserRole, IdentityUserClaim>, IUser
{
/// <summary>
/// Constructor which creates a new Guid for the Id
/// </summary>
public IdentityUser()
{
Id = Guid.NewGuid().ToString();
}
/// <summary>
/// Constructor that takes a userName
/// </summary>
/// <param name="userName"></param>
public IdentityUser(string userName)
: this()
{
UserName = userName;
}
}
/// <summary>
/// IUser implementation
/// </summary>
/// <typeparam name="TKey"></typeparam>
/// <typeparam name="TLogin"></typeparam>
/// <typeparam name="TRole"></typeparam>
/// <typeparam name="TClaim"></typeparam>
public class IdentityUser<TKey, TLogin, TRole, TClaim> : IUser<TKey>
where TLogin : IdentityUserLogin<TKey>
where TRole : IdentityUserRole<TKey>
where TClaim : IdentityUserClaim<TKey>
{
/// <summary>
/// Constructor
/// </summary>
public IdentityUser()
{
Claims = new List<TClaim>();
Roles = new List<TRole>();
Logins = new List<TLogin>();
}
/// <summary>
/// Email
/// </summary>
public virtual string Email { get; set; }
Special objects called Roles are intended for access rights restrictions in Identity. A role in the configuration can correspond to job positions or types of activities of various user groups.
namespace InterSystems.AspNet.Identity.Cache
{
/// <summary>
/// EntityType that represents a user belonging to a role
/// </summary>
public class IdentityUserRole : IdentityUserRole<string>
{
}
/// <summary>
/// EntityType that represents a user belonging to a role
/// </summary>
/// <typeparam name="TKey"></typeparam>
public class IdentityUserRole<TKey>
{
/// <summary>
/// UserId for the user that is in the role
/// </summary>
public virtual TKey UserId { get; set; }
/// <summary>
/// RoleId for the role
/// </summary>
public virtual TKey RoleId { get; set; }
}
}
IdentityDbContext is an instance that encapsulates the creation of a connection, loading of entities from a database, validation of user’s objects conformity to the structure of associated tables and field values. Let’s use the OnModelCreating as an example – this method validates tables according to Identity requirements.
protected override void OnModelCreating(DbModelBuilder modelBuilder)
{
// Mapping and configuring identity entities according to the Cache tables
var user = modelBuilder.Entity<TUser>()
.ToTable("AspNetUsers");
user.HasMany(u => u.Roles).WithRequired().HasForeignKey(ur => ur.UserId);
user.HasMany(u => u.Claims).WithRequired().HasForeignKey(uc => uc.UserId);
user.HasMany(u => u.Logins).WithRequired().HasForeignKey(ul => ul.UserId);
user.Property(u => u.UserName)
.IsRequired()
.HasMaxLength(256)
.HasColumnAnnotation("Index", new IndexAnnotation(new IndexAttribute("UserNameIndex") { IsUnique = true }));
user.Property(u => u.Email).HasMaxLength(256);
modelBuilder.Entity<TUserRole>()
.HasKey(r => new { r.UserId, r.RoleId })
.ToTable("AspNetUserRoles");
modelBuilder.Entity<TUserLogin>()
.HasKey(l => new { l.LoginProvider, l.ProviderKey, l.UserId })
.ToTable("AspNetUserLogins");
modelBuilder.Entity<TUserClaim>()
.ToTable("AspNetUserClaims");
var role = modelBuilder.Entity<TRole>()
.ToTable("AspNetRoles");
role.Property(r => r.Name)
.IsRequired()
.HasMaxLength(256)
.HasColumnAnnotation("Index", new IndexAnnotation(new IndexAttribute("RoleNameIndex") { IsUnique = true }));
role.HasMany(r => r.Users).WithRequired().HasForeignKey(ur => ur.RoleId);
}
DbModelBuilder is used for comparing CLR classes with the database schema. This code-oriented approach to build an EDM model is called Code First. DbModelBuilder is typically used for configuring the model by means of redefining OnModelCreating(DbModelBuilder). However, DbModelBuilder can also be used independently from DbContext for model creation and subsequent design of DbContext or ObjectContext.
The IdentityDbInitializer class prepares the Caché database for using Identity.
public void InitializeDatabase(DbContext context)
{
using (var connection = BuildConnection(context))
{
var tables = GetExistingTables(connection);
CreateTableIfNotExists(tables, AspNetUsers, connection);
CreateTableIfNotExists(tables, AspNetRoles, connection);
CreateTableIfNotExists(tables, AspNetUserRoles, connection);
CreateTableIfNotExists(tables, AspNetUserClaims, connection);
CreateTableIfNotExists(tables, AspNetUserLogins, connection);
CreateIndexesIfNotExist(connection);
}
}
CreateTableIfNotExists method creates the necessary tables if they don't exist. Table existence checks are performed by running a query against the Cache – Dictionary.CompiledClass table that stores information about existing tables. If the table doesn't exist, it will be created.
On the second stage, IdentityUserStore and IdentityRoleStore instances were created. They encapsulate the logic of adding, editing and removing users and roles. These entities required 100% unit-test coverage.
Let's draw the bottom line: we created a data provider that allows the Caché DBMS to work with Entity Framework within the context of the ASP.NET Identity technology. The app is packed into a separate Nuget package, so if you need to work with the Caché DBMS and use standard Microsoft authorization, all you need to do is to add the Identity Caché Provider build into your project via Nuget Package Manager.
The source code of the project, along with samples and documentation, is available on GitHub. regarding Identity.Test on github: XUnit requires some extra installation steps and external engine to run, what are its benefits compared to VS's default test framework?
Article
Jonathan Levinson · Nov 1, 2016
Don’t use Python built into Mac. Because of System Integrity Protection (SIP) you will not be able to access the libraries that the InterSystems Python binding requires. The Python build into the MAC has baked in what libraries it can use.Install another Python. Don’t put this other ahead of Mac Python on path since this could break things. Apple regards its Python as part of the its OS, and may use that Python in its system operations.This Python will not be installed into the protected area. Mac does not want you messing with its Python.Use explicit path to invoke this other Python. You can use shell scripts and aliases to simplify your python with full path invocation.
Article
Dmitry Pavlov · Jan 20, 2017
About the Ontodia library
First of all, I think we should provide some background information about Ontodia and Caché DBMS. Let’s start with a less known product, Ontodia. Ontodia is the result of a joint project of the ISST lab of the ITMO University and VISmart, a software development company specializing in the semantic web domain. The Ontodia service was created as a web application for visualizing linked data and ontologies. We created this service because we couldn’t find simple, accessible and efficient tools for convenient visualization of linked data.
A typical use case for Ontodiа could be described as a sequence of 3 steps:
The user provides a set of data for visualization: RDF or OWL file, or enters access parameters for the data endpoint.
The user navigates the data and builds the desired diagram.
The user saves the diagram, shares it with colleagues, publishes it with a permanent link or embeds the diagram into a webpage.
Ontodia is up and running 24/7 at www.ontodia.org.
An overview of its basic capabilities in the screencast format can be found here.
As time went by and we collected some feedback, we came to realize that there was huge demand for graph-based visualization of data stored in relational and object databases. The Russian office of Intersystems company was very prominent in expressing interest in Ontodia.
The service started to grow and gain new features, eventually going beyond semantic data. First and foremost, we added the support of object DBMS’s (Caché DBMS is essentially an object database) via a proprietary REST API used for receiving data.
The next big step was the porting of the Ontodia service into a standalone JavaScript library. The code of the library is currently published on <github>. All server-side functions, such as access control, user management, data storage and such, are now performed by platforms that use Ontodia as a library. In this particular case, InterSystems Caché became such a platform.
Creation of a standalone library made it possible to implement simplified integration with platforms. The scenario of interaction between a platform, such as Caché DBMS, and the library is now as follows:
When a user sends a request to a particular URL, the Caché DBMS server invokes the library code
The library requests the necessary set of data (to be shown on a graph) from Caché DBMS. The server replies with requested data.
Ontodia restores the data scheme in graph terms (connections, vertices, connection and vertex properties)
The user initiates graph saving and Ontodia sends a JSON to the server for saving and storing the graph
About InterSystems Caché
We should now say a few words about the system that made Ontodia its standard part. If you look broadly at it, then InterSystems Caché is a database management system, but if you step back and look at it from an even broader angle, you’ll find out that it’s a platform for developing data processing applications. It is also assumed that Caché is a multi-model DBMS, which means that it offers a number of different ways to store, display and provide access to data based on corresponding representations.
In essence, Ontodia is yet another method of representing data in Caché, which allows the user to better understand the structure of data organization in terms of stored classes and connections between them (including ranks and connection semantics), as well as to visualize the properties of stored class instances, their connections with other instances in the database, and, finally, to navigate instances in the database.
Ontodia’s capabilities relevant to Caché
Ontodia can display the following data elements on an interactive graph:
Classes and their properties;
Instances and their properties;
Connections between classes;
Connections between classes and instances;
Connections between instances.
Graph rendering features
Each vertex of the graph is displayed as a rectangle with rounded corners. Inside each rectangle is a label with the name of the object, and a small text on a colored bar specifying the class of the object. The bar above helps distinguish classes by color.
A single click on a graph vertex places the focus on this vertex and displays all related objects on the Instances panel.
A double-click on a vertex helps unfold a list of object properties.
Connections are rendered as lines with arrows corresponding to their direction and their names shown above.
After this brief familiarization with Ontodia’s visual language, it’s time to show how data from Caché is displayed and how Ontodia helps interact with graphs.
User interaction with graph
At the start, Ontodia displays a tree of classes of the received data set on the Class tree panel (see illustration) and allows you to drag classes into the graph’s workspace (in the middle). It allows you to search the class tree and filter results on the go as characters are typed in.
The user can view all instances of any class. Instances are filtered on the Instances tab by clicking on any class in the tree. If the number of results is too high, you can use simple string search to find what you need.
The user can also drag one or several instances from the Instances panel to the graph’s workspace.
Users can manage the display of connections by going to the Connections panel and selecting the types of connections to be shown on the graph. You can also enable or disable connection labels. This feature lets you control the “saturation” of the graph with elements and connections.
Users can remove any vertices and connections, thus forming the resulting set of data to be displayed.
The user can move vertices and connections across the work area to achieve the most visually efficient placement of elements.
The user can zoom in and out from the graph, move the graph within the boundaries of the workspace and fit the graph size to that of the screen.
Double-clicking any vertex will open a list of object properties.
Below is an example of a graph that took around 30 seconds to create.
Navigation capabilities
Not only does Ontodia allow you to create any number of data views (diagrams), but also allows you to analyze data by looking at its visual representation:
Click any graph vertex and get a list of all directly connected objects on the Instances panel. After that, users can drag the objects, thus plotting the trajectory of their navigation through the data.
Get all connections forming the selected vertex on the graph. Connections will be listed in the top part of the Connections panel. The number of objects on the other side of the connection will be shown to the right of each identified connection on the Connections panel. By clicking the filter icon next to the necessary connection, the user will populate the corresponding area of the Instances panel with objects that are connected to the object selected on the graph.
How to see a demo and get more information:
Link to a demo.
Link to an Ontodia screencast.
The projects repository is publicly accessible here.
How to install
To install the latest version of the library capable of working with Caché via a Rest API, you need to import the OntodiaCache package. To do this, download the archive here, then import the OntodiaCache.xml file (it will be in ontodia-cache-release.zip archive) to Caché Studio (tutorial). As a result, you will get all the necessary resources, and compilation will produce a new web application.
How to start
After installation, go to the Caché server at [server URL]/csp/ontodia-cache/index.html.
In order to specify which particular namespace to use for data visualization, add a “namespace” parameter for the necessary value to the address line.
Example: localhost:57772/csp/ontodia-cache/index.html?namespace=Samples
Building a new version of OntodiaCache.xml
To build a project, you will need NodeJS to be installed. If you have it, clone the source code of the repository and run the npm install command in the root folder of the project. This will install all the necessary modules via NPM.
Once done, you will need to install the old version of the OntodiaCache.xml file — see the “How to install” section above.
Starting from this moment, you will need to execute the npm run webpack command in the root folder to build the project. Running this command will create the necessary source code that must be moved to the InterSystems Caché Studio server folder ({Server folder}\Cache\CSP\{namespace}
Example: C:\InterSystems\Cache\CSP\ontodia-cache) – can be used for developing a new version of the library.
To complete the process, go to InterSystems Caché Studio, click the root element of the Workspace panel and select the “export to xml” command from the context menu, then specify the destination folder for the new version of the .xml file.
What’s next
We plan to extend the functionality of the library, and specifically do the following:
Create configurable templates for displaying instances of particular classes (we’d like to, for example, allow users to show an instance of the “person” class as a photo with a name, last name and contact details)
Implement a possibility to edit data right on the graph
Implement the support of popular notations: UML, IDEF, Archimate, etc.
It is clear that creating a new version with the listed features will take a lot of time. We don’t want to wait for too long and will appreciate any help or contributions that will help make Ontodia better and more functional.
Feedback
The latest news about the project is available in our blog.
Contact us by email. I haven't read the article yet but, just to let you know, the first two images are missing. Images are fine for me.Looks like they are hosted on a google drive, perhaps your behind an http proxy that blocks that domain? I've relocated this two images from google drive storage to another one (to the same as for others pictures in this article), hope it helps.
Question
Evgeny Shvarov · Apr 9, 2017
Hi!Sometimes I need to filter the widget on a dashboard from a different cube. And I face the following problem:Widget A refers to a query from Cube A and I want to filter Widget B from Widget B.Widget's B pivot refers to Cube B, and which has different dimensions for the same data.E.g. cube A has the dimension Author and the Cube B has the dimension Member for the same data. So there is no way to filter such a widget B from the widget A.Actually, once we filter a given widget B with another widget A, we add the Filter Expression to the MDX query which looks like member's expression from Cube A, like:[Outlet].[H1].[Region].&[Asia]Is there any way to alter the filter expression for Widget B, just changing the value of the last part (Asia in this case) of the filter expression? One way to do this is by using a pivot variable. Create the same pivot variable "Region" in both pivots on which your widgets are based. These pivot variables should return the members, in your example Asia, Europe, N. America, S. America. You can define the manually or in a termlist, or use a kpi to retrieve them. For the example in the screenshot below I created a HoleFood2 cube with a Outlet2.H1.Region2 level. This level is "incompatible" with an Outlet.H1.Region level in HoleFoods. In my manual Region pivot variable I simply defined two regions, which can be selected manually. Once you have these two pivot variables create a calculated dimension on each pivot using the pivot variable. In your example in HoleFoods the expression should be Outlet.[$variable.Region]. Place the calculated dimension on Filters.This is how I did it in HoleFoods:and this is how I did it in HoleFoods2:Finally, add an ApplyVariable control on one of your widgets with "*" as target. Selecting a region will filter both widgets.
Article
Semion Makarov · Sep 10, 2017
System Monitor is a flexible and highly configurable tool supplied with Caché (Ensemble, HealthShare), which collects the essential metrics of the operating system and Caché itself. System Monitor also notifies administrators about issues with Caché and the operating system, when one or several parameters reach the admin-defined thresholds.
Notifications are sent by email or delivered otherwise via a custom notifications class. Notifications can be configured with the help of the ^%SYSMONMGR tool. For email delivery, the admin needs to to specify the parameters of the email server, the recipient’s email address and authentication settings. After that, the user can add the required addresses to the delivery list and test the settings by sending a test message. Once done, the tool will send email notifications about the remaining hard drive space, license expiry information and more. Additional information about notifications can be found here.
Immediately after startup (by default, the tool is launched along with a Caché instance), System Monitor starts collecting metrics and recording them to system tables. Collected data is available via SQL. Besides, you can use SYSMON Dashboards to view and analyze these metrics starting from version 2015.1.
SYSMON Dashboards is an open-source project for viewing and analyzing metrics. The project is supplied with a set of analytical dashboards featuring graphs for OS and Caché parameters. SYSMON Dashboards uses the DeepSee technology for analytics and building analytical dashboards. The installation process is fairly simple. Here’s what you need to do:
Download the latest release,Import the class to any space (for example, USER),Start the installation using the following command:
do ##class(kutac.monitor.utils.Installer).setup().
All other settings will be configured automatically. After installation, the DeepSee portal will get a set of DeepSee toolbars for viewing and analyzing performance metrics.
In order to view DeepSee toolbars, I use DeepSeeWeb, an open-source project that uses an extended set of components for visualizing DeepSee analytical panels.
SYSMON Dashboards also includes a web interface for configuring the monitor and notifications. For detailed configuration, I recommend using the ^%SYSMONMGR utility. The SYSMON Dashboards settings page helps monitor a set of metrics, as well as to start/stop the monitor.
Configuration of Email notification settings via the web interface is no different than the standard process: you need to specify the server parameters, address and authentication details.
Example of email settings configuration:
This way, by using the standard Caché tool, SYSMON Dashboards and DeepSeeWeb, you can considerably simplify the task of monitoring InterSystems platforms. Thanks Semion. I downloaded and installed SYSMON dashboards on a development server but I can't find the SYSMON Configurator tool you're mentioning and showing. Where is that located? Sorry for waiting. Go to URL: "{yourserver:port}/csp/sysmon/index.html".
Article
Kimberly Dunn · Sep 26, 2017
This summer the Database Platforms department here at InterSystems tried out a new approach to our internship program. We hired 10 bright students from some of the top colleges in the US and gave them the autonomy to create their own projects which would show off some of the new features of the InterSystems IRIS Data Platform. The team consisting of Ruchi Asthana, Nathaniel Brennan, and Zhe “Lily” Wang used this opportunity to develop a smart review analysis engine, which they named Lumière. As they explain:
A rapid increase in Internet users along with the growing power of online reviews has given birth to fields like opinion mining and sentiment analysis. Today, most people seek positive and negative opinions of a product before making a purchase. Customers find information from reviews extremely useful because they want to know what people are saying about the product they want to buy. Information from reviews is also crucial to marketing teams, who are constantly seeking customer feedback to improve the quality of their products. While it is universal that people want feedback about online products, they are often not willing to read through all the hundreds or even thousands of customer reviews that are available. Therefore our tool extracts the information both vendors and customers need so they can make the best decision without having to read through any reviews.
That sounds really great, doesn’t it? Check out the rest of their whitepaper to get more details about what they were able to accomplish and how InterSystems IRIS enabled them to do it!
Announcement
Celeste Canzano · Jan 22
Hi Everyone!
The Certification Team of InterSystems Learning Services is currently developing an InterSystems ObjectScript Specialist certification exam, and we are reaching out to our community for feedback that will help us evaluate and establish the contents of this exam.
Please note that this is one of two exams being developed to replace our InterSystems IRIS Core Solutions Developer exam. You can find more details about the other exam, InterSystems IRIS Developer Professional exam, here.
How do I provide my input? Complete our Job Task Analysis (JTA) survey! We will present you with a list of job tasks, and you will rate them on their importance as well as other factors.
How much effort is involved? It takes about 20-30 minutes to fill out the survey. You can be anonymous or identify yourself and ask us to get back to you.
How can I access the survey? You can access the survey here.
Survey does not work well on mobile devices - you can access it, but it will involve a lot of scrolling.
Survey can be resumable if you return to it on the same device in the same browser – Use the Next button at the bottom of the page to save your answers
Survey will close on March 15, 2025
What’s in it for me? You get to weigh in on the exam topics for our new exam, and as a token of our appreciation, all participants will be entered into a raffle where 10 lucky winners will receive a $50 Tango gift card, Global Masters points, and potentially other swag. Note: Tango gift cards are only available for US-based participants. InterSystems and VA employees are not eligible.
Here are the exam title and the definition of the target role:
InterSystems ObjectScript Specialist
An IT professional who is familiar with object-oriented programming concepts and:
creates InterSystems IRIS classes,
writes and reviews InterSystems ObjectScript and SQL code, and
uses both objects and SQL for data access.
At least six months to one year of experience with the tasks listed above is recommended. Caution: For those that have a dark themed operating system you might now be able to read the survey.
The solution I found was to switch to a light colored theme. Thank you for bringing this to my attention! I have updated the instructions in the post accordingly. The survey has been updated to now be readable with a dark theme!
Article
Rahul Singhal · Mar 1
**Introduction**
To achieve optimized AI performance, robust explainability, adaptability, and efficiency in healthcare solutions, InterSystems IRIS serves as the core foundation for a project within the x-rAI multi-agentic framework. This article provides an in-depth look at how InterSystems IRIS empowers the development of a real-time health data analytics platform, enabling advanced analytics and actionable insights. The solution leverages the strengths of InterSystems IRIS, including dynamic SQL, native vector search capabilities, distributed caching (ECP), and FHIR interoperability. This innovative approach directly aligns with the contest themes of "Using Dynamic SQL & Embedded SQL," "GenAI, Vector Search," and "FHIR, EHR," showcasing a practical application of InterSystems IRIS in a critical healthcare context.
**System Architecture**
The Health Agent in x-rAI is built on a modular architecture that integrates multiple components:
Data Ingestion Layer: Fetches real-time health data from wearable devices using the Terra API.
Data Storage Layer: Utilizes InterSystems IRIS for storing and managing structured health data.
Analytics Engine: Leverages InterSystems IRIS's vector search capabilities for similarity analysis and insights generation.
Caching Layer: Implements distributed caching via InterSystems IRIS Enterprise Cache Protocol (ECP) to enhance scalability.
Interoperability Layer: Uses FHIR standards to integrate with external healthcare systems like EHRs.
Below is a high-level architecture diagram:
[Wearable Devices] --> [Terra API] --> [Data Ingestion] --> [InterSystems IRIS] --> [Analytics Engine]
------[Caching Layer]------
----[FHIR Integration]-----
**Technical Implementation**
**1. Real-Time Data Integration Using Dynamic SQL**
The Health Agent ingests real-time health metrics (e.g., heart rate, steps, sleep hours) from wearable devices via the Terra API. This data is stored in InterSystems IRIS using dynamic SQL for flexibility in query generation.
**Dynamic SQL Implementation**
Dynamic SQL allows the system to adaptively construct queries based on incoming data structures.
def index_health_data_to_iris(data):
conn = iris_connect()
if conn is None:
raise ConnectionError("Failed to connect to InterSystems IRIS.")
try:
with conn.cursor() as cursor:
query = """
INSERT INTO HealthData (user_id, heart_rate, steps, sleep_hours)
VALUES (?, ?, ?, ?)
"""
cursor.execute(query, (
data['user_id'],
data['heart_rate'],
data['steps'],
data['sleep_hours']
))
conn.commit()
print("Data successfully indexed into IRIS.")
except Exception as e:
print(f"Error indexing health data: {e}")
finally:
conn.close()
*Benefits of Dynamic SQL*
Enables flexible query construction based on incoming data schemas.
Reduces development overhead by avoiding hardcoded queries.
Supports seamless integration of new health metrics without modifying the database schema.
**2. Advanced Analytics with Vector Search**
InterSystems IRIS’s native vector datatype and similarity functions were utilized to perform vector search on health data. This allowed the system to identify historical records similar to a user’s current health metrics.
**Vector Search Workflow**
> Convert health metrics (e.g., heart rate, steps, sleep hours) into a vector representation.
>
> Store vectors in a dedicated column in the HealthData table.
>
> Perform similarity searches using VECTOR_SIMILARITY().
**SQL Query for Vector Search**
SELECT TOP 3 user_id, heart_rate, steps, sleep_hours,
VECTOR_SIMILARITY(vec_data, ?) AS similarity
FROM HealthData
ORDER BY similarity DESC;
**Python Integration**
def iris_vector_search(query_vector):
conn = iris_connect()
if conn is None:
raise ConnectionError("Failed to connect to InterSystems IRIS.")
try:
with conn.cursor() as cursor:
query_vector_str = ",".join(map(str, query_vector))
sql = """
SELECT TOP 3 user_id, heart_rate, steps, sleep_hours,
VECTOR_SIMILARITY(vec_data, ?) AS similarity
FROM HealthData
ORDER BY similarity DESC;
"""
cursor.execute(sql, (query_vector_str,))
results = cursor.fetchall()
return results
except Exception as e:
print(f"Error performing vector search: {e}")
return []
finally:
conn.close()
*Benefits of Vector Search*
Enables personalized recommendations by identifying historical patterns.
Enhances explainability by linking current metrics to similar past cases.
Optimized for high-speed analytics through SIMD (Single Instruction Multiple Data) operations.
**3. Distributed Caching for Scalability**
To handle increasing volumes of health data efficiently, the Health Agent leverages InterSystems IRIS’s Enterprise Cache Protocol (ECP). This distributed caching mechanism reduces latency and enhances scalability.
*Key Features of ECP*
Local caching on application servers minimizes central database queries.
Automatic synchronization ensures consistency across all cache nodes.
Horizontal scaling enables dynamic addition of application servers.
**Caching Workflow**
Frequently accessed health records are cached locally on application servers.
Subsequent queries for the same records are served directly from the cache.
Updates to cached records trigger automatic synchronization with the central database.
*Benefits of Caching*
Reduces query response times by serving requests from local caches.
Improves system scalability by distributing workload across multiple nodes.
Minimizes infrastructure costs by reducing central server load.
**4. FHIR Integration for Interoperability**
InterSystems IRIS’s support for FHIR (Fast Healthcare Interoperability Resources) ensured seamless integration with external healthcare systems like EHRs.
FHIR Workflow
Wearable device data is transformed into FHIR-compatible resources (e.g., Observation, Patient).
These resources are stored in InterSystems IRIS and made accessible via RESTful APIs.
External systems can query or update these resources using standard FHIR endpoints.
*Benefits of FHIR Integration*
Ensures compliance with healthcare interoperability standards.
Facilitates secure exchange of health data between systems.
Enables integration with existing healthcare workflows and applications.
**Explainable AI Through Real-Time Insights**
By combining InterSystems IRIS’s analytics capabilities with x-rAI’s multi-agentic reasoning framework, the Health Agent generates actionable and explainable insights. For example:
> "User 123 had similar metrics (Heart Rate: 70 bpm; Steps: 9,800; Sleep: 7 hrs). Based on historical trends, maintaining your current activity levels is recommended."
This transparency builds trust in AI-driven healthcare applications by providing clear reasoning behind recommendations.
**Conclusion**
The integration of InterSystems IRIS into x-rAI’s Health Agent showcases its potential as a robust platform for building intelligent and explainable AI systems in healthcare. By leveraging features like dynamic SQL, vector search, distributed caching, and FHIR interoperability, this project delivers real-time insights that are both actionable and transparent—paving the way for more reliable AI applications in critical domains like healthcare. Thanks for giving it a read! Please reach out to me if you have any suggestions or thoughts. Do give it a like if you resonate with it :)
Article
Evgeny Shvarov · Mar 20, 2018
Hi, Community!I’m sure you are using Developer Community analytics built with InterSystems Analytics technology DeepSee: You can find DC analytics n InterSystems->Analytics menu.DC Analytics shows interactive dashboards on key figures of DC entities: Posts, Comments, and Members. Since the last week, this analytics project is available for everyone with source code and data on DC Github!It is available for everyone (who is familiar with InterSystems data platform ;) to download and install on Caché/Ensemble/IRIS, load the data and have same analytics locally. And what is more interesting IMHO you can improve the analytics and pull request to the Github!So! What is the data?The schema of persistent classes is very complex and consists of 3 classes: Posts, Comments, and Members. See the diagram built in ClassExplorer, courtesy of @Nikita.Savchenko .Another portion of persistent data is a daily data on views and achievements of members.This data can be imported and is available in Releases section in a form of gz with globals in xml format.InstallationHow to get this on your InterSystems Data Platform instance?You would need IDP instance of 2014.1 and newer.1 Install MDX2JSON (30 sec)2 Install DeepSee Web(DSW) (1 minute)3 Create a new Namespace (name it e.g. DCANALYTICS), enable DeepSee and iKnow.4 Go to Releases and download the latest DCAnalytics_classes*.xml file.Import it into DCANALYTICS namespace e.g. with $System.OBJ.ObjLoad(), Atelier or Class Import UI in Control Panel:5. Start the data import and installation. Call the setup method and provide the path to DCAnalytics_globals.gz file. :
DCANALYTICS> do ##class(Community.Utils).setup("DCAnalytics_globals.gz")
The setup does the following:
1. Imports globals for persistent classes (without indices)
2. Builds indices
3. Builds cubes
6. Setup tiles for DSW. Download from releases the latest DSW.config.and.iKnow.files.zip, unpack and move file dcanalytics.json from archive to <your_instance>/CSP/dcanalytics/configs/. The name of dcanalytics.json should match the name of the namespace.
DONE!
Open the url <server:port>/dsw/index.html?ns=DCANALYTICS and get your own DC Analytics.
There are also dashboards, which use iKnow to show links between terms and articles.
To setup iKnow part of the solution do the following:
Download from release DSW.config.and.iKnow.files.zip and move files sets.txt and backlist.txt from archive to <your_instance>/Mgr/DCANALYTICS/.Run in the terminal following:
DCANALYTICS> do ##class(Community.iKnow.Utils).setup()
DCANALYTICS> do ##class(Community.iKnow.Utils).update()
DCANALYTICS> do ##class(Community.Utils).UpdateСubes()
Open iKnow dashboard on: <server:port>/dsw/index.html#!/d/iKnow.dashboard?ns=DCANALYTICS
And you will see something like this:
What’s Next? – Make it better!
Hope, as a usual developer, you don’t like the implementation, don’t like the style, dashboards are awful or not enough - this is great! You can fix it! Let the collaboration started!
So, fork it, make it better and provide the pull request. We’ll review it, merge it and introduce to the solution.
Make your own better Developer Community Analytics with InterSystems Data Platform! Hi, guys!There is a community project of DSW reports - which provides a way to prepare and send DeepSee Web reports in PDF by schedule.So, we introduced this feature to DC online analytics to have a weekly report in PDF - like this one.If you want to receive this report on Mondays too please put your comment here, we'll put you on the list.And you are very welcome to introduce your issues how we can improve it, or provide your pull requests.
Announcement
Anastasia Dyubaylo · Jul 6, 2018
Hi Everybody!
We are pleased to invite you to the upcoming webinar "Sharding as the basis of Scaling in InterSystems IRIS Data Platform" on 24th of July at 10:00 (Moscow time)!
The webinar focuses on the sharding technology that offers new capabilities for horizontal scalability of data in the InterSystems IRIS platform. Parallelization of data storage and processing power allows you to dynamically scale your applications.
In particular, the following topics would be discussed in detail:
sharding basics;
usecases where it's advisable to use sharding;
rapid creation of a sharding cluster with ICM;
creating sharded cluster LIVE;
advantages of using sharding with Apache Spark, JDBC.
Presenter: @Vladimir.Prushkovskiy, InterSystems Sales Engineer.
Audience: The webinar is designed for developers.
Note: The language of the webinar is Russian.
We are waiting for you at our webinar! Register now!
Announcement
Murray Oldfield · Aug 17, 2018
Hi Everyone!If you are attending VMworld Las Vegas remember to look for two sessions on InterSystems IRIS in the content catalog!InterSystems IRIS sessions at VMworld:1. "Accelerating Digital Transformation with InterSystems IRIS and vSAN" is a '100' level session. This session will show examples from multiple industries, including financial services, logistics, transportation, and manufacturing, and describe benefits and use cases of a unified data platform running on vSAN. I will also review recent benchmark results and scalability choices for InterSystems IRIS. Time: Wednesday, Aug 29, 2:00 p.m. - 3:00 p.m.2. "Best Practices for Deploying the InterSystems IRIS Data Platform on vSAN" is a more technically focused '200' level session. If you are considering moving your InterSystems IRIS or Caché application to hyper-converged infrastructure this session offers practical guidance and explains key architecture and configuration decisions and best practices for a successful deployment. Time: Wednesday, Aug 29, 3:30 p.m. - 4:30 p.m.Join us on the 29th of August!
Announcement
Olga Zavrazhnova · Sep 27, 2019
Hey Developers,
Have you already reviewed InterSystems Caché or IRIS on the Gartner Peer Insights? If not, here is your chance to get two $25 VISA Cards - one from Gartner Peer Insights and one from InterSystems!This promotion ends on December 5, 2019.
See the rules below.
✅ #1: Follow this unique link and submit a review for IRIS or Caché.
✅ #2: Make a screenshot of the headline and text of your review.
✅ #3: Upload a screenshot in this challenge on Global Masters. Done? Great! After your review is published you will get two $25 VISA Cards!
Note:
• InterSystems IRIS and Caché reviews only.• Use mentioned above unique link in order to qualify for the gift cards.• The survey takes about 10-15 minutes. Gartner will authenticate the identity of the reviewer, but the published reviews are anonymous. You can check the status of your review and gift card in your Gartner Peer Insights reviewer profile at any time.
Done? Awesome! Your cards are on the way!
Article
Evgeny Shvarov · Sep 29, 2019
Hi Developers!
When you prepare your modules for ZPM (InterSystems Package Manager) it expects the certain directory structure for ObjectScript source files.
ObjectScript in your source folder need to be stored by types in the following subfolders. E.g. if you have the source folder named as /src the structure should be as follows:
/src
/cls - for classes
/inc - for include files
/mac - for mac files
/int - for interpretable files
/gbl - for globals
And the ObjectScript should be in CLS (a.k.a UDL) and not in XML.
Example.
Ok! But how could you setup it for your current ObjectScript packages? Manually?
Never!
There are at least 2 ways to manage this.
1. Use isc-dev module
Import isc-dev module via ZPM as:
zpm:USER>install isc-dev
Or import the release into the namespace.
Setup the workdir for isc-dev:
USER>Do ##class(dev.code).workdir("/yoursourcedir")
USER>Do ##class(dev.code).export()
It will export all the source code in CLS into "/yoursourcedir" folder and will create proper folders structure with file types and packages.
2. Another way to do the proper export is to use VSCode ObjectScript
1. Open the folder with your project where your want to export sources.
2. Create the source folder, .e.g. /src
3. Put the src folder in VSCode settings.json and add
"objectscript.export.addCategory": true,
and
"objectscript.export.folder": "src"
into the settings. E.g. here is the example of settings.json file:
"objectscript.conn.active": true,
"objectscript.conn.version": 3,
"objectscript.conn.ns": "USER",
"objectscript.conn.port": 52773,
"objectscript.export.addCategory": true,
"objectscript.export.folder": "src"
After that open InterSystems ObjectScript section in VSCode, find the package you want to export or all of them and click Export:
Article
Evgeny Shvarov · Apr 14, 2019
Hi guys!Portrait of Madame X, Gustave Caillebotte.One of the features I like in InterSystems ObjectScript is how you can process array transformations in a specific method or a function.Usually when we say "process an array" we assume a very straightforward algorithm which loops through an array and does something with its entries upon a certain rule.The trick is how you transfer an array to work with into a function. One of the nice approaches on how to pass the information about an array is using $Name and Indirection operator. Below you can find a very simple example which illustrates the thing.Suppose we have a method which accepts an array and does the multiplication by two of all the values for all the keys on a level one. The array passed could be an array of any type: global, local, or PPG.Here is the code of the method:
ClassMethod ArraySample(list) As %Status
{
set key=$Order(@list@("")) //get initial key
while key'="" {
set @list@(key)=@list@(key)*2 // multiply all the keys in a list
set key=$Order(@list@(key))
}
}
How this could be used?
E.g. we have a global ^A and we want all the entries of the first level in global ^A to by multiplied by 2. Here is the code:
set arr=$Name(^A)
kill @arr
for i=1:1:10 {
set @arr@(i)=i
}
w "Initial array:",!
zw @arr
do ..ArraySample(arr)
w !,"Multiplied arrays:",!
zw @arr
}
if you run this it will produce the following:
USER>d ##class(Ideal.ObjectScript).ArraySampleTest()
Initial array:
^A(1)=1
^A(2)=2
^A(3)=3
^A(4)=4
^A(5)=5
^A(6)=6
^A(7)=7
^A(8)=8
^A(9)=9
^A(10)=10
After multiplication:
^A(1)=2
^A(2)=4
^A(3)=6
^A(4)=8
^A(5)=10
^A(6)=12
^A(7)=14
^A(8)=16
^A(9)=18
^A(10)=20
Notice @arr@(index) construct. This is called double indirection or subscript indirection.
if arr contains the path to an array subscript indirection lets you refer to subscripts of any initial arr. This could be useful in many cases e.g. if you want to shortcut the long global path or to use in methods as in this case.
Notice $Name function:
set arr=$Name(^A)
in this case $Name sets into arr the name of an array which lets you work with subscripts of ^A using @arr@(subscirpt1,subscript2,...) constructions.
This is similar to
set arr="^A"
But $Name could be used not only for a global name itself but to get a path to any subscript level of the global. E.g. It's very handy when we have strings on the level which saves us from double quotes:
set arr=$Name(^A("level one",3,"level three"))
is much handier than:
set arr ="^A(""level one"",3,""level three"")"
So, $Name helps you to get a fully qualified name for any global, local, or PPG path when you want to process the subscripts of it in the function/procedure/method and you don't want to transfer the data, but provide only the name of an array. Usage $Name + plus double indirection @list@() does the work.
This ObjectScript flexibility could be really helpful if you build API, or libraries, which work with a large amount of persistent data.
I put placed the code above into this simple class, which you can find also in Open Exchange.
So!
I'm not telling "Don't try this at home!". In contrary: "Try it on your laptop"!
The fastest and coolest way is to checkout the repo and as soon as the repo is dockerized you are able to run the example with IRIS Community Edition on your laptop using three following commands in the repo directory:
build container:
$ docker-compose build
run container:
$ docker-compose up -d
start IRIS terminal and call the sample method:
$ docker-compose exec iris iris session iris
USER>zn "OSCRIPT"
OSCRIPT> do ##class(Ideal.ObjectScript).ArraySampleTest()
Added VSCode settings with the last commit - so you are able to code and compile immediately after you open the project in VSCode. And the settings of VSCode which does the "instant-coding" effect are:
{
"objectscript.conn.version": 3,
"objectscript.conn.ns": "OSCRIPT",
"objectscript.conn.port": 52773,
"objectscript.conn.active": true
}
The file.
Article
David E Nelson · Apr 26, 2019
The last time that I created a playground for experimenting with machine learning using Apache Spark and an InterSystems data platform, see Machine Learning with Spark and Caché, I installed and configured everything directly on my laptop: Caché, Python, Apache Spark, Java, some Hadoop libraries, to name a few. It required some effort, but eventually it worked. Paradise. But, I worried. Would I ever be able to reproduce all those steps? Maybe. Would it be possible for a random Windows or Java update to wreck the whole thing in an instant? Almost certainly.
Now, thanks to the increasingly widespread availability of containers and the increasingly usable Docker for Windows, I have my choice of pre-configured machine learning and data science environments . See, for example, Jupyter Docker Stacks and Zeppelin on Docker Hub. With InterSystems making the community edition of the IRIS Data Platform available via container (InterSystems IRIS now Available on the Docker Store), I have easy access to a data platform supporting both machine learning and analytics among a host of other features. By using containers, I do not need to worry about any automatic updates wrecking my playground. If my office floods and my laptop is destroyed, I can easily recreate the playground with a single text file, which I have of course placed in source control ;-)
In the following, I will share a Docker compose file that I used to create a container-based machine learning and data science playground. The playground involves two containers: one with a Zeppelin and Spark environment, the other with the InterSystems IRIS Data Platform community edition. Both use images available on Docker hub. I’ll then show how to configure the InterSystems Spark Connector to connect the two. I will end by loading some data into InterSystems IRIS and using Spark to do some data exploration, visualization, and some very basic machine learning . Of course, my example will barely scratch the surface of the capabilities of both Spark and InterSystems IRIS. However, I hope the article will be useful to help others get started doing more complex and useful work.
Note: I created and tested everything that follows on my Windows 10 laptop, using Docker for Windows. For information on configuring Docker for Windows for use with InterSystems IRIS please see the following. The second of the two articles also discusses the basics of using compose files to configure Docker containers.
Using InterSystems IRIS Containers with Docker for Windows
Docker for Windows and the InterSystems IRIS Data Platform
Compose File for the Two-Container Playground
Hopefully, the comments in the following compose file do a reasonably adequate job of explaining the environment, but in case they do not, here are the highlights. The compose file defines:
Two containers: One containing the InterSystems IRIS Community Edition and the other containing both the Zeppelin notebook environment and Apache Spark. Both containers are based on images pulled from the Docker store.
A network for communication between the two containers. With this technique, we can use the container names as host names when setting up communication between the containers.
Local directories mounted in each container. We can use these directories to make jar files available to the Spark environment and some data files available to the IRIS environment.
A named volume for the durable %SYS feature needed by InterSystems IRIS. Named volumes are necessary for InterSystems IRIS when running in containers on Docker for Windows. For more about this see below for links to other community articles.
Map some networking ports inside the containers to ports available outside the containers to provide easy access.
version: '3.2'
services:
#container 1 with InterSystems IRIS
iris:
# iris community edition image to pull from docker store.
image: store/intersystems/iris:2019.1.0.510.0-community
container_name: iris-community
ports:
# 51773 is the superserver default port
- "51773:51773"
# 52773 is the webserver/management portal default port
- "52773:52773"
volumes:
# Sets up a named volume durable_data that will keep the durable %SYS data
- durable:/durable
# Maps a /local directory into the container to allow for easily passing files and test scripts
- ./local/samples:/samples/local
environment:
# Set the variable ISC_DATA_DIRECTORY to the durable_data volume that we defined above to use durable %SYS
- ISC_DATA_DIRECTORY=/durable/irissys
# Adds the IRIS container to the network defined below.
networks:
- mynet
#container 2 with Zeppelin and Spark
zeppelin:
# zeppelin notebook with spark image to pull from docker store.
image: apache/zeppelin:0.8.1
container_name: spark-zeppelin
#Ports for accessing Zeppelin environment
ports:
#Port for Zeppelin notebook
- "8080:8080"
#Port for Spark jobs page
- "4040:4040"
#Maps /local directories for saving notebooks and accessing jar files.
volumes:
- ./local/notebooks:/zeppelin/notebook
- ./local/jars:/home/zeppelin/jars
#Adds the Spark and Zeppelin container to the network defined below.
networks:
- mynet
#Declares the named volume for the IRIS durable %SYS
volumes:
durable:
# Defines a network for communication between the two containers.
networks:
mynet:
ipam:
config:
- subnet: 172.179.0.0/16
Launching the Containers
Place the compose file in a directory on your system. Note that the directory name becomes the Docker project name. You will need to create sub-directories matching those mentioned in the compose file. So, my directory structure looks like this
iris_spark_zeppelin
local
jars
notebooks
samples
docker-compose.yml
To launch the containers, execute the following Docker command from inside your project directory:
C:\iris_spark_zeppelin>docker-compose up –d
Note that the –d flag causes the containers in detached mode. You will not see them logging any information to the command line.
You can inspect the log files for the containers using the docker logs command. For example, to see the log file for the iris-community container, execute the following:
C:\>docker logs iris-community
To inspect the status of the containers, execute the following command:
C:\>docker container ls
When the iris-community container is ready, you can access the IRIS Management Portal with this url:
http://localhost:52773/csp/sys/UtilHome.csp
Note: The first time you login to IRIS use the username/password: SuperUser/SYS. You will be re-directed to a password change page.
You can access the Zeppelin notebook with this url:
http://localhost:8080
Copying Some Jar FilesIn order to use the InterSystems Spark Connector, the Spark environment needs access to two jar files:
1. intersystems-jdbc-3.0.0.jar
2. intersystems-spark-1.0.0.jar
Currently, these jar files are with IRIS in the iris-community container. We need to copy them out into the locally mapped directory so that the spark-zeppelin container can access them.
To do this, we can use the Docker cp command to copy all the JDK 1.8 version files from inside the iris-community container into one of the local directories visible to the spark-zeppelin container. Open a CMD prompt in the project directory and execute the following command:C:\iris_spark_zeppelin>docker cp iris-community:/usr/irissys/dev/java/lib/JDK18 local/jars
This will add a JDK18 directory containing the above jar files along with a few others to <project-directory>/local/jars.
Adding Some Data
No data, no machine learning. We can use the local directories mounted by the iris-community container to add some data to the data platform. I used the Iris data set (no relation to InterSystems IRIS Data Platform). The Iris data set contains data about flowers. It has long served as the “hello world” example for machine learning (Iris flower data set). You can download or pull an InterSystems class definition for generating the data, along with code for several related examples, from GitHub (Samples-Data-Mining). We are interested in only one file from this set: DataMining.IrisDataset.cls.
Copy DataMining.IrisDataset.cls into your <project-directory>/local/samples directory. Next, open a bash shell inside the iris-community container by executing the following from a command prompt on your local system:
C:\>docker exec –it iris-community bash
From the bash shell, launch an IRIS terminal session:
/# iris session iris
IRIS asks for a username/password. If this is the first time that you are logging into IRIS in this container, use SuperUser/SYS. You will then be asked to change the password. If you have logged in before, for example through the Management Portal, then you changed the password already. Use your updated password now.
Execute the following command to load the file into IRIS:
USER>Do $System.OBJ.Load(“/samples/local/IrisDataset.cls”,”ck”)
You should see output about the above class file compiling and loading successfully. Once this code is loaded, execute the following commands to generate the data for the Iris dataset
USER>Set status = ##class(DataMining.IrisDataset).load()
USER>Write status
The output from the second command should be 1. The database now contains data for 150 examples of Iris flowers.
Launching Zeppelin and Configuring Our Notebook
First, download the Zeppelin notebook note available here: https://github.com/denelson/DevCommunity. The name of the note is “Machine Learning Hello World”.
You can open the Zeppelin notebook in your web browser using the following url:
http://localhost:8080
It looks something like this.
Click the “Import note” link and import “Machine Learning Hello World.json”.
The first code paragraph contains code that will load the InterSystems JDBC driver and Spark Connector. By default, Zeppelin notebooks provide the z variable for accessing Zeppelin context. See Zeppelin Context in the Zeppelin documentation.
%spark.dep
//z supplies Zeppelin context
z.reset()
z.load("/home/zeppelin/jars/JDK18/intersystems-jdbc-3.0.0.jar")
z.load("/home/zeppelin/jars/JDK18/intersystems-spark-1.0.0.jar")
Before running the paragraph, click the down arrow next to the word “anonymous” and then select “Interpreter”.
On the Interpreters page, search for spark, then click the restart button on the right-hand-side and then ok on the ensuing pop-up.
Now return to the Machine Learning Hello World notebook and run the paragraph by clicking the little arrow all the way at the right. You should see output similar to that in the following screen capture:
Connecting to IRIS and Exploring the Data
Everything is all configured. Now we can connect code running in the spark-zeppelin container to InterSystems IRIS, running in our iris-community container, and begin exploring the data we added earlier. The following Python code connects to InterSystems IRIS and reads the table of data that we loaded in an earlier step (DataMining.IrisDataset) and then displays the first ten rows.
Here are a couple of notes about the following code:
We need to supply a username and password to IRIS. Use the password that you provided in an earlier step when you logged into IRIS and were forced to change your password. I used SuperUser/SYS1
“iris” in the spark.read.format(“iris”) snippet is an alias for the com.intersystems.spark class, the spark connector.
The connection url, including “IRIS” at the start, specifies the location of the InterSystems IRIS default Spark master server
The spark variable points to the Spark session supplied by the Zeppelin Spark interpreter.
%pyspark
uname = "SuperUser"
pwd = "SYS1"
#spark session available by default through spark variable.
#URL uses the name of the container, iris-community, as the host name.
iris = spark.read.format("iris").option("url","IRIS://iris-community:51773/USER").option("dbtable","DataMining.IrisDataset").option("user",uname).option("password",pwd).load()
iris.show(10)
Note: For more information on configuring the Spark connection to InterSystems IRIS, see Using the InterSystems Spark Connector in the InterSystems IRIS documentation. For more information on the spark session and other context variables provided by Zeppelin, see SparkContext, SQLContext, SparkSession, ZeppelinContext in the Zeppelin documentation.
Running the above paragraph results in the following output:
Each row represents an individual flower and records its petal length and width, its sepal length and width, and the Iris species it belongs to.
Here is some SQL-esque code for further exploration:
%pyspark
iris.groupBy("Species").count().show()
Running the paragraph produces the following output:
So there are three different Iris species represented in the data. The data represents each species equally.
Using Python’s matplotlib library, we can even draw some graphs. Here is code to plot Petal Length vs. Petal Width:
%pyspark
%matplotlib inline
import matplotlib.pyplot as plt
#Retrieve an array of row objects from the DataFrame
items = iris.collect()
petal_length = []
petal_width = []
for item in items:
petal_length.append(item['PetalLength'])
petal_width.append(item['PetalWidth'])
plt.scatter(petal_width,petal_length)
plt.xlabel("Petal Width")
plt.ylabel("Petal Length")
plt.show()
Running the paragraph creates the following scatter plot:
Even to the untrained eye, it looks like there is a pretty strong correlation between Petal Width and Petal Length. We should be able to reliably predict petal length based on petal width.
A Little Machine Learning
Note: I copied the following code from my earlier playground article, cited above.
In order to predict petal length based on petal width, we need a model of the relationship between the two. We can create such a model very easily using Spark. Here is some code that uses Spark's linear regression API to train a regression model. The code does the following:
Creates a new Spark DataFrame containing the petal length and petal width columns. The petal width column represents the "features" and the petal length column represents the "labels". We use the features to predict the labels.
Randomly divides the data into training (70%) and test (30%) sets.
Uses the training dat to fit the linear regression model.
Runs the test data through the model and then displays the petal length, petal width, features, and predictions.
%pyspark
from pyspark.ml.regression import LinearRegression
from pyspark.ml.feature import VectorAssembler
# Transform the "Features" column(s) into the correct vector format
df = iris.select('PetalLength','PetalWidth')
vectorAssembler = VectorAssembler(inputCols=["PetalWidth"],
outputCol="features")
data=vectorAssembler.transform(df)
# Split the data into training and test sets.
trainingData,testData = data.randomSplit([0.7, 0.3], 0.0)
# Configure the model.
lr = LinearRegression().setFeaturesCol("features").setLabelCol("PetalLength").setMaxIter(10)
# Train the model using the training data.
lrm = lr.fit(trainingData)
# Run the test data through the model and display its predictions for PetalLength.
predictions = lrm.transform(testData)
predictions.show(10)
Running the paragraph results in the following output:
The Regression Line
The “model” is really just a regression line through the data. It would be nice to have the slope and y-intercept of that line. It would also be nice to be able to visualize that line superimposed on our scatter plot. The following code retrieves the slope and y-intercept from the trained model and then uses them to add a regression line to the scatter plot of the petal length and width data.
%pyspark
%matplotlib inline
import matplotlib.pyplot as plt
# retrieve the slope and y-intercepts of the regression line from the model.
slope = lrm.coefficients[0]
intercept = lrm.intercept
print("slope of regression line: %s" % str(slope))
print("y-intercept of regression line: %s" % str(intercept))
items = iris.collect()
petal_length = []
petal_width = []
petal_features = []
for item in items:
petal_length.append(item['PetalLength'])
petal_width.append(item['PetalWidth'])
fig, ax = plt.subplots()
ax.scatter(petal_width,petal_length)
plt.xlabel("Petal Width")
plt.ylabel("Petal Length")
y = [slope*x+intercept for x in petal_width]
ax.plot(petal_width, y, color='red')
plt.show()
Running the paragraph results in the following output:
What’s Next?
There is much, much more we can do. Obviously, we can load much larger and much more interesting datasets into IRIS. See, for example the Kaggle datasets (https://www.kaggle.com/datasets) With a fully licensed IRIS we could configure sharding and see how Spark running through the InterSystems Spark Connector takes advantage of the parallelism sharding offers. Spark, of course, provides many more machine learning and data analysis algorithms. It supports several different languages, including Scala and R. The article is considered as InterSystems Data Platform Best Practice. Hi,Your article helps me a lot.one question more: how can we get a fully licensed IRIS?It seems that there is no download page in the official site. Hi!You can request for a fully licensed IRIS on this pageIf you want to try or use IRIS features with IRIS Community Edition:Try IRIS onlineUse IRIS Community from DockerHub on your laptop as is, or with different samples from Open Exchange. Check how to use IRIS Docker image on InterSystems Developers video channel.or run Community on Express IRIS images on AWS, GCP or Azure.HTH