Replies by Benjamin De Boe for InterSystems Developer Community

Benjamin De Boe · Jul 30, 2018

Note that in InterSystems IRIS 2018.2, you'll be able to save a PMML model straight into InterSystems IRIS from SparkML, through a simple iscSave() method we added to the PipelineModel interface. You can already try it for yourself in the InterSystems IRIS Experience using Spark.

Also, besides this point-and-click batch test page, you can invoke PMML models stored in IRIS programmatically from your applications and workflows as explained in the documentation. We have a number of customers using it in production, for example to score patient risk models for current inpatient lists at HBI Solutions.

Benjamin De Boe · Jul 23, 2018

or leveraging pyspark, as explained in this article

Benjamin De Boe · Jul 6, 2018

Yes, we maintain an adoption guide that covers exactly that purpose. In order to be able to properly follow up on questions you'd have, we're making it available through your technical account team (sales engineer or TAM) rather than ship it with the product.

Benjamin De Boe · Jun 20, 2018

Hi Eduard,

I believe this article describes what you're looking for: https://community.intersystems.com/post/keeping-your-iknow-domain-synchronized

Thanks,

Benjamin

Benjamin De Boe · Jun 18, 2018

Hi Eduard,

for this sort of querying (and many other uses outside straight API calls), you can use the SQL projections generated for your domain, as documented in %iKnow.Tables.Utils. That'll generate a column for your Views metadata field on the table containing Source information, which you can then join to the Part (entity occurrence) table to filter the ones containing the requested entity.

Hope this helps,
benjamin

Benjamin De Boe · Jun 14, 2018

Hi Eduard,

looking at your code, there seem to be a few small things that may each contribute to not seeing the results you were expecting:

the MaintenanceAPI:GetBlackListElements() call returns its results as result(n) = $lb(id, string) with n just an incrementing integer representing the row number. At the other end, the ContainsEntityFilter expects array(string) or a $listbuild(string1, string2, ...). So your filter might be selecting sources containing the strings "1", "2", etc
SourceAPI:GetByDomain() returns result(n) = $lb(sourceID, externalID). That source ID is an internally generated integer ID that has no links to your source table Text.Data. The external ID is typically composed of what you selected as group field and identifier field when loading from a SQL table. So depending on how you set up your domain, that may indeed be the ID field of your Text.Data table. It looks like you have the "simple external IDs" feature switched on, which is why your external IDs only consist of the identifier field, making things indeed easier (but usually only useful/safe when loading from a single table!). Note that this is slightly different for DeepSee-managed domains, where the source ID equals the external ID and corresponds to DeepSee's fact ID, but ignore this confusing comment when not using DeepSee
Finally, and likely irrelevant, you're passing in $$$YES when initializing filterNot. I'm not sure where you're loading that macro from, but that should be a %Boolean with a value of 1 to work as expected, where a string value would translate to a %Boolean with value 0.

Hope this helps,

benjamin

Benjamin De Boe · May 4, 2018

OK, thanks for the feedback. We're indeed looking into those additional windowing functions to go beyond our %FOREACH SQL extension, but it's not (yet) on the short-term agenda. Customer demand like yours of course helps us properly prioritize what should go on there.

Benjamin De Boe · May 4, 2018

We currently don't support analytic windowing functions (PARTITION BY syntax), but have been looking into it for a future release. MATCH_RECOGNIZE is certainly one of the more advanced ones in that bucket. Is this the very one you would need or do you have scenarios that would be served by core windowing functionality, excluding the pattern matching piece?

Or is it the pattern matching and not as much the windowing you're looking for?

Benjamin De Boe · May 4, 2018

I'm afraid we don't support the SQL PIVOT command, so unless you can enumerate the response codes as columns explicitly, you can only organise them as rows. If you control the application code, you could of course first have a query selecting all response codes and then generating the lengthy SQL call that includes separate columns for each response code. Something like SUM(CASE bRecord.ResponseCode WHEN 'response code 1' THEN 1 ELSE 0 END) AS ResponseCode1Count should work fairly well.

Benjamin De Boe · Mar 27, 2018

Horita-san,

The SetParameter() method requires your domain has an ID assigned, which it gets automatically as soon as you call %Save() a first time. Note that it should be returning an error in the sequence of commands you pasted, but it went unnoticed because of the do syntax.

In general, it should be more convenient to work with Domain Definitions rather than the %iKnow.Domain API directly.

Thanks,
benjamin

Benjamin De Boe · Feb 1, 2018

Hi Dmitry,

Zen is indeed no longer a central piece of our application development strategy. We'll support it for some time to come (your Zen app still works on IRIS), but our focus is on providing a fast and scalable data management platform rather than GUI libraries. In that sense, you may already have noticed that recent courses we published on the topic of application development focus on leveraging the right technologies to connect to the backend (i.e. REST) and suggest using best-of-breed third-party technologies (i.e. Angular) for web development.

InterSystems IRIS is a new product where we're taking advantage of our Caché & Ensemble heritage. It's meant to address today's challenges when building critical applications and we've indeed leveraged a number of capabilities from those products, but also added a few significant new ones like containers, cloud & horizontal scalability. We'll be providing an overview of elements to check for Caché & Ensemble customers that would like to migrate to InterSystems IRIS shortly (i.e. difference in supported platforms), but please don't consider this as merely an upgrade. You may already have noticed the installer doesn't support upgrading anyhow.

Thanks,
benjamin

Benjamin De Boe · Feb 1, 2018

Hi Robert,

DocBook has now moved fully online, which is what the mgmt portal will link to: http://docs.intersystems.com/iris

SAMPLES included quite a few outdated examples and was also not appropriate for many non-dev deployments, so we've also moved to a different model there, posting the most relevant ones on GitHub, giving us more flexibility to provide updates and new ones: https://github.com/intersystems?q=samples

JDBC driver: to what extent is this different from the past? It's always just been available as a jarfile, as is customary for JDBC drivers. We do hope to be able to post it through Maven repositories in the near future though.

Small icons: yeah, to make our installer and (more importantly) the container images more lightweight, we had to economize on space. Next to the removal of DocBook and Samples, using smaller icons also reduces the size in bytes ;) ;)

InterSystems IRIS is giving us the opportunity to adopt a contemporary deployment model, where we were somewhat restricted by long-term backwards compatibility commitments with Caché & Ensemble. Some of these will indeed catch your eye and might even feel a little strange at first, but we really believe the new model makes developing and deploying applications easier and faster. Of course, we're open to feedback on all of these evolution and this is a good channel to hear from you.

Thanks!
benjamin

Benjamin De Boe · Jan 31, 2018

With the release of InterSystems IRIS, we're also publishing a few new great online courses on the technologies that come with it. That includes two on sharding, so don't hesitate to check them out!

Benjamin De Boe · Jan 12, 2018

If you have a global structure that you mapped a class to afterwards, that data is already in one physical database and therefore not sharded or shardable. Sharding really is a layer in between your SQL accesses and the physical storage and it expects you not to touch that physical storage directly. So yes you can still picture how that global structure looks like and under certain circumstances (and when we're not looking ;-) ) read from those globals, but new records have to go through INSERT statements (or %New in a future version), but can never go against the global directly.

We currently only support sharding for %CacheStorage. There's been so many improvements in that model over the past 5-10 years that there aren't many reasons left to choose %CacheSQLStorage for new SQL/Object development. The only likely reason would be that you still have legacy global structures to start from, but as explained above, that's not a scenario we can support with sharding. Maybe a nice reference in this context is that of one of our early adopters who was able to migrate their existing SQL-based application to InterSystems IRIS in less than a day without any code changes, so they could use the rest of the day to start sharding a few of their tables and were ready to scale before dinner, so to speak.

Benjamin De Boe · Jan 12, 2018

Hi Warlin,

I'm not sure whether you have something specific in mind, but it sort of works the other way around. You shard a table and, under the hood, invisible to application code, the table's data gets distributed to globals in the data shards. You cannot shard globals.

thanks,
benjamin

Benjamin De Boe · Jan 11, 2018

Hi Herman,

We're supporting SQL only in this first release, but are working hard to add Object and other data models in the future. Sharding any globals is unfortunately not possible as we need some level of abstraction (such as SQL tables or Objects) to hook into in order to automate the distribution of data and work to shards. This said, if your SQL (or soon Object) based application has the odd direct global reference to a "custom" global (not related to a sharded table), we'll still support that by just mapping those to the shard master database.

Thanks,
benjamin

Benjamin De Boe · Sep 29, 2017

iKnow was written to analyze English rather than ObjectScript, so you may see a few odd results coming out of code blocks. I believe you can add a where clause excluding those records from the block table to avoid them.

Benjamin De Boe · Sep 21, 2017

I've had something simple running on my laptop already a long time ago, but the internal discussion on how to package it proved a little more complicated. Among other things, an iFind index requires an iKnow-enabled license (and more space!), which meant you couldn't simply include it in every kit.

Also, for the ranking of docbook results, applying proper weights based on the type of content (title / paragraph / sample / ...) was at least as important as the text search capabilities themselves. That latter piece has been well-addressed in 2017.1, so docbook search is in pretty good shape now. Blending in an easily-deployable iFind option as Konstantin published can only add to this!

Thanks,
benjamin

Benjamin De Boe · Sep 19, 2017

Hi Steve,

hadn't seen this question until just now, but I have to admit we're a bit storage-hungry with iKnow. If you generate the default full set of indices, for a moderately-sized domain you'll need up to 25x the original dataset size measured as raw text to fit everything. This can drop to half that size (12x) if you forsake all non-essential indices, but that will prevent a number of queries from running smoothly or, in some cases, disable them completely.

For iFind, the numbers are dependent on the type of index. Count on factors 2x, 7x and 15x for Basic, Semantic and Analytic indices, respectively. Of course there's a difference in functionality between all these options and it's best to start from a set of functional requirements and then look at which particular approach covers those.

These numbers are somewhat conservative maximums and, as Eduard already suggested, you may see different (lower) numbers depending on the nature of your data. A more detailed sizing guide is available on request.

Thanks,
benjamin

Benjamin De Boe · Sep 18, 2017

Hi Konstantin,

thanks for sharing your work, a nice application of iFind technology! If I can add a few ideas to make this more lightweight:

Rather than creating a domain programmatically, the recommended approach for a few versions now has been to use Domain Definitions. They allow you to declare a domain in an XML format (not much unlike the %Installer approach) and avoid a number of inconveniences in managing your domain in a reproducible way.
From reading the article, I believe you're just using the iKnow domain for that one EntityAPI:GetSimilar() call to generate search suggestions. iFind has a similar feature, also exposed through SQL, through %iFind.FindEntities() and %iFind.FindWords(), depending on what kind of results you're looking for. See also this iFind demo. With that in place, you may even be able to skip those domains altogether :-)

thanks,
benjamin