or leveraging pyspark, as explained in this article
- Log in to post comments
or leveraging pyspark, as explained in this article
Yes, we maintain an adoption guide that covers exactly that purpose. In order to be able to properly follow up on questions you'd have, we're making it available through your technical account team (sales engineer or TAM) rather than ship it with the product.
Hi Eduard,
I believe this article describes what you're looking for: https://community.intersystems.com/post/keeping-your-iknow-domain-synchronized
Thanks,
Benjamin
Hi Eduard,
for this sort of querying (and many other uses outside straight API calls), you can use the SQL projections generated for your domain, as documented in %iKnow.Tables.Utils. That'll generate a column for your Views metadata field on the table containing Source information, which you can then join to the Part (entity occurrence) table to filter the ones containing the requested entity.
Hope this helps,
benjamin
Hi Eduard,
looking at your code, there seem to be a few small things that may each contribute to not seeing the results you were expecting:
Hope this helps,
benjamin
OK, thanks for the feedback. We're indeed looking into those additional windowing functions to go beyond our %FOREACH SQL extension, but it's not (yet) on the short-term agenda. Customer demand like yours of course helps us properly prioritize what should go on there.
We currently don't support analytic windowing functions (PARTITION BY syntax), but have been looking into it for a future release. MATCH_RECOGNIZE is certainly one of the more advanced ones in that bucket. Is this the very one you would need or do you have scenarios that would be served by core windowing functionality, excluding the pattern matching piece?
Or is it the pattern matching and not as much the windowing you're looking for?
I'm afraid we don't support the SQL PIVOT command, so unless you can enumerate the response codes as columns explicitly, you can only organise them as rows. If you control the application code, you could of course first have a query selecting all response codes and then generating the lengthy SQL call that includes separate columns for each response code. Something like SUM(CASE bRecord.ResponseCode WHEN 'response code 1' THEN 1 ELSE 0 END) AS ResponseCode1Count should work fairly well.
Horita-san,
The SetParameter() method requires your domain has an ID assigned, which it gets automatically as soon as you call %Save() a first time. Note that it should be returning an error in the sequence of commands you pasted, but it went unnoticed because of the do syntax.
In general, it should be more convenient to work with Domain Definitions rather than the %iKnow.Domain API directly.
Thanks,
benjamin
Hi Dmitry,
Zen is indeed no longer a central piece of our application development strategy. We'll support it for some time to come (your Zen app still works on IRIS), but our focus is on providing a fast and scalable data management platform rather than GUI libraries. In that sense, you may already have noticed that recent courses we published on the topic of application development focus on leveraging the right technologies to connect to the backend (i.e. REST) and suggest using best-of-breed third-party technologies (i.e. Angular) for web development.
InterSystems IRIS is a new product where we're taking advantage of our Caché & Ensemble heritage. It's meant to address today's challenges when building critical applications and we've indeed leveraged a number of capabilities from those products, but also added a few significant new ones like containers, cloud & horizontal scalability. We'll be providing an overview of elements to check for Caché & Ensemble customers that would like to migrate to InterSystems IRIS shortly (i.e. difference in supported platforms), but please don't consider this as merely an upgrade. You may already have noticed the installer doesn't support upgrading anyhow.
Thanks,
benjamin
Hi Robert,
DocBook has now moved fully online, which is what the mgmt portal will link to: http://docs.intersystems.com/iris
SAMPLES included quite a few outdated examples and was also not appropriate for many non-dev deployments, so we've also moved to a different model there, posting the most relevant ones on GitHub, giving us more flexibility to provide updates and new ones: https://github.com/intersystems?q=samples
JDBC driver: to what extent is this different from the past? It's always just been available as a jarfile, as is customary for JDBC drivers. We do hope to be able to post it through Maven repositories in the near future though.
Small icons: yeah, to make our installer and (more importantly) the container images more lightweight, we had to economize on space. Next to the removal of DocBook and Samples, using smaller icons also reduces the size in bytes ;) ;)
InterSystems IRIS is giving us the opportunity to adopt a contemporary deployment model, where we were somewhat restricted by long-term backwards compatibility commitments with Caché & Ensemble. Some of these will indeed catch your eye and might even feel a little strange at first, but we really believe the new model makes developing and deploying applications easier and faster. Of course, we're open to feedback on all of these evolution and this is a good channel to hear from you.
Thanks!
benjamin
With the release of InterSystems IRIS, we're also publishing a few new great online courses on the technologies that come with it. That includes two on sharding, so don't hesitate to check them out!
If you have a global structure that you mapped a class to afterwards, that data is already in one physical database and therefore not sharded or shardable. Sharding really is a layer in between your SQL accesses and the physical storage and it expects you not to touch that physical storage directly. So yes you can still picture how that global structure looks like and under certain circumstances (and when we're not looking ;-) ) read from those globals, but new records have to go through INSERT statements (or %New in a future version), but can never go against the global directly.
We currently only support sharding for %CacheStorage. There's been so many improvements in that model over the past 5-10 years that there aren't many reasons left to choose %CacheSQLStorage for new SQL/Object development. The only likely reason would be that you still have legacy global structures to start from, but as explained above, that's not a scenario we can support with sharding. Maybe a nice reference in this context is that of one of our early adopters who was able to migrate their existing SQL-based application to InterSystems IRIS in less than a day without any code changes, so they could use the rest of the day to start sharding a few of their tables and were ready to scale before dinner, so to speak.
Hi Warlin,
I'm not sure whether you have something specific in mind, but it sort of works the other way around. You shard a table and, under the hood, invisible to application code, the table's data gets distributed to globals in the data shards. You cannot shard globals.
thanks,
benjamin
Hi Herman,
We're supporting SQL only in this first release, but are working hard to add Object and other data models in the future. Sharding any globals is unfortunately not possible as we need some level of abstraction (such as SQL tables or Objects) to hook into in order to automate the distribution of data and work to shards. This said, if your SQL (or soon Object) based application has the odd direct global reference to a "custom" global (not related to a sharded table), we'll still support that by just mapping those to the shard master database.
Thanks,
benjamin
iKnow was written to analyze English rather than ObjectScript, so you may see a few odd results coming out of code blocks. I believe you can add a where clause excluding those records from the block table to avoid them.
I've had something simple running on my laptop already a long time ago, but the internal discussion on how to package it proved a little more complicated. Among other things, an iFind index requires an iKnow-enabled license (and more space!), which meant you couldn't simply include it in every kit.
Also, for the ranking of docbook results, applying proper weights based on the type of content (title / paragraph / sample / ...) was at least as important as the text search capabilities themselves. That latter piece has been well-addressed in 2017.1, so docbook search is in pretty good shape now. Blending in an easily-deployable iFind option as Konstantin published can only add to this!
Thanks,
benjamin
Hi Steve,
hadn't seen this question until just now, but I have to admit we're a bit storage-hungry with iKnow. If you generate the default full set of indices, for a moderately-sized domain you'll need up to 25x the original dataset size measured as raw text to fit everything. This can drop to half that size (12x) if you forsake all non-essential indices, but that will prevent a number of queries from running smoothly or, in some cases, disable them completely.
For iFind, the numbers are dependent on the type of index. Count on factors 2x, 7x and 15x for Basic, Semantic and Analytic indices, respectively. Of course there's a difference in functionality between all these options and it's best to start from a set of functional requirements and then look at which particular approach covers those.
These numbers are somewhat conservative maximums and, as Eduard already suggested, you may see different (lower) numbers depending on the nature of your data. A more detailed sizing guide is available on request.
Thanks,
benjamin
Hi Konstantin,
thanks for sharing your work, a nice application of iFind technology! If I can add a few ideas to make this more lightweight:
thanks,
benjamin
Thanks John,
indeed, you'd need a proper license in order to work with iKnow. If the method referred above would return 0, please contact your sales representative to request a temporary trial license and appropriate assistance for implementing your use case.
Also, iKnow doesn't come as a separate namespace. You can create (regular) namespaces as you prefer and use them to store iKnow domain data. You may need to enable your web application for iKnow, which is disabled by default for security reasons in the same way DeepSee is. See this paragraph here for more details.
Hi Eduard,
you can define iFind indices for calculated fields, so if you point your field calculation to a function that strips out the HTML, you should be fine. The HTML converter in iKnow was built for a slightly different purpose, but can be used here:
Property HtmlText As %String(MAXLEN="");
Property PlainText As %String(MAXLEN="") [ Calculated, ReadOnly, SqlComputed, SqlComputeCode = { set {PlainText} = ##class(%iKnow.Source.Converter.Html).StripTags({HtmlText}) } ];
Regards,
benjamin
Hi Evgeny,
nice work!
Maybe you can enhance the interface by also including an iKnow-based KPI to the dashboard exposing the similar or related entities for the concept clicked in the heat map. You can subclass this generic KPI and specify the query you want it to invoke, and then use it as the data source for a table widget. Let me know if I can help.
thanks,
benjamin
After posting the initial article, I realized the sample code's use of ^CacheTemp.* globals implied a risk of iKnow.SyncedDefinition subclasses with the same name in different namespaces overwrite one another's data. The revised code now uses the namespace and domain ID as a subscript in ^CacheTemp, which should be safe.
The update also fixes the sample table's CreateTime column to be of type %DeepSee.Datatype.dateTime rather than %Date.
Cool stuff!
I believe you're using matching dictionaries for identifying those sentiment markers, which is indeed convenient from an API perspective. However, you might want to take advantage of sentiment attributes, which will allow you to not just detect occurrences of your marker terms, but also which parts of the sentence they apply to. I'm not sure how that is covered in your current app (didn't dig that deep into the code), but especially in the recent versions that improved our attribute expansion accuracy, it may improve the precision of your application too. See this article for more details.
Separately, leveraging domain definitions may also simplify the methods you're using to set up your domain. There's an option to load dictionary content from a table or file, leveraging <external> tags inside the <matching> section. It's not (yet) supported through the Architect, but you can add it when updating the class through Studio.
Thanks for sharing this!
benjamin
Hi Max,
the connector we're building is meant to be a smarter alternative to regular JDBC, pushing down filtering work from the Spark side to Caché SQL and leveraging parallelism where possible. So that means you can still use any Spark programming language (Scala, Java, Python or R) while enjoying the optimized connection. However, as it's an implementation of Spark's DataSource API, it's meant to go from Spark to "a data source" and not the other way round, i.e. submit a Spark job from Caché. On the other hand, that'd be something you could probably build without much effort through the Java Gateway. Do you have a particular example or use case in mind? Perhaps that would make an interesting code sample to post on the Developer Community.
Thanks,
benjamin
Hi Andreas,
we don't have a release date yet, but we'll certainly be demonstrating it at the Global Summit in September. If you are already using Spark in your organisation today and would be interested in seeing how it may help you make better use of the underlying Caché database, please drop me an email.
Thanks,
benjamin
yes, the two-word feature called "executing COS" would probably be quite a step up. It was more a loose idea than something I've researched thoroughly, but maybe the authors of the Caché Web Terminal have some clues on how the connectivity should work (JDBC won't pull it).
Nice article Andreas!
Have you perhaps also looked into creating a more advanced interpreter, rather than just leveraging the JDBC one? I know that's probably a significantly more elaborate thing to do, but a notebook-style interface for well-documented scripting would nicely complement the application development focus of Atelier / Studio.
Thanks,
benjamin
The $$$SIMSRCDOMENTS is much more restrictive and may not yield any results if your domain is small and sources are too far apart. I see results when trying it in the Aviation demo dataset. Note that you can loosen it by setting the "strict" parameter to 0 as described in the class ref.
That third alternative you quoted has been deprecated and does not anything to the regular $$$SIMSRCSIMPLE option. You dug too deep in the code ;o)
Regards,
benjamin
you are entirely correct.
The separate MatchScore column is to accommodate methods where the score is more refined than the pure count-based one with $$$SIMSRCSIMPLE. With $$$SIMSRCDOMENT, dominance is accounted for in this metric and you'll see it'll differ from percentageMatched