go to post Benjamin De Boe · Sep 12, 2017 Hi Robert,glad you liked Paul's announcement. Global Summit attendees can pre-register for the limited release program in the Tech Exchange area, at the central booths. After the summit, we'll gradually broaden that program and publish a Field Test closer to the end of this calendar year.You can find a lot more about the InterSystems IRIS Data Platform on our new website and through this resource guide at learning.intersystems.com. Stay tuned for more articles on the various new features here too. Thanks,benjamin
go to post Benjamin De Boe · Jul 3, 2017 Thanks John,indeed, you'd need a proper license in order to work with iKnow. If the method referred above would return 0, please contact your sales representative to request a temporary trial license and appropriate assistance for implementing your use case.Also, iKnow doesn't come as a separate namespace. You can create (regular) namespaces as you prefer and use them to store iKnow domain data. You may need to enable your web application for iKnow, which is disabled by default for security reasons in the same way DeepSee is. See this paragraph here for more details.
go to post Benjamin De Boe · Jun 26, 2017 Hi Eduard,you can define iFind indices for calculated fields, so if you point your field calculation to a function that strips out the HTML, you should be fine. The HTML converter in iKnow was built for a slightly different purpose, but can be used here:Property HtmlText As %String(MAXLEN="");Property PlainText As %String(MAXLEN="") [ Calculated, ReadOnly, SqlComputed, SqlComputeCode = { set {PlainText} = ##class(%iKnow.Source.Converter.Html).StripTags({HtmlText}) } ];Regards,benjamin
go to post Benjamin De Boe · May 8, 2017 Hi Evgeny,nice work!Maybe you can enhance the interface by also including an iKnow-based KPI to the dashboard exposing the similar or related entities for the concept clicked in the heat map. You can subclass this generic KPI and specify the query you want it to invoke, and then use it as the data source for a table widget. Let me know if I can help. thanks,benjamin
go to post Benjamin De Boe · Apr 19, 2017 After posting the initial article, I realized the sample code's use of ^CacheTemp.* globals implied a risk of iKnow.SyncedDefinition subclasses with the same name in different namespaces overwrite one another's data. The revised code now uses the namespace and domain ID as a subscript in ^CacheTemp, which should be safe.The update also fixes the sample table's CreateTime column to be of type %DeepSee.Datatype.dateTime rather than %Date.
go to post Benjamin De Boe · Apr 10, 2017 Cool stuff!I believe you're using matching dictionaries for identifying those sentiment markers, which is indeed convenient from an API perspective. However, you might want to take advantage of sentiment attributes, which will allow you to not just detect occurrences of your marker terms, but also which parts of the sentence they apply to. I'm not sure how that is covered in your current app (didn't dig that deep into the code), but especially in the recent versions that improved our attribute expansion accuracy, it may improve the precision of your application too. See this article for more details.Separately, leveraging domain definitions may also simplify the methods you're using to set up your domain. There's an option to load dictionary content from a table or file, leveraging <external> tags inside the <matching> section. It's not (yet) supported through the Architect, but you can add it when updating the class through Studio. Thanks for sharing this!benjamin
go to post Benjamin De Boe · Apr 3, 2017 Hi Max,the connector we're building is meant to be a smarter alternative to regular JDBC, pushing down filtering work from the Spark side to Caché SQL and leveraging parallelism where possible. So that means you can still use any Spark programming language (Scala, Java, Python or R) while enjoying the optimized connection. However, as it's an implementation of Spark's DataSource API, it's meant to go from Spark to "a data source" and not the other way round, i.e. submit a Spark job from Caché. On the other hand, that'd be something you could probably build without much effort through the Java Gateway. Do you have a particular example or use case in mind? Perhaps that would make an interesting code sample to post on the Developer Community. Thanks,benjamin
go to post Benjamin De Boe · Mar 10, 2017 Hi Andreas,we don't have a release date yet, but we'll certainly be demonstrating it at the Global Summit in September. If you are already using Spark in your organisation today and would be interested in seeing how it may help you make better use of the underlying Caché database, please drop me an email.Thanks,benjamin
go to post Benjamin De Boe · Mar 1, 2017 yes, the two-word feature called "executing COS" would probably be quite a step up. It was more a loose idea than something I've researched thoroughly, but maybe the authors of the Caché Web Terminal have some clues on how the connectivity should work (JDBC won't pull it).
go to post Benjamin De Boe · Feb 27, 2017 Nice article Andreas!Have you perhaps also looked into creating a more advanced interpreter, rather than just leveraging the JDBC one? I know that's probably a significantly more elaborate thing to do, but a notebook-style interface for well-documented scripting would nicely complement the application development focus of Atelier / Studio.Thanks,benjamin
go to post Benjamin De Boe · Jan 10, 2017 The $$$SIMSRCDOMENTS is much more restrictive and may not yield any results if your domain is small and sources are too far apart. I see results when trying it in the Aviation demo dataset. Note that you can loosen it by setting the "strict" parameter to 0 as described in the class ref.That third alternative you quoted has been deprecated and does not anything to the regular $$$SIMSRCSIMPLE option. You dug too deep in the code ;o) Regards,benjamin
go to post Benjamin De Boe · Jan 9, 2017 you are entirely correct. The separate MatchScore column is to accommodate methods where the score is more refined than the pure count-based one with $$$SIMSRCSIMPLE. With $$$SIMSRCDOMENT, dominance is accounted for in this metric and you'll see it'll differ from percentageMatched
go to post Benjamin De Boe · Jan 5, 2017 Hi Benjamin,The (patented) magic of iKnow is the way how it identifies concepts in sentences and happens in a library shipped as a binary, which we refer to as the iKnow engine and is used by both the iKnow APIs and iFind indices. Most of what happens with that engine's output is not nearly as much rocket science and as Eduard indicated, its COS source code can usually be consulted for clues on how it works if you're adventurous. The two options of the GetSimilar() query both work by looking at the top concepts of the reference source and look for other sources that have them as well, using frequency and dominance for weighting in-source relevance for the two options respectively. So not much rocket science and only support for full matches at this point. This said, iKnow offers you the building blocks to build much more advanced things, quite possibly inspired by your academical research, leveraging the concept level that is unique to iKnow in identifying what a text is really about. For example, you can build vectors containing entity frequency or dominance and look for cosine similarity in this vector space, or you can leverage topic modelling, but many of these will require quite a bit of computation and actual result quality may depend a bit on the nature of the texts you're dealing with, which is why we chose to stick to very simple things in the kit for now.However, you can find two (slightly) more advanced options in demos we have published online:The iKnow Investigator demo is part of your SAMPLES namespace and offers, next to the SourceAPI:GetSimilar() option, also an implementation that builds a dictionary from your reference source, matches it against your target sources and looks for the ones with the highest aggregated match score, which accounts for partial matches as well.In the iFind Search Portal demo, the bottom of the record viewer popup displays a list of similar records as well. This one is populated based on a bit of iFind juggling implemented in the Demo.SearchPortal.Utils class, leveraging full entity matches only, but easy to extend to weigh in words as well.In both cases, there's a myriad of options to refine these algorithms, but all at a certain compute cost, given the high dimensionality introduced by the iKnow entity (and actually even word) level. If you have further ideas or, better yet, sample code to achieve better similar document lists, we'd be thrilled to read about it here on the community ;o)Thanks,benjamin
go to post Benjamin De Boe · Dec 22, 2016 Hi Edward,the thing that comes closest here would be the %iKnow.Queries.SourceAPI:GetSimilar() query, which for a certain seed document, looks for the most similar ones in a domain, optionally constrained by a filter object. The results of that query include a figure like the one you're looking for, expressing how many entities were new in the seed document vs the corpus it's comparing against. Although that that particular calculation isn't available as an atomic function, a simple way to get to what you want would be to use the %iKnow.Filters.SourceIdFilter and just compare against an individual document.If you prefer to write more code :o), you can just look up the entities in the one document and compare them against those in the others through the %iKnow.Objects.EntityInSourceDetails SQL projection.Regards,benjamin
go to post Benjamin De Boe · Nov 3, 2016 I sometimes like to call myself Kyle on Thursdays that feel like Mondays ;o) Like any operator whose name starts with a %, it's an InterSystems-specific one, so both %CONTAINS and %FIND were added to the standard for specific purposes. %CONTAINS has been part of Caché for a long while and indeed offers a simple means to directly refer to your %Text field. So, you don't need to know the name of your index, but you do need to make sure your text field is of type %Text, so there's still a minimum of table metadata you need to be aware of.In the case of %FIND, we're actually leveraging the %SQL.AbstractFind infrastructure, which is a much more recent addition that allows leveraging code-based filters in SQL. In this case, the code-based filter implementation is provided by the %iFind infrastructure. Because this AbstractFind interface is, what's in a name, abstract, it needs a bit of direction in the form of the index name to wire it to the right implementation, which in our case comes as the index name. As the AbstractFind interface is expected to return a bitmap (for fast filtering), it also needs to be applied to an integer field, typically the IDKey of your class. So, while it offers a lot of programmatic flexibility to implement optimized filtering code (cf this article on spatial index), it does require this possibly odd-looking construct on the SQL side. But that's easily hidden behind a user interface, of course.
go to post Benjamin De Boe · Nov 3, 2016 A bit of a long read, but very nice illustration of how iKnow's bottom up approach allows you to work with the full concepts as coined by the author rather than a top-down approach relying on predefined lists of terms. If this is only their bachelor thesis, I'm looking forward to see their master thesis :-)Thanks for sharing Otto!
go to post Benjamin De Boe · Nov 3, 2016 What is the attribute of your texts that you want to categorize? When building TC models on top of iKnow domains, as Chip indicated, the easiest way is to do this based on a metadata field using the %LoadMetadataCategories method on your builder object. The more manual method is using these %AddCategory methods you're using, but they require a filter specification as the second argument (you passed "") to identify which records correspond to the category you're adding. That's what's making it a training set, a set of texts for which the outcome (category) is known, so the TC infrastructure can try to find corellations to exploit. Separately, 11 records is a very small training set. I would not expect to find much statistically relevant information to exploit. Even when you're building a rule-based model manually, it'll be barely enough to validate your model. So probably it's worth trying to get hold of (much) more data to start with.I've uploaded a tutorial we prepared for a Global Summit academy session on the subject in a separate thread (could not attach it to an existing one). That may give you a better idea of what the infrastructure can be used for and also takes you through the GUI, which may be more practical than the code-based approach you're referring to.
go to post Benjamin De Boe · Aug 10, 2016 This is due to a logging issue that has been fixed in 2016.2 and should also be included in a future maintenance release of 2016.1
go to post Benjamin De Boe · Aug 10, 2016 Indeed. A fix has just been pushed to the GitHub repo.Thanks for reporting!