A bit of a long read, but very nice illustration of how iKnow's bottom up approach allows you to work with the full concepts as coined by the author rather than a top-down approach relying on predefined lists of terms. If this is only their bachelor thesis, I'm looking forward to see their master thesis :-)

Thanks for sharing Otto!

What is the attribute of your texts that you want to categorize? When building TC models on top of iKnow domains, as Chip indicated, the easiest way is to do this based on a metadata field using the %LoadMetadataCategories method on your builder object. The more manual method is using these %AddCategory methods you're using, but they require a filter specification as the second argument (you passed "") to identify which records correspond to the category you're adding. That's what's making it a training set, a set of texts for which the outcome (category) is known, so the TC infrastructure can try to find corellations to exploit. 

Separately, 11 records is a very small training set. I would not expect to find much statistically relevant information to exploit. Even when you're building a rule-based model manually, it'll be barely enough to validate your model. So probably it's worth trying to get hold of (much) more data to start with.

I've uploaded a tutorial we prepared for a Global Summit academy session on the subject in a separate thread (could not attach it to an existing one). That may give you a better idea of what the infrastructure can be used for and also takes you through the GUI, which may be more practical than the code-based approach you're referring to.

It's indeed tempting to just stuff interfaces like this in the kit, but it goes a bit beyond the objectives of pure system management interfaces that we'd typically pack with Caché. Also, in the specific case of this Dictionary Builder demo, it uses the programmatic APIs to create dictionaries (requiring allowCustomUpdates=true) and does not update the domain definition itself. We're actually working on making that a smoother process, so when that gets to a point where it can support the interactions implemented in this GUI (and when AngularJS becomes part of our kit), we can reconsider it.

Hi Terri,

[a little late perhaps (and we spoke briefly in Phoenix as well), but for the sake of completeness, here's your response]

The iKnow Architect page only shows domains based on domain definitions, subclasses of %iKnow.DomainDefinition. The ones created "the old way" (programmatically) through the %iKnow.Domain class do not carry the declarative information about where to load data from, hence the majority of the Architect GUI wouldn't be available on those.

As to the metadata loading question: I'm not sure which part of the script you're referring to was taking care of that metadata loading, but maybe this comes back to your question in a separate thread.

regards,

benjamin

Hi Orion,

I haven't heard about index data simply disappearing. There should be no reason for that other than calls to %PurgeIndices() or manually dropping the globals containing the data (including ^ISC.IF* ones containing entities and words shared across the namespace). Another potential issue may arise when importing (at a global level) only either the index or the shared data, resulting in them no longer being in sync. Any chance any of that could have happened?

For the missing class, that might be due to a class import as well, after which not all related classes were recompiled to reflect the changes.

Which version are you working on? Some of those recompiling issues may have been addressed in recent versions.

Perhaps the WRC is a better place to get the appropriate follow-up for specific issues like this one. 

regards,
benjamin

Hi Benjamin,

the default algorithm indeed won't return scores for each record, but will only make the calculation for all records that contain at least a decent number of entities that are relevant in the source document. You can indeed simply approximate the other documents' score by taking 0.

For your specific use case, you may want to take a look at the text categorization infrastructure. I've posted a tutorial on the topic here.

regards,
benjamin

Hi Julie,

For XEP, the XEP guide in the product documentation is probably the best starting point. For iKnow, you can take a look at this video playlist introducing the technology. 

As you may know, InterSystems is also developing a new platform specifically aimed at big data use cases. Part of this new platform will be support for the UIMA standard, as a broader framework for dealing with unstructured data than iKnow's natural language processing alone, allowing you to combine it with third-party or custom utilities. Please send me an email if you'd like to discuss your big data project in more detail.

 

thanks,
benjamin

Hi Jack,

there's no need to normalize your search strings, as it's take care of automatically as part of executing your search when appropriate.

When you use DELETE FROM in SQL, or ##class(Your.Table).%DeleteExtent() in COS, the associated iFind indices' data will be erased as well. To drop just the indices data, use ##class(Your.Table).%PurgeIndices() (cf class ref for refinements). Note that, unless you are using index-local storage (new feature in 2016.1), the words and entities tables will not be wiped as they are shared between all iFind indices in your namespace (somewhat conserving space and indexing efficiency).

iFind can calculate a score representing how well a record satisfies a search string, largely based on TFIDF (although it'll leverage the more refined dominance scores for entities when it can). This is also new in 2016.1. See https://community.intersystems.com/code/ifind-search-portal for an example.

 

regards,
benjamin

Hi Evgeny, Jack,

 

Ranking is new in 2016.1, and will indeed allow you to retrieve a score expressing how well a record matches a search string. A packagename.tablename_indexnameRank function gets automatically generated when  you compile your class with an iFind index and can be invoked as follows:

SELECT %ID, 
Title,
FullText,
SomePackage.TheTable_MyIndexRank(%ID, 'cocktail* OR (hammock AND NOT bees)')
FROM SomePackage.TheTable
WHERE %ID %FIND search_index(MyIndex, 'cocktail* OR (hammock AND NOT bees)')
ORDER BY 4 DESC

There are no public demo servers exposing this functionality at this time.

 

regards,
benjamin

Hi Benjamin,

If you just want a SQL prompt, you can open one from the COS prompt by calling "do $system.SQL.Shell()", or use the SQL page in the system management portal which you can find under the "system exploration" menu.

The SQL lister and loader functionality is meant to populate your domain (rather than query it), but should no longer be invoked directly. Managing a domain can be taken care of much more easily through domain definitions, which can be configured through the iKnow Architect as from 2016.1. But I see you're already using one, otherwise you wouldn't have seen that error message (which BTW informs you that this domain definition is configured not to allow any build/config operation other than through the domain definition itself, which is the default setting for domain defs).

 

If you want to achieve the same result through COS:

write ##class(%iKnow.Queries.EntityAPI).GetTop(.result, domainID)

zwrite result

 

regards,

benjamin

Hi Benjamin (sounds like a conversation amongst just Benjamins now!),

 

The knowledge portal demonstration interface you find in the %iKnow.UI package (which gets a significant visual overhaul in 2016.3) is written using InterSystems' Zen technology, a web development framework that helps you combine client-side JavaScript and server-side Caché ObjectScript to build web applications. If you're good with PHP and/or JavaScript, there's no strict need to dig into Zen to build an iKnow-powered application. You can either use ODBC to connect to Caché and use SQL as in the above examples to query an iKnow domain, or you can build a simple REST service on top of iKnow (in Caché ObjectScript) and query that from your PHP/JavaScript code. We'll be releasing an out-of-the-box REST interface with 2016.3, but it's no rocket science to build one that fits your needs on earlier versions. If you already have an ISC sales engineering contact (none of them called Benjamin, unfortunately ;o) ), we can work together to get you up and running.

 

FYI, this github repo contains a simple iKnow demo application written with AngularJS and a REST interface. It's technically speaking a CSP page (yet another ISC web technology at a lower level than Zen), but could have been a straight HTML page.

 

 

Regards,

Benjamin

Hi Benjamin,

 

if you're familiar with Caché ObjectScript, that's the easiest way to work with iKnow. For example, the script below will add two short "sources" (documents) to your domain and then query the top concepts:

set domainID = 1, domainName = $system.iKnow.GetDomainName(domainID)

write $system.iKnow.IndexString(domainName, "123", "This is a first piece of text to be added to your iKnow domain!")

write $system.iKnow.IndexString(domainName, "234", "This is the second piece of text to be added to your iKnow domain! And guess what, it's an even more inspirational one!")

write ##class(%iKnow.Queries.EntityAPI).GetTop(.result, domainID)

zwrite result

 

For a good start with iKnow, take a look at this iKnow video and the next ones in the iKnow playlist.

 

If you prefer to work with SQL and loaded your domain through the iKnow Architect in the management portal, you can invoke those same query APIs through either of the following calls (for domain ID = 1):

CALL %iKnow_Queries.EntityQAPI_GetTop(1)

SELECT * FROM %iKnow_Queries.EntityQAPI_GetTop(1)

Hi Benjamin,

in order to enable a web application to use iKnow, you need to check the "iKnow" box in the SMPs Web Application management page (System Administration > Security > Applications > Web Applications). This was mentioned in the release notes of the first version introducing the stricter security policies (or at least the routine behind the checkbox is), but isn't mentioned prominently enough in the iKnow guide. We'll look into that.

This is actually only related to the web interfaces, so Atelier is not involved here. To create iKnow domain definitions through the management portal, look for the "iKnow Architect" in the SMP menu for iKnow.

 

regards,

benjamin

Hi Jack,

this is not an out-of-the-box feature of the iKnow technology. iKnow's semantic analysis is targeted at identifying the semantic entities of a sentence, but not at interpreting them, which is typically an application-specific activity. However, we do have some building blocks that will help you create such applications, combining the iKnow analysis of a sentence with domain knowledge you already have. If you look at the indexing results for such a sentence, you'll see that the entities iKnow identifies will usually already present a good structure for your sentence, and human questions are often not that complicated. However, if the database you'll be querying is just un-interpreted free text as well, you'll need much more magic. If you're looking at querying a well-known data structure, it's much more feasible. I once wrote a crude text-to-MDX query tool that translated natural language questions into MDX by matching the concepts in the question to the labels on the dimensions and measures of a DeepSee cube definition. In this case, iKnow played its part in decomposing the question into concepts and relationships, which were then easily "interpreted" by custom code as cube elements and MDX constructs. 

So, in short, iKnow will help you in the semantic analysis of natural language text, but depending on the complexity of the domain, more dedicated (and expensive) tools are usually needed for the subsequent interpretation and inference of results.

 

benjamin

Hi Ben,

thanks for your reply, but that's what I tested first, but didn't seem to work, maybe because it somehow still needs the CSP file to be in the install/CSP/xyz/ folder, where it still only is in install/CSP/abc/. I also tried adding a web app /csp/xyz/test/ that referred to the abc folder and xyz namespace, but that was probably too optimistic (or messy).