iKnow demo apps (part 3) - Dictionary Builder Demo

Article

Benjamin De Boe · Jun 14, 2016 5m read

Open Exchange

#InterSystems Natural Language Processing (NLP, iKnow)

This is the third article in a series on iKnow demo applications, showcasing how the concepts and context provided through iKnow's unique bottom-up approach can be used to implement relevant use cases and help users be more productive in their daily tasks. Previous articles discussed the Knowledge Portal, a straightforward tool to browse iKnow indexing results, and the Set Analysis Demo, in which you can use the output of iKnow indexing to organize your texts according to their content, such as in patient cohort selection.

This week, we'll look into another demo application, the Dictionary Builder demo, in which we'll marry iKnow's bottom-up insights with top-down expertise, organizing our domain knowledge into dictionaries that are composed of the actual terms used in the data itself. Sticking to a top-down approach only, you'd risk missing out on some terminology used in the field that a domain expert sitting in his office wouldn't be aware of.

Installation and setup

This demo is available for download from a GitHub repository, where you can either check it out using your preferred Git client, or download it as a zip outright. You should be able to import and compile all classes in any release of Caché starting with 2015.2.

Like the Set Analysis demo, the application consists of a single (CSP) page with AngularJS-infused HTML and associated JavaScript code that calls a REST interface. To make the application work, you'll need to set up a web application that redirects to the Demo.DictionaryBuilder.RestHandler class, either manually or through the Demo.DictionaryBuilder.Utils:CreateRestWebApp() method included in the package.

While this demo is meant to work with any domain, that domain needs to be editable in order to save your work. Domains managed by domain definition classes by default can only be updated through that class and will refuse attempts made by the demo GUI, as described in this article. You can enable editing the domain by setting the top-level allowCustomUpdates attribute to true in the XML. For the screenshots below, we used (somewhat unpolished) data from electronic health records, starting from a simple domain definition such as this one:

<domain name="NETS data" allowCustomUpdates="true">
 <data>
  <table tableName="nets.data" idField="ID" groupField="mrn" dataFields="txt" />
 </data>
 <matching dropBeforeBuild="false">
  <dictionary name="Tumor types">
   <item name="SCC" uri=":tumor:scc">
    <term string="squamous cell carcinoma" />
    <term string="SCC" />
   </item>
   <item name="sarcoma" uri=":tumor:sarcoma">
    <term string="sarcoma" />
   </item>
  </dictionary>
  <dictionary name="Procedures">
  </dictionary>
 </matching>
 <blacklist name="Patient Noise">
 </blacklist>
</domain>

The idea of the app is to build a dictionary of real-world terminology that helps us organize and highlight the EHRs according to the properties of the tumors the patients were suffering from. A few terms are already in the dictionary, but we'll add more, based on the real data rather than starting from textbook knowledge only.

Exploring the data

So let's open our demo interface by accessing the following URL, where the number at the end should correspond to the domain ID of the domain you wish to work with: http://localhost:57772/csp/user/DictionaryBuilder.csp?1. On the left, you'll see a list of the top concepts for this domain, similar to what we saw in the Knowledge Portal earlier on. On the right, you'll see the list of dictionaries currently defined for your domain, one of which is expanded to the item level in the central column.

Now, again similar to how the earlier demos worked, let's start by typing something in the top left box and pressing the "Go!" button. That'll show you the similar entities for the fragment you just entered. For example, if you type "squa" and press the button, you'll see all the similar concepts iKnow identified, which includes many more subtle variations of squamous cell carcinoma (SCC), that would have been hard to enumerate upfront, or recombine with those subtle qualifiers ("moderately differentiated scc") that were added by the original text author.

Creating dictionaries

So we have our real-world bottom-up data on the left, and our top-down dictionary-organized knowledge on the right. Now you can enrich your dictionaries by clicking an entity on the left and dragging it over to the dictionary item for which you want to add the entity as a term.

If you drag an entity to the "new item" button at the top of the expanded dictionary, or straight onto a collapsed one in the list on the right, it will be added as a new term. If you drop it on the cog icon, you'll see a dialog allowing you to edit the terms for that item.

You can also create new dictionaries using the text box in the top center of the screen, for example to create an additional dictionary for topologies, another aspect we wish to know more about and would like to learn the real-world terminology for. After adding more dictionaries, items and terms, your screen may look like this:

Clearing things up

While it's clear this approach is helpful in validating whether your top-down insights correspond to the terminology used in the field and use that to enrich your own knowledge, it also brings up a certain amount of noise that's obstructing the view somewhat. For this, you can use blacklists, which are simple lists of terms you wish to exclude from query results (as we saw in the knowledge portal demo). You can populate these (orange bar in the lower right corner) in the same drag-and-drop fashion as you populated your dictionaries, and they'll be removed from the list of top/similar terms on the left straight away.

Separately, to filter the list of entities on the left to exclude the ones you have already mapped to a dictionary, use the "filter" dropdown in the entities' header bar.

Note: Remember to press the "Save all" button when you have finished modifying your dictionaries and blacklists, as all changes are only retained on the client side until you click that button. That will also fully refresh the star icons identifying matched terms.

Similar to the Set Analysis demo last week, this generic demo shows you how you can work with iKnow output to explore and organize your data. Gradually moving to more high-level use cases, we'll build upon this demo in next week's article, where we'll combine the dictionaries created here into rules we'd like to match against our data.

Jose-Tomas Salvador · Aug 9, 2016

Aren't these "demo tools" useful for general configuration for whatever project? I mean, are they going to appear in future releases or are not so useful in real projects?

0 0

Benjamin De Boe · Aug 10, 2016

It's indeed tempting to just stuff interfaces like this in the kit, but it goes a bit beyond the objectives of pure system management interfaces that we'd typically pack with Caché. Also, in the specific case of this Dictionary Builder demo, it uses the programmatic APIs to create dictionaries (requiring allowCustomUpdates=true) and does not update the domain definition itself. We're actually working on making that a smoother process, so when that gets to a point where it can support the interactions implemented in this GUI (and when AngularJS becomes part of our kit), we can reconsider it.