iKnow Review Analyzer (iKRA)

Article

Alex Litkovets · Apr 10, 2017 5m read

#Databases #InterSystems IRIS BI (DeepSee) #Unstructured Data #InterSystems Natural Language Processing (NLP, iKnow)

Introduction

We used the InterSystems iKnow technology to create a review assessment system called iKnow Reviews Analyzer (iKRA). Some information about the prototype of the system can be found here. iKRA analyzes users’ text reviews and automatically rates the object being reviewed. This functionality may come in very handy on e-commerce sites, forums or collections of media content – in other words, everywhere where people discuss products, places or services, for example.

What does the solution do?

iKnow Reviews Analyzer analyzes the domain area, be it the online sales of home appliances or hotel room booking services for travelers. To get the results of this analysis, you need to perform the following steps:

collect reviews in the designated domain area;
create dictionaries – databases of words for calculations;
create an area for loading and analyzing data;
start model calculations;
drink coffee / wait;
look at the results.

Use case

Here’s how it looks I reality… Let’s use smartphone reviews as an example and pick 5 manufacturers:

Apple;
Huawei;
LG;
Meizu;
Samsung.

Let’s assume that we are interested in two models from each of these vendors. Let’s download 50 reviews for each of the selected models – the total will be 500. Reviews will come from kimovil.

We’ll save each review to a separate file and use the following file organization scheme (Figure 1):

Figure 1. File location hierarchy

Brackets contain the general smartphone rating specified by the user in the review. It is written to metadata and used afterwards to optimize the calculation algorithm. Source reviews can be found here.

To perform the analysis, you need to create an iKnow domain – a storage of unstructured data. We will not focus on it right now, since this topic is described in detail here.

Once we’ve created a domain and filled it with reviews, let’s proceed to analyzing its content. When I choose a smartphone, the following parameters are crucial for me:

performance;
quality of communications;
comfort/ergonomics.

For ease of narration, let me introduce the following notions:

Category – a rateable parameter;
Functional (f) marker – a term that characterizes a parameter/category being rated;
Functional dictionary – an array of f-Markers;
Emotional (e) marker – a word characterizing the attitude of the reviewer to the object of review;
Emotional dictionary – an array of e-Markers.

Let’s use the selected characteristics to create a functional dictionary, where f-Markers (determiners) are assigned to each of the specified categories. For example, the “performance” category is likely to contain markers like “speed”, “processor”, “memory”, “performance”, “core” and so on. All f-Markers are saved to a special file. Figure 2 shows an example of a “Performance” category:

Figure 2. f-Markers

After that, we will create a dictionary of emotions by filling it with corresponding e-Markers. It’s impossible to provide a complete list here, but those would be words like “good”, “comfortable”, “liked”, “issues”, “problems”. e-Markers define a positive or negative context of every sentence in a review. Each e-Marker will have a numeric value assigned to it. For convenience, let’s use +1 for positive markers and -1 for negative ones. All e-Markers are also saved to a special file. Figure 3 shows an example of a set of e-Markers:

Figure 3. e-Markers

Once the dictionaries are ready, we can calculate the ratings. To do this, select the necessary domain on the “Domains” tab and click the “Calculate” button (Figure 4):

Figure 4. Rating calculation

To view the result, open the ikra.Dictionary.MarksUnit class table containing ratings for each smartphone model or use the ikra.Dictionary.MarksReview class that contains ratings for each review. Information is shown on the management portal. Let’s select the SQL section to view the necessary table. Figure 5 demonstrates the viewing of an ikra.Dictionary.MarksUnit class table.

Figure 5. Viewing the ikra.Dictionary.MarksUnit table

Let’s use DeepSee to check out the result. We’ve created a cube that uses the results of rating calculation by category and have built a chart for each analyzed smartphone model (Figure 6):

Figure 6. Ratings chart by category

What if we add another category?

In the past, if you wanted to rate categories, you’d have to specify the corresponding class property manually. It was inconvenient, since when categories or their count changed during the analysis of new domain areas, you had to make corrections in the code, which wasn’t the most fun and productive use of time. To avoid this, we have considered two solutions:

Reservation of a large number of class properties;
Use of a database.

The first option allows us to forget about the constantly changing number of categories and not care about the database structure. However, it’s not very convenient to store such a large number of properties, and there are no guarantees that the number of rateable parameters will remain unchanged. Therefore, we decided to go in a different direction.

The second option solves the problem with an undefined number of categories and does not require a fixed amount of memory for storing each class instance. When using a database, the system easily adapts to analyzing any domain area with any number of categories.

The advantages of the second approach convinced us to use it in iKRA.

Adding a new category

“And then I realized that I needed to rate another parameter of my smartphone – camera! (If you are into catching Pokemons, do it in style)”

Adding a new category is easy – all you need to do is to change the content of the functional dictionary and add a new category name (Figure 7).

Figure 7. Adding a “Camera” category

Let’s define a category by adding f-markers on the corresponding tab (Figure 2).

Select the necessary one on the domains tab and start the analysis (Figure 4).

Let’s wait for it to finish and then proceed to viewing (Figure 8):

Figure 8. The updated category rating chart

Hooray! We’ve easily added a new category and rated it.

To be continued

We will now be able to rate any product category quickly and without re-writing a line of code. All we need to do is to set up a dictionary and start the analysis. The complex part is the loading of reviews to the database, but we will cover this topic in a separate article.

GitHub link

Benjamin De Boe · Apr 10, 2017

Cool stuff!

I believe you're using matching dictionaries for identifying those sentiment markers, which is indeed convenient from an API perspective. However, you might want to take advantage of sentiment attributes, which will allow you to not just detect occurrences of your marker terms, but also which parts of the sentence they apply to. I'm not sure how that is covered in your current app (didn't dig that deep into the code), but especially in the recent versions that improved our attribute expansion accuracy, it may improve the precision of your application too. See this article for more details.

Separately, leveraging domain definitions may also simplify the methods you're using to set up your domain. There's an option to load dictionary content from a table or file, leveraging <external> tags inside the <matching> section. It's not (yet) supported through the Architect, but you can add it when updating the class through Studio.

Thanks for sharing this!

benjamin

4 0

Alex Litkovets · Apr 10, 2017

Very good comment, Benjamin! I fully agree with you, this will greatly improve the application. Now I know what I'll pay attention to first.

Many thanks for the helpful feedback and useful links!

Aleksei

0 0

retolik frolki · Aug 26, 2019

Tengo una pregunta, pero ¿son estas aplicaciones solo para el navegador o hay una aplicación móvil para el uso conveniente de este último?

Evgeny Shvarov · Aug 26, 2019

Dear @retolik frolki !

If you ask your question on the Spanish Community you have more chances to get an answer.

Sergey Kozhevnikov · Nov 22, 2019

Very nice name!