Question
Eduard Lebedyuk · Jan 2, 2021

SQL Search/iFind to find out how many matches are there

I have an SQL text index defined like this

Index TextIndex On (Text) As %iFind.Index.Basic(IGNOREPUNCTUATION = 1, INDEXOPTION = 0, LANGUAGE = "en", LOWER = 1);

If I run a query like this:

SELECT
Text
FROM Post
WHERE %ID %FIND search_index("TextIndex",'ABC')

I get 20 results with posts containing the term ABC.

But each post can contain the term ABC several times.

Is there a way to get a total count of ABC term?

Is there a way to get a count of ABC term in each Post?

Product version:
IRIS 2020.4
00
1 0 13 139
Log in or sign up to continue

Replies

try to extend your SQL statement like this

SELECT SUM(count) 
   (SELECT
      $LENGTH(TEXT,'ABC')-1 as count,
      Text
   FROM Post
   WHERE %ID %FIND search_index("TextIndex",'ABC')
 )

That won't return accurate count unfortunately.

Consider a Text value like: ABCD ABC. It would be returned by iFind since ABC is in this text, however $Find would return 2 - incorrect number of ABC instances. More advanced iFind variants do additional text/term processing so I'd like to use it.

Additionally I know it's possible as there is highlight function so iFind can determine a number of search term occurrences in a string and highlight them, but I only need to return the number of occurrences.

I can highlight and count <b>,  but I wonder if a better solution exists.

Ah! you look for isolated "ABC" and exclude it in context.
So you might need a ClassMethod as SQL Procedure to get the exact & isolated occurrence 
instead of $LENGTH().
Probably requiring some Regular Expression 
or plain looping on $FIND() and checking the borders 

Went with:

SELECT
    SUM($LENGTH(%iFind.Highlight(Text, :name), '<b>')-1)
FROM Post
WHERE %ID %FIND search_index("TextIndex",:name)

Assumes plaintext.

where does '<b>' come from? Is it HTML text? 

Ahh! never seen before!  yes  seems to be in $$$IFDEFAULTHLTAGS

in %IFinclude.inc:
/// BDB666: default highlighting tags
#define IFDEFAULTHLTAGS "<b>,<font color=""red"">"

You can expose this information through setting the IFINDMAPPINGS parameter to 1:

  • [class_name]_[index_name]_WordRec: stores which words appear in each record in this index. See also %iFind.Index.AbstractWordRec.
  • [class_name]_[index_name]_WordSpread: stores the total number of records in which this word appears in this index. See also %iFind.Index.AbstractWordSpread.
  • [class_name]_[index_name]_WordPos stores which word occurs at which position in a record, so it can be joined to the AttributePos table. See also

So by doing a COUNT() on the WordPos table, you should find what you're looking for IFF it corresponds to an actual word. If you're using wildcards, you might combine with %iFind.FindWords() as a TVF, but that'd still be looking for individual words only:

SELECT COUNT(*) 
FROM Test_IFind.IFind_IF_WordPos 
WHERE WordId IN (SELECT WordId FROM %iFind.FindWords('ab*'))

If you want to count any kind of match, your highlight trick is probably the nicest way to get at it.

(as an aside: with the introduction of the %iFind.Index.Minimal index type in 2020.1, which BTW was Eduard's suggestion ;-), it seems the class reference for the IFINDMAPPINGS projections added by the .Basic class but not in .Minimal got lost. We'll fix that shortly!)

Thank you!

That's exactly what I need. I knew about iKnow relational mappings, but not iFind ones.

If you want to count any kind of match, your highlight trick is probably the nicest way to get at it.

The problem with this approach is:

  • It actually reindexes the source again which takes time
  • It does not seem to be aware of the index so index parameters would be missed as only INDEXOPTION can be passed.
  • It does not seem to be aware of the index so index parameters would be missed as only INDEXOPTION can be passed.

hmm, that shouldn't happen. if you could file a reproducible test case through our internal systems, we'll take a look. Or perhaps you didn't use the implicitly-generated [package name].[table name]_[index name]Highlight() procedure?

I used %iFind.Highlight as shown in the docs for highlight function.

Even went to sources and checked if it was a generator maybe - it was not, so I stopped my search for more index-specific option.

Switching to [package name].[table name]_[index name]Highlight() now.

Also how can[package name].[table name]_[index name]Find()  and  ...Rank() be used?

the Find() one implements the %SQL.AbstractFind interface is the longhand (opposite of shorthand :-) ) for what search_index() does in your %FIND predicate. The Rank() function is documented in the class ref