SQL Search/iFind to find out how many matches are there
I have an SQL text index defined like this
Index TextIndex On (Text) As %iFind.Index.Basic(IGNOREPUNCTUATION = 1, INDEXOPTION = 0, LANGUAGE = "en", LOWER = 1);
If I run a query like this:
SELECT
Text
FROM Post
WHERE %ID %FIND search_index("TextIndex",'ABC')
I get 20 results with posts containing the term ABC
.
But each post can contain the term ABC
several times.
Is there a way to get a total count of ABC
term?
Is there a way to get a count of ABC
term in each Post?
Product version: IRIS 2020.4
try to extend your SQL statement like this
SELECT SUM(count) (SELECT $LENGTH(TEXT,'ABC')-1 as count, Text FROM Post WHERE %ID %FIND search_index("TextIndex",'ABC') )
That won't return accurate count unfortunately.
Consider a Text value like:
ABCD ABC
. It would be returned by iFind sinceABC
is in this text, however $Find would return 2 - incorrect number ofABC
instances. More advanced iFind variants do additional text/term processing so I'd like to use it.Additionally I know it's possible as there is highlight function so iFind can determine a number of search term occurrences in a string and highlight them, but I only need to return the number of occurrences.
I can highlight and count
<b>
, but I wonder if a better solution exists.Ah! you look for isolated "ABC" and exclude it in context.
So you might need a ClassMethod as SQL Procedure to get the exact & isolated occurrence
instead of $LENGTH().
Probably requiring some Regular Expression
or plain looping on $FIND() and checking the borders
Went with:
SELECT SUM($LENGTH(%iFind.Highlight(Text, :name), '<b>')-1) FROM Post WHERE %ID %FIND search_index("TextIndex",:name)
Assumes plaintext.
where does '<b>' come from? Is it HTML text?
No, the text is plaintext and does not contain any HTML.
highlight function by default highlights with
<b>text</b>
.Ahh! never seen before!
seems to be in $$$IFDEFAULTHLTAGSin %IFinclude.inc:
/// BDB666: default highlighting tags
#define IFDEFAULTHLTAGS "<b>,<font color=""red"">"
Calling @Benjamin De Boe.
You can expose this information through setting the IFINDMAPPINGS parameter to 1:
So by doing a COUNT() on the WordPos table, you should find what you're looking for IFF it corresponds to an actual word. If you're using wildcards, you might combine with %iFind.FindWords() as a TVF, but that'd still be looking for individual words only:
If you want to count any kind of match, your highlight trick is probably the nicest way to get at it.
(as an aside: with the introduction of the %iFind.Index.Minimal index type in 2020.1, which BTW was Eduard's suggestion ;-), it seems the class reference for the IFINDMAPPINGS projections added by the .Basic class but not in .Minimal got lost. We'll fix that shortly!)
Thank you!
That's exactly what I need. I knew about iKnow relational mappings, but not iFind ones.
The problem with this approach is:
hmm, that shouldn't happen. if you could file a reproducible test case through our internal systems, we'll take a look. Or perhaps you didn't use the implicitly-generated [package name].[table name]_[index name]Highlight() procedure?
I used
%iFind.Highlight
as shown in the docs for highlight function.Even went to sources and checked if it was a generator maybe - it was not, so I stopped my search for more index-specific option.
Switching to
[package name].[table name]_[index name]Highlight()
now.Also how can
[package name].[table name]_[index name]Find()
and...Rank()
be used?the Find() one implements the %SQL.AbstractFind interface is the longhand (opposite of shorthand :-) ) for what search_index() does in your %FIND predicate. The Rank() function is documented in the class ref