The iFind search portal demo includes a simple class query to find similar documents within a single iFind index. It's only pretty basic and somewhat picky (assuming the demo setup), building on the dominance score for each entity, and may not guard against that difference in length issue you're seeing with BM25. There is a similar method in iKnow when your data would already be in an iKnow domain.

There would indeed be value in providing %SIMILARITY support for iFind indexed fields, leveraging the standard/enhanced algorithm on top of word tokens. I'll log that as an enhancement request and we can follow up internally. Obviously, I'm interested in experiences or advice of other DC members here 

IRIS will automatically create an auto-incrementing RowID and leverage it as the IdKey for you, so unless you want anything other than the default, you shouldn't define such a field or index explicitly. 

If all you want is control the name, take a look at the SqlRowIdName class parameter. If you need control over its behaviour, what you've set up is appropriate and you can leverage SqlRowIdPrivate to get rid of the default additional projection of the RowID. However, unless there's a good reason for controlling the IdKey, you should try to avoid overriding it as it may deprive you of certain practical features and efficiencies such as bitmap indices and an extent index.

You can expose this information through setting the IFINDMAPPINGS parameter to 1:

  • [class_name]_[index_name]_WordRec: stores which words appear in each record in this index. See also %iFind.Index.AbstractWordRec.
  • [class_name]_[index_name]_WordSpread: stores the total number of records in which this word appears in this index. See also %iFind.Index.AbstractWordSpread.
  • [class_name]_[index_name]_WordPos stores which word occurs at which position in a record, so it can be joined to the AttributePos table. See also

So by doing a COUNT() on the WordPos table, you should find what you're looking for IFF it corresponds to an actual word. If you're using wildcards, you might combine with %iFind.FindWords() as a TVF, but that'd still be looking for individual words only:

SELECT COUNT(*) 
FROM Test_IFind.IFind_IF_WordPos 
WHERE WordId IN (SELECT WordId FROM %iFind.FindWords('ab*'))

If you want to count any kind of match, your highlight trick is probably the nicest way to get at it.

(as an aside: with the introduction of the %iFind.Index.Minimal index type in 2020.1, which BTW was Eduard's suggestion ;-), it seems the class reference for the IFINDMAPPINGS projections added by the .Basic class but not in .Minimal got lost. We'll fix that shortly!)

Hi Mark,

can you be more specific on the nature of the data and possibly application that's going to be migrated? HealthShare is more a suite of applications with standardized data models underneath, running on top of InterSystems IRIS for Health. To ingest data into HealthShare means transforming it through its APIs, whereas IRIS for Health is more comparable to a relational database like DB2.

SQL Gateways allow mapping tables in remote non-InterSystems databases (such as DB2) to IRIS so they can be accessed as if they were local. That can be helpful during migrations as well as heterogeneous production environments. TSQL is a specific set of extensions of the SQL language used by Sybase and MS SQL Server that we support to ease migration scenarios for new customers with applications written in TSQL.

hope this helps,
benjamin

This is not possible today, but happens to be something due for 2020.4 or 2021.1 in the form of expression indices.

The documentation may be slightly confusing here as the BuildValueArray() mechanism applies to non-collection properties and not "array of" or "list of" properties. The section that describes it just happens to be placed right after the section describing how to use the ELEMENTS trick to index them in their entirety. Note that you can use BuildValueArray() for %ArrayOfObjects properties, but those aren't projected as a child table.

A possible workaround would be through a separate property that's just there as an intermediary for your index:

Property AR As array Of Sample.Address;
Property AllStates As %String [ Calculated, ReadOnly, SqlComputeCode = {set {*} = {%%ID}}, SqlComputed ];
Index AllStatesIDX On AllStates(ELEMENTS);
ClassMethod AllStatesBuildValueArray(value, ByRef valueArray) As %Status
{
  kill valueArray
  set tObj = ..%OpenId(value), tKey = ""
  for {
    set tAddress = tObj.AR.GetNext(.tKey)
    quit:tKey=""
    set valueArray(tKey) = tAddress.State
  }
  quit $$$OK
}

Then you can include it in queries:

select * from sample.person where FOR SOME %ELEMENT(AllStates) (%VALUE = 'KY')

Hi David, maybe you can elaborate a little more on the particular latency challenge you faced?

As Dmitriy mentioned, sharding will spread your data and corresponding query workload across multiple nodes in order to achieve higher efficiencies on very large datasets and is especially fit for read-mostly workloads. ECP, when used in a typical application server setup, is meant to distribute user-bound workload across multiple servers, so serving a slightly different goal. Depending on your use case, either of those (or a combination of them) can be more appropriate. See also this overview and introductory video for more info.

if that column is currently in a string format (you can tell from the column/resultset metadata), you can use a different pattern to check if it meets the raw format:

SELECT CASE WHEN '01/01/2020' %PATTERN '1.2N1"/"1.2N1"/"4N' THEN 'valid' ELSE 'invalid' END

if you also want to check if that resolves to a valid date, you can use CAST or TO_DATE, but those will throw errors if they don't resolve to a valid date, so a small custom function in ObjectScript is perhaps not a bad idea

Depends on how you're set up. There is a setting to allow specifying those values that is meant for this type of bulk load scenario.

@Daniel Buxton : the multi-row insert syntax you're asking about is currently not supported by IRIS SQL (as you figured by now :-)). Robert's approach using UNION all is a creative workaround, but depending on where your big INSERT is coming from you might also just be able to generate / rewrite it into separate INSERTs? Once you have that in place, you can use the SQL Shell's RUN command or the ObjectScript $SYSTEM.SQL.DDLImport() method to execute your script file.

Hi Darko,

LONGVARCHAR is actually storing the text as a stream, so to make LIKE work, you'll need to convert it to a string, e.g. using SUBSTRING(), in the expression you're serving to the LIKE operator. The following works fine for me:

SELECT SUBSTRING(text, 1, 999999) FROM table WHERE SUBSTRING(text, 1, 9999999) LIKE '%abc%'

This looks a little clumsy, but streams are really meant for huge data that wouldn't fit in a VARCHAR. Note that you can get a massive amount of text in VARCHAR (%String) columns these days, so maybe that's worth considering.

Separately, when working with iFind, that'll provide faster search results and it also transparently takes care of the stream aspect.