Of course it's just one approach to the problem but I hope it can be helpful.

Stay tuned for a dedicated LOAD DATA command in IRIS SQL coming very soon :-)

The iFind search portal demo includes a simple class query to find similar documents within a single iFind index. It's only pretty basic and somewhat picky (assuming the demo setup), building on the dominance score for each entity, and may not guard against that difference in length issue you're seeing with BM25. There is a similar method in iKnow when your data would already be in an iKnow domain.

There would indeed be value in providing %SIMILARITY support for iFind indexed fields, leveraging the standard/enhanced algorithm on top of word tokens. I'll log that as an enhancement request and we can follow up internally. Obviously, I'm interested in experiences or advice of other DC members here 

yes, they are the same. Like Studio and ODBC, it's an install-time option to right-size your footprint (and therefore highly relevant for container images). I'm not sure if there's a handy utility method to check if it's been installed or not, but @Thomas Dyar would know.

