The iFind search portal demo includes a simple class query to find similar documents within a single iFind index. It's only pretty basic and somewhat picky (assuming the demo setup), building on the dominance score for each entity, and may not guard against that difference in length issue you're seeing with BM25. There is a similar method in iKnow when your data would already be in an iKnow domain.

There would indeed be value in providing %SIMILARITY support for iFind indexed fields, leveraging the standard/enhanced algorithm on top of word tokens. I'll log that as an enhancement request and we can follow up internally. Obviously, I'm interested in experiences or advice of other DC members here 

You're spot on. The similarity between Python and ObjectScript plus its popularity (Python's, that is ;-) ) are exactly what drove us to build Embedded Python. We're not compiling it to .obj code though, but running it "as Python" in the kernel. @Bob Kuszewski and @David McCaldon are much better at explaining that nuance (and actually do in the intro webinar to our early access program for this upcoming feature.

Hi Nigel, 

glad you like the Python Gateway work. Perhaps you're also interested in participating in the embedded python EAP?

As for APL, we're typically looking for some critical mass of customer demand before embarking on such a development project and only recall it being mentioned once before. But thanks for bringing it up, as each critical mass starts somewhere :-). 

This said, we've had some really great work pioneered in the community here. The Python Gateway project is one example that originated here and its popularity (thanks @Eduard Lebedyuk and @Sergey Lukyanchikov 
!) inspired the one now released as part of the core InterSystems IRIS platform. Similar concepts, such as the Julia Gateway are still "incubating" as a community-driven project. Maybe APL could fit that same path?

Yes to both questions: For a full list of the available images, please refer to the ICR documentation.

Our doc team is updating that page today, but generally anything for which there was a 2021.1.0.205.0 tag, there should also be a 2021.1.0.215.0. There were a few open questions on the new permutations that we were looking through this morning.

Thanks for prodding me on this. It's a perfect opportunity to advertise an upcoming feature that addresses this exact problem.

SQL collations have existed for a long while, but as Robert pointed out, were restricted to playing with upper/lower case, truncation, whitespace and the odd numeric ploy through a limited set of built-in collations. Later this year, we'll release a more generic %COLLATE() option that allows you to combine these properties any way you like, including a translation that removes all accents (both as a SQL collation option and a new $zconvert() option).

In the meantime, you can use the somewhat impractical workarounds described above, or possibly use an iFind index and specify a TRANSFORMATIONSPEC that removes the accents.

Yes it does. I don't have time to turn this into an easy zpm-compatible module right away, but the following commands worked fine for me on IRIS, after cloning the github repo:

do $system.OBJ.ImportDir("/path/to/downloaded/isc-iknow-setanalysis","*.xml","c",,1)

do ##class(Demo.SetAnalysis.Utils).CreateRestWebApp()

The first command may throw a few errors for some BI-related things that no longer seem to work out-of-the-box, but the core app works just fine after running the above two lines. You can access it the URL described in the article above.