I'm working in an application that uses %SIMILARITY to find matches among a set of documents that vary greatly in length. It's generally good but I've noticed issues with ranking short partially-matching documents over longer documents that match the search string entirely.
Reading up on the Okapi BM25 ranking function (which is what %SIMILARITY / the %Text package use) at https://en.wikipedia.org/wiki/Okapi_BM25 I see mention of the BM25+ modification, which "was developed to address one deficiency of the standard BM25 in which the component of term frequency normalization by document length


.png)
.png)




