Using vector search to compare the similarity between articles |

Article

xuanyou du · May 11, 2024 1m read

Open Exchange

#Embedded Python #Python #Vector Search #InterSystems IRIS for Health

Principle: After dividing the article uploaded by the user into sentences using Python, the embedded value is obtained and stored in the Iris database. Then, the similarity between sentences is compared through Iris vector search, and finally displayed on the front-end page.

The installation steps can be viewed in the readme file. It should be noted that the BERT model used in the example has some memory requirements. If there is a long-term stuck situation during the testing process, other models such as MiniLM (which is used in the online demo) can be considered. Note that if using other models, it is necessary to modify Article Similarity SentenceVector and ArticleSimilarity The LEN and MiniLM for Embedding in Vector are 384.

At present, the application defaults to displaying statements with a similarity of 0.7 or higher, which can be found in Article Similarity Modified in the GetSenSimiEmbedding method of GetSimilarityBussinessOperation (currently displayed as 0.5 in the online demo).