Pesquisar

Article
· Oct 9, 2024 3m read

My personal approach to Vectors

Motivated by personal feedback by @Edilson Eberle Carvalho  and 
an excellent presentation of @Michael Braam  related to Vector Search I'd like to share
my personal approach to Vectors.

When I started and met vectors with 256, 384, and over 1200 dimensions - I felt lost.
However my example 
Vector-inside-IRIS - a simplification of iris-vector-search - worked fine.
 
In order to understand the mechanics behind it, I decided to start in small steps.
Our common 3 dimensions describe our physical world quite fine.
Even the half 4th dimension (no negatives) added by Einstein is not to hard to follow.
Cosmologic string theory with 0 to 11+ dimensions was a real border line to me.

So back to start : => 2 dimensions are enough for a beginner.
Geographic coordinates provide a nice starting point with sufficient test data.
VECTOR_COSINE() function was my primary target in my example 
geo-vector-search

The range of results from -1 to +1 is easy to interpret in a quasi-flat projected map.
That's basic mathematics and its transposition to an additional dimension is not a miracle.

But what about some hundred dimensions?

;#1
-.0104943, .01472898, .07107521, .07168121, -.0937807, .05828459, .04451195, -.1045385, -.0110124, -.0240547, -.0032111, -.0030188, -.0414225, -.1092015, .02203945, -.0129255, .14087346, .04734043, -.0181046, -.0458297, .02323769, .02859951, .01124321, .00857456, -.0049756, -.0144282, -.0846236, -.0284645, -.0147692, -.0989931, .04880870, .01899284, -.0176833, .04763242, -.0808972, -.0604988, .05757499, -.0638228, .04217084, .03707900, .03757081, .03086806, .02773610, .02082979, -.0495735, -.0337784, -.0438372, -.0827000, -.0018084, -.0072785, -.0797550, -.0055747, -.0551242, -.0918905, .01140710, -.0115834, -.0088469, -.0445509, .02972822, .04385065, .04125113, .01189815, .01809763
;#2
-.0340279, -.0930349, -.0356242, .03200291, .07393958, -.0164658, -.0218968, .01392244, -.0069597, .02677908, -.0800164, .07227557, -.0430033, -.1134698, -.0561500, -.0520939, -.0306403, .00750979, -.0345837, .03335380, -.0438071, -.0088005, .03423582, .00794844, .01172804, .05204785, .04179215, .01768089, -.0489745, -.0031708, -.0349655, -.0482467, .08090461, -.0596610, -.0565769, -.0043313, .01015284, .07152537, .04189436, -.0475862, -.0171517, .03899634, -.0705699, -.1133416, .08019342, .02138555, .01466019, .00184080, -.0905641, -.1039420, -.0290395, .02753796, .01674868, -.0259464, -.0107869, -.0407411, -.0120343, -.0636389, .00047146, .01514394, -.0694578, -.0204190, -.0024446
;#3
.00350692, .09432639, .01641871, .09951058, .10459023, .00019239, -.0823584, -.0022799, -.0227801, -.0023362, -.0397562, .07449327, -.1137044, .09173037, .08620572, -.0881805, -.0111093, -.0316556, -.0044012, -.1248759, -.0897788, .03191807, -.0147239, -.0198379, -.0849955, -.0026861, .02628867, -.0523788, -.0398543, -.0080245, .06736382, .01456158, .04700677, -.0171667, -.0217174, .06761254, -.0070750, .02879706, .01109632, .02541129, -.0384420, .00410159, .05145533, .06493697, -.0924961, -.0422163, -.0739539, .06107471, .06070494, -.0044191, .00238501, -.0182966, .03546700, .05925614, -.0361021, .09686610, .02930910, .01282224, .02953721, -.0526526, .03977891, .00501585, .00717564

the example here is shortened for readability

After thinking it over for some time I found a personal image:

  • When I search for a target in Google Maps that is not just around the corner I get a choice of routes offered.
  • the shortest one, the fastest one, the one with the least fuel consumption,   public transport, ... 
  • and I make a choice according to my needs
  • interpreting the dimensions of my vectors as numbered intermediate steps to my target I would get a similar picture
  • And by VECTOR_COSINE I get the best proposal

LLM experts may laugh about my simplification.
But to me, it's a picture beyond abstract mathematics and confusing language theories.
And I believe my simple picture based on a daily used process in car driving
helps to understand how results are found.

NOTE: I still have no idea of HOW those vectors are calculated as long as it's consistent.
It was important for me to understand how the matching works. 

 

1 Comment
Discussion (1)2
Log in or sign up to continue
Announcement
· Oct 9, 2024

Server Manager in VS Code - Better handling of changed passwords

We've just made a change to Server Manager with the aim of it coping better when a stored password is no longer valid, for example because it has been changed.

We plan to include this in the next published version (no release date yet set), but if you'd like early access please download the v3.6.3-beta.3 VSIX and install it, for example by dragging it from your file explorer onto the Extensions view in VS Code.

If you encounter problems with this beta you can easily revert to the most recent published version (3.6.2) by means of the "Install Specific Version..." option on the gearwheel of the extension in Extensions view.

Feedback on this change is welcome here or at https://github.com/intersystems-community/intersystems-servermanager/issues

1 Comment
Discussion (1)2
Log in or sign up to continue
Question
· Oct 9, 2024

SQL Query Help

Hello all,

I need help with coming up with a SQL query that pulls only one value. I have a case where two providers share the exact same name. Each has a different NPI number and IdentityTypeId. I tried the below query - output is also below. 

SELECT *                        
FROM PhysTable                        
WHERE ProviderName = 'DOE, JOE' AND Type = 'NPI'                        
                        
UNION                        
                        
SELECT *                        
FROM PhysTable                        
WHERE IdentityId = '345678'                        
 

Output

ID Type IDType IdentityId IdentityTypeId ProvId ProviderName ProviderType
292||1001 NPI NPI 12345 1001 9242 DOE, JOE Nurse
252||1001 NPI NPI 56785 1001 8252 DOE, JOE Doctor
252||61 NUM EMPLOYEE NUMBER 345678 61 8252 DOE, JOE Doctor

What I want is the IdentityId highlighted in red that is "56785" that is Type NPI. The ProvId and ProviderName both come in the HL7 message but IdentityId etc. do not. Any ideas on how I can do this? Thanks. 

Editing to add: we do get the IdentityId of "345678" of Type NUM but we do not get the IdentityId of "12345" or "345678" of Type NPI.

18 Comments
Discussion (18)2
Log in or sign up to continue
Digest
· Oct 9, 2024

InterSystems Community Q&A Monthly Newsletter #41

Top new questions
Can you answer these questions?
#InterSystems IRIS
#InterSystems IRIS for Health
#Caché
#Ensemble
#Health Connect
#Open Exchange
#Other
#41Monthly Q&A fromInterSystems Developers
InterSystems Official
· Oct 9, 2024

Búsquedas vectoriales más rápidas con el índice ANN (Aproximate Nearest Network) -- disponible en el Programa de Acceso Anticipado a Vector Search (nueva funcionalidad de búsqueda vectorial)

Recientemente hemos puesto a disposición una nueva versión de InterSystems IRIS en el Programa de Acceso Anticipado a Vector Search (o búsqueda vectorial), que utiliza el índice ANN (Aproximate Nearest Network) basado en el algoritmo de indexación Hierarchical Navigable Small World (HNSW). Esta incorporación permite realizar búsquedas aproximadas de vecinos-más-cercanos (nearest-neighbours) de manera mucho más eficiente, sobre grandes conjuntos de datos vectoriales, mejorando drásticamente el rendimiento de las consultas y la escalabilidad.

El algoritmo HNSW está diseñado para optimizar la búsqueda vectorial en datos con alta dimensionalidad, construyendo una estructura basada en grafos que facilita encontrar vecinos aproximados en grandes colecciones de vectores. Ya sea que estéis trabajando con sistemas de recomendación, procesamiento de lenguaje natural u otras aplicaciones de aprendizaje automático, HNSW puede reducir significativamente los tiempos de búsqueda, permitiéndoos ajustar el nivel de precisión valorando que una mayor exactitud resultará en tiempos de consulta más lentos.

Los beneficios clave de HNSW incluyen:

  • Búsquedas más rápidas incluso a medida que el tamaño del conjunto de datos crece
  • Reducción en el uso de memoria mientras se mantiene una alta precisión
  • Integración sin problemas con las capacidades de búsqueda vectorial existentes en IRIS

Cómo empezar

La última versión ya está disponible a través de nuestro Programa de Acceso Anticipado a Vector Search. Para participar, solo tenéis que registraros, descargar la nueva versión y comenzar a probarla. ¡Vuestros comentarios son fundamentales mientras seguimos mejorando la Búsqueda Vectorial!

Os animamos a explorar las mejoras en el rendimiento y compartir vuestras opiniones con la comunidad. No dudéis en contactarnos si tenéis alguna pregunta o comentario durante la fase de acceso temprano.

¡A programar!

Discussion (0)1
Log in or sign up to continue