#Unstructured Data

5 Followers · 29 Posts

Unstructured data (or unstructured information) is information that either does not have a pre-defined data model or is not organized in a pre-defined manner. Unstructured information is typically text-heavy but may contain data such as dates, numbers, and facts as well.

New
Article Alyssa Ross · Feb 20 6m read

One objective of vectorization is to render unstructured text more machine-usable. Vector embeddings accomplish this by encoding the semantics of text as high-dimensional numeric vectors, which can be employed by advanced search algorithms (normally an approximate nearest neighbor algorithm like Hierarchical Navigable Small World). This not only improves our ability to interact with unstructured text programmatically but makes it searchable by context and by meaning beyond what is captured literally by keyword.

In this article I will walk through a simple vector search implementation that Kwabena Ayim-Aboagye and I fleshed out using embedded python in InterSystems IRIS for Health. I'll also dive a bit into how to use embedded python and dynamic SQL generally, and how to take advantage of vector search features offered natively through IRIS.

0
0 134
Question Kanishk Mittal · Jul 28, 2025

We’re building out a data lake in IRIS 2025.1 that aggregates data across multiple business systems and departments. I’m trying to establish best practices for schema design and separation.

Right now, I’m thinking of using a separate schema for each distinct system of record feeding into the data lake - for example, one schema per upstream source system, rather than splitting based on function (e.g. staging, raw, curated). The idea is that this would make it easier to manage source ownership, auditing, and pipeline logic, especially when multiple domains are contributing data.

But I’d love to

0
0 101