Indexing

Syndicate content 5 

In the previous parts (1, 2) we talked about globals as trees. In this article, we will look at them as sparse arrays.

A sparse array - is a type of array where most values assume an identical value.

In practice, you will often see sparse arrays so huge that there is no point in occupying memory with identical elements. Therefore, it makes sense to organize sparse arrays in such a way that memory is not wasted on storing duplicate values.

In some programming languages, sparse arrays are part of the language - for example, in J, MATLAB. In other languages, there are special libraries that let you use them. For C++, those would be Eigen and the like.

Globals are good candidates for implementing sparse arrays for the following reasons:

Last comment 3 days ago
+ 6   0 5
587

views

+ 6

rating

Have some free text fields in your application that you wish you could search efficiently?  Tried using some methods before but found out that they just cannot match the performance needs of your customers?  Do I have one weird trick that will solve all your problems?  Don’t you already know!?  All I do is bring great solutions to your performance pitfalls!

As usual, if you want the TL;DR (too long; didn’t read) version, skip to the end.  Just know you are hurting my feelings.

If you open up your version of Sample.Company in the SAMPLES namespace of a recent (2015.1 or later) Caché/Ensemble/HealthShare version you will see a Mission field that is pseudo-randomly generated text.  Suppose we want to search this text field.  For the purpose of this exercise, I generated about 256,246 companies – feel free to populate some on your own and follow along.  Well you might run the following query

Last comment 10 May 2019
+ 19   1 3
1226

views

+ 19

rating

We have been storing raw messages in a MySQL database for DR and ad hoc purposes. We are thinking of using an Ensemble instance as our data lake instead. We could segregate the source data by namespace or by global. But either way we'll want a custom global to index the data for data retrieval performance purposes.

Anyone else taking this approach? Any feedback?

Last answer 29 March 2019 Last comment 29 March 2019
0   0 2
63

views

0

rating

Just wondering an Insight in the difference between these two indexes

IdKey / PrimaryKey
================= 

Property Identifier As %Integer

Index Index1 on Identifier [Idkey]

Index Index2 on Identifier [PrimaryKey]

What's the difference?

1. If I don't have Index1 and only have Index2,  then cache does still make its own id.
So how and why  do I ever use the PrimaryKey.  In Joins ??

Table1.Identifier = Table2.Identifier instead of Table1.Id = Table2.id ??
But I can still use Table1.Id = Table2.Id as cache still made one ID field

So where is PrimaryKey useful in cache?

2. If I declare Index1 , I am not able to have any Bitmap indexes [Cache throws an error on compilation saying I have an Idkey index]

 

Last answer 5 March 2019 Last comment 6 March 2019
+ 1   0 6
158

views

+ 1

rating

Hi,

I know of the existance of (ELEMENTS) to create an index from a list, but I actually would like to index the content of an element of a list. Is it possible?

 

My scenario:

Class:
Property Test As list of TestList;
 

Test.List:
Property Name As %String;
Property Surname As %String;

I would like to have an  index based on the TestList.Name. If I try using

Index NewIndex On Test(ELEMENTS)

it will create an index with Name and Surname in it, but I just want to have an index with the name. Is it possible? 

Last answer 26 February 2019 Last comment 26 February 2019
0   1 2
86

views

0

rating

I have a persistent class that represents cities across the United States.  It is below, but basically has a City Id, Name, Lat, Lon and a few other unimportant fields for this issue.  Anytime I attempt to query on the Latitude or Longitude it immediately returns no results.  My first thought was that it was a casting issue so I tried casting both sides to floats, ints, even strings and in all cases it immediately comes back with no results.  I then decided to cast it to a string and attempt a like statement thinking it might be something about how floats are handled, but still no joy.  Any ideas on why I cannot query on these fields?  More details are below

Last answer 8 December 2018 Last comment 8 December 2018
0   0 2
113

views

0

rating

Hi-

I have the following objects

Class A

   Property P1 As B

   Property P2 As %String

   Property P3 As %String

Class B

   Property P1 As %String

Can I create an index in Class A based on P1.P1.  Basically I want an index of class A by property P1 in class B

I tried creating the following but got a compile error

Index I1 On P1.P1

Thanks

Last answer 16 April 2018
0   0 3
0

comments

200

views

0

rating

No doubt bitmap indexing, if used with a suitable property, performs just impressive!
But there is a major limit: ID key has to be a positive integer.
For modern class definitions working with CacheStorage this is a default.

BUT: There are hundreds (thousands ?) old applications out in the field that
are still using composite ID keys.
Or - to name the origin - work on Globals with 2 subscript levels (or more).
They are by construction excluded from our "Bitmap Wonderland".

Of course, using the feature of multiple storage maps could possibly allow to escape - somehow.
But in those cases where you have many many GB of data stored in that way and
where you work with millions of lines of old  - often poorly documented - code.
You wouldn't think seriously more than 5 min. if you change the storage structure or not.
So the bitmap is not with you

Last comment 17 February 2018
+ 7   0 3
366

views

+ 7

rating

Hi, folks!

Suppose you have a Caché class with %String property which contains relatively large text (from 10 to 2000 symbols).

The class:

Class Test.Duplicates Extends %Persistent 

{

Property Text As %String (MAXLEN = 2000);

}

And you have thousands of entries.

What are the best options to find entries which are duplicates on this property?

Last answer 1 December 2017 Last comment 9 November 2017
0   1 2
586

views

0

rating

Hi,

I have a class with around 400k lines and 60 columns. Class storage is Cache SQL storage (Mapped from a global).

 I want to create multiple indices on certain fields.

I am familiar with two approaches:

1. Create a new map (Index type) on a pointer global.

2. Create a bitmap index

Which approach is more recommended to be used in the case I described? If there are any other approaches, I will be happy to hear.

Thanks :)

 

Last answer 11 October 2017 Last comment 19 October 2017
0   0 2
366

views

0

rating

Hi,

I am new to Cache' MV but have extensive experience with other Pick flavors especially Unidata. 

I need to determine the impact of adding several indexes to a large file with over 51,000,000 records. 

On other systems, I could use FILE.STAT, ANALYZE.FILE and shell to the OS to determine how large the index file was. 

None of those seem to be available in Cache' MV. Shelling to the OS just tells me the size of CACHE.DAT.

What is the best way to determine what the disk impact would be if I added an index (CREATE-INDEX) to a file?

TIA,

Steve

Last answer 9 May 2017 Last comment 14 August 2017
0   0 2
246

views

0

rating

Hi

1. Is it possible do define an index like that :

create index UIX on MyTable (Column1) where Column1 is not null

2. What happens if we add an index on a property that is NOT required, meanning that not all records will be indexed because we do not allow null subscripts ?

Last answer 10 August 2017 Last comment 13 August 2017
0   0 2
251

views

0

rating

Hi 

I have two persistent classes defined. Lets call it Parent and Child.

Child class is one of the property of Parent Class.

I would like to define a index on Child class.

So what is the default behaviour I defined a index on a non simple data type member?

Any possibility that I could customized the behaviour ? For example. Child class has three properties.

Could I configure the index to index any combinations of these three properties?

Thanks for your help.

Last answer 16 July 2017
0   0 1
0

comments

285

views

0

rating

Quotes (1NF/2NF/3NF)ru:

Every row-and-column intersection contains exactly one value from the applicable domain (and nothing else).
The same value can be atomic or non-atomic depending on the purpose of this value. For example, “4286” can be
  • atomic, if its denotes “a credit card’s PIN code” (if it’s broken down or reshuffled, it is of no use any longer)
  • non-atomic, if it’s just a “sequence of numbers” (the value still makes sense if broken down into several parts or reshuffled)

This article explores the standard methods of increasing the performance of SQL queries involving the following types of fields: string, date, simple list (in the $LB format), "list of <...>" and "array of <...>".

+ 6   0 3
0

comments

378

views

+ 6

rating

Overview

Encryption of sensitive data becomes more and more important for applications. For example patient names, SSN, address-data or credit card-numbers etc..

Cache supports different flavors of encryption. Block-level database encryption and data-element encryption. The block-level database encryption protects an entire database.  The decryption/encryption is done when a block is written/read to or from the database and has very little impact on the performance.

With data-element encryption only certain data-fields are encrypted.  Fields that contain sensitive data like patient data or credit-card numbers. Data-element encryption is also useful if a re-encryption is required periodically. With data-element encryption it is the responsibility of the application to encrypt/decrypt the data.

Both encryption methods leverage the managed key encryption infrastructure of Caché.

The following article describes a sample use-case where data-element encryption is used to encrypt person data.  

But what if you have hundreds of thousands of records with an encrypted datafield and you have the need to search that field? Decryption of the field-values prior to the search is not an option. What about indices?

This article describes a possible solution and develops step-by-step a small example how you can use SQL and indices to search encrypted fields. 

Last comment 16 March 2017
+ 4   0 1
759

views

+ 4

rating

Hi!

Consider I have a class Package.Data with Property UniqueStringValue as %String.

I introduced the Index for this property:

 Index ValueIndex on UniqueStringValue [Unique];

It works well.  But if I try to check if there is an object with the certain value in code like this:

if ##class(Package.Data).ValueIndexExists(value)  

this expression fails, if value="value", even if there is an instance with instance.UniqueStingValue="Value"

How can I set the index to prevent saving case sensitive values in this class?

Last answer 27 February 2017 Last comment 28 February 2017
0   0 3
228

views

0

rating

I have a class which defines a property as array of %String. Is it possible to index values of this property and use this property in SQL?

I have tried 'Index idx On prop(ELEMENTS)' and then a select from the generated collection table, but this is still orders of magnitude slower than queries to the containing class.

Last answer 29 November 2016 Last comment 30 November 2016
+ 1   0 2
240

views

+ 1

rating

Hello Fellow Cache Developers:  

Has anyone ever created an index on values of a list property?   If so, would you be willing to share an example?

Also, feel free to offer input and suggestions regarding use of indexes on List values.

Here is my database scenario:

Parent Class:

PropertyA - %String  

PropertyB - %Integer 

Child Class:

PropertyC - %Integer

PropertyD - list of %Integer

Data illustration:

PropertyA + PropertyB   maps to one or more occurrences of [PropertyC + list of values for PropertyD]

ABC + 90001000    maps to  6600012113 + list (86001000, 86982277, 86982271)

ABC + 90001000    maps to  6600087886 + list (84002300, 82911287)

ABC + 90001000    maps to  6410203911 + list (53010002, 53028878, 10099210, 90098876, 19029993)

ABC + 90001000    maps to  6922300119 + list (90998900

Last answer 31 August 2016 Last comment 31 August 2016
0   0 1
310

views

0

rating

Earlier in this series, we've presented four different demo applications for iKnow, illustrating how its unique bottom-up approach allows users to explore the concepts and context of their unstructured data and then leverage these insights to implement real-world use cases. We started small and simple with core exploration through the Knowledge Portal, then organized our records according to content with the Set Analysis Demoorganized our domain knowledge using the Dictionary Builder Demo and finally build complex rules to extract nontrivial patterns from text with the Rules Builder Demo.

This time, we'll dive into a different area of the iKnow feature set: iFind. Where iKnow's core APIs are all about exploration and leveraging those results programmatically in applications and analytics, iFind is focused specifically on search scenarios in a pure SQL context. We'll be presenting a simple search portal implemented in Zen that showcases iFind's main features.

Last comment 28 June 2016
+ 7   0 6
552

views

+ 7

rating