Article
· Feb 20, 2017 14m read

Making encrypted datafields SQL-searchable

Overview

Encryption of sensitive data becomes more and more important for applications. For example patient names, SSN, address-data or credit card-numbers etc..

Cache supports different flavors of encryption. Block-level database encryption and data-element encryption. The block-level database encryption protects an entire database.  The decryption/encryption is done when a block is written/read to or from the database and has very little impact on the performance.

With data-element encryption only certain data-fields are encrypted.  Fields that contain sensitive data like patient data or credit-card numbers. Data-element encryption is also useful if a re-encryption is required periodically. With data-element encryption it is the responsibility of the application to encrypt/decrypt the data.

Both encryption methods leverage the managed key encryption infrastructure of Caché.

The following article describes a sample use-case where data-element encryption is used to encrypt person data.  

But what if you have hundreds of thousands of records with an encrypted datafield and you have the need to search that field? Decryption of the field-values prior to the search is not an option. What about indices?

This article describes a possible solution and develops step-by-step a small example how you can use SQL and indices to search encrypted fields. 

Introduction

Cache provides a variety of methods to perform data-element encryption, hashing or message authentication code generation. These are all defined in the %SYSTEM.Encryption class. 

For example if you want to encrypt a datafield, you can either use the managed key infrastructure of Caché and use %SYSTEM.Encryption.AESCBCManagedKeyEncrypt(Plaintext As %String, KeyID As %String) or you can use

%SYSTEM.Encryption.AESCBCEncrypt(plaintext As %String, key As %String, IV As %String) . In the former case you only have to provide the plain text and the Key-Id of the encryption key you want to use. In the latter you have to provide the plain text, the key itself and a so-called initialization vector.

Both methods implement a Cipher Block Chained (CBC) encryption. That means that prior to the encryption of a plain text block this plain text is XORed with the cipher text of the previous block. If there is no previous block the initialization vector comes into play. 

One advantage of those CBC encryption methods is that two subsequent encryption-method calls for the same plain-text result in different cipher texts. In this sense the encryption is not deterministic. That also means that the cipher texts are not suitable for indexing. How can we resolve this?

Building a simple class

Start Atelier or Studio and create a class DC.Person. The class inherits from %Persistent.

To keep it simple the class only contains properties LastName, FirstName, DOB. The values of the property LastName will be encrypted. 

It will also contain an implementation of the %OnNew()-callback and a helper method GenerateData(pRowCount). GenerateData will generate test data. By default if not specified otherwise it will create 100 rows.

Below you find the class-code:

Class DC.Person extends %Persistent {

Property LastName as %String(MAXLEN = "");

Property FirstName as %String;

Property DOB as %Date;

Method %OnNew(pLastName as %String = "", pFirstName as %String = "", pDOB as %Date = "") as %Status [ Private, ServerOnly = 1 ] 
{
    try {
        if pLastName = "" {
            set ..LastName = ##class(%PopulateUtils).LastName()
        }
        else {
            set ..LastName = pLastName
        }
        if pFirstName = "" {
            set ..FirstName = ##class(%PopulateUtils).FirstName()
        }
        else {
            set ..FirstName = pFirstName
        }
        if pDOB = "" {
            set ..DOB = ##class(%PopulateUtils).Date()
        }
        else {
            set ..DOB = pDOB
        }
    }
    catch tEx {
        write !,tEx.DisplayString()
    }
    
    return $$$OK
}

ClassMethod GenerateData(pRowCount as %Integer = 100) as %Integer 
{
#dim tCounter as %Integer = 0
    
    for i = 1:1:pRowCount {
        set tSc = ..%New().%Save()
        if tSc set tCounter = tCounter + 1
    }
    
    return tCounter
}
 
}

Please note that the property LastName sets the MAXLEN = "". The default MAXLEN for %String properties is 50. This will not be sufficient for an encrypted string value here. 

Compile the class and test the GenerateData method by starting a terminal and entering the following code at the command-prompt:

write ##class(DC.Person).GenerateData()

This should generate 100 test records. You can verify this in the System Management Portal. Make sure you are connected to the correct namespace. Then start:
 "System Explorer" -> "SQL" and type in the following SELECT-statements:

select count(*) from dc.person

and

select * from dc.person

You should see your 100 test records. A sample result should look like this:

In the next step we will modify our class to store the values of LastName as encrypted strings.

Adding encryption

In our example we will use the manged key infrastructure of Caché. So please use the System Management Portal to create and activate an Encryption Key .  In the System management Portal go to "System Administration" -> "Encryption" -> "Create New Encryption Key File".

Fill out the form in the upcoming page and save the key file.

Once you have created the key file, please activate it via the System Management Portal. Please go to "System Administration" -> "Encryption" -> "Data Element Encryption"

Click "Activate Key" and then "Browse" in the upcoming form. Select your key file enter the Key Adminstrator name and the password you had entered during the key creation. In case of a successful activation the activated key is shown as below:

Next we will use the encryption key and the available encryption API to encrypt the lastnames in our person class.

In our example we will use the %SYSTEM.Encryption.AESCBCManagedKeyEncrypt(Plaintext As %String, KeyID As %String) method.

This method expects two parameters. The plaintext and the ID of the encryption key we want to use. The %SYSTEM.Encryption class also provides a method which returns the IDs of the activated encryption key. This is done by the method ListEncryptionKeys. This method returns a comma-separated list of the activated Key-IDs.

So we have everything we need available to develop a method Encypt which receives a plain-text string as a parameter and returns a cipher-text. Our method code looks like this:

Method Encrypt(pVal As %String) As %String [ Private ]
{
#dim tCipher as %String = ""

    try {
        set tKeyId = $system.Encryption.ListEncryptionKeys()
        set tCipher = $system.Encryption.AESCBCManagedKeyEncrypt(pVal,tKeyId)
    }
    catch tEx {
        write !,tEx.DisplayString()
    }
    
    return tCipher
}

In the code above we assume that only one encryption key is active.
We add this method code to our class and modify the %OnNew method so that it uses the Encrypt method (see the changes in blue color) :

Method %OnNew(pLastName as %String = "", pFirstName as %String = "", pDOB as %Date = "") as %Status [ Private, ServerOnly = 1 ] 
{
#dim tName as %String

    try {
        if pLastName = "" {
            set tName = ##class(%PopulateUtils).LastName()
        }
        else {
            set tName = pLastName
        }
        if pFirstName = "" {
            set ..FirstName = ##class(%PopulateUtils).FirstName()
        }
        else {
            set ..FirstName = pFirstName
        }
        if pDOB = "" {
            set ..DOB = ##class(%PopulateUtils).Date()
        }
        else {
            set ..DOB = pDOB
        }
        set ..LastName = ..Encrypt(tName)
    }
    catch tEx {
        write !,tEx.DisplayString()
    }
    
    return $$$OK
}

Now let's delete the previously generated test data and run the GenerateData method again. To delete the test-data simply delete either the global ^DC.PersonD or run a SQL delete statement

delete from dc.person

Then run the GenerateData method again and check the results as above. You should see something like this:

You don't see the lastnames in plain text any longer. If you look into the global ^DC.PersonD you will also see only the encrypted values.

To retrieve the plain-text from the encrypted data you have to use the corresponding decryption-method

%SYSTEM.Encryption.AESCBCManagedKeyDecrypt(Ciphertext As %String).

But how can I query the encrypted data without decrypting the data first? We will implement this in the next section.

Adding an index for SQL

As stated above the cipher-texts are not suitable for indexing. This is due to the non determistic results of the encryption. I.E. if you encrypt the same plain-text string twice you will get different cipher-texts.

So we need something that is deterministic. You can either use a hash or a MAC (Message Authentication Code) for this purpose. Caché supports both types of hashing. If you hash the same value twice you 'll get the same result. So you could ask, why don't we use the hash or MAC for the data field as well. Hashes or MACs are not reversible. That means you can't create the plain text from the hash or MAC.

So the idea is to use hash values or MACs to build the index.  This is easy. Add an redundant field hashname to your class which is used to store the hash or MAC and build an index on that field.
This helps in situations in which the decrypted values of a field are either unique or you don't use wildcards in your queries.

Something like this will work:

Select id, DOB from DC.Person where hashname = '<hashvalue for Smith>'

This approach has some disadvantages. You can't run queries similiar to the one below:

Select id, DOB from DC.Person where name like 'Sm%'

And you have to add redundancy (the hashname field). So let's look for a different approach.

A possible solution here is to consider the name field as a collection with the collection values 'S',  'Sm',  'Smi', 'Smit', 'Smith' and to define an index on that collection. But LastName really is just a normal %String property and not a collection. How can I build this index?
The answer is easy. If you define an index on a collection field you can have control on how this index is built. To do this you have to implement a method with the name <propertyname>BuildValueArray(pName As %String, ByRef pNameArray As %String) . In our case it is LastNameBuildValueArray(pName As %String, ByRef pNameArray As %String).

ClassMethod LastNameBuildValueArray(pName As %String, ByRef pNameArray As %String) As %Status 
{

}

The parameter pName contains the atcual value of LastName. The ref parameter pNameArray returns an array with the "collection" values of LastName.

So let's define a collection index on LastName. This index definition looks like this:

Index idxLastName On LastName(elements);

This looks similar to "normal" index definitions. The key-word elements within the parenthesis specifies that we want to index the collection elements. Plese see the documentation for more details.

Next let's implement the LastNameBuildValueArray() method. It could look like this:

ClassMethod LastNameBuildValueArray(pName As %String, ByRef pNameArray As %String) As %Status 
{
#dim tName as %String
#dim tRet as %Status = $$$OK

    try {
    set tName = $system.Encryption.AESCBCManagedKeyDecrypt(pName)
    set tKey = $system.Encryption.ListEncryptionKeys()
    for i=$length(tName):-1:1 {
        set pNameArray(i) = $system.Encryption.Base64Encode($system.Encryption.HMACSHA1($extract(tName,1,i),tKey))
        }
    }
    catch tEx {
        set tRet = tEx.AsStatus()
    }
    
    return tRet
}

Let's take a quick look. The parameter pName contains the actual value of LastName. This means it contains the encrypted value. This is not suitable here. So we have to decrypt the value first:

set tName = $system.Encryption.AESCBCManagedKeyDecrypt(pName)

We then can use the plain-text value in tName to "construct" our collection values "S", "Sm", "Smi", "Smit", "Smith", create the MAC and add an entry to the output array pNameArray.

for i=$length(tName):-1:1 {
    set pNameArray(i) = $system.Encryption.Base64Encode($system.Encryption.HMACSHA1($extract(tName,1,i),tKey))
}

Delete your test-data as described above and regenerate it. This will also create the index.

Searching 

The last step missing is how to run SQL queries against that table and use LastName as a search condition.
Since we have indexed the MACs we have to convert our search value to a MAC. To do this we define a stored procedure which returns the MAC for a plain-text value:

ClassMethod MakeSearchKey(pVal) As %String [ SqlName = MakeSearchKey, SqlProc ]
{
    set tKey = $system.Encryption.ListEncryptionKeys()
    return $system.Encryption.Base64Encode($system.Encryption.HMACSHA1(pVal,tKey))
}

It's important that you use the same secrect-key in this method as you did for indexing.

Important: To prevent unauthorized users from using this stored procedure you should use the SQL security/privilege system. This is also true for access to the LastName property. If a user is not allowed to decrypt it, its probably not neccessary that he can see it at all. 

Before we now run our searches against our data let's add a calculated property decLastName to our class-definition. This property returns the decrypted value of LastName and is only for debugging purposes to verify that our query returns the correct results. 

Please find below the definition of decLastName

Property decLastName As %String [ Calculated, SqlComputeCode = {set {*} = $system.Encryption.AESCBCManagedKeyDecrypt({LastName})}, SqlComputed ];

So let's query our table and look for Persons with the LastName like 'S%':

SELECT * 
FROM   dc.person 
WHERE  for some %element(lastname)(%value= 
       ( 
              SELECT dc.makesearchkey('S')))

If you run this query in the System-Management-Portal you should see an output similiar to the one below:

Note that we use a Caché SQL extension (for some %element()...) in the query. This is a predicate condition especially for searches on collection properties/fields. You can find details on this in the documentation.

Please keep in mind that the decLastName column is just there to verify that the query returns the correct results. Let's search for LastName like 'Si%' and modify the statement accordingly:

SELECT * 
FROM   dc.person 
WHERE  for some %element(lastname)(%value= 
       ( 
              SELECT dc.makesearchkey('Si')))

This is the output:

Please take a look at the query-plan and you will see that our index idxLastName is used during the query execution.

Below you find the full class listing:

Class DC.Person extends %Persistent 
{

Property LastName as %String(MAXLEN = "");

Property FirstName as %String;

Property DOB as %Date;

Property decLastName As %String [ Calculated, SqlComputeCode = {set {*} = $system.Encryption.AESCBCManagedKeyDecrypt({LastName})}, SqlComputed ];

Index idxLastName on LastName(elements);

Method %OnNew(pLastName as %String = "", pFirstName as %String = "", pDOB as %Date = "") as %Status [ Private, ServerOnly = 1 ] 
{
#dim tName as %String

    try {
        if pLastName = "" {
            set tName = ##class(%PopulateUtils).LastName()
        }
        else {
            set tName = pLastName
        }
        if pFirstName = "" {
            set ..FirstName = ##class(%PopulateUtils).FirstName()
        }
        else {
            set ..FirstName = pFirstName
        }
        if pDOB = "" {
            set ..DOB = ##class(%PopulateUtils).Date()
        }
        else {
            set ..DOB = pDOB
        }
        set ..LastName = ..Encrypt(tName)
    }
    catch tEx {
        write !,tEx.DisplayString()
    }
    
    return $$$OK
}

Method Encrypt(pVal As %String) As %String [ Private ]
{
#dim tCipher as %String = ""

    try {
        set tKeyId = $system.Encryption.ListEncryptionKeys()
        set tCipher = $system.Encryption.AESCBCManagedKeyEncrypt(pVal,tKeyId)
    }
    catch tEx {
        write !,tEx.DisplayString()
    }
    
    return tCipher
}

ClassMethod GenerateData(pRowCount as %Integer = 100) as %Integer 
{
#dim tCounter as %Integer = 0
    
    for i = 1:1:pRowCount {
        set tSc = ..%New().%Save()
        if tSc set tCounter = tCounter + 1
    }
    
    return tCounter
}
 
ClassMethod MakeSearchKey(pVal) As %String [ SqlName = MakeSearchKey, SqlProc ]
{
    set tKey = $system.Encryption.ListEncryptionKeys()
    return $system.Encryption.Base64Encode($system.Encryption.HMACSHA1(pVal,tKey))
}
 
ClassMethod LastNameBuildValueArray(pName As %String, ByRef pNameArray As %String) As %Status 
{
#dim tName as %String
#dim tRet as %Status = $$$OK

    try {
    set tName = $system.Encryption.AESCBCManagedKeyDecrypt(pName)
    set tKey = $system.Encryption.ListEncryptionKeys()
    for i=$length(tName):-1:1 {
        set pNameArray(i) = $system.Encryption.Base64Encode($system.Encryption.HMACSHA1($extract(tName,1,i),tKey))
        }
    }
    catch tEx {
        set tRet = tEx.AsStatus()
    }
    
    return tRet
}

}

Summary

We have created a class DC.Person which contains a property LastName. The property values are encrypted and only the encrypted values are stored in the database. To make this property SQL-searchable without the need to decrypt it prior to the search we have defined an index which contains hashed values. Nowhere neither in the data-global nor in the index-global we store the plain-text values.
It is not necessary  to decrypt the field values prior to the search even if you use the encrypted field in your where-clause.

Granting access to the column LastName and the stored procedure MakeSearchKey only to users who are allowed to decrypt the names might be an option to make the data more secure. 

Discussion (9)0
Log in or sign up to continue

After Michael's post originally appeared on the intersystems-public-cache Google Group (as a result of the automatic crossposting being done by a username intersystems.dc) my colleague George James responded in the Google Group (GG). However, whatever mechanism the user intersystems.dc has set up for crossposting only handles the initial DC post, and nor does it feed GG responses back to DC. So I'm re-posting George's response here where I think it will get a wider audience.

George wrote:

It seems to me that your ixdLastName index might be usable as some kind of rainbow table to attack the data contained in the AES encrypted field.
 

If I were able to perform a chosen-plaintext attack then querying with like 'J%', then 'Ja%', then 'Jam%' would trivially discover where my name was in the database. 

 
Have you carried out a cryptographic analysis of the strength of this approach?  Logically it must be weaker than just AES on its own.  My question is how much weaker?

You should use Caché security features to prevent unauthorized access to the index global and the code etc. I didn't use a standard hash like SHA-1 or SHA-256 and used a "salted" hash to make it harder to use standard rainbow tables.

I didn't claim that it is 100% safe. As long as you get access to the hashes and the MakeSearchField function there is a potential risk.

Maybe is obvious to people but the downside to encryption is that if you lose your encryption keys, you lose all of your encrypted data. That would be ironic if the reason you encrypted your data was to prevent data loss and theft. This fear of data loss has been the key driver for not using key-based encryption within our applications - even though there are good business reasons for doing so.