Article
· Dec 1, 2021 5m read

Data anonymization, introducing iris-Disguise

freepik- freepik.com
First of all, what is data anonymization?

According to Wikipedia:

Data anonymization is a type of information sanitization whose intent is privacy protection. It is the process of removing personally identifiable information from data sets, so that the people whom the data describe remain anonymous.

In other words, the data anonymization is a process that retains the data but keeps the source anonymous.
Depending on the adopted anonymization technique the data is redacted, masked or substituted.

And that is the purpose of iris-Disguise, to provide a set of anonymization tools.

You can use in two different ways, by method execution or specify your anonymization strategy inside the persistent class definition itself.

The current version of iris-Disguise offers 6 strategies to anonymize data:

  • Destruction
  • Scramble
  • Shuffling
  • Partial Masking
  • Randomization
  • Faking

Let me explain each strategy, I will show a method execution with an example and as mentioned, I'll also show how to apply inside the persistent class definition.
To use iris-Disguise in this way you need to "wear a disguise glasses".
In the persistent class, you can extent the dc.Disguise.Glasses class and change any property with the data type with the strategy of your choice.
After that, at any moment, just call the DisguiseProcess method on the class. All the values will be replaced using the strategy of the data type.

So buckle up and let's go.

Destruction

This strategy will replace a entire column with a word ('CONFIDENTIAL' is the default).

Do ##class(dc.Disguise.Strategy).Destruction("classname", "propertyname", "Word to replace")

The third parameter is optional. If not provided, the word 'CONFIDENTIAL' will be used.

Class packageSample.FictionalCharacter Extends (%Persistent, dc.Disguise.Glasses)
{
Property Name As dc.Disguise.DataTypes.String(FieldStrategy = "DESTRUCTION");
}
Do ##class(packageSample.FictionalCharacter).DisguiseProcess()

1

Scramble

This strategy will scrambling all characters in a property.

Do ##class(dc.Disguise.Strategy).Scramble("classname", "propertyname")

Class packageSample.FictionalCharacter Extends (%Persistent, dc.Disguise.Glasses)
{
Property Name As dc.Disguise.DataTypes.String(FieldStrategy = "SCRAMBLE");
}
Do ##class(packageSample.FictionalCharacter).DisguiseProcess()

scramble

Shuffling

Shuffling will rearrange all values in a given property. Is not a masking strategy because it works "verticaly".
This strategy is useful for relatinship because referential integrity will be kept.
Until this version, this method only works on one-to-many relationships.

Do ##class(dc.Disguise.Strategy).Shuffling("classname", "propertyname")

Class packageSample.FictionalCharacter Extends (%Persistent, dc.Disguise.Glasses)
{
Property Name As %String;
Property Weapon As dc.Disguise.DataTypes.String(FieldStrategy = "SHUFFLING");
}
Do ##class(packageSample.FictionalCharacter).DisguiseProcess()

shuffling

Partial Masking

This strategy will obfuscate the part of data, a credit card number for example, can be replaced by 456X XXXX XXXX X783

Do ##class(dc.Disguise.Strategy).PartialMasking("classname", "propertyname", prefixLength, suffixLength, "mask")

PrefixLength, suffixLength and mask are optional. If not provided, the default values will be used.

Class packageSample.FictionalCharacter Extends (%Persistent, dc.Disguise.Glasses)
{
Property Name As %String;
Property SSN As dc.Disguise.DataTypes.PartialMaskString(prefixLength = 2, suffixLength = 2);
Property Weapon As %String;
}
Do ##class(packageSample.FictionalCharacter).DisguiseProcess()

partialmsk

Randomization

This strategy will generate purely random data. There are three types of randomization: integer, numeric and date.

Do ##class(dc.Disguise.Strategy).Randomization("classname", "propertyname", "type", from, to)

type: "integer", "numeric" or "date". "integer" is the default.

from and to are optional. Is to define the range of randomization.
For integer type the default range is 1 to 100. For numeric type the default range is 1.00 to 100.00.

Class packageSample.FictionalCharacter Extends (%Persistent, dc.Disguise.Glasses)
{
Property Name As %String;
Property Age As dc.Disguise.DataTypes.RandomInteger(MINVAL = 10, MAXVAL = 25);
Property SSN As %String;
Property Weapon As %String;
}
Do ##class(packageSample.FictionalCharacter).DisguiseProcess()

rand

Fake Data

The idea of Faking is to replace data with random but plausible values.
iris-Disguise provides a small set of methods to generate fake data.

Do ##class(dc.Disguise.Strategy).Fake("classname", "propertyname", "type")

type: "firstname", "lastname", "fullname", "company", "country", "city" and "email"

Class packageSample.FictionalCharacter Extends (%Persistent, dc.Disguise.Glasses)
{
Property Name As dc.Disguise.DataTypes.FakeString(FieldStrategy = "FIRSTNAME");
Property Age As %Integer;
Property SSN As %String;
Property Weapon As %String;
}
Do ##class(packageSample.FictionalCharacter).DisguiseProcess()

fake

 I want to hear from you!

Feedback and ideas are welcome!

Let me know what you think of this tool, how it fits your needs and what features are missing.

And I want to say a very special thanks to @Henrique Dias, @Oliver Wilms, @Robert Cemper, @Yuri Marx and @Evgeny Shvarov that commented, reviewed, suggested and made rich discussions which inspired me to create and improve the iris-Disguise.

Discussion (12)1
Log in or sign up to continue

Hi Henry,

Your video is now on InterSystems Developers YouTube:

⏯ Iris-Disguise

https://www.youtube.com/embed/Z4pxPIHVmBU
[This is an embedded link, but you cannot view embedded content directly on the site because you have declined the cookies necessary to access it. To view embedded content, you would need to accept all cookies in your Cookies Settings]


Thanks for your contribution!

Nice app, @Henry Pereira !

If I have the property Name that contains full names, and I need to Fake it, I should:

0. Install ZPM module iris-disquise

1. Change the datatype to dc.Disguise.DataTypes.FakeString("FULLNAME")

2.  Run the ##class(dc.Disguise.Strategy).Fake("myclass", "Name", "fullname")

Right?

I'm curious, why should I repeat fullname 2 times? Maybe I could omit the strategy at least in the Fake method assuming that strategy can be taken from the property?

One of the best features I've run across in these types of tools is the ability to remember the anonymizations that were applied to a message. For example, you would define the MRN and Encounter Number fields as key fields for an HL7 message type and whatever transformations were applied to the first message with those keys will be applied to any of them that follow. It is certainly a non-trivial feature to implement, but it gives you the ability to keep a patient encounter intact as the visit goes from admit to discharge. This scares out a lot of problems before go-live.

Regardless, nice work and thank you for your efforts.