Henry Pereira · Dec 1, 2021 5m read

Data anonymization, introducing iris-Disguise

First of all, what is data anonymization?

According to Wikipedia:

Data anonymization is a type of information sanitization whose intent is privacy protection. It is the process of removing personally identifiable information from data sets, so that the people whom the data describe remain anonymous.

In other words, the data anonymization is a process that retains the data but keeps the source anonymous.
Depending on the adopted anonymization technique the data is redacted, masked or substituted.

And that is the purpose of iris-Disguise, to provide a set of anonymization tools.

You can use in two different ways, by method execution or specify your anonymization strategy inside the persistent class definition itself.

The current version of iris-Disguise offers 6 strategies to anonymize data:

  • Destruction
  • Scramble
  • Shuffling
  • Partial Masking
  • Randomization
  • Faking

Let me explain each strategy, I will show a method execution with an example and as mentioned, I'll also show how to apply inside the persistent class definition.
To use iris-Disguise in this way you need to "wear a disguise glasses".
In the persistent class, you can extent the dc.Disguise.Glasses class and change any property with the data type with the strategy of your choice.
After that, at any moment, just call the DisguiseProcess method on the class. All the values will be replaced using the strategy of the data type.

So buckle up and let's go.


This strategy will replace a entire column with a word ('CONFIDENTIAL' is the default).

Do ##class(dc.Disguise.Strategy).Destruction("classname", "propertyname", "Word to replace")

The third parameter is optional. If not provided, the word 'CONFIDENTIAL' will be used.

Class packageSample.FictionalCharacter Extends (%Persistent, dc.Disguise.Glasses)
Property Name As dc.Disguise.DataTypes.String(FieldStrategy = "DESTRUCTION");
Do ##class(packageSample.FictionalCharacter).DisguiseProcess()



This strategy will scrambling all characters in a property.

Do ##class(dc.Disguise.Strategy).Scramble("classname", "propertyname")

Class packageSample.FictionalCharacter Extends (%Persistent, dc.Disguise.Glasses)
Property Name As dc.Disguise.DataTypes.String(FieldStrategy = "SCRAMBLE");
Do ##class(packageSample.FictionalCharacter).DisguiseProcess()



Shuffling will rearrange all values in a given property. Is not a masking strategy because it works "verticaly".
This strategy is useful for relatinship because referential integrity will be kept.
Until this version, this method only works on one-to-many relationships.

Do ##class(dc.Disguise.Strategy).Shuffling("classname", "propertyname")

Class packageSample.FictionalCharacter Extends (%Persistent, dc.Disguise.Glasses)
Property Name As %String;
Property Weapon As dc.Disguise.DataTypes.String(FieldStrategy = "SHUFFLING");
Do ##class(packageSample.FictionalCharacter).DisguiseProcess()


Partial Masking

This strategy will obfuscate the part of data, a credit card number for example, can be replaced by 456X XXXX XXXX X783

Do ##class(dc.Disguise.Strategy).PartialMasking("classname", "propertyname", prefixLength, suffixLength, "mask")

PrefixLength, suffixLength and mask are optional. If not provided, the default values will be used.

Class packageSample.FictionalCharacter Extends (%Persistent, dc.Disguise.Glasses)
Property Name As %String;
Property SSN As dc.Disguise.DataTypes.PartialMaskString(prefixLength = 2, suffixLength = 2);
Property Weapon As %String;
Do ##class(packageSample.FictionalCharacter).DisguiseProcess()



This strategy will generate purely random data. There are three types of randomization: integer, numeric and date.

Do ##class(dc.Disguise.Strategy).Randomization("classname", "propertyname", "type", from, to)

type: "integer", "numeric" or "date". "integer" is the default.

from and to are optional. Is to define the range of randomization.
For integer type the default range is 1 to 100. For numeric type the default range is 1.00 to 100.00.

Class packageSample.FictionalCharacter Extends (%Persistent, dc.Disguise.Glasses)
Property Name As %String;
Property Age As dc.Disguise.DataTypes.RandomInteger(MINVAL = 10, MAXVAL = 25);
Property SSN As %String;
Property Weapon As %String;
Do ##class(packageSample.FictionalCharacter).DisguiseProcess()


Fake Data

The idea of Faking is to replace data with random but plausible values.
iris-Disguise provides a small set of methods to generate fake data.

Do ##class(dc.Disguise.Strategy).Fake("classname", "propertyname", "type")

type: "firstname", "lastname", "fullname", "company", "country", "city" and "email"

Class packageSample.FictionalCharacter Extends (%Persistent, dc.Disguise.Glasses)
Property Name As dc.Disguise.DataTypes.FakeString(FieldStrategy = "FIRSTNAME");
Property Age As %Integer;
Property SSN As %String;
Property Weapon As %String;
Do ##class(packageSample.FictionalCharacter).DisguiseProcess()


 I want to hear from you!

Feedback and ideas are welcome!

Let me know what you think of this tool, how it fits your needs and what features are missing.

And I want to say a very special thanks to @Henrique Dias, @Oliver Wilms, @Robert Cemper, @Yuri Marx and @Evgeny Shvarov that commented, reviewed, suggested and made rich discussions which inspired me to create and improve the iris-Disguise.

2 727
Discussion (11)1
Log in or sign up to continue

Nice app, @Henry Pereira !

If I have the property Name that contains full names, and I need to Fake it, I should:

0. Install ZPM module iris-disquise

1. Change the datatype to dc.Disguise.DataTypes.FakeString("FULLNAME")

2.  Run the ##class(dc.Disguise.Strategy).Fake("myclass", "Name", "fullname")


I'm curious, why should I repeat fullname 2 times? Maybe I could omit the strategy at least in the Fake method assuming that strategy can be taken from the property?

If you change the property datatype and extend your class from dc.Disguise.Glasses you only need to run the method DisguiseProcess from your class.

The strategy will be taken from property datatype.

An alternative way is run ##class(dc.Disguise.Strategy).Fake only, in this way you don't need to change the datatype from porperty.
Thanks @Evgeny Shvarov 

Great article. thank you!!

I moved my reply / question because I think it was in the wrong spot.

One of the best features I've run across in these types of tools is the ability to remember the anonymizations that were applied to a message. For example, you would define the MRN and Encounter Number fields as key fields for an HL7 message type and whatever transformations were applied to the first message with those keys will be applied to any of them that follow. It is certainly a non-trivial feature to implement, but it gives you the ability to keep a patient encounter intact as the visit goes from admit to discharge. This scares out a lot of problems before go-live.

Regardless, nice work and thank you for your efforts.

I understand how to extend the property on a dataset class / property.  Would there be a way you could use this in the Ens.DTL?  So that as HL7 messages are coming in I can apply these method fake, destruct, and random on EnsLib.HL7.Message?

I'm not sure if it will work in the Ens.DTL... But I love the idea.
I'll try to use it in DTL(I don't know Ens.DTL very well), if it doesn't work sounds like a great feature to implement

Thanks for the feedback.  When I get some free time I will also try to see what can be done with the HL7 implementation.