Data anonymization, introducing iris-Disguise
First of all, what is data anonymization?
According to Wikipedia:
Data anonymization is a type of information sanitization whose intent is privacy protection. It is the process of removing personally identifiable information from data sets, so that the people whom the data describe remain anonymous.
In other words, the data anonymization is a process that retains the data but keeps the source anonymous.
Depending on the adopted anonymization technique the data is redacted, masked or substituted.
And that is the purpose of iris-Disguise, to provide a set of anonymization tools.
You can use in two different ways, by method execution or specify your anonymization strategy inside the persistent class definition itself.
The current version of iris-Disguise offers 6 strategies to anonymize data:
- Destruction
- Scramble
- Shuffling
- Partial Masking
- Randomization
- Faking
Let me explain each strategy, I will show a method execution with an example and as mentioned, I'll also show how to apply inside the persistent class definition.
To use iris-Disguise in this way you need to "wear a disguise glasses".
In the persistent class, you can extent the dc.Disguise.Glasses class and change any property with the data type with the strategy of your choice.
After that, at any moment, just call the DisguiseProcess method on the class. All the values will be replaced using the strategy of the data type.
So buckle up and let's go.
Destruction
This strategy will replace a entire column with a word ('CONFIDENTIAL' is the default).
Do ##class(dc.Disguise.Strategy).Destruction("classname", "propertyname", "Word to replace")
The third parameter is optional. If not provided, the word 'CONFIDENTIAL' will be used.
Class packageSample.FictionalCharacter Extends (%Persistent, dc.Disguise.Glasses)
{
Property Name As dc.Disguise.DataTypes.String(FieldStrategy = "DESTRUCTION");
}
Do ##class(packageSample.FictionalCharacter).DisguiseProcess()
Scramble
This strategy will scrambling all characters in a property.
Do ##class(dc.Disguise.Strategy).Scramble("classname", "propertyname")
Class packageSample.FictionalCharacter Extends (%Persistent, dc.Disguise.Glasses)
{
Property Name As dc.Disguise.DataTypes.String(FieldStrategy = "SCRAMBLE");
}
Do ##class(packageSample.FictionalCharacter).DisguiseProcess()
Shuffling
Shuffling will rearrange all values in a given property. Is not a masking strategy because it works "verticaly".
This strategy is useful for relatinship because referential integrity will be kept.
Until this version, this method only works on one-to-many relationships.
Do ##class(dc.Disguise.Strategy).Shuffling("classname", "propertyname")
Class packageSample.FictionalCharacter Extends (%Persistent, dc.Disguise.Glasses)
{
Property Name As %String;
Property Weapon As dc.Disguise.DataTypes.String(FieldStrategy = "SHUFFLING");
}
Do ##class(packageSample.FictionalCharacter).DisguiseProcess()
Partial Masking
This strategy will obfuscate the part of data, a credit card number for example, can be replaced by 456X XXXX XXXX X783
Do ##class(dc.Disguise.Strategy).PartialMasking("classname", "propertyname", prefixLength, suffixLength, "mask")
PrefixLength, suffixLength and mask are optional. If not provided, the default values will be used.
Class packageSample.FictionalCharacter Extends (%Persistent, dc.Disguise.Glasses)
{
Property Name As %String;
Property SSN As dc.Disguise.DataTypes.PartialMaskString(prefixLength = 2, suffixLength = 2);
Property Weapon As %String;
}
Do ##class(packageSample.FictionalCharacter).DisguiseProcess()
Randomization
This strategy will generate purely random data. There are three types of randomization: integer, numeric and date.
Do ##class(dc.Disguise.Strategy).Randomization("classname", "propertyname", "type", from, to)
type: "integer", "numeric" or "date". "integer" is the default.
from and to are optional. Is to define the range of randomization.
For integer type the default range is 1 to 100. For numeric type the default range is 1.00 to 100.00.
Class packageSample.FictionalCharacter Extends (%Persistent, dc.Disguise.Glasses)
{
Property Name As %String;
Property Age As dc.Disguise.DataTypes.RandomInteger(MINVAL = 10, MAXVAL = 25);
Property SSN As %String;
Property Weapon As %String;
}
Do ##class(packageSample.FictionalCharacter).DisguiseProcess()
Fake Data
The idea of Faking is to replace data with random but plausible values.
iris-Disguise provides a small set of methods to generate fake data.
Do ##class(dc.Disguise.Strategy).Fake("classname", "propertyname", "type")
type: "firstname", "lastname", "fullname", "company", "country", "city" and "email"
Class packageSample.FictionalCharacter Extends (%Persistent, dc.Disguise.Glasses)
{
Property Name As dc.Disguise.DataTypes.FakeString(FieldStrategy = "FIRSTNAME");
Property Age As %Integer;
Property SSN As %String;
Property Weapon As %String;
}
Do ##class(packageSample.FictionalCharacter).DisguiseProcess()
I want to hear from you!
Feedback and ideas are welcome!
Let me know what you think of this tool, how it fits your needs and what features are missing.
And I want to say a very special thanks to @Henrique Dias, @Oliver Wilms, @Robert Cemper, @Yuri Marx and @Evgeny Shvarov that commented, reviewed, suggested and made rich discussions which inspired me to create and improve the iris-Disguise.