Data Classification Phase

Now that the table structures containing technical information have been loaded, you can add classes containing application information that drives the data masking and data reduction process.

You assign classes as described in detail in your Data Manager documentation. In short, you can assign classes using:
  • The graphical interface of Data Manager to multi-select and assign classes
  • An external data dictionary
  • Physical or logical referential integrity, assuming that columns referenced in the same relationship share the same class
  • Column contents statistical distribution, assuming that columns having the same statistical distribution of their contents share the same class

The following examples use the data dictionary approach for defining classes for data masking. For data reduction, we use classes for referential integrity, showing both the two interfaces and the steps needed to execute the classification phase in the two cases. This shows you how to load classes from an external dictionary, AA201.DMANAGER.CLALIST, containing masking classes, and how to load classes from a referential integrity interface, AA201.DMANAGER.REFINTEG, containing subsetting classes.

Data Masking Class Assignment Example (using Data Dictionary)

The data masking class basically states that the columns NAME, SURNAME and ADDRESS from the CUSTOMER table are to be masked.

Data Reduction Example (using Referential Integrity)

This example shows acknowledgement of a reduction rule, also known as a method. This method is based on the Referential Integrity rules between the tables CUSTOMER and ACCOUNT, ACCOUNT and CCARD, and CCARD and OPERAT as shown in this simple data model:

Data Model

The table CUSTOMER and ACCOUNT are linked by a relationship based on the COD_CUS column. ACCOUNT is related to CCARD with multiple keys OFF_NUM and ACC_NUM. And, CCARD and OPERAT are linked with the keys CARD_TYPE and CARD_NUM.

You can generate both interfaces from an Excel spreadsheet by exporting them using the proper separator. You can alternatively generate the referential integrity from the DB2 catalog itself. Or, using a tool provided with Data Manager, you can generate the referential integrity from a DDL statement containing SQL commands that establish the referential integrity. You can generate this type of DDL statement using a data modeling tool such as the Erwin Data Modeler.

From the Work with Jobs window, you can run the load classification from the dictionary by specifying the BURECLR job:

Data Masking BURECLR Job Submission from Work with Jobs
Work with Jobs

Then load the corresponding input file necessary for BURECLR processing:

Secondary Options

In the same way, you can submit the load classification from referential integrity by specifying the BURDDUR job:

Data Reduction BURDDUR Job submission from Work with Jobs
Work with Jobs

Then load the corresponding input file:

Secondary Options

Data Builder - Work with Data Elements

After completing both processes, Data Manager proposes an "estimated class" for confirmation. Once confirmed, an estimated class becomes an "assigned class" and is then available for subsetting and masking. To create an assigned class, use the Data Builder Work with Data Elements dialog box, which allows column-level operation including the confirmation of class as shown here:

Work with Data Elements

The List of Data Elements shows the data elements contained within the tables that have been processed by the Data Masking and Data Reduction classes. At this point, the classes have not yet been confirmed. To confirm the classes, click Confirm Class. Once confirmed, this populates the Assigned Class column with the relevant class.