One of the most consistent questions we received about the Waterline Data Catalog is “what is data fingerprinting and how does it work?”  Data fingerprinting is the idea that a column of data has a signature, or a fingerprint, and that by examining the data values in a column of data, we can identify what that data is and determine two things: 1.) What other columns share this same fingerprint? and 2.) What is the business term or label that can be connected to this data?

To address the second question, connecting a business term to an unlabeled or mislabeled column of data, Waterline Data Fingerprinting can do this very well for lots of business terms, but to improve the match accuracy for new terms, the fingerprinting system has to be trained. For example, it knows what a first name or last name is, or what a credit card number is, but it doesn’t know what a “Claim Number” is for ACME Insurance. That is because the format of a claim number would be unique to ACME.  However, once a knowledgeable business user or data steward tags just one column as a “Claim Number,” the system learns and now knows what a claim number is and that tag or business term gets propagated automatically to all of the other unlabeled columns of data that have the same characteristics or fingerprint.

