Solutions Review’s Expert Insights Series is a collection of contributed articles written by industry experts in enterprise software categories. Billy VanCannon of Baffle breaks down all current data security methods, helping remove much of the confusion.
Once isolated to the IT department, data privacy and security are now top priorities across the organization and the boardroom. Executives understand that protecting data has profound business implications, from maintaining compliance to securely analyzing data for market differentiation. However, there needs to be more clarity around which protection methods a company might implement based on their business needs.
Looking for a stand-alone data security platform? Learn about Baffle and more in our list of the Best Data Security Platforms to Consider!
Data Security: Removing the Confusion
For organizations to unravel the data privacy and security mystery, they must first understand the differences between various data protection methods and how to implement them. Those methods include data masking, data tokenization, and traditional data encryption. Let’s examine each method, its strengths and weaknesses, and appropriate use cases.
Definition: Data masking is a non-reversible obfuscation method that replaces valid data with similar characters with no actual value. Masking is very flexible. For example, every Social Security number (SSN) could be entirely replaced with a fixed value, like “**confidential**,” or each SSN could be replaced with random 9-digit numbers. Partial masking is also possible; for example, the first five digits could be replaced with an “X” and the last four left in the clear.
Companies use two types of masking: static data masking and dynamic data masking.
- Static data masking hides data at rest (i.e., in the data store) and data in use. It is typically used in non-production / lower environments, such as application or software development, testing, and training. The non-reversible nature of masking means that the original data is effectively destroyed in this environment. This is ideal if there is never any reason to restore the data, such as a replica of a production database.
- Dynamic data masking combines masking with role-based access control (RBAC). The data at rest is NOT protected, but the data in use is protected (based on RBAC). This enables the masking to vary depending on the application or user accessing the data. In the case of an SSN, the marketing department gets “**confidential**”; customer support gets “XXX-XX-1234” to identify the customer; and finance gets the whole SSN in the clear so they can invoice the customer. In this case, the data at rest has to be either plain text or protected using a reversible obfuscation because we don’t want to destroy the data entirely. Dynamic data masking protects data in use.
Strengths of data masking: Since data masking is irreversible, the data’s actual value cannot be revealed intentionally or accidentally. In instances when data is stored until the end of its life cycle — and does not need to be analyzed or shared — masking is an ideal data protection method. The flexibility of data masking helps organizations maintain compliance with various data privacy compliance regulations, such as the California Privacy Rights Act (CPRA), PCI DSS, GDPR, and HIPAA. Finally, data masking does not require extensive computing or additional storage because the data does not need to be unmasked.
Weaknesses of data masking: Static data masking runs the risk that the original data can’t be restored. With dynamic data masking, if someone gets access to the database, they can see the data in clear text, as the data is not kept in an anonymized format in the database.
Data masking use case: When dynamic data masking (DDM) is combined with encryption or tokenization of the data store to protect against direct data attacks at rest, the overall system is very secure [See Exhibit 1 below]. Companies can use DDM to meet compliance requirements such as PCI, which mandates that credit cards be encrypted at rest and at least partially masked (i.e., show the first six or seven and/or last four digits of a credit card number).
Definition: Tokenization is a reversible data obfuscation method that replaces sensitive data with random data— a “token.” One way to create tokens is to map the random data back to a secure, centralized look-up table, or “vault,” with the actual data. Authorized users can reference the look-up table to revert to the sensitive data. Format preserving encryption (FPE), which is “vaultless,” is a relatively new data protection method combining tokenization and encryption attributes. Using FPE, data is encrypted and decrypted using an encryption algorithm, like AES, but cipher text is the same length and format as the original data (in plain text).
Strengths of tokenization: Tokenization’s greatest strength is that the resulting tokens can be made to have the same data type and length as the original data. This way, applications can still display the data without error. Databases designed for the original data can use the tokens without disrupting the system.
Weaknesses of tokenization: With vaulted tokenization, there is a direct trade-off between security and processing power and storage. When tokenizing a 16-digit number, such as a credit card number, 1016 numbers are possible. Creating a vault for every possible combination would require more than 30 petabytes of storage. It is also important to note that the processing time to search a table that large can be restrictive. It is possible to reduce the vault size by only creating tokens as needed. However, conducting an extensive search is necessary to ensure each token is unique. Therefore, at some point, it will take non-trivial amounts of processing power to do this every time a token is created. It is also important to note that the vault itself now must be protected, effectively doubling the work of managing and ensuring compliance with the system. With FPE, any numerical data with six or more digits is highly secure, and performance will not vary with the quantity of data. However, vault management is replaced by the need to store and manage keys.
Tokenization use cases: Tokenization is ideal when the obfuscated data are required to have the same data type and length as the original data. This, combined with reversibility, means lower environments are often protected with tokenization instead of static masking.
Traditional Data Encryption
Definition: Traditional encryption uses a mathematical algorithm and a public key to encrypt sensitive data from plain text into seemingly random data, referred to as cipher text. The same key and algorithm are used in reverse to revert to plain text.
Strengths of traditional encryption: Traditional encryption is a highly effective obfuscation method that encodes data using a mathematical algorithm for maximum security. This makes it difficult for anyone who does not have the necessary decryption key to access the data. Traditional encryption also effectively handles low cardinality—the total number of values in a data set, where tokenization weakens. The cipher text can even be “randomized,” where even the same plain text has a different cipher text every time. Finally, the cipher text has metadata added to it, such as a hash-based message authentication code (HMAC), which adds integrity-checking capability. The metadata can even include a key ID for mapping, which can be used directly or for backup. The processing overhead required is often minimal in modern systems that pass the additional processing off to hardware specifically designed for that purpose, known as hardware acceleration.
Weakness of traditional encryption: A downside to traditional encryption is that the resulting cipher text length and type may be incompatible with the software and databases designed to use the plain text formatting. Another challenge is encryption key management, which is the same as FPE and arguably less burdensome than the vault in vaulted tokenization.
Traditional encryption use cases: When maximum security is required, and the cipher text data type and length do not have to match the plain text, traditional encryption is the way to go.
Final Thoughts on Data Security
As organizations gain a greater understanding of how to protect data to support today’s business priorities and the growing number of data privacy regulations, decision-makers will be better prepared to drastically reduce data risk and enjoy the ROI data can yield.
- Removing the Confusion Around Methods of Data Security - May 23, 2023