Ad Image

The Key to Enhanced Security? Fixing Data Upstream



Solutions Review’s Contributed Content Series is a collection of contributed articles written by thought leaders in enterprise software categories. Nicole Bucala of Comcast Technology Solutions examines how fixing your data lake upstream is the key to enhanced security.

Security organizations today are grappling with growing volumes of data but don’t have a cohesive global strategy to store, analyze, and manage it all cost-effectively. In addition, risk and detection analytics layered on current security datasets often yield false positives, so their end result cannot be trusted.

Over the past few years, organizations have deployed security orchestration, automation, and response (SOAR) solutions or AI/ML on top of their data to try to develop answers and respond to key security or compliance questions faster. Still, the responses that are put out by AI/ML algorithms or SOAR playbooks are only as good as the underlying data sets. So, what happens when the upstream data isn’t holistic, correct, or formatted in a way that can be properly interpreted by SOAR or AI/ML? It means that organizations can’t get a proper picture of what’s actually going on, and confusion ensues.

That’s leaving them behind in terms of cybersecurity readiness.

Download Link to SIEM Buyers Guide

Fixing the Data Lake Upstream

Six Steps to Improving Your Data

The good news? You can do several things to improve the quality of your organization’s underlying data set and develop a better strategy for managing security data at scale.

  1. Move to a security data lake architecture. Data lakes are increasingly used for all manners of enterprise data. Essentially, they are centralized repositories for storing, processing, and securing massive volumes of data in any form. This same approach can be applied to security. With a security data lake, be sure to select one with compression built in to keep costs down, and you’ll be able to store petabytes of data for as long as your organization recommends. The end result: your security data can be stored and used at a much-reduced cost compared to a security information and event management (SIEM) solution.
  2. Develop an analytics strategy that includes partners. A security data lake doesn’t replace a SIEM; rather, it lives in harmony with a SIEM. You’ll want to develop an analytics strategy to determine where and how data flows to your security data lake or SIEM, and you’ll want to supplement this data with feeds from a threat intelligence partner to ensure your security data is always enriched with the latest intel. You can send long-term forensics copies of many data sets to your data lake while sending only specific data sources and fields needed for more accurate SIEM detections to the SIEM. You can also send to your data lake large quantities of data sources that aren’t cost-effective to process in a SIEM at all.
  3. Use an ETL process to optimize your security data. To effectively use a data lake, you must first get the security data into a usable state. Most security data is not normalized at the source and requires transformation to be used effectively. Many security data utilize context that requires correlation to uncover actual incidents or anomalies. You’ll want to purchase or develop an extract, transform, and load (ETL) process and ensure that it includes automatic parsing, normalization, enrichment, and correlation logic.
  4. Identify business value context that you’ve always wanted to combine with security insights but have been unable to. Maybe it’s in the form of organization chart data, business revenue, or some other business data (i.e., non-security data source) that can make your security insights more meaningful. Determining the types of business context most important to your organization’s desired outcomes can help guide you on the right path and ensure you get the most impact out of your efforts. It also helps to provide a one-stop shop of data for multiple business units, granted the right governance processes are in place to limit sensitive data to those who are necessary.
  5. Align on a common schema in which your data will reside in that data lake. It is useful to use a common schema and explore more modern ones. Some experts predict that older schemas like elastic common schema and MITRE Cyber Analytics Repository (MITRE CAR) are expected to be eclipsed by the Open Cybersecurity Schema Framework (OCSF), which naturally contains logic to deliver user and entity relationships to make the data inherently more usable to a security analyst.
  6. Develop a governance program around your security data lake. You’ll want the buy-in of data source owners because it is essential that each data source owner feels their data is being rendered accurately in the dataset living in the data lake; in fact, it’s vital that owners still have control over their data and the teams who utilize it. You’ll want to develop a process so each data source owner can verify their data is accessible, integrated, and transparent.

Making Security Data Management More Feasible for the Future

By following the six steps above, you’ll likely find more cohesion across all the data. Your data should be cleaner, so you’ll have fewer false positives. Your AI/ML algorithms should work better, yielding results you have more confidence in. Data silos will be reduced, and team collaboration will be more fluid because analysis will be done on the same dataset. Importantly, you’ll find that as you switch to a security data lake architecture, you can de-duplicate storage across other toolsets and have a single source of truth. That said, truly bringing this to fruition requires a team of data scientists. That’s a major reason why we’ve traditionally seen only the most sophisticated of companies be able to develop approaches along these lines.

The good news is that innovation is springing up in pockets, and platforms and tools are being developed to tackle these tasks automatically, with less manual intervention. The ultimate goal of these forthcoming technologies is to make data more accessible and usable by analysts who do not have advanced data science skillsets. Of course, change isn’t going to happen overnight, but the key is to understand that a new approach with the potential to enable a better way to manage all the data is imminent.

Download Link to SIEM Buyers Guide


Nicole Bucala
Follow Her
Latest posts by Nicole Bucala (see all)

Share This

Related Posts

Udacity Cybersecurity Ad