The Dangers of Data Scraping: Do You Know What’s Out There?

The Dangers of Data Scraping: Do You Know What's Out There?

What is data scraping? How can it pose a threat to your enterprise and your employees? What information may exist outside your enterprise’s databases, and can hackers use that information to conduct cyberattacks? 

Data scraping refers to a computer program or bot that extracts human-readable data from another program, site, or platform. In other words, data scraping creates feeds of information for easy human parsing and analysis. Moreover, data scraping extracts human data, such as email addresses, phone numbers, shopping behaviors, and more. Often, this process is conflated with web scraping, which is a subset of data scraping that acquires data from websites specifically. Other common terms include web harvesting.  

In the end, this tool finds data for re-purposing for the web-scrapers’ own use. Why can this prove such a problem? 

 

Data Scraping and Cybersecurity 

Not all businesses that use these tools possess malevolent motives. Marketing companies, content creators, and UI designers often utilize these tools in their line of work. After all, the data collected via data scraping can facilitate processes such as web content creation, business intelligence, finding sales leads, conducting marketing or advertising research, and developing personalization. 

So like all tools, data scraping offers both profound benefits and serious challenges. However, these programs have received a negative connotation in more recent years; this reputation is far from unfounded. As recently as this week, several social media websites suffered a data breach due to data scraping.  

The problem stems from a few different challenges on both sides of the data transfer interaction. On the scraped users’ side, they often don’t know what information is being collected or that someone is aggregating their data in the first place. Meanwhile, scrapers may not configure the databases of collected information or secure them at all. The latter allows hackers of all calibers to access critical consumer and employee data. 

How to Secure Against Data Scraping

First, enterprises need to take legal action against data scrapers, warning them against the process (you can include the language in your terms of service). Other security procedures include blacklisting and whitelisting IP addresses, configuring access against scraping, and preventing hot linking. 

Endpoint security can offer several other tools against scraping, such as application control and data loss prevention. However, enterprises should also use data monitoring to evaluate what information could end up easily scraped. Further, it requires evaluation of third parties, including their access and their data interactions.

For those using these tools, it all comes down to securing your databases. Avoid public cloud databases, and configure the ones you do use properly. Password protect all databases, especially those containing aggregated information. Above all, it requires monitoring and security awareness. 

What is at Stake? 

Phishing Attacks

If hackers get their hands on the accumulated information created by web scraping, the possibilities of devastation prove limitless. For example, hackers could use this information to perfect their phishing attacks. First, phishers can learn which employees might be more susceptible to phishing attacks or who has the job titles they need to target. 

Further, data scraping can open the door to spear phishing attacks; hackers can learn the names of superiors, ongoing projects, trusted third parties, etc. Essentially, everything a hacker could need to craft their message to make it plausible and provoke the correct (rash and ill-informed) response in their victims. 

Password Cracking

Even if the password isn’t leaked directly, it is still enough for hackers to crack credentials and break through single factor (or in some cases multifactor) authentication protocols. Remember, your employees create passwords based on their interests, personal lives, and similar traits. All of these are available on social media and sometimes in employee biographies on your site. A savvy hacker can use this information to guess passwords, making a cyber attack all that much easier.  

Of course, unscrupulous data collectors may also collect the credentials of their targets, which means hackers simply need to access the database. This could cause huge problems on its face, both to the victims and to the company itself; it could seriously harm your reputation given the negativity attached to data scraping. Needless to say, do not collect credentials or payment information. 

You can learn more about defending against data scraping in our Endpoint Security Buyer’s Guide. Also, be sure to check out our Identity Management Buyer’s Guide for more on securing databases. 

 

Follow me

Ben Canner

Editor, Cybersecurity at Solutions Review
Ben Canner is an enterprise technology writer and analyst covering Identity Management, SIEM, Endpoint Protection, and Cybersecurity writ large. He holds a Bachelor of Arts Degree in English from Clark University in Worcester, MA. He previously worked as a corporate blogger and ghost writer. You can reach him via Twitter and LinkedIn.
Ben Canner
Follow me