Data Scraping Attacks: Is Something Big on the Horizon?

Posted by Joe Hannan on May 17, 2021 12:15:00 PM
Joe Hannan
Find me on:

GoVanguard-Blog Post 4-Graphic

Don’t call it a leak. At least that’s what LinkedIn, Facebook, and most recently, Clubhouse would prefer. 

Caught in the middle of this semantics game are the data from millions of social media users, who are now at increased risk of falling victim to cybercrime. No matter what you call this emergent threat, it matters little to cybercriminals, for whom pilfering user data is a lucrative business. In each of these attacks, the criminals got what they came for, and they will most certainly exploit it. 


These recent attacks also raise some bigger questions: Are social media platforms obligated to better protect their users? And what should the millions of people affected by data scraping do to prepare for the coming wave of attacks? 


leak by any other name 

 The aforementioned social media companies don’t want us to use the L-word. This is a linguistic dodge. User data was lost, and threat actors obtained that data by means that are illegal and/or violate acceptable use agreements. 


Nevertheless, it’s useful to distinguish among the methods of cybercriminals. In these three leaks, they used a technique called data scraping. In a data scraping attack, bots typically troll a website for content. This content can include visible data, or information stripped from HTML code. In a data breach, an unauthorized person or persons access private information.  


LinkedIn, Facebook, and Clubhouse have all made the case that they weren’t breached because their users voluntarily shared the pilfered information. In other words, if someone uses the data you supplied with nefarious intent, that’s on you, pal. Let’s explore the specifics of what happened on each platform. 


Giving up the goods 



Cybernews reported that in early April, the names, phone numbers, and email addresses of Facebook users appeared on a hacking forum. The user data came from 106 different countries and included 32 million Americans and 11 million Britons. 


Facebook took issue with Cybernews calling the breach a “leak.” In a blog post, Facebook defended itself saying that the data in question had been scraped -- not leaked -- in September 2019. In other words, the threat actors exploited a known loophole then and only publicized the data now. The data loss stemmed from a Facebook feature that formerly allowed users to find friends using their phone numbers. CNETfirst reported on this leak in 2019.  


While that may satisfy Facebook leadership, Ireland isn’t impressed with the defense. Its Data Protection Commission is investigating. 




LinkedIn suffered a similar fate in April and mounted a similar defense. Cybernews reported that data from a half-billion users is for sale on a hacking forum. For context, LinkedIn claims 756 million users. To prove their mettle, the cybercriminal behind the attack released 2 million pieces of user data as a teaser, hoping to entice another bad actor to pay the four-digit sum for complete access.  


Like Facebook, the LinkedIn data includes full names, emails, and phone numbers. Similarly, LinkedIn is electing to play the semantics game. The company made the case in a statement that this wasn’t a breach or a leak, since no private data got out. In other words, you supplied it, we’re denying it. 




Clubhouse, a synchronous audio chat platform, is the new kid on the block. Cybersecurity experts were already concerned about a feature that gives the app access to a user’s contacts. Then, 1.3 million users had their data scraped, Cybernews reported. As the company attempted to downplay the leak, a Cybernews investigation revealed that the company’s API lets just about anyone scrape user data. 


Clubhouse, in a tweet, even seemed to signal that they support scraping:  


“Clubhouse has not been breached or hacked. The data referred to is all public profile information from our app, which anyone can access via the app or our API.” 


So what’s the big deal? 

 On the surface, knowing a user’s full name, email address, or phone number seems innocuous. However, consider that this information constitutes some of the details a cybercriminal would use to impersonate you, or trick you into revealing more information, such as bank account details or a social security numberOr think of when you’re locked out of an account. Typically, this is the type of information an app or customer support representative will request to restore access. 


With this information in hand, it’s easier for cybercriminals to target you with social engineering and phishing attacks. In the case of a social engineering attack, they could use baiting or scareware to trick you into installing malware. In the case of a phishing attack, you might receive a phony, frightening email or text, allegedly from the IRS or a local court. Finally, a cybercriminal might attempt a spear phishing attack, such as impersonating your IT department. 


Password spraying is another tactic that a cybercriminal might employ, made easier with your basic information in hand. It simplifies the brute-force work of guessing or resetting a password. Furthermore, a cybercriminal could cross-check your contact information against published lists of breached passwords, gaining access to your social media and financial accounts, for example. 


Scraped data also comes in handy for identity theft. For example, let’s say John Smith loses his job and changes his employment status on LinkedIn. With scraped data in hand, including basic information such as an email address and phone number, a cybercriminal could then fraudulently claim John Smith’s unemployment benefits. 


Furthermorethe LinkedIn leak provides a fresh avenue of attack for bad actors. It’s now easier for cybercriminals to target companies, using individual employees as entry points. They might single out an employee and blackmail them into compromising a company server, for example. With most employees working from home – away from direct supervision – this attack angle might prove more fruitful. 


How to protect yourself 

The simplest solution is to make all your profiles private. And while you’re at it, be on high alert for any suspicious emails, texts, or direct messages on your social media accounts. If you run a business, you must train your employees on best cybersecurity practices, spotting phishing attempts, and maintaining password security. Considering the LinkedIn breach, mid- and upper-level management, as well as executive management, may become prime targets. 


When creating new social media accounts, or managing your existing accounts, think carefully about the information you allow the app to access. Err on the side of restricting access. 


Aside from these basic tactics, the leaks raise some larger questions. Chiefly, do social media platforms have an obligation to prevent data scrapingIdeally, yes. But legally, it’s a gray area. Scraping isn’t exactly legal, and many platforms expressly prohibit it. However, few platforms – as these leaks revealed – take meaningful steps to prevent it. 


This is an area in which we may see Congress intervene in the future. Until then, individuals must be conscious of the risks associated with venturing into social media. It’s a massive avenue of exposure – and one that’s only loosely regulated. 


If you’re serious about guarding your business from emergent threats such as these, a strong defense is your best offense. Employee training on best cybersecurity practices, as well as penetration testing, close potential avenues of attack before they can be exploited. Contact us today to get started. 


Topics: hack, cybersecurity, information security, privacy, Data Privacy, pentesting, penetration testing, data scraping

Recent Posts