Last month, Vinny Troia, the founder of Data Viper and Bob Diachenko, an independent cybersecurity consultant discovered a “wide-open” Elasticsearch server. The server exposed the personal information of about 1.2 billion unique users including their names, email addresses, phone numbers, LinkedIn and Facebook profile information.
The Elasticsearch server did not have any kind of authentication whatsoever and was accessible via a web browser. “No password or authentication of any kind was needed to access or download all of the data,” the report adds.
What the investigation on the Elasticsearch server revealed
Troia and Diachenko came across the Elasticsearch server while looking for exposures on the web scanning services BinaryEdge and Shodan. Upon further investigation, the researchers speculated that the data originated from two different data enrichment companies: People Data Labs and OxyData.io.
Data enrichment, as the name suggests, is a process of enhancing the existing raw data to make it useful for businesses. Data enrichment companies can provide access to large stores of data merged from multiple third-party sources, which enables businesses to gain deeper insights into their current and potential customers.
Elasticsearch stores its data in an index, which is similar to a ‘database’ in a relational database. The researchers found that the majority of the data spanned four separate data indexes, labeled “PDL” and “OXY”. Also, each user record was labeled with a “source” field that matched either PDL or Oxy, respectively.
After the researchers de-duplicated the nearly 3 billion user records with the PDL index, they found roughly 1.2 billion unique people and 650 million unique email addresses. These numbers matched with the statistics provided by the company on their website.
The data within the three PDL indexes included slightly varied information. While some focused on scraped LinkedIn information, email addresses and phone numbers, others included information on individual social media profiles such as a person’s Facebook, Twitter, and Github URLs. After analyzing the data under the OXY index, the researchers found scrape of LinkedIn data, including recruiter information.
What made the case confusing was that the Elasticsearch server was hosted on Google Cloud Services, while People Data Labs appears to be using Amazon Web Services. When contacted about the Elasticsearch server, both the companies denied that the server belonged to them.
In an interview with Wired, PDL co-founder Sean Thorne said, “The owner of this server likely used one of our enrichment products, along with a number of other data-enrichment or licensing services. Once a customer receives data from us, or any other data providers, the data is on their servers and the security is their responsibility. We perform free security audits, consultations, and workshops with the majority of our customers.”
This news sparked a discussion on Hacker News. While some users were stunned by the sheer negligence of leaving the Elasticsearch server wide-open, others were questioning the core business model of these companies.
A user commented, “It has to exist on a private network behind a firewall with ports open to application servers and other es nodes only. Running things on a public IP address is a choice that should not be taken lightly. Clustering over the public internet is not a thing with Elasticsearch (or similar products).”
“It’s a tragedy that all of this data was available to anyone in a public database instead of…. checks notes… available to anyone who was willing to sign up for a free account that allowed them 1,000 queries. It seems like PDL’s core business model is irresponsible regarding their stewardship of the data they’ve harvested,” another user added.
Read the full report on Data Viper’s official website.