Checklist 425: AI Privacy Labels and Lots of Lost Logins

May 30, 2025 • 5 min read

AI Chatbots Ranked by Data Collection: Meta Tops the List in New Surfshark Report

In a recent episode of The Checklist, the team dissected a new report from VPN provider Surfshark ranking 11 popular AI chatbots by how much user data they collect. While Surfshark’s VPN service itself is well-regarded, the hosts urge listeners to take this specific report with a grain of salt—especially given its reliance on Apple’s self-reported App Store Privacy Labels.

AI Chatbots Ranked by Data Harvesting

Surfshark analyzed data collection behaviors and found the following ranking—from least to most data types collected:

Jasper
Pi
X’s Grok
Perplexity
ChatGPT
DeepSeek (China)
Microsoft Copilot
Claude (Anthropic)
Poe (Quora)
Gemini (Google)
Meta AI (Facebook)

“Meta AI collects the most user data, with 32 out of 35 data types, which is more than double the average of 13.”

Surprising Standouts: DeepSeek and Grok

The hosts expressed surprise at DeepSeek’s low ranking (#6), especially given prior episodes (Checklists No. 412 and 416) highlighting its major security flaws. Citing ZDNet and Ars Technica, they reminded listeners:

DeepSeek reportedly forwarded sensitive user data to Chinese-government-operated firms (like China Mobile) and ByteDance (owner of TikTok).
It also transmitted sensitive data via unencrypted channels, leaving it vulnerable to interception.

Given that, its mid-tier placement raises skepticism—especially when Surfshark’s methodology relies almost entirely on Apple’s Privacy Labels and developer-supplied privacy policies.

Methodology and Its Limitations

Surfshark’s ranking methodology included:

Number of data types collected
Whether data is linked to identity
Inclusion of third-party advertisers
App Store Privacy Labels
Privacy policies (for ChatGPT and DeepSeek)

The issue? Privacy Labels are self-reported, with little oversight or enforcement. The podcast compared this to past security misfires like Clubhouse, once deemed secure by pCloud despite serious concerns—because of the same reliance on App Store disclosures.

Who’s Really Watching You?

The episode raised pressing questions about how much trust users should place in apps approved by Apple:

Apps in the App Store are unlikely to be malware—but that doesn’t mean they’re privacy-friendly.
Meta AI and Copilot are the only ones openly collecting identity-linked data for ad purposes, but the lack of accountability means others might too.

The hosts remind users that transparency ≠ safety. Meta and Google may appear more invasive—but at least they disclose their practices.

Don’t Be Passive About Privacy

Even if you’re only using ChatGPT via iOS 18’s anonymized Siri integration, it’s worth reading the App Privacy Labels. But do so critically—because what’s declared might not be all that’s collected.

“Keep an ear out for news on websites and podcasts like this one… and make as informed a decision as you’re able.”

Massive Data Leaks Underscore Urgent Need for Strong Passwords and Two-Factor Authentication

Cybersecurity is personal—and the math doesn’t lie: 364,000 plus 184 million equals your wake-up call to strengthen your digital defenses.

LexisNexis Breach: Sensitive Data on 364,000 People Exposed

In a report first covered by TechCrunch, LexisNexis Risk Solutions, a major data broker serving insurance firms and law enforcement, disclosed a breach impacting over 364,000 individuals.

The compromised data includes:

Full names
Dates of birth
Phone numbers
Email and mailing addresses
Social Security numbers
Driver’s license numbers

Ironically, the firm—dedicated to helping clients detect fraud and risk—became a risk itself. These are precisely the kinds of data used in identity theft, account hijacking, and other forms of cybercrime.

Mystery Database Leaks 184 Million Logins

Adding to the chaos is an unrelated but equally troubling revelation from Wired: security researcher Jeremiah Fowler found an exposed ElasticSearch database containing 184,162,718 records—including credentials linked to Apple, Facebook, Google, and even government accounts.

Fowler couldn’t identify who compiled the dataset. Speculation suggests it may be a stash from infostealer malware—either amassed by cybercriminals or researchers investigating breaches. Of the 10,000 records he tested, several users confirmed the data was authentic.

It’s Time to Get Proactive

While strong and unique passwords have long been a mantra from The Checklist and SecureMac, this episode emphasized that password strength alone is no longer enough—especially when malware captures credentials before they’re even encrypted.

Key Recommendations

Enable Two-Factor Authentication (2FA)
- Whether via app, hardware key, or biometrics, 2FA adds a critical barrier if your password gets leaked.
- Example: 23andMe only implemented mandatory 2FA after losing data on 5–6 million users.
Monitor All Accounts
- Stay alert for signs of suspicious activity.
- Enable login alerts and account change notifications wherever possible.
Demand Better Security from Providers

Hosting Provider Responds to Database Discovery

The World Host Group, whose infrastructure housed the massive trove of 184M logins, said the data was uploaded to an “unmanaged server” by a fraudulent user.

“The system has since been shut down… We will cooperate with law enforcement and provide any relevant customer data,” the company told Wired.

Still, the incident raises alarms about how easily enormous data sets can be hidden in plain sight, with little trace of origin or intent.

Final Equation: 364K + 184M = Take Security Seriously

Whether from a known corporate breach or a mystery dump of login data, the message is loud and clear: your digital identity is under constant threat. Strengthen it with:

Unique passwords
Two-factor or multi-factor authentication
Vigilance over all your accounts