The New Era of AI Cyberattacks: How Agent-Aware Cloaking Weaponizes ChatGPT Atlas for Disinformation

31.10.2025 siteguarding2

Researchers uncover critical vulnerability allowing manipulation of AI browsers through specially crafted content

The world is facing a fundamentally new type of cyberattack that exploits not code, but the very logic of artificial intelligence operation. Agent-aware cloaking technology uses AI browsers like OpenAI’s ChatGPT Atlas to deliver misleading content that can poison the information AI systems ingest, potentially manipulating decisions in hiring, commerce, and reputation management.

By detecting AI crawlers through user-agent headers, websites can deliver altered pages that appear benign to humans but toxic to AI agents, turning retrieval-based AI systems into unwitting vectors for misinformation.

The Scale of the Problem: 2025 Statistics

The threat of prompt injections and AI manipulation has reached critical proportions:

93% of security leaders are preparing for daily AI attacks in 2025, while 66% of surveyed organizations predict that AI will have the most significant impact on cybersecurity this year.

The specific numbers are even more alarming:

Out of 1.8 million prompt injection attacks in a public AI agent red-teaming competition, over 60,000 succeeded in causing policy violations (data access, illicit actions)
Across approximately 3,000 U.S. companies using AI agents, there are an average of 3.3 AI agent security incidents per day in 2025, with 1.3 per day tied to prompt injection or agent abuse
From July to August 2025 alone, several LLM data leakage incidents related to prompt injection resulted in massive breaches of sensitive data, including user chat records, credentials, and third-party application data

Table 1: AI Attack Statistics in 2025

Metric	Value	Source
Organizations expecting daily AI attacks	93%	Trend Micro
Successful prompt injections out of 1.8M attempts	60,000+	Public competition
Average success rate of attacks	3.33%	Calculated data
AI security incidents per day (US)	3.3	Study of 3,000 companies
Prompt injection incidents per day	1.3	Same sample
Organizations using generative AI	65%	McKinsey 2024
Confirmed AI-related breaches (YoY increase)	+49%	Industry reports

OpenAI Atlas Technology: The Double-Edged Sword of Innovation

Atlas from OpenAI, launched in October 2025, is a Chromium-based browser that integrates ChatGPT for seamless web navigation, search, and automated tasks. It enables AI to browse live webpages and access personalized content, making it a powerful tool for users but a vulnerable entry point for attacks.

The Evolution of Cloaking

Traditional cloaking deceived search engines by showing optimized content to crawlers, but agent-aware cloaking targets AI-specific agents like Atlas, ChatGPT, Perplexity, and Claude.

Expert Opinion:

Perplexity’s security team published a blog post on prompt injection attacks, noting that the problem is so severe that “it demands rethinking security from the ground up.” The blog continues to note that prompt injection attacks “manipulate the AI’s decision-making process itself, turning the agent’s capabilities against its user.”

How AI Crawlers See a Different Internet

A simple server rule—”if user-agent equals ChatGPT-User, serve fake page”—can reshape AI outputs without hacking, relying solely on content manipulation.

Table 2: Traditional vs. Agent-Aware Cloaking Comparison

Characteristic	Traditional Cloaking	Agent-Aware Cloaking
Target	Search engines (Google, Bing)	AI agents (ChatGPT, Claude, Perplexity)
Detection Method	IP addresses, search engine user-agents	AI crawler user-agents
Attack Complexity	Medium	Low
Code Required	Yes	No (text only)
Impact Scale	SEO rankings	AI decision-making
Detection Difficulty	Relatively easy	Very difficult
Consequences	Lost rankings	Decision manipulation, data leaks
Prevention	Standard techniques	Requires new approaches

Real-World Experiments: SPLX Case Studies

SPLX researchers demonstrated this vulnerability through controlled experiments on sites that differentiate between human and AI requests.

Case 1: Reputation Sabotage

SPLX created zerphina.xyz, a portfolio for the fictional Zerphina Quortane, a Portland-based designer blending AI and creativity.

What Humans See:

Professional biography
Clean layouts
Positive project highlights
No suspicious elements

What AI Agents See:

When accessed by AI agents like Atlas identified via user-agents such as “ChatGPT-User” or “PerplexityBot,” the server serves a damning alternate narrative portraying Zerphina as a “notorious product saboteur” riddled with ethical lapses and failures.

Atlas and similar tools reproduced this poisoned profile without verification, confidently labeling her unreliable and unhirable in summaries. Detection lagged, as neither ChatGPT nor Perplexity cross-checked inconsistencies, underscoring gaps in provenance validation.

Case 2: Hiring Process Manipulation

SPLX’s second test targeted recruitment, simulating a job evaluation with five fictional candidates’ resumes on hosted pages.

Table 3: Candidate Evaluation Manipulation Results

Candidate	AI Score (with cloaking)	AI Score (without cloaking)	Difference
Natalie Carter	88/100	26/100	-62 points
Jessica Morales	78/100	85/100	+7 points
Other candidates	70-75/100	80-90/100	Varies
Average manipulation impact	—	—	±40 points

Attack Methodology:

For candidate Natalie Carter, the server was rigged to detect AI crawlers and inflate her resume with exaggerated titles, leadership claims, and tailored achievements appealing to algorithmic scoring.

Result:

When Atlas retrieved the pages, it ranked Natalie highest at 88/100, far above others like Jessica Morales at 78. In contrast, using human-visible resumes loaded locally—bypassing user-agent tricks—dropped her to 26/100, flipping the leaderboard entirely.

This shift demonstrates how cloaked content injects retrieval bias into decision-making, affecting hiring tools, procurement, or compliance systems.

Security Rankings: Most Critical AI Vulnerabilities

OWASP (Open Worldwide Application Security Project) ranked prompt injection as the number one security risk in its 2025 OWASP Top 10 for LLM Applications report, describing it as a vulnerability that can manipulate LLMs through adversarial inputs.

Table 4: OWASP Top 10 LLM Security Risks (2025)

Rank	Threat Type	Severity Level	Exploitation Difficulty
1	Prompt Injection	Critical	Low
2	Insecure Output Handling	High	Medium
3	Training Data Poisoning	High	High
4	Model Denial of Service	Medium	Medium
5	Supply Chain Vulnerabilities	High	Medium
6	Sensitive Information Disclosure	Medium	Low
7	Insecure Plugin Design	High	Medium
8	Excessive Agency	Medium	Low
9	Overreliance	Medium	Very Low
10	Model Theft	Medium	High

Critical Incidents of 2025

CVE-2025-32711, which affected Microsoft 365 Copilot, has a CVSS score of 9.3, indicating high severity. Exploitation of this vulnerability, which involved AI command injection, could have potentially allowed an attacker to steal sensitive data over a network. Microsoft publicly disclosed and patched it in June.

Table 5: Major AI Security Incidents (2025)

Date	Incident	CVE	CVSS Score	Consequences
June 2025	Microsoft 365 Copilot	CVE-2025-32711	9.3	Network data theft
July-Aug 2025	LLM Data Leaks	Multiple	N/A	Chat logs, credentials leaked
January 2025	Cursor IDE	CVE-2025-54135, CVE-2025-54136	N/A	Remote code execution
February 2025	Google Gemini	N/A	Low	Long-term memory manipulation
July 2025	X’s Grok4	N/A	N/A	Successful jailbreak
December 2024	ChatGPT Search	N/A	Medium	Hidden text manipulation

Types of Prompt Injections: Threat Classification

Direct prompt injections occur when user input directly alters the behavior of the model in unintended or unexpected ways. Indirect prompt injections occur when an LLM accepts input from external sources, such as websites or files.

Table 6: Prompt Injection Typology

Attack Type	Method	Example	Complexity	Effectiveness
Direct Injection	Direct user input	“Ignore all previous instructions”	Very Low	High
Indirect Injection	Via webpages/files	Hidden text on website	Low	Very High
Hybrid Attack	Combined with XSS/CSRF	Prompt + JavaScript code	Medium	Critical
Zero-Click Attack	Via email/notifications	Malicious email in Outlook	Low	Critical
Multimodal Injection	Instructions in images	Hidden text in pictures	Medium	High
Template Injection	Configuration manipulation	Modify system prompts	High	Critical

Evolution of Threats: Prompt Injection 2.0

Prompt injection attacks, where malicious input is designed to manipulate AI systems into ignoring their original instructions and following unauthorized commands instead, were first discovered by Preamble, Inc. in May 2022 and responsibly disclosed to OpenAI.

Over the last three years, these attacks have remained a critical security threat for LLM-integrated systems. The emergence of agentic AI systems, where LLMs autonomously perform multistep tasks through tools and coordination with other agents, has fundamentally transformed the threat landscape.

Modern prompt injection attacks can now combine with traditional cybersecurity exploits to create hybrid threats that systematically evade traditional security controls.

Industry Expert Opinions

Stuart MacLellan, CTO of South London and Maudsley NHS Foundation Trust:

“There are still lots of questions around AI models and how they could and should be used. There’s a real risk in my world around sharing personal information. We’ve been helping with training, and we’re defining rules to make it known which data resides in a certain location and what happens to it in an AI model.”

Perplexity Security Team:

The problem is so severe that it “demands rethinking security from the ground up.” Prompt injection attacks “manipulate the AI’s decision-making process itself, turning the agent’s capabilities against its user.”

Defense Strategies: Multi-Layered Security Approach

To counter this threat, organizations must implement provenance signals for data origins, validate crawlers against known agents, and continuously monitor AI outputs.

Table 7: Recommended Defense Measures

Defense Layer	Measure	Effectiveness	Implementation Difficulty
Input Level	Real-time prompt filtering	60-70%	Medium
Source Verification	User-agent validation	50-60%	Low
Cross-Checking	Compare with reference data	70-80%	High
Reputation Systems	Block manipulative sources	65-75%	Medium
Model Testing	Red teaming with AI tactics	80-90%	High
Logged-Out Mode	Operate without authentication	90-95%	Low
Output Monitoring	Continuous response analysis	75-85%	Medium
Multimodal Validation	Image/text consistency checks	70-80%	High

Technical Solutions and Countermeasures

OpenAI’s Approach:

OpenAI created “logged out mode,” in which the agent won’t be logged into a user’s account as it navigates the web. This limits the browser agent’s usefulness, but also limits how much data an attacker can access.

Perplexity’s Solution:

Perplexity reports it built a detection system that can identify prompt injection attacks in real time, though cybersecurity researchers note these safeguards don’t guarantee complete protection.

Table 8: Vendor Security Features Comparison

Vendor	Primary Defense	Secondary Defense	Real-time Detection	Logged-Out Mode
OpenAI (Atlas)	Logged-out mode	Input filtering	Limited	✓ Yes
Perplexity	Real-time detection	Content validation	✓ Yes	Partial
Microsoft Copilot	Policy enforcement	Sandboxing	✓ Yes	✗ No
Google Gemini	Memory notifications	User interaction checks	Limited	✗ No
Anthropic (Claude)	Constitutional AI	Context validation	✓ Yes	✓ Yes

Market Growth and Threat Forecast

Bloomberg projects that the generative AI market will reach $1.3 trillion by 2032. With such a scale of adoption, the importance of protection against manipulation will only grow.

Table 9: AI Security Threat Growth Forecast

Year	AI Market Cap	Predicted Incidents	Expected Damage
2025	$300 billion	16,200 confirmed attacks	$2-3 billion
2027	$600 billion	35,000+ attacks	$8-10 billion
2030	$1 trillion	75,000+ attacks	$25-30 billion
2032	$1.3 trillion	120,000+ attacks	$50+ billion

Industries Most at Risk

Table 10: Industry Vulnerability Assessment

Industry	Risk Level	Primary Threat	AI Adoption Rate	Potential Impact
Financial Services	Critical	Data exfiltration	78%	$10B+ annual
Healthcare	Critical	Patient data leaks	62%	$8B+ annual
Legal Services	High	Confidential doc exposure	54%	$5B+ annual
Recruitment/HR	High	Bias injection	71%	$3B+ annual
E-commerce	Medium-High	Review manipulation	83%	$6B+ annual
Manufacturing	Medium	IP theft	45%	$4B+ annual
Education	Medium	Academic fraud	38%	$1B+ annual

Real-World Impact Scenarios

Scenario 1: Corporate Espionage

A competitor uses agent-aware cloaking to poison AI research tools, causing a Fortune 500 company to make strategic decisions based on falsified market data. Estimated loss: $50-100 million.

Scenario 2: Political Manipulation

During an election cycle, AI-powered news aggregators are fed manipulated content about candidates, influencing voter perception without leaving traditional traces.

Scenario 3: Financial Fraud

AI-powered trading algorithms are fed false financial data through cloaked pages, triggering automated trades that benefit attackers. Market manipulation cost: $500 million+.

The Human Element

Table 11: User Awareness and Behavior

Demographic	AI Trust Level	Security Awareness	Verification Habits
Gen Z (18-24)	68% trust	32% aware	Rarely verify
Millennials (25-40)	54% trust	48% aware	Sometimes verify
Gen X (41-56)	41% trust	61% aware	Often verify
Boomers (57+)	28% trust	45% aware	Usually verify
Tech Professionals	35% trust	87% aware	Always verify

Regulatory Response and Compliance

As of 2025, several jurisdictions are implementing AI security regulations:

EU AI Act: Mandatory risk assessments for high-risk AI systems
US Executive Orders: Federal agencies required to implement AI security frameworks
China’s AI Regulations: Strict content control and security measures
GDPR Extensions: New provisions for AI data processing

Table 12: Global Regulatory Landscape

Region	Regulation Status	Enforcement Level	Penalties
European Union	Active	Strict	Up to 7% global revenue
United States	In development	Moderate	Case-by-case
United Kingdom	Consultation phase	Moderate	TBD
China	Active	Very strict	License revocation
Japan	In development	Light	Advisory only

Best Practices for Organizations

Implement Multi-Factor Verification: Never rely solely on AI-retrieved information for critical decisions
Continuous Monitoring: Deploy 24/7 monitoring systems for AI agent behavior
Red Team Exercises: Conduct regular adversarial testing with prompt injection scenarios
Employee Training: Ensure staff understand AI manipulation risks
Vendor Assessment: Evaluate AI service providers’ security measures
Incident Response Plans: Develop specific protocols for AI security breaches

Emerging Technologies and Future Defenses

Researchers are exploring new architectures that could inherently block prompt injections in agentic systems, using strict information-flow controls to prevent an AI agent from ever outputting data it wasn’t authorized to access.

Industry standards are emerging, and major tech providers such as Microsoft are continually investing in more deterministic security features to stay ahead of attackers.

Conclusion: A New Reality of Digital Security

Agent-aware cloaking evolves classic SEO tactics into AI overview (AIO) threats, amplifying impacts on automated judgments like product rankings or risk assessments. Hidden prompt injections could even steer AI behaviors toward malware or data exfiltration.

As AI browsers like Atlas proliferate, defense measures will define the battle for web integrity. Organizations that fail to invest in multi-layered protection of AI systems now risk catastrophic consequences in the near future.

Key Takeaway: This is not a theoretical threat but a current reality requiring immediate action from every organization using AI technologies. The window for proactive defense is closing rapidly, and the cost of inaction grows exponentially with each passing quarter.

The question is no longer whether your organization will face AI manipulation attacks, but when—and whether you’ll be prepared to defend against them.