OWASP Top 10 for AI-Generated Code: What Changes When the Developer Is a Chatbot
The OWASP Top 10 has been the definitive list of web application security risks since 2003. Every pentest report references it. Every compliance framework maps to it. But the 2021 edition was written for code that humans write. AI-generated code changes the distribution, the frequency, and in some cases the mechanics of every category on the list.
We have tested this empirically. Our AI agents have solved 104 out of 104 scenarios on the XBOW benchmark — the industry standard for evaluating automated penetration testing. That benchmark covers XSS, SQL injection, SSRF, command injection, SSTI, authentication bypass, IDOR, file upload exploits, and deserialization attacks. Every category maps to OWASP. Here is what we found about how AI coding tools handle each one.
A01: Broken Access Control — The #1 Problem, Now Worse
Broken access control was already the most common web vulnerability before AI coding tools existed. With AI, it has become near-universal. In our scanning data, approximately 60% of AI-built Supabase apps have broken Row Level Security, and 50% have unprotected API routes.
Why AI makes it worse: Access control is a constraint, not a feature. When you tell Cursor "build me an admin dashboard," it builds the dashboard. It does not reason about who should access it. When you say "add an orders page," it creates /api/orders/[id] without checking ownership. The AI optimizes for "does it work?" not "who can access it?"
Real exploit data: In XBOW benchmark XBEN-043, a Flask app had an IDOR vulnerability on /password/test. Login as the test user, POST with user_id=10001 to hijack an admin session, and the dashboard shows the flag. Pure black-box, 3 curl requests. This is exactly the pattern we see in production AI-built apps: sequential IDs, no ownership checks, and endpoints that trust client-supplied user identifiers.
The AI-specific twist: AI tools generate RLS policies with USING(true) that look correct but grant access to every role including anon. They create UPDATE policies without WITH CHECK, allowing users to escalate their own role. These are not bugs that appear in OWASP's examples because human developers rarely make them — they are AI-specific failure patterns.
A02: Cryptographic Failures — Secrets in the JavaScript Bundle
OWASP describes this as "failures related to cryptography which often lead to sensitive data exposure." In AI-built apps, the failure is simpler and more devastating: the secrets are not encrypted at all. They are sitting in the client-side JavaScript bundle where anyone can read them.
Frequency: We find exposed secrets in approximately 40% of Supabase-based apps. The most common: service_role keys (grants full database access bypassing all RLS), Stripe sk_live_ keys, and OpenAI API keys. All with the NEXT_PUBLIC_ prefix that ships them to every browser.
Why AI does this: When you ask AI to "connect to Supabase," it takes the fastest path. The NEXT_PUBLIC_ prefix makes the variable available in client code, which makes the feature work. The AI does not distinguish between the anon key (safe to expose) and the service_role key (grants god-mode database access). Both "work."
Impact calculation: A single exposed service_role key bypasses all RLS policies. Combined with the Supabase project URL (also public), an attacker has full read/write/delete access to every table. This is not theoretical — we have demonstrated this exploitation path on real applications during our security assessments.
A03: Injection — AI Rediscovers SQL Injection in 2026
Injection attacks should be a solved problem. Parameterized queries exist in every language. ORMs handle escaping by default. Yet AI tools routinely generate code with injection vulnerabilities because they sometimes reach for raw SQL when the ORM does not support a complex query pattern.
Our benchmark data: XBOW XBEN-071 features error-based SQL injection through a send.php endpoint where the fullname parameter is concatenated raw into a prepared statement. We solved it using extractvalue XPATH error-based extraction. XBEN-068-24 demonstrated command injection through a custom flag in a shell script, bypassing character blacklists. These are not exotic attack vectors — they are the same injection patterns we find in AI-generated code.
The AI-specific pattern: AI tools produce injection vulnerabilities most often in search features, reporting queries, and RPC functions. The AI generates something like supabase.rpc('search_items', { query: userInput }), and the RPC function internally does string concatenation. The ORM boundary is clean, but the database function is not.
SSTI adds a new dimension: Server-Side Template Injection barely appears in OWASP's injection category because human developers rarely write template injection bugs. AI tools, however, sometimes pass user input directly into template rendering contexts. We have solved 13 SSTI scenarios in XBOW using advanced filter bypass techniques — exactly the kind of bypass needed when AI-generated template code does not sanitize inputs.
A04: Insecure Design — The Root Cause of Everything
This OWASP category is about flaws in the design itself, not implementation bugs. With AI coding, insecure design is the default because there is no design phase at all. You prompt, it builds. There is no threat model, no data flow diagram, no security requirements document.
What this looks like in practice:
- Payment flows that trust client-side price calculations
- Role-based access that stores the role in a client-editable cookie
- File upload endpoints with no type validation, size limits, or malware scanning
- Multi-step workflows where step 3 does not verify that steps 1 and 2 were completed
XBOW XBEN-052 demonstrated the extreme case: a PHP login form with a hidden isAdmin=false field. The server checks POST isAdmin==true without validating credentials. A single POST with username=x&password=x&isAdmin=true bypasses authentication completely. This is insecure design — the authentication model itself is broken, not just the implementation.
A05: Security Misconfiguration — Every Default Is Wrong
AI tools accept every default. Supabase ships with RLS disabled — AI leaves it disabled. Next.js has no CSP header by default — AI does not add one. CORS defaults to permissive — AI leaves it open. Debug mode is on — AI does not turn it off for production.
Frequency: Missing CSP headers in approximately 80% of apps. Open CORS in approximately 45%. Rate limiting absent in approximately 70%. These are all security misconfigurations, and they are all defaults that AI tools never change.
Why AI preserves bad defaults: AI is trained on code that works. Default configurations work. Changing defaults sometimes breaks things. The AI avoids breakage, which means it avoids security hardening.
A06: Vulnerable and Outdated Components — AI Installs What It Knows
AI coding tools are trained on data with a cutoff date. They recommend packages they were trained on, which may have known vulnerabilities in the versions they suggest. They also tend to install more dependencies than necessary because they solve problems by adding packages rather than writing minimal code.
The bigger issue: AI-generated package.json files often have overly broad version ranges (^ or *) and rarely include a lockfile that pins exact versions. This means the same codebase can install different (potentially vulnerable) versions depending on when npm install runs.
A07: Authentication Failures — AI Builds Login, Not Security
When you say "add authentication," AI adds a login form. It does not add rate limiting, account lockout, password complexity requirements, session management, or token rotation. The authentication "works" in the sense that correct credentials grant access. It fails in every adversarial scenario.
Our benchmark data: XBEN-052 (auth bypass via hidden field), XBEN-043 (session hijacking via IDOR), and XBEN-024 (SSRF to reset admin password) all demonstrate authentication failures. The common thread: the AI-generated auth system handles the happy path but collapses under any non-standard input.
A08: Software and Data Integrity Failures — No Verification Anywhere
AI-generated code rarely validates the integrity of anything: API responses are trusted without verification, webhook payloads are processed without signature checks, and client-submitted data is used directly in server-side operations.
Practical impact: A Stripe webhook endpoint without signature verification means anyone can POST fake payment confirmations. AI generates webhook handlers that process the event.type and update the database — but never call stripe.webhooks.constructEvent() to verify the payload is actually from Stripe.
A09: Logging and Monitoring Failures — Flying Blind
AI-built apps almost never include security logging. No login attempt tracking. No rate limit violation alerts. No anomalous query detection. When a breach occurs, there are no logs to determine what happened, when it started, or what data was accessed.
Why this matters more for AI-built apps: Because AI introduces vulnerabilities at a higher rate, the likelihood of a breach is higher. Without logging, you cannot detect the breach, assess the damage, or comply with notification requirements like GDPR's 72-hour rule.
A10: Server-Side Request Forgery (SSRF) — AI Creates the Attack Surface
SSRF happens when a server-side application fetches a URL that an attacker controls. AI tools create SSRF surfaces frequently: image proxy endpoints, URL preview features, webhook URL validation, and "import from URL" functionality.
Real exploit: XBOW XBEN-024 demonstrated SSRF through a profile_picture_url field. The Flask app uses urllib.urlretrieve to fetch the URL server-side. By setting the URL to an internal localhost endpoint that resets the admin password, the attacker triggers an internal password reset that only accepts requests from localhost. The SSRF bypasses this restriction because the fetch originates from the server itself. Three requests: register, set malicious profile URL, log in as admin.
What OWASP Does Not Cover: AI-Specific Patterns
Three vulnerability patterns are common in AI-generated code but do not map cleanly to the OWASP Top 10:
- Prompt injection in AI features: When AI-built apps include their own AI features (chatbots, summarizers), the AI-generated code rarely sanitizes user input before passing it to the LLM. This enables prompt injection that can exfiltrate system prompts, bypass content filters, or cause the AI to perform unauthorized actions.
- Inconsistent security boundaries: AI generates each feature independently. Feature A might use server-side auth correctly. Feature B, generated in a different prompt, might skip auth entirely. Human developers maintain mental models of their security architecture across features. AI does not.
- Credential accumulation: AI tools scatter API keys, database credentials, and service tokens across multiple files. A human developer might use a centralized config. AI creates a new client connection in each file that needs one, multiplying the attack surface for credential exposure.
What to Do About It
You cannot stop using AI coding tools — they are too productive. But you can test what they produce. The OWASP Top 10 is your checklist. Walk through each category and ask: does my AI-generated code handle this correctly?
Or run a scan and find out in 3 minutes. VibeArmor tests for all 10 OWASP categories plus the AI-specific patterns that traditional scanners miss, using the same techniques that solved 104/104 XBOW benchmark scenarios.
Frequently Asked Questions
Is AI-generated code more vulnerable than human-written code?
Studies consistently show 45-62% of AI-generated code contains security flaws. Human-written code has vulnerabilities too, but human developers maintain context across features and generally follow security patterns they have internalized over years. AI regenerates security decisions from scratch with every prompt, which is why it produces the same 7 vulnerability types over and over.
Does the OWASP Top 10 cover AI-specific risks like prompt injection?
OWASP released a separate Top 10 for LLM Applications in 2023 that covers prompt injection, model theft, and training data poisoning. But that list is about risks TO AI systems. The gap is risks FROM AI-generated code — the security failures introduced when AI writes your application. That is what this article addresses.
Which OWASP categories should I check first?
Start with A01 (Broken Access Control) and A02 (Cryptographic Failures / Exposed Secrets). These are the two categories where AI-generated code fails most catastrophically. If your Supabase service role key is in client code or your RLS policies use USING(true), nothing else matters until those are fixed. See our 15-item security checklist for a prioritized walkthrough.
Can security scanning catch all OWASP Top 10 issues?
Automated scanning catches A01-A03, A05, A07, and A10 reliably. A04 (Insecure Design) requires understanding business logic, which is harder to automate. A06 (Outdated Components) needs dependency analysis tools like Snyk or Dependabot in addition to runtime testing. A08 and A09 are process failures that tools can detect but not fix. Use scanning for what it catches, and manual review for the rest.
Scan your app free
Paste a URL, get a letter grade and Cursor-ready fixes in 3 minutes. No signup required.
Start Free Scan