Why Your AI-Built App Gets an F (And How to Get an A)
We launched VibeArmor with a traditional security scoring model. Within 48 hours, we realized it was completely broken.
Shopify scored an F. Stripe scored a D. Meanwhile, a todo app with zero authentication and an exposed database got a B+ because it had all its HTTP headers in place.
That is the fundamental problem with security scanners: they measure hygiene instead of hackability.
The Hygiene Problem
Traditional scanners check for things like missing X-Frame-Options headers, X-Content-Type-Options, and whether your cookies have the Secure flag. These are real things, but they are not how apps get hacked.
Here is what a typical hygiene-based scanner produces:
- Stripe.com — "Missing X-Frame-Options, missing Permissions-Policy, server version disclosed." Grade: D.
- Shopify.com — "Open CORS policy, missing CSP on some routes, cookie without SameSite." Grade: F.
- Random todo app — "All headers present, HTTPS configured, cookies look good." Grade: B+.
That todo app had no authentication, no RLS, and the Supabase service role key in the client bundle. But it had great HTTP headers, so it scored well.
This is not useful. It is actively misleading.
The Tier Pivot: Hackability Over Hygiene
We rebuilt our entire scoring system around one question: can someone actually hack this app?
Every check is now classified into one of three tiers:
Tier 1: Can Someone Steal Your Data? (36 checks)
These prove exploitability. Exposed secrets, authentication bypass, SQL injection, cross-user data access. If you fail any Tier 1 check, your grade drops hard — because these are the things that lead to data breaches, not theoretical risks.
A single exposed service role key is worth more than 50 missing HTTP headers.
Tier 2: Are Your Defenses Solid? (34 checks)
These are real security gaps that require specific conditions to exploit. HTTPS misconfigurations, missing Content-Security-Policy, no rate limiting on login, cookie security flags. They matter, but they do not prove someone can walk in and take data right now.
Tier 3: Informational (30 checks)
Best practices that are good to know but never affect your grade. Missing X-Frame-Options on a marketing page is not a vulnerability. Server version disclosure is trivia, not a hack. These checks are shown for completeness but carry zero weight.
What Changed in Practice
With the new scoring model:
- Stripe scores an A+ — because it has zero exploitable vulnerabilities, even if it is missing some informational headers.
- Shopify scores an A — because its CORS is intentional (they run a public API) and nothing is actually exploitable.
- That todo app scores an F — because an exposed service role key and no RLS means anyone can read every user's data.
This is the correct ranking. The scanner now agrees with what a penetration tester would tell you.
How the Scoring Works
The math is intentionally weighted toward hackability:
- Tier 1 critical finding: -25 points. One exposed secret drops you from A to C.
- Tier 1 high finding: -15 points.
- Tier 1 medium finding: -5 points.
- Tier 2 findings: -10 / -5 / -2 points depending on severity.
- Tier 3 findings: 0 points. Always zero. They show up in your report but never hurt your grade.
There are also floors to prevent absurd results:
- If you have zero Tier 1 findings, your score cannot drop below 75 (C+) no matter how many Tier 2 issues exist.
- If you have zero Tier 1 and zero Tier 2 findings, your score cannot drop below 90 (A-).
- The absolute floor is 40 (F) — we do not go lower.
What This Means for You
If your app gets an F, do not panic. It means we found something that is actively exploitable — and every finding comes with a specific fix you can paste into Cursor.
The path from F to A is usually 3-5 fixes. Move your secrets server-side, enable RLS, add rate limiting to your login endpoint. That alone will get most apps to a B or higher.
Stop worrying about HTTP headers. Start worrying about whether someone can read your users' data.
Related reading
- Automated Penetration Testing for AI-Built Apps: How It Actually Works
Manual pentests cost $10K+ and take weeks. AI agents can test your app in minutes. Here is how automated penetration tes...
- The 7 Most Common Vulnerabilities in AI-Generated Code
45-62% of AI-generated code contains security flaws. These are the 7 specific vulnerabilities we find most often in apps...
- Vibe Coding Security Checklist: 15 Things to Check Before You Ship
A prioritized checklist of security issues we find in 70%+ of AI-built apps. Organized by severity so you fix what matte...
- Benchmark Scores — See how Stripe and Shopify actually score
- Vibe Coding Security Risks — The Complete 2026 Guide
Scan your app free
Paste a URL, get a letter grade and Cursor-ready fixes in 3 minutes. No signup required.
Start Free Scan