LLM API Rate Limiting: Protecting Against Abuse

"My Startup Got a $28,000 OpenAI Bill Overnight"

I’ll never forget the panicked email from a developer last month. Their startup’s Anthropic Claude API key leaked on GitHub, and attackers racked up $28,000 in unauthorized usage before they noticed. Worse? Their account got banned for abuse.

This isn’t rare. In 2024, we’re seeing 3-5 major LLM API leaks per week across OpenAI, Google Gemini, and Mistral. Without proper rate limiting, a single exposed key can bankrupt your project.

Why LLM APIs Are High-Risk Targets

Unlike traditional APIs, AI services:

  • Charge per token ($$$ adds up fast)
  • Lack automatic spending caps (OpenAI only added this in 2023)
  • Get scraped 24/7 by bots hunting for leaked keys

Here’s what happens when attackers find your key:

# Attackers will hammer your endpoint like this:
import openai
for _ in range(10_000): # Brutal 10k requests
 openai.ChatCompletion.create(
 model="gpt-4",
 messages=[{"role":"user","content":"Generate 10,000 words"}]
 )
# Result: $5,000+ bill before breakfast

Rate Limiting 101: Your First Defense

1. Client-Side Throttling (Python Example)

from tenacity import retry, wait_exponential
import openai

@retry(wait=wait_exponential(multiplier=1, min=4, max=60))
def safe_completion(prompt):
 return openai.ChatCompletion.create(
 model="gpt-4",
 messages=[{"role":"user","content":prompt}],
 max_tokens=500 # Always set limits!
 )

# Enforces: 
# - Exponential backoff on failures
# - Max token constraints
# - Rate limits via retry delays

2. Server-Side Protection

Most providers offer rate limits, but you must enable them:

Provider Rate Limit Setting Location
OpenAI Organization Settings → Rate Limits
Anthropic Account Dashboard → API Security
Google Gemini GCP Console → Quotas

Pro Tip: Set limits at 50% of your expected max usage to prevent spikes.

How to Monitor for Leaks (Before It’s Too Late)

Even with perfect rate limiting, a leaked key is catastrophic. Tools like Leaked.now scan GitHub 24/7 to detect exposed API keys before attackers find them. I’ve seen this save teams from six-figure disasters.

Combine this with:

  • GitHub secret scanning (free for public repos)
  • Environment variable audits (never hardcode keys!)
  • Weekly usage alerts (CloudWatch/Prometheus)

5 Must-Do Steps in 2024

  1. Enable provider rate limits (OpenAI/Claude/Gemini dashboards)
  2. Implement client-side throttling (like the Python example above)
  3. Set billing alerts ($100 threshold recommended)
  4. Rotate keys monthly (especially after staff changes)
  5. Monitor leaks with Leaked.now or similar

A single unprotected API call can cost you thousands. Take 10 minutes today to lock things down—your CFO will thank you.


Need Help? Bookmark this guide or share it with your team. For real-time leak detection across 25+ AI providers, check out Leaked.now. Stay safe out there!


Don't Let Your API Keys Get Stolen

Every day, hundreds of API keys are leaked on GitHub. Leaked.now helps you find and secure exposed credentials before attackers do.

🔐 Monitors OpenAI, Claude, Gemini, Mistral & more 🚨 Instant alerts when keys are found 📧 Responsible disclosure to protect developers

Start Monitoring Free →