Axon Shield

Part of the Akamai WAF Defense Guide - Rate limiting for DDoS protection conflicts with server-to-server traffic that scales with business growth. How static thresholds break legitimate integrations and the operational process required to maintain protection without blocking customers.

Rate Limiting Reality: Why DDoS Protection Breaks Legitimate Business Growth

The DDoS Protection Paradox: Defending Against Attacks Without Blocking Growth

Enterprise DDoS mitigation follows predictable implementation: establish baseline traffic volumes, set rate limits above normal patterns, block traffic exceeding thresholds. Security teams configure Akamai with conservative limits, monitor for violations, adjust as needed.

This works perfectly until legitimate traffic grows.

A financial services API processing 10,000 transactions daily establishes rate limits at 15,000 requests per hour with comfortable overhead. Business development signs new enterprise customer. Their integration generates 8,000 requests hourly. Suddenly the service appears to handle 18,000 requests - triggering rate limiting designed to prevent DDoS attacks.

If your company runs marketing campaigns resulting in demand spikes, you get viral social posts attracting multiples of signups / customers, or organize large online events, DoS rate limits can easily cause self-inflicted denial-of-service.

Legitimate customer traffic gets blocked. Enterprise customer experiences service failures. Sales team escalates urgently. Security team emergency-raises limits. Pattern repeats with next customer onboarding.

The protection mechanism becomes the availability problem.


Server-to-Server Traffic Patterns That Rate Limiting Cannot Accommodate

Consumer web traffic follows predictable distributions. Individual users generate modest request volumes. Peak traffic correlates with time zones and business hours. Rate limiting based on per-IP thresholds works because no legitimate user generates thousands of requests per minute.

Server-to-server integrations violate every assumption:

High-Volume Legitimate Sources

Integration scenario: Partner company processes payroll for 50,000 employees. Their payroll system calls your API to verify employment data. On payday, the system generates:

  • Initial verification requests (50,000)
  • Tax calculation callbacks (50,000)
  • Payment confirmation updates (50,000)

Total: 150,000 API calls from single IP address in 2-hour processing window.

Rate limit configured for DDoS protection: 10,000 requests per hour per IP.

Result: Payroll integration fails. 50,000 employees don't get paid on schedule. Enterprise customer escalates to executive level. Your API appears unreliable.

The rate limit designed to prevent service disruption causes service disruption.

Unpredictable Scaling Patterns

Consumer traffic scales gradually. User growth follows measurable trends. Capacity planning predicts volume increases months in advance.

B2B integrations scale in discrete jumps:

  • Enterprise customer goes live: +40% traffic overnight
  • Partner migrates from competitor: +60% volume unannounced
  • Seasonal business (tax filing, enrollment periods): 10x normal load

Rate limits based on historical averages become obsolete during growth phases. Organizations choose between two sub-optimal strategies that Akamai support usually suggests based on your risk tolerance.

Option A: Conservative limits that block legitimate traffic during growth
Result: Customer complaints, emergency limit increases, security controls that impede business. With a good operational model, these periods are short.

Option B: Generous limits that accommodate growth
Result: Reduced DDoS protection effectiveness, larger attack surface. Application testing needs to be adjusted according to these higher rate limits.

Neither option is satisfactory.

Regional Traffic Concentration

Consumer services distribute globally. DDoS attacks often originate from geographically diverse botnets. Geographic rate limiting (restricting traffic from specific regions) provides useful signal for attack detection.

B2B integrations concentrate geographically:

  • Partner's entire infrastructure routes through single datacenter
  • Cloud provider runs all services in specific AWS region
  • Regulatory requirements mandate data processing in particular jurisdiction

All legitimate traffic appears to originate from narrow geographic area. Geographic rate limiting designed to detect botnet distribution instead flags concentrated legitimate sources.


The Operational Debt of Manual Rate Limit Management

Organizations address rate limiting conflicts through manual exception processes:

  1. Customer integration fails
  2. Support ticket escalated to operations
  3. Operations identifies rate limiting as cause
  4. Security approves exception for specific IP
  5. Configuration updated manually
  6. Integration resumes

This works at small scale. At enterprise scale with hundreds of integrations, the process creates operational overhead:

Exception proliferation - Each partner, each customer, each integration requires documented exception. Exception lists grow to hundreds of entries. Nobody remembers why specific IPs were whitelisted.

Configuration drift - Exceptions added during incidents rarely get reviewed. IP addresses change (partner migrates datacenters). Old exceptions persist indefinitely because removing them might break something.

Security degradation - Expansive exception lists undermine DDoS protection. Attackers can exploit exceptions if they compromise whitelisted infrastructure.

Change management overhead - Every exception requires approval, documentation, implementation, testing. Security team becomes bottleneck for business integrations.

At the UK financial institution, we documented 47 manual rate limit exceptions accumulated over 18 months. Security team spent approximately 8 hours monthly reviewing exception requests, investigating false positives, and updating configurations.

More concerning: 20% of the 47 exceptions were for IP addresses no longer in use. The partners had migrated infrastructure, but exception removal required formal security review. Nobody volunteered to potentially break something by removing obsolete rules.


Why "Just Raise the Limits" Fails

The obvious solution appears simple: set rate limits high enough to accommodate legitimate traffic. If current limits cause false positives, double them.

This approach fails because it fundamentally misunderstands DDoS attack economics. Also - increasing rate limits after an application has been handed over to production means that you may run it beyond the safe limits. This allows DoS attacks to test backend application resiliency and easily cause problems - from increased infrastructure costs to actual downtimes.

Attack Volume Scales With Available Protection

Attackers observe your rate limiting behavior. If your service remains available at 50,000 requests per minute, they generate 60,000. You raise limits to 75,000 to accommodate growth. Attacks scale to 90,000.

I have regularly observed attackers testing those limits and optimize attacks to avoid being blocked.

The arms race favors attackers because:

  • Attack traffic costs are minimal (compromised IoT devices, amplification attacks)
  • Defense costs scale linearly (bandwidth, processing capacity)
  • Your business requires availability guarantees
  • Attackers just need to exceed your tolerance threshold

Industry research documents DDoS attacks exceeding 1 Tbps1. No rate limit accommodates attack traffic at this scale. Protection requires distinguishing legitimate from malicious traffic, not simply raising thresholds.

Application-Layer Attacks Hide in High-Volume Traffic

Volumetric DDoS attempts to overwhelm infrastructure capacity. Detection is straightforward: traffic volume exceeds normal patterns by orders of magnitude.

Application-layer attacks operate differently. Rather than flooding services with traffic, attackers send carefully crafted requests designed to:

  • Consume disproportionate server resources (complex database queries)
  • Exploit algorithmic complexity (regexp patterns causing exponential processing)
  • Trigger cascading failures (cache invalidation forcing expensive recomputation)

Even without testing the infrastructure capacity, if your Akamai monitoring is not tuned, you will simply not see the application-level attacks. One can set filters to see these "level 7" attacks but if you need to do that manually every time, you will either miss those attacks, or detect them too late.

Attack scenario in a financial company:

Normal traffic: 20,000 requests per minute, average processing time 50ms per request.

Attack traffic: 2,000 requests per minute, average processing time 2 seconds per request. Carefully crafted requests with longest processing times.

Total volume is modest (2,000 requests โ‰ช 20,000 normal traffic). Rate limiting based on request count doesn't trigger. But 2,000 requests ร— 2 seconds = 4,000 seconds of processing time = 66 minutes of compute capacity consumed per minute. Effectively 4x the time required for genuine traffic.

Server resources exhaust. Legitimate traffic queues. Response times degrade. Service becomes unavailable despite traffic volume well within "normal" thresholds.

Raising rate limits doesn't address this attack pattern. You need request pattern analysis, not volume-based filtering.


The Rate Limiting Architecture That Actually Works

Effective DDoS protection requires multiple layers with different detection methodologies:

Layer 1: Volumetric Protection at Network Edge

Akamai Prolexic absorbs multi-terabit DDoS floods before they reach application infrastructure. This layer handles:

  • UDP/TCP floods
  • DNS amplification attacks
  • NTP reflection attacks
  • SSDP attacks

Traffic doesn't route to origin servers. Scrubbing centers filter attack traffic at network edge.

This protection is based on IP ranges, not domain names and the smallest IP range you can implement Prolexic is /24 range. Protection increases latency as all the traffic goes through a Prolexic scrubbing servers but it's always-on and catches even very short / burst attacks or probes.

Rate limiting at this layer: Very high thresholds. You're protecting against infrastructure saturation, not application abuse. Limits measured in gigabits per second or millions of packets per second.

Monitoring: There is limited integration of logs into external SIEM systems. I have implemented this with serverless AWS system into, e.g., Splunk. There is no need for real-time response but what many Akamai users miss is bandwidth planning. Each of your services will evolve with changes to traffic patterns. At the end of the day, all the traffic has to flow through your a fiber to your data centers. Prolexic shows you sudden changes in traffic patterns and you can also monitor long-term changes but you need an external service for this to become part of your operational model.

Layer 2: Akamai Kona Per-IP Rate Limiting with Adaptive Thresholds

Rather than static limits, implement percentile-based thresholds:

Baseline calculation: Analyze 30 days of traffic. Calculate 95th percentile request volume per IP per hour. Set initial limit at 2ร— this value.

Adaptive adjustment: Recalculate weekly. If legitimate traffic grows, thresholds rise automatically. If traffic declines, protection tightens.

Exception handling: When IP exceeds threshold, evaluate:

  • Is this IP documented as legitimate integration? (Allow)
  • Does traffic pattern match known attack signatures? (Block)
  • Is this first violation or repeated pattern? (Alert vs Block)

This accommodates organic growth without manual exception management.

The problem - Akamai does not provide any tools for this kind of traffic / rate limit management. It provides API to automate calculations and adjustments but you need to build this processing yourself.

Layer 3: Akamai Kona Request Pattern Analysis

Beyond volume, analyze request characteristics:

Legitimate integration patterns:

  • Consistent user agent strings
  • Predictable endpoint distribution (calls spread across multiple APIs)
  • HTTP method distribution appropriate to integration (reads vs writes)
  • Authentication tokens validate correctly

Attack patterns:

  • Randomized user agents (botnet trying to appear diverse)
  • Concentrated on single endpoint (targeting specific vulnerability)
  • Unusual HTTP method distribution (all POST requests for read-only API)
  • Authentication failures or missing credentials

Akamai provides tools to classify these attacks and it also generates logs. However, the processing and analysis of these patterns is up to you and has to be done externally.

Layer 4: Business Logic Validation

Ultimate protection requires understanding legitimate business processes:

Payroll integration example: Partner processes biweekly payroll. Predictable traffic spikes every 14 days. Outside payroll windows, traffic should be minimal.

Detection logic: If this IP generates high traffic on non-payroll dates, investigate. Either:

  • Partner changed processing schedule (verify through business relationship)
  • Account compromise (attackers using legitimate credentials)

This requires coordination between security operations and business relationship management. Ultimately, this is where your operational model has to be focused on. It is hard to do efficient perimeter protection without understanding your business processes, including spike-generating events.

Akamai does not offer tools to centrally manage trusted IP addresses for m-to-m traffic that scales with, e.g., marketing campaigns. This is another aspect that deserve a dedicated external processing capabilities.


Server-to-Server Authentication as Rate Limiting Alternative

Rate limiting by IP address assumes identity verification through network location. This assumption breaks when:

  • Legitimate partners share IP ranges with other organizations
  • Cloud providers use dynamic IP allocation
  • Partners operate through proxy infrastructure with rotating IPs

Alternative approach: Authenticate at application layer, rate limit by authenticated identity.

Implementation:

  • Issue API keys to integration partners
  • Rate limits apply to API key, not source IP
  • Partner operating from multiple IPs consumes single rate limit bucket
  • Attackers without valid API keys blocked regardless of IP address

Advantages:

  • Survives IP address changes without configuration updates
  • Enables granular rate limiting (different limits for different partners)
  • Provides attribution for troubleshooting (which partner caused traffic spike?)

Challenges:

  • Requires API authentication implementation (some legacy systems lack this)
  • Key management overhead (issuance, rotation, revocation)
  • Doesn't prevent attacks using compromised credentials

Not a complete replacement for IP-based rate limiting but complementary control. In fact, you still need to efficiently manage trusted IP addresses to ensure that you don't create a gap between rate limits and app. level authentication. A gap that can be exploited for DoS attacks.


The Operational Process Nobody Budgets For

Effective rate limiting requires ongoing operational investment:

Weekly Traffic Analysis

Review traffic patterns, identify:

  • Legitimate sources approaching rate limits (proactive adjustment needed)
  • Anomalous traffic patterns not triggering existing rules (detection gaps)
  • Exception list entries for obsolete IP addresses (cleanup opportunities)

Time investment: 2-4 hours weekly for operations engineer.

Quarterly Exception Reviews

Systematic validation of all rate limiting exceptions:

  • Does this integration still exist?
  • Is the whitelisted IP still accurate?
  • Can we replace IP-based exception with API key authentication?
  • What is blast radius if this IP is compromised?

Time investment: 8-12 hours quarterly for security team.

Incident Response Integration

When DDoS attacks occur, post-incident analysis should evaluate:

  • Did attack traffic exploit rate limiting exceptions?
  • Could tighter limits have mitigated impact?
  • Were legitimate customers affected by emergency rate limit changes?
  • What procedural improvements prevent recurrence?

Time investment: Variable based on incident frequency, 4-8 hours per significant event.

At the UK bank, we formalized this operational process during months 3-5 of implementation. Not because the technical configuration is complex - because operational discipline determines whether protection remains effective as business evolves.

Organizations deploying Akamai budget for initial configuration. Few budget for ongoing operational overhead required to maintain effectiveness. The gap between deployment and operational maturity determines whether DDoS protection works when attacks occur.


What We Built at the Financial Institution

Initial state: Static rate limits based on traffic analysis from 6 months prior. Limits unchanged since initial deployment. 47 manual exceptions accumulated.

Transformation deliverables:

Adaptive threshold calculation - Weekly recalculation of per-IP limits based on rolling 30-day traffic patterns. Accommodates organic growth automatically.

API key-based rate limiting - Migrated major integration partners from IP-based to authentication-based rate limiting. Reduced IP exception list by 60%.

Pattern-based detection - Beyond volume, analyze request characteristics to identify application-layer attacks hiding in legitimate traffic volumes.

Operational playbook - Documented procedures for weekly analysis, quarterly reviews, incident response. Security team knows exactly how to respond when rate limiting triggers.

Result: No customer-impacting false positives from rate limiting post-implementation. DDoS attack mitigation improved through pattern-based detection catching attacks that volume-based limits missed.

The protection mechanism stopped being an availability problem.


Related Resources


References

  1. Cloudflare. (2024). "DDoS Threat Report." Cloudflare Radar. https://radar.cloudflare.com/reports/ddos
  2. Akamai Technologies. (2024). "State of the Internet / Security: DDoS and Application Attacks." https://www.akamai.com/internet-station/cyber-research
  3. NIST. (2023). "Guide to DDoS Attacks Prevention and Response." NIST Special Publication 800-189.
  4. Arbor Networks. (2023). "Worldwide Infrastructure Security Report." NETSCOUT Arbor.
  5. OWASP Foundation. (2021). "Denial of Service Cheat Sheet." OWASP Cheat Sheet Series.
  6. Jonker, M., et al. (2017). "Millions of Targets Under Attack: a Macroscopic Characterization of the DoS Ecosystem." ACM Internet Measurement Conference.
  7. Krรคmer, L., et al. (2015). "AmpPot: Monitoring and Defending Against Amplification DDoS Attacks." Research in Attacks, Intrusions, and Defenses.
  8. Akamai Technologies. (2024). "Prolexic DDoS Protection." Akamai Product Documentation.
  9. Paxson, V. (2001). "An Analysis of Using Reflectors for Distributed Denial-of-Service Attacks." ACM Computer Communication Review.
  10. Zargar, S.T., et al. (2013). "A Survey of Defense Mechanisms Against Distributed Denial of Service (DDoS) Flooding Attacks." IEEE Communications Surveys & Tutorials.