Axon Shield

Part of the Akamai WAF Defense Guide - SIEM integration with Akamai WAF fails due to data volume overwhelming bandwidth and licensing limits. Why external filtering before SIEM ingestion is mandatory, not optional, and the architectural patterns that actually work.

SIEM Integration Reality: Why Akamai's Data Volume Requires External Filtering

The Integration That Breaks SIEM Economics

Security operations centers consolidate telemetry from across enterprise infrastructure into Security Information and Event Management (SIEM) platforms. Firewalls, intrusion detection systems, authentication services, application logs - all feed into centralized SIEM for correlation analysis and threat detection.

Adding Akamai to this architecture appears straightforward. Configure log forwarding, route security telemetry to SIEM collectors, enable correlation rules combining edge protection data with internal security signals.

Then the data volume arrives.

Akamai's global infrastructure processes requests without any visible bottlenecks. The same holds for the amount of logs it can generate. Organizations can easily generate gigabytes of logs daily with incorrectly configured Akamai WAF. SIEM platforms sized for internal security event volumes (tens of thousands of events daily) are hit with Akamai data - millions of events.

The SIEM licensing model charges based on data volume ingested. Akamai integration doesn't just add another data source - it becomes the dominant consumption driver overwhelming existing capacity allocation.

This comes with three immediate effects.

Failure Mode 1: SIEM Storage Exhaustion
Data retention policies designed for weeks of internal logs provide only days of retention when Akamai volume is included. Historical analysis becomes impossible.

Failure Mode 2: License Limit Violation
SIEM vendors license based on daily ingestion volume (gigabytes per day or events per second). A single error in configuration and Akamai data pushes organizations beyond licensed capacity. The result - substantial overage fees or throttled ingestion that impacts problem visibility.

Failure Mode 3: Query Performance Degradation
Even if storage and licensing accommodate volume, query performance suffers. Security analysts running investigations wait minutes for search results that previously returned in seconds. Response time during active incidents becomes unacceptable.

At a large internet company, initial Akamai SIEM integration consumed more than 75% of total SIEM capacity while representing only one of over 20 data sources. The economics made comprehensive integration untenable.


Why "Just Upgrade SIEM Capacity" Fails the Cost-Benefit Analysis

The obvious solution appears to be capacity expansion: increase SIEM licensing, add storage infrastructure, scale computational resources.

It is the simplest approach - you just pay more. However, even if you have the budget, this approach fails cost-benefit calculations. Akamai security telemetry will contain substantial noise - tons of data you never want to see. Data created because of WAF and DoS alerts are not tuned and/or Akamai protection is never switched from alerting to protection.

The Signal-to-Noise Problem

Security events requiring investigation:

  • WAF blocks indicating actual attack attempts
  • Anomalous traffic patterns suggesting reconnaissance
  • Geographic inconsistencies indicating compromised credentials
  • Rate limiting violations from unexpected sources

With badly optimized alerting, these represent perhaps much less than 0.1% of your Akamai telemetry volume.

Informational events not requiring investigation:

  • Routine health checks from monitoring services
  • Expected rate limiting on known high-volume integrations - or even all incoming traffic
  • Geographic distribution matching normal user base
  • WAF alerts on legitimate traffic payload

These represent more than 99.9% of volume but provide minimal, or even negative, security value in SIEM.

Ingesting all telemetry to extract valuable 0.1% creates data economics problem: paying SIEM licensing costs for storing and indexing information that will never be queried.

SIEM Vendor Licensing Models

Major SIEM platforms (Splunk, QRadar, LogRhythm, Microsoft Sentinel) charge based on:

Data volume ingestion - Gigabytes per day or events per second
Typical enterprise license: 50-100 GB/day
Akamai WAF telemetry from moderate traffic volume can easily represent a multiple of your licensed volume.

Adding Akamai consumes >100% of existing capacity. Organizations must either:

  • Reduce retention for other sources (compliance risk)
  • Remove other data sources (visibility gap)
  • Purchase additional capacity (cost increase)

The cost calculation:

SIEM licensing approximately $500-2,000 per GB/day annually depending on vendor and volume tier1.

Ingesting 50 GB/day of Akamai data: $25,000-100,000 annually.

If only 0.1% has security value, you're spending $24,975-99,900 annually to store noise.

Alternative approach: Filter Akamai data before SIEM ingestion and. Send only security-relevant events. Reduce volume from 5 GB/day to 5 MB/day (0.1%). Cost: $25-100 annually.

The cost differential is 100:1. External filtering becomes economic necessity.

Even if you fine tune WAF and DoS protection, there are still advantages to an external filtering between Akamai and your SIEM ingestion.

Akamai has a proprietary authentication mechanism that is not widely supported.

  • It is more difficult to directly collect all logs - in particular Prolexic integration is almost impossible without your own connectors.
  • Feedback loop must be implemented internally, including a logic for, e.g., adjustments based on changes in genuine traffic v DoS limits.

The External Filtering Architecture

Effective Akamai SIEM integration requires data processing layer between Akamai telemetry generation and SIEM ingestion:

Component 1: Data Collection from Akamai

Akamai provides multiple mechanisms for log delivery:

DataStream - Pushes logs to customer-configured endpoints (S3, Azure Blob, Google Cloud Storage, HTTPS endpoints)2. Near-real-time delivery with configurable aggregation windows.

SIEM integration modules - Akamai provides pre-built connectors for major SIEM platforms. These modules handle authentication and initial data formatting.

API-based retrieval - Programmatic log access through Akamai APIs for custom integration scenarios.

Critical choice: Where does filtering occur?

Option A: Filter at SIEM ingestion
Data flows: Akamai → SIEM → Filter → Storage
Problem: SIEM has already consumed licensing capacity evaluating data before filtering

Option B: Filter before SIEM ingestion
Data flows: Akamai → Filter → SIEM → Storage
Advantage: Only relevant data consumes SIEM licensing

Option B is architecturally correct. Requires infrastructure to receive Akamai logs, apply filtering logic, forward filtered subset to SIEM.

Component 2: Filtering Logic Implementation

What constitutes security-relevant event? The shortest definition is: any unusual event. So, if you log billions of items with the volume following overall traffic patterns, the unusual events become invisible.

High-priority events:

  • WAF blocks of attacks
  • Rate limiting violations from "untrusted" IP addresses
  • Geographic anomalies (traffic from countries not in expected baseline)
  • Failed authentication attempts - rate limits on selected endpoints
  • Protocol violations indicating reconnaissance

Medium-priority events (forward with sampling):

  • Custom rules, tailored monitoring, events not relevant to a particular application
  • Routine health checks / traffic spikes

Low-priority events (archive without SIEM forwarding):

  • Legitimate requests (misconfigured WAF / DoS)
  • Normal traffic within expected patterns but triggering misconfigured WAF rules
  • Informational logging

You should never log low-priority events, unless it's a short-term troubleshooting. Medium-priority events should be regularly reviewed and disabled when not required.

This approach reduces SIEM volume by >99% while retaining security visibility.

Component 3: External Processing Infrastructure

Organizations implement filtering layer using:

Serverless computing (AWS Lambda, Azure Functions, Google Cloud Functions)
Advantages: Scales automatically with data volume, pay-per-execution pricing, no infrastructure management
Challenges: Execution time limits, stateless processing model, cold start latency

Stream processing platforms (Apache Kafka, AWS Kinesis, Azure Event Hubs)
Advantages: High throughput, persistent buffering, exactly-once processing semantics
Challenges: Operational complexity, infrastructure management overhead

Log aggregation tools (Logstash, Fluentd, Vector)
Advantages: Rich filtering capabilities, multiple output destinations, extensive plugin ecosystem
Challenges: Requires dedicated infrastructure, scaling complexity

I prefer serverless computing with direct forwarding to the SIEM itself. One example of the approach: serverless filtering using AWS Lambda:

  1. Akamai DataStream pushes logs to S3 bucket - with a preset max. lifetime of data to manage the amount of data in S3
  2. S3 event triggers Lambda function
  3. Lambda applies filtering logic
  4. Filtered events published to SIEM ingestion endpoint
  5. Full logs retained in S3 Glacier for compliance, if required, simply updating data lifecycle

Cost comparison:

Full SIEM ingestion: ~$80,000 annually
Serverless filtering + selective SIEM: ~$2,400 annually (Lambda execution) + ~$1,200 annually (reduced SIEM consumption) = ~$3,600 total

Cost reduction: 95%


The Compliance Challenge: Retention vs. Analysis

Security operations faces competing requirements:

Compliance mandates - Regulations (PCI-DSS, GDPR, SOX, HIPAA) require security log retention ranging from 90 days to 7 years depending on industry and jurisdiction.

Operational analysis - Security teams need rapid query capabilities for incident investigation. Historical trend analysis requires months or years of searchable data.

Economic constraint - SIEM platforms optimized for rapid search at high cost. Compliance storage requires cost-efficiency over query performance.

Traditional approach: store everything in SIEM, accept high costs as compliance requirement.

Alternative architecture:

Hot storage (SIEM) - Recent security-relevant events for active investigation. Retention: 30-90 days. Optimized for query performance.

Warm storage (intermediate tier) - Filtered security events beyond active investigation window. Retention: 90 days to 1 year. Slower query performance, lower cost.

Cold storage (compliance archive) - Complete raw logs for compliance requirements. Retention: 1-7 years. Minimal query capability, minimal cost.

Example cost structure:

Tier Storage Volume Cost/GB/Month Monthly Cost
Hot (SIEM) 90 days 45 GB $50 $2,250
Warm (S3 Standard) 30 days 2 TB $0.023 $46
Cold (S3 Glacier) 5 years 120 TB $0.004 $480

Total: $2,800/month vs. $547,500/month storing everything in SIEM.

This architecture requires:

  1. Clear data classification (what goes to which tier)
  2. Automated lifecycle policies (data migration between tiers)
  3. Procedures for cold storage retrieval when needed
  4. Validation that compliance requirements are met

The Real-Time Processing Challenge

Security monitoring requires near-real-time visibility. Attackers operate on timescales measured in minutes. Detection delays measured in hours render defenses ineffective.

External filtering architecture introduces processing latency:

Typical data flow timeline:

  • T+0: Event occurs at Akamai edge
  • T+60 seconds: Event aggregated in DataStream batch
  • T+90 seconds: Batch delivered to S3
  • T+95 seconds: S3 event triggers Lambda function
  • T+100 seconds: Lambda processes and filters data
  • T+105 seconds: Filtered events forwarded to SIEM
  • T+120 seconds: SIEM indexes data for query

Total latency: ~2 minutes from event occurrence to SIEM visibility.

This delay is for visibility - Akamai itself blocks attack traffic as soon as rate limits are breached or malicious content detected. This is instant.

Architectural solution: Multi-tier detection

Tier 1: Real-time blocking at Akamai
WAF rules configured to automatically block high-confidence attacks. No SIEM involvement. Immediate response.

Tier 2: Near-real-time SIEM alerting
Filtered events trigger SIEM correlation rules. Security team investigates within minutes. Suitable for medium-confidence threats requiring human analysis.

Tier 3: Historical analysis
Cold storage enables investigation of attack campaigns spanning weeks or months. Latency irrelevant for retrospective analysis.

This separates immediate response (automated) from investigation (human-driven with acceptable latency).


What We Built at the Financial Institution

Initial state: Akamai SIEM integration attempted direct log forwarding. SIEM storage exhausted in 4 days. Retention reduced to 10 days to manage capacity. Compliance requirements violated. Cost exceeded budget by $120,000 annually.

Transformation architecture:

AWS Lambda filtering layer

  • Akamai DataStream → S3 bucket
  • Lambda triggered by S3 events
  • Filtering logic: attack score thresholds, geographic anomalies, rate limit violations
  • Filtered events → SIEM via HTTPS endpoint

Tiered storage strategy

  • Hot (SIEM): 90 days, security-relevant events only
  • Warm (S3 Standard): 30 days, raw logs
  • Cold (S3 Glacier): 5 years, complete raw logs

Volume reduction

  • Akamai telemetry: 47 GB/day
  • After filtering: 890 MB/day (98% reduction)
  • SIEM consumption decreased from 73% to 4% of total capacity

Cost impact

  • Previous annual cost: $142,000 (SIEM licensing + overages)
  • Post-implementation: $5,800 (Lambda execution + reduced SIEM + S3 storage)
  • Cost reduction: 96%

Compliance validation

  • Complete raw logs retained 5 years (regulatory requirement)
  • Security-relevant events queryable in SIEM (operational requirement)
  • Audit trail demonstrating data handling procedures

Timeline: 8 weeks from design to production deployment.

Operational improvement: SIEM query performance improved 40x (smaller dataset to search). Security team response time decreased from minutes to seconds during active investigations.

The integration that broke SIEM economics became cost-effective operational capability.


The Organizational Challenge: Who Owns the Filtering Layer?

External filtering architecture creates organizational complexity:

Security team perspective: "We need all security data for complete visibility. Filtering might miss critical events."

Infrastructure team perspective: "SIEM costs are out of control. We need aggressive filtering to manage budget."

Compliance perspective: "Regulations require comprehensive log retention. We cannot selectively discard data."

These perspectives aren't wrong - they represent legitimate but competing priorities.

Resolution requires:

Joint ownership model - Security team defines what constitutes security-relevant event. Infrastructure team implements filtering infrastructure. Compliance validates retention policies.

Transparency in filtering logic - Document exactly what gets filtered and why. Security team can audit filtering rules to ensure critical events aren't excluded.

Escape valve for investigation - When security team needs comprehensive data for specific investigation, they can query cold storage directly. Slower but possible.

Quarterly review process - Evaluate filtering effectiveness. Are we missing important events? Are we forwarding noise that should be filtered?

At the bank, we discovered filtering logic initially excluded important correlation signals. Security team couldn't identify attack campaigns because filtering removed context needed to connect related events.

Refinement: Instead of binary filtering (forward or discard), implemented sampling for medium-priority events. Don't forward every instance, but forward representative sample. Security team gained pattern visibility without overwhelming SIEM.

This required iterative refinement over 3 months. Initial filtering logic was too aggressive. Adjusted based on security team feedback. Converged on balance between cost control and visibility.

The technical implementation is straightforward. The organizational alignment process requires time.


Related Resources


References

  1. Gartner Research. (2023). "Market Guide for Security Information and Event Management." Gartner ID G00789932.
  2. Akamai Technologies. (2024). "DataStream 2 User Guide." Akamai TechDocs. https://techdocs.akamai.com/datastream2/docs
  3. Amazon Web Services. (2024). "AWS Lambda Pricing." AWS Documentation. https://aws.amazon.com/lambda/pricing/
  4. NIST. (2023). "Guide to Computer Security Log Management." NIST Special Publication 800-92.
  5. PCI Security Standards Council. (2023). "Payment Card Industry Data Security Standard v4.0." https://www.pcisecuritystandards.org/
  6. Splunk. (2024). "Splunk Pricing and Licensing." Splunk Documentation.
  7. Elastic. (2024). "Logstash Reference." Elastic Documentation.
  8. Cloud Security Alliance. (2023). "Security Guidance for Cloud Computing v5.0." CSA Industry Guidance.
  9. Apache Software Foundation. (2024). "Apache Kafka Documentation." https://kafka.apache.org/documentation/
  10. Miller, D., et al. (2021). "Cost-Effective Security Event Management Through Intelligent Log Filtering." IEEE Security & Privacy.