Security Data Lake

Their SIEM was choking. Retention limited to 90 days because of storage costs. No way to do historical analysis or hunt for slow-burn attacks. When they needed to investigate something from 6 months ago, the data was just gone. Security isn't just about real-time alerts. Sometimes you need to look back and connect dots that weren't obvious at the time. You can't do that if you're throwing away evidence every 90 days. I built a security data lake in BigQuery that could store years of telemetry at a fraction of SIEM costs. All cloud logs, network flows, endpoint data, authentication events—everything went into the lake with infinite retention. The architecture separated hot and cold data. Recent events stayed in the SIEM for real-time detection. Everything older than 90 days got moved to the lake for long-term analysis. Analysts could query across both seamlessly. Used this for threat hunting that wasn't possible before. Found an APT that had been operating for 8 months by analyzing authentication patterns across the entire dataset. The attacker was careful—never triggering rate limits, always operating during business hours, slowly expanding access. Individual events looked normal. The pattern over months was clearly malicious. Built ML models on the historical data to establish behavioral baselines. What's normal for this user, this time of day, this type of resource access. Used those baselines to detect anomalies that wouldn't show up in rule-based detection. The cost savings were significant. SIEM costs dropped 60% while actually increasing retention from 90 days to 3 years. More data, better analysis, lower cost—rare trifecta in security. When they got subpoenaed for security records in a legal case, we could actually provide complete evidence instead of saying "that data was deleted."

Client

Financial Services Firm

Deliverables

Centralized ingestion
Long-term retention
Advanced analytics
Threat hunting

Year

2026

Role

Security Data Engineer

Next Project