
In February 2026, a small startup faced a large Google Cloud bill after a privilege escalation incident. The API key used for Google Maps and Firebase inherited Gemini generative AI permissions when the Gemini API was enabled. This was later exploited by attackers. By the time a developer detected the issue, the cost of that breach had already reached $82,314. When he immediately revoked the key, it broke their own production environment.
This incident raises a key question: Could better incident response practices have reduced the impact?
When you revoke before mapping, it breaks production. If you delay during scoping, you extend the exposure window. This tradeoff is exactly why incident response needs a structured approach based on NIST SP 800-61 Rev. 3 and CSF 2.0 guidance.
This article adapts that guidance phase by phase to help your team create a decision-gated sequence for more effective and efficient responses to secret incidents.
The table below provides a quick view of each phase in a secrets incident playbook, including what to do, the decision gate, and common mistakes to avoid.
| Phase | What you do | Decision gate | Common mistake |
|---|---|---|---|
1. Decision | Monitor for anomaly signals such as API spikes, unusual IP traffic, and unexpected resource usage. Include collaboration tools in scanning. | Is this a confirmed credential-related incident or noise? | Treating operational alerts as security signals or relying on a single signal |
2. Blast radius analysis | Identify all services, pipelines, and environments that use the credential. Trace dependencies and assess permissions. | Do we fully understand where the credential is used and what it can access? | Revoking before mapping dependencies or limiting scope to the leak source |
3. Containment | Revoke the compromised credential at the provider and verify invalidation. Notify relevant teams and track MTTRv. | Has the credential been fully invalidated without causing unintended outages? | Revoking without notifying stakeholders or delaying due to uncertainty |
4. Rotation | Issue a new credential, propagate it across systems, validate usage, then revoke the old one. Use an overlap window. | Is the new credential working correctly across all dependent systems? | Rotating after revocation or skipping overlap window usage, leading to race conditions |
5. History cleanup | Remove exposed secrets from Git history and logs using tools like BFG. Coordinate force push and team re-clone. | Has the secret been fully removed from all historical records? | Assuming deleting a commit removes secrets or skipping team coordination |
6. Post-incident | Document the incident, assign owners, track metrics, and tie root causes to guardrails. | Have we clearly documented the root cause and defined the corrective actions required to prevent recurrence? | Treating the incident as resolved without capturing lessons or improvements |
With a summarized view of the phases, it is important to define the roles responsible for each phase.
In responding to a crisis, one of the main causes of delay is a lack of authority. When no one is explicitly assigned to phases during an incident, hesitation becomes common. It is not necessarily about assigning a specific person to each responsibility. In small teams, a single engineer may act as an incident handler and hold multiple roles. The goal is to define each role and its responsibilities clearly, then assign individuals within your incident response team to those roles. The table below maps this out.
| Role | Phase owned | Core responsibility | Escalation authority |
|---|---|---|---|
Incident lead | 1 | Managing the timeline and removing blockers | Can declare a "critical incident" and escalate response |
Blast radius analyst | 2 | Identifying all consumers (application, CI/CD, MCP) | Can veto immediate revocation if risk is too high |
Revocation owner | 3 & 5 | Executing credential revocation and Git history cleanup | Can force-push to main without standard PR review |
Communications lead | 3 & 4 | Managing the notification clock (GDPR/SOC 2) | Can authorize public or customer-facing statements |
Post-incident owner | 6 | Root cause analysis and long-term fixes | Can mandate security-focused remediation work |
It is important to note that GDPR requires the communications lead to notify the supervisory authority within 72 hours of becoming aware of an incident. This timeline starts when the issue is detected, not when the investigation is complete.
Now that roles are clearly aligned to responsibilities, the next step is to walk through each phase in detail.
The first step is configuring your systems to surface the right signals. Your detection setup should answer: Did something abnormal happen? Is a credential involved? Is this likely an incident driven by a cyber threat or just noise?
In most workflows, teams set up alerts for different types of events, such as system failures and cybersecurity incidents. Do not mix up these signals. For example, hardware failures such as OOM kills, deployment errors, and uptime issues are operational alerts. These are not useful for secrets incidents. Configure your systems to look for signals such as API anomalies, unfamiliar IP traffic, and unexpected cloud resource consumption, and provide visibility into these signals to your security operations center. A sudden spike in requests to an admin endpoint, or your clusters doubling in size unexpectedly, are signals that should be tied back to credential usage.
Aside from detection within your infrastructure, teams often overlook collaboration tools such as Slack, Jira, or Confluence. GitGuardian’s State of Secrets Sprawl 2026 report revealed that 28% of incidents occur entirely outside of source code. An engineer might paste production snippets or temporary keys into Slack to help debug an issue. That secret becomes searchable in the tool and may be cached on multiple devices. To address this, implement secret scanning across collaboration tools and their APIs to detect these exposures.
In practice, initial detection is rarely based on a single signal. Combine secret scanning alerts, threat intelligence, usage anomalies, and billing spikes to confirm a breach. Once a credential-related incident is verified, the next step is to map its blast radius.
Blast radius analysis involves mapping every point where an affected secret is stored or used. It's not enough to scope analysis to only where the leak happened. You must figure out the services, pipelines, and environments that reference that credential.
For example, if you trace the exposure of a Stripe API key to a payment microservice, also map the systems that the microservice triggers. This could include a fulfillment worker, a subscription job, and even MCP configuration files. It gives you a clear chain of dependencies.

The dependency chain helps you identify where fallback paths or temporary alternatives are needed to prevent critical services from failing during incident response.
Apart from preventing outages, mapping also forces you to examine the permissions the secret holds. Keys used only to confirm payments might have broader access to sensitive data, such as customer PII. Flag such situations during analysis, as they may require legal or compliance notification prior to revocation.
The table below provides a clear reference for what to flag for escalation based on the scope of access.
| Credential type | Access scope | Priority | Escalation required |
|---|---|---|---|
Root / admin cloud key | Full infrastructure / IAM | Critical | Legal, CTO, CISO |
Database connection string | PII / user data | High | Legal, data privacy officer |
Production API key | Service-to-service flow | Medium | Product / engineering lead |
CI/CD deployment token | Build and deploy systems | High | DevOps lead |
Dev / sandbox key | Non-production data | Low | Engineering lead |
After you have fully determined the blast radius, you can proceed to containment.
At containment, you are aiming to finally stop the bleeding. While urgency is expected, how you go about it safely is just as important. Before any form of credential revocation, notify the individuals or teams responsible for that credential. For example, if production credentials are involved, notify the engineering lead or on-call staff. If the secret involves PII, inform the legal team and identify any potential impact on affected users.
Ensure you revoke the credential at the issuing provider. If it is an AWS access key, revoke it within the AWS account. If it is a GitHub PAT, revoke it within GitHub. Confirm that requests are returning 401 errors to verify the credential is no longer usable before proceeding. Also track the Mean Time to Revoke (MTTRv).
The formula is:
This measures how long it takes to invalidate a compromised credential after detection and helps you evaluate how quickly incidents are contained.
It may seem like slow MTTRv is not tied to how you manage secrets, but managing the same secrets across multiple systems reduces confidence in the blast radius. You spend additional time verifying dependencies before revoking, which increases MTTRv. If you use centralized storage, dependencies will be easier to trace, which allows for faster and more confident revocation. This same factor also affects how quickly you can update secrets during rotation.
The aim of this phase is to replace credentials without significant downtime and restore normal operations. There are two major factors that affect this goal. The first is how secrets are managed, and the other is how rotation is executed. The first directly impacts the second.
When secrets are not managed centrally, you often end up updating them in multiple places after revocation. It could look like this:
With centralized secrets management, updates originate from a single source and propagate across connected systems. For example, Doppler allows cross-environment propagation of secrets. A single root configuration in production can push updated secrets to services such as Kubernetes, Vercel, and GitHub. This reduces the risk of downtime, but correct rotation procedures are still required to avoid it.
The mistake comes from assuming that rotation should happen after revocation. Zero downtime is not guaranteed when you delete an old credential and immediately inject a new one. It almost always leads to a race condition where some service instances receive the new key while others are still using the old one.
To reduce this risk, implement a dual credential state. Configure your workflow to have an overlap window (usually between five and fifteen minutes), where the old credential stays valid alongside the new one. This would allow the new secret to propagate fully before the old one is revoked, helping prevent downtime during rotation.
Before finalizing rotation, confirm that services depending on the new key can authenticate and complete requests without errors. If the old key has already been revoked and the new one is misconfigured, it can trigger a secondary outage.
This sequence can be used as a reference for safe rotation:
To fulfill compliance and auditing requirements, document when the secret was rotated and which systems were affected. Use a centralized system to make it easier to maintain that audit trail. Doppler, for example, provides rotation event logs that can be used during post-incident reviews.
This phase is about ensuring there are no leftover exposed secrets in your logs or Git history. A common misconception is that deleting secrets in a new commit removes them from history. It doesn’t. The secret still exists in previous commits. Anyone who clones the repository can check out earlier revisions or use scanning tools like TruffleHog to discover it.
After rotation and revocation, use BFG Repo-Cleaner to remove secrets and sensitive data from the repository history. While git-filter-repo can also be used, BFG is often preferred for speed.
After cleanup, delete any old objects from your local repository to prevent them from being pushed again. Then force push the cleaned history to the remote branch. Below is an example using BFG:
Cleaning history is a sensitive operation. If a team member still has a local branch based on the old history, pushing changes can reintroduce the secret. Instruct all team members to delete their local copies and re-clone the cleaned repository.
After cleanup, run a full repository scan to verify that no secrets remain in history, and then proceed with post-incident activities.
The goal of incident response is not only to fix the data breach but also to ensure the organization is protected against that specific failure mode. An important step toward this is producing a detailed incident report that captures the incident, the corrective actions taken, their owners, and associated metrics. This document can be used to evaluate how the incident was handled, define strategies to prevent recurrence, and strengthen the organization's security posture.
Include details such as detection timestamp, blast radius, revocation timestamp, MTTRv, root cause, and corrective actions. Map each action to an owner to make it verifiable.
Alongside the documentation, tie each root cause to a technical guardrail. The post-mortem should clearly show how to prevent the failure from happening again and mitigate threats. For example, map hardcoded secrets in .env or .json files to pre-commit hooks that block such commits. For secrets exposed in Slack or Jira, deploy a secret scanning bot for collaboration tools and APIs.
Below is an example of a completed incident record:
Incident ID: SEC-2026-05-02-GCP
Severity: Critical
Summary: Unauthorized Gemini API usage via leaked legacy Maps key.
| Field | Value | Notes |
|---|---|---|
Detection timestamp | 2026-05-02 09:00 UTC | Triggered by unexpected cloud resource consumption |
Revocation timestamp | 2026-05-02 09:45 UTC | Verified via 401 response on Gemini endpoint |
MTTRv | 45 minutes | Target for this tier is < 60 minutes |
Blast radius | GCP Project | Impacted: Payment-Service, Search-UI |
Root cause | Privilege escalation | Legacy key inherited Gemini permissions without restrictions |
Financial impact | $1,280.00 | Captured before the “hockey stick” curve |
Corrective actions:
Just as teams plan different sprints and stages in a software development cycle, secrets incident response should be preemptive. It should follow a standard procedure that is tested and continuously improved until it is efficient enough to quickly stop leaks and meet compliance requirements. Use this playbook as a template for your team. Create a clear sequence of activities that allows you to act without introducing new risks.



Trusted by the world’s best DevOps and security teams. Doppler is the secrets manager developers love.
