
For a single developer working locally, running export GITHUB_TOKEN=... before starting a model context protocol (MCP) server is common. It is quick and convenient. However, it becomes a problem when the same pattern is extended to production secrets and the environment.
In production, credentials must have a defined rotation policy. Security teams require SOC 2 audit logs, and multiple engineers need controlled access without manually sharing a single token. None of that works when secrets are shell-exported into runtime environments.
This article focuses on building secure and scalable MCP servers. It covers production architecture for secrets management, deployment patterns, runtime security, multi-user access, monitoring, and incident response.
To build secure and scalable production MCP servers, you need:
There are five security principles that you must apply to your MCP workflow for your systems to be deemed resilient. These practices should serve as a universal protocol for designing and enforcing security policies. Otherwise, as systems scale, they can unravel from the foundation. Let’s observe what they are:
Your MCP servers should never be given full admin rights to your system. Always apply the absolute minimum permissions required to perform specific tasks. For a GitHub MCP server, this could mean applying metadata:read only when the repository context is needed.
Least privilege should also apply to tool exposure. MCP servers should dynamically limit the tools returned by list_tools based on the authenticated user's permissions. If a user does not have write access, destructive tools such as delete_repository should not be exposed at all.
Limiting privilege reduces the blast radius and attackers' ability to infiltrate your entire system in the event of a breach. Below is a checklist for implementing least privilege.
Implementation checklist:
Most of the time, secrets used for staging and development are given more privileges than they should. It is important to avoid promoting or persisting with such secrets in production. Use different credentials across all environments.
Doppler’s UI, for example, provides separate configs within a single project for dev, staging, and production secrets. Secrets stored in each config are properly isolated from one another.
Implementation checklist:
It must be possible to change your secrets instantly without downtime. Hardcoded tokens make that impossible. Any change requires a code update, a commit, and a redeploy.
In production, your MCP server should be configured to receive secret updates at runtime. When a secret changes, the new value is injected without restarting the entire system. A leaked key can be rotated in under five minutes using this model.
Implementation checklist:
Your MCP servers should not rely on a single security control. Security must be layered so that if one control fails, others continue to protect the system.
So, not every server needs to be publicly accessible. Not every service should be permitted to perform unrestricted file operations against internal environments or external data sources. Use network-level controls such as security groups, network ACLs, and Kubernetes NetworkPolicies to restrict ingress and egress traffic.
Encryption should protect data in transit between the appropriate MCP servers and backend APIs.TLS prevents interception and credential leakage during service-to-service communication.
Implementation checklist:
A system with activity history and audit logs becomes the difference between guessing and knowing when something unusual happens. Logging secret access, system interactions, identity usage, and potential training data exposure provides visibility into how the MCP server responds to requests and operates.
Your logs should answer three questions: who accessed the system, what was accessed, and when it was accessed.
Implementation checklist:
These practices translate directly into the design of production-ready MCP servers.
A client-server architecture in MCP aims to create a controlled context layer between AI models, internal systems, and carefully exposed external capabilities. MCP works to expose available tools and structured data to the model through the MCP protocol, but only within clearly defined boundaries.
A production-grade MCP server architecture must remove ambiguity around three areas: the security of the runtime environment, the source and handling of credentials, and the blast radius in the event of compromise. It must also make detection and incident response straightforward. If any of these areas are unclear, the system will not pass security review.
Below is a production reference that shows the primary components of an MCP architecture.

This structure separates request handling at the transport layer, secret management, and backend interaction, while maintaining observability across the entire request path.A production-ready server requires these key components to meet operational and security expectations. The sections below walk through each of them.
The aim of creating a dedicated environment for each MCP server is to run it inside an immutable and unprivileged execution environment. In this case, using gcr.io/distroless/nodejs or node:alpine is preferred for implementing MCP containers. Distroless is particularly suitable because it removes the shell and many common OS utilities, which reduces the available execution surface.
Avoid using npm start inside containers, as it may not correctly forward OS signals such as `SIGTERM` for graceful shutdown. Instead, invoke Node directly to start your server. Additionally, use a multi-stage build to keep the final image size small and reduce the attack surface.
A secure way to manage secrets and inject them into your MCP environment is by using the Doppler Kubernetes Operator. Secrets stored in Doppler can be synced from the Doppler API into a native Kubernetes Secret object using the operator. The MCP server then consumes these secrets as environment variables.
This keeps secrets storage centralized while allowing runtime injection through native Kubernetes mechanisms.
However, for teams that do not want secrets materialized as Kubernetes Secrets, the init container method can be used. The init container talks to the Doppler API, downloads the secrets, and writes them to a shared memory volume such as /dev/shm. When the main MCP server starts, it reads the secrets from that mounted path.
MCP servers communicate frequently with APIs from servers such as GitHub, Slack, weather data services, or internal services like databases. Always apply network policies so that each server instance only reaches the backends it needs.
For example, if an AI agent only needs to read Jira tickets and search Confluence pages, you can allow outbound access to the Jira and Confluence APIs while blocking access to the production customer database, the payments API, and the open internet.
A good practice is to start with a default-deny egress policy and then add allow rules one backend at a time.
You need to track whether your production server is healthy and how AI models, including MCP LLMs it provides context to, interact with your internal and external tools. Logs must be available for debugging and tracing, but raw logs are not enough. They should follow a structured format so they can be queried, used to access real-time data, show how users interact, and be understood quickly.
Use a logger like Pino or Winston to output JSON logs. Add redaction rules to prevent sensitive information, such as secrets and credentials, from appearing in logs.
Your telemetry should cover metrics across the protocol layer, including tool calls via JSON-RPC, request durations, errors, credential fetch time, and how the server executes each request. Add OpenTelemetry so you can follow the request lifecycle from user request to credential fetch, analyze MCP interactions, and backend API call.
Modeling your MCP server on this architecture ensures that your secrets and internal tools remain secure in production. However, once multiple engineers begin using a shared MCP deployment, additional controls are required.
When one person uses an MCP server locally, it often runs with a single service credential, and that works. But when 100 engineers use one centralized MCP server in production, an important question must be answered: When the model processes client requests such as create_issue, whose permission does that represent?
Is it a shared company bot? An individual user? Or a team-scoped role credential?
If the answer is a shared credential that every engineer uses, then you already have a security concern.
For multi-user MCP deployments operating under a client-server model, the patterns below address how identity and credential boundaries should be handled.
This pattern is common for MCP servers running as web services. The server should be configured to retrieve an authorization token tied to the user’s session rather than from static configuration.
Furthermore, the server should not blindly trust a token the client sends during authentication. An OpenID Connect (OIDC) flow must be used during the initialization process to validate the user’s identity and verify the authorization context before the request is processed. Only after validation should the backend API be called using that user’s delegated permissions and validated input parameters.
In this pattern, the MCP server remains stateless with respect to long-term credentials. It does not store user tokens beyond the lifetime of the request.
The example below shows how an MCP server validates an identity using OIDC before processing user interaction or client requests.
Many MCP servers are designed to run over standard input and output (stdio), which works well for local processes. Here, the host application starts the MCP server as a sub-process on the same machine. But in production environments where hundreds of engineers use a centralized deployment, it becomes impractical to manage separate OS-level sub-processes per session.
This 1:1 stateful connection between the host and the server process does not scale properly in containerized environments such as Kubernetes, where load balancing and horizontal scaling are expected. It increases resource usage and makes it harder to centralize identity control, logging, and policy enforcement. For production deployments, HTTP-based transports such as Server-Sent Events (SSE) or WebSockets integrate better with load balancers, ingress controllers, and centralized proxy patterns.
A practical solution is to deploy a centralized MCP proxy that sits between MCP hosts and multiple server instances. The LLM sends the request to the proxy. The proxy identifies the user over a dedicated connection, using the session context or authorization header, then fetches the appropriate user- or role-scoped credentials before forwarding the request to the MCP server.

In large organizations, managing unique credentials for every user accessing an MCP deployment can become complex. An alternative approach is to define role-based credentials aligned with teams or functions, such as SRE, frontend, or engineering.
Each function here would have clearly defined permissions that map to specific MCP-supported tools, custom integrations, server capabilities, or backend APIs. And data access would be tied to that function rather than an individual user. When a member of the SRE team invokes the MCP server, the server retrieves an SRE-scoped credential. When a junior developer invokes it, a read-only credential may be used instead.
When these multi-user patterns are implemented correctly, your MCP deployment can support multiple identities without encroaching identity boundaries. But that identity isolation alone is not enough for securing MCP secrets. Credentials still need to be rotated regularly, and how rotation is handled determines whether your system experiences downtime or remains in continuous operation.
An important question to ask when designing reliable MCP deployments is: What happens to active traffic when credentials change? In production, there are many reasons for credentials change. It could be due to token expiration, leaked secrets, scheduled rotation, or even when auditors request proof of rotation. Irrespective of the trigger, it is critical to structure secret rotation to avoid downtime.
Manual rotation should not even be considered part of a production strategy. It leads to irregular rotation patterns, stale credentials, and downtime. When a token is rotated manually, the configuration must be updated, and services restarted. In a multi-instance deployment, this introduces failure windows and 401 spikes while instances reload.
A safer approach is the dual-credential phase. The MCP server is designed to temporarily accept two credentials: a primary token (new) and a fallback token (old). When rotation begins, the new token is introduced while the old token remains valid. If the new token is not yet active on the provider side, the server falls back to the old token. The old credential is only revoked after all instances confirm successful use of the new one.
The current and previous tokens come from a secrets source that can expose both versions under one stable key.
Doppler supports this approach within its rotated secrets functionality. During rotation, Doppler maintains both the new and previous credentials for an overlap window. This enables server instances to retrieve and begin using the new secret before the old credential is revoked. Whether rotation is scheduled or triggered, the overlap period reduces the risk of authentication failures during propagation.
A rotation timeline can be standardized for your environment to ensure sufficient propagation time across all server instances or pods.
| Time | Action |
|---|---|
T+0:00 | New token generated |
T+0:01 | Token rotated in Doppler |
T+0:15 | All MCP instances receive updated secret |
T+0:15 | Old token revoked |
T+0:16 | Old token version removed from Doppler |
Automated dual-phase rotation not only prevents downtime but also eliminates credential rotation as a bottleneck during incident response.
A team must maintain a defined incident response strategy for MCP-related security events. While detection, response, and post-incident hardening should follow a structured process, different scenarios may require different containment steps. Below are common examples.
A developer accidentally commits a .env file or hardcodes a service token into a repository.
Detection: Most code hosting platforms provide secret scanning on commit. Alerts may be triggered when .env files or hardcoded credentials are detected in a repository.
Response timeline:
T+0m: Automatic alert via Slack or PagerDuty.
T+2m: Immediately revoke the exposed credential.
T+5m: Generate and rotate a new secret through a secret manager such as Doppler to avoid further hardcoding or environment file exposure.
Post-incident hardening: Implement pre-commit hooks using tools such as Husky or secret-scanning utilities to prevent secrets from being committed locally. Restrict credential scope where possible.
Your database API, responsible for retrieving customer records or consumer profiles, shows a sudden spike in calls.
Detection: Anomaly detection through distributed tracing or metrics monitoring flags abnormal behavior. OpenTelemetry traces show a single user_id triggering the same MCP tool hundreds or thousands of times per minute.
Response timeline:
T+0m: Identify the affected MCP server instance or associated user identity using structured logs and trace IDs.
T+1m: Disable or revoke the credential to immediately stop further backend access.
T+10m: Review LLM conversation logs and request payloads to determine whether the behavior was caused by prompt injection, automation abuse, or compromised credentials.
Post-incident hardening: Implement rate limiting at the MCP proxy layer, add anomaly thresholds on tool invocation frequency, and restrict tool discovery to approved roles only.
An attacker exploits a vulnerability in a Node.js dependency used by your MCP server and gains shell access inside the container.
Detection: Runtime security tools such as Falco or Sysdig detect anomalous behavior, such as a shell being spawned inside the container or unexpected outbound network connections.
Response timeline:
T+0m: Update the NetworkPolicy to block lateral movement to internal databases and backend services.
T+1m: Terminate the compromised pod. In an immutable deployment, a clean pod should automatically replace it.
T+3m: Assume all environment variables holding sensitive data were exposed. Rotate every secret in the affected environment immediately.
Post-incident hardening: Adopt distroless base images that remove the shell and unnecessary binaries. Review container permissions and reduce network egress to only required backends.
When you have a proper incident response in your workflow, you layer your security against serious damage if a breach occurs. It should be part of a mandatory checkpoint before deploying to production.
A pre-production checklist should serve as the final checkpoint before deployments. Ensure that you validate the following control categories.
Infrastructure
Credentials
Access control
Monitoring
Audit and compliance
Incident response
Developer experience
Building secure MCP servers that scale right requires deliberate effort to define a layered security model and architecture that supports growth. This guide covered various foundational principles, including least privilege and rotation readiness. It also discussed production patterns such as credential isolation, zero-downtime rotation, and structured incident response. Treat these practices as security baselines to check against your MCP ecosystem.
Here is a practical rollout plan:
Week 1: Audit your MCP deployment against the pre-production checklist.
Week 2: Migrate to Doppler for centralized and scalable secrets management.
Week 3: Implement containerization for your MCP servers and enforce network isolation.
Week 4: Set up monitoring, logging, tracing, and include production runbooks.
Within 2–3 months: Enable automated rotation and validate incident response drills.



Trusted by the world’s best DevOps and security teams. Doppler is the secrets manager developers love.
