May 11, 2026
21 min read

Building secure and scalable MCP servers

Building secure and scalable MCP servers

For a single developer working locally, running export GITHUB_TOKEN=... before starting a model context protocol (MCP) server is common. It is quick and convenient. However, it becomes a problem when the same pattern is extended to production secrets and the environment.

In production, credentials must have a defined rotation policy. Security teams require SOC 2 audit logs, and multiple engineers need controlled access without manually sharing a single token. None of that works when secrets are shell-exported into runtime environments.

This article focuses on building secure and scalable MCP servers. It covers production architecture for secrets management, deployment patterns, runtime security, multi-user access, monitoring, and incident response.

TLDR

To build secure and scalable production MCP servers, you need:

  • Security principles such as least privilege, credential isolation, rotation readiness, defense in depth, and auditability
  • A production architecture that includes a containerized runtime, centralized secrets management(e.g., Doppler), strict network isolation, and structured monitoring with tracing
  • Multi-user controls such as per-user or role-based credentials, OIDC validation, and no shared global tokens
  • A zero-downtime rotation strategy
  • Incident response readiness

Five security principles for production MCP servers supporting AI systems

There are five security principles that you must apply to your MCP workflow for your systems to be deemed resilient. These practices should serve as a universal protocol for designing and enforcing security policies. Otherwise, as systems scale, they can unravel from the foundation. Let’s observe what they are:

1. Least privilege

Your MCP servers should never be given full admin rights to your system. Always apply the absolute minimum permissions required to perform specific tasks. For a GitHub MCP server, this could mean applying metadata:read only when the repository context is needed.

Least privilege should also apply to tool exposure. MCP servers should dynamically limit the tools returned by list_tools based on the authenticated user's permissions. If a user does not have write access, destructive tools such as delete_repository should not be exposed at all.

Limiting privilege reduces the blast radius and attackers' ability to infiltrate your entire system in the event of a breach. Below is a checklist for implementing least privilege.

Implementation checklist:

  • Scope API tokens to read-only unless write access is required.
  • Restrict credentials to specific resources rather than granting global access.
  • Avoid wildcard permissions in IAM or API scopes.
  • Review and remove unused permissions regularly.
  • Prevent MCP service accounts from having infrastructure-level admin rights.
  • Dynamically scope exposed MCP tools at runtime based on the caller's role or identity.

2. Credential isolation

Most of the time, secrets used for staging and development are given more privileges than they should. It is important to avoid promoting or persisting with such secrets in production. Use different credentials across all environments.

Doppler’s UI, for example, provides separate configs within a single project for dev, staging, and production secrets. Secrets stored in each config are properly isolated from one another.

Implementation checklist:

  • Use separate credentials for dev, staging, and production.
  • Restrict access to production secrets to approved roles only.
  • Store secrets in a centralized secret management system, not in local files.
  • Use separate credentials per MCP tool when needed.

3. Rotation readiness

It must be possible to change your secrets instantly without downtime. Hardcoded tokens make that impossible. Any change requires a code update, a commit, and a redeploy.

In production, your MCP server should be configured to receive secret updates at runtime. When a secret changes, the new value is injected without restarting the entire system. A leaked key can be rotated in under five minutes using this model.

Implementation checklist:

  • Do not hardcode secrets in source code or container images.
  • Do not bake .env files into production builds.
  • Fetch secrets at runtime.
  • Read secrets from memory instead of static environment files.
  • Define and document a secret refresh interval.
  • Create and test an emergency rotation runbook.
  • Revoke old credentials without restarting all MCP instances.

4. Defense in depth

Your MCP servers should not rely on a single security control. Security must be layered so that if one control fails, others continue to protect the system.

So, not every server needs to be publicly accessible. Not every service should be permitted to perform unrestricted file operations against internal environments or external data sources. Use network-level controls such as security groups, network ACLs, and Kubernetes NetworkPolicies to restrict ingress and egress traffic.

Encryption should protect data in transit between the appropriate MCP servers and backend APIs.TLS prevents interception and credential leakage during service-to-service communication.

Implementation checklist:

  • Avoid exposing MCP servers publicly unless required.
  • Isolate internal services using network segmentation.
  • Enable TLS for all service-to-service communication.
  • Never transmit secrets in plain text.
  • Use Kubernetes NetworkPolicies or cloud security groups to restrict east-west traffic.

5. Auditability

A system with activity history and audit logs becomes the difference between guessing and knowing when something unusual happens. Logging secret access, system interactions, identity usage, and potential training data exposure provides visibility into how the MCP server responds to requests and operates.

Your logs should answer three questions: who accessed the system, what was accessed, and when it was accessed.

Implementation checklist:

  • Log all secret access events.
  • Record the caller's identity for every access.
  • Capture timestamps for each event.
  • Retain audit logs for at least 90 days.
  • Centralize logs in a secure logging system.
  • Restrict access to logs to authorized personnel.
  • Monitor logs for anomalies and suspicious behavior.

These practices translate directly into the design of production-ready MCP servers.

Core components of a production-grade MCP architecture

A client-server architecture in MCP aims to create a controlled context layer between AI models, internal systems, and carefully exposed external capabilities. MCP works to expose available tools and structured data to the model through the MCP protocol, but only within clearly defined boundaries.

A production-grade MCP server architecture must remove ambiguity around three areas: the security of the runtime environment, the source and handling of credentials, and the blast radius in the event of compromise. It must also make detection and incident response straightforward. If any of these areas are unclear, the system will not pass security review.

Below is a production reference that shows the primary components of an MCP architecture.

Production-grade mcp server architecture showing request flow, runtime isolation, Doppler-managed secrets, and centralized monitoring with audit logs and rotation events.
Production-grade mcp server architecture showing request flow, runtime isolation, Doppler-managed secrets, and centralized monitoring with audit logs and rotation events.

This structure separates request handling at the transport layer, secret management, and backend interaction, while maintaining observability across the entire request path.A production-ready server requires these key components to meet operational and security expectations. The sections below walk through each of them.

Containerized MCP server

The aim of creating a dedicated environment for each MCP server is to run it inside an immutable and unprivileged execution environment. In this case, using gcr.io/distroless/nodejs or node:alpine is preferred for implementing MCP containers. Distroless is particularly suitable because it removes the shell and many common OS utilities, which reduces the available execution surface.

Avoid using npm start inside containers, as it may not correctly forward OS signals such as `SIGTERM` for graceful shutdown. Instead, invoke Node directly to start your server. Additionally, use a multi-stage build to keep the final image size small and reduce the attack surface.

Secrets management: Doppler + Kubernetes

A secure way to manage secrets and inject them into your MCP environment is by using the Doppler Kubernetes Operator. Secrets stored in Doppler can be synced from the Doppler API into a native Kubernetes Secret object using the operator. The MCP server then consumes these secrets as environment variables.

This keeps secrets storage centralized while allowing runtime injection through native Kubernetes mechanisms.

However, for teams that do not want secrets materialized as Kubernetes Secrets, the init container method can be used. The init container talks to the Doppler API, downloads the secrets, and writes them to a shared memory volume such as /dev/shm. When the main MCP server starts, it reads the secrets from that mounted path.

Network isolation

MCP servers communicate frequently with APIs from servers such as GitHub, Slack, weather data services, or internal services like databases. Always apply network policies so that each server instance only reaches the backends it needs.

For example, if an AI agent only needs to read Jira tickets and search Confluence pages, you can allow outbound access to the Jira and Confluence APIs while blocking access to the production customer database, the payments API, and the open internet.

A good practice is to start with a default-deny egress policy and then add allow rules one backend at a time.

Monitoring and observability

You need to track whether your production server is healthy and how AI models, including MCP LLMs it provides context to, interact with your internal and external tools. Logs must be available for debugging and tracing, but raw logs are not enough. They should follow a structured format so they can be queried, used to access real-time data, show how users interact, and be understood quickly.

Use a logger like Pino or Winston to output JSON logs. Add redaction rules to prevent sensitive information, such as secrets and credentials, from appearing in logs.

Your telemetry should cover metrics across the protocol layer, including tool calls via JSON-RPC, request durations, errors, credential fetch time, and how the server executes each request. Add OpenTelemetry so you can follow the request lifecycle from user request to credential fetch, analyze MCP interactions, and backend API call.

Modeling your MCP server on this architecture ensures that your secrets and internal tools remain secure in production. However, once multiple engineers begin using a shared MCP deployment, additional controls are required.

Securing multi-user MCP deployments for MCP hosts and AI applications

When one person uses an MCP server locally, it often runs with a single service credential, and that works. But when 100 engineers use one centralized MCP server in production, an important question must be answered: When the model processes client requests such as create_issue, whose permission does that represent?

Is it a shared company bot? An individual user? Or a team-scoped role credential?

If the answer is a shared credential that every engineer uses, then you already have a security concern.

For multi-user MCP deployments operating under a client-server model, the patterns below address how identity and credential boundaries should be handled.

Pattern 1: Per-user credential injection for MCP clients (remote MCP servers)

This pattern is common for MCP servers running as web services. The server should be configured to retrieve an authorization token tied to the user’s session rather than from static configuration.

Furthermore, the server should not blindly trust a token the client sends during authentication. An OpenID Connect (OIDC) flow must be used during the initialization process to validate the user’s identity and verify the authorization context before the request is processed. Only after validation should the backend API be called using that user’s delegated permissions and validated input parameters.

In this pattern, the MCP server remains stateless with respect to long-term credentials. It does not store user tokens beyond the lifetime of the request.

The example below shows how an MCP server validates an identity using OIDC before processing user interaction or client requests.

Pattern 2: Proxy pattern (stdio MCP servers)

Many MCP servers are designed to run over standard input and output (stdio), which works well for local processes. Here, the host application starts the MCP server as a sub-process on the same machine. But in production environments where hundreds of engineers use a centralized deployment, it becomes impractical to manage separate OS-level sub-processes per session.

This 1:1 stateful connection between the host and the server process does not scale properly in containerized environments such as Kubernetes, where load balancing and horizontal scaling are expected. It increases resource usage and makes it harder to centralize identity control, logging, and policy enforcement. For production deployments, HTTP-based transports such as Server-Sent Events (SSE) or WebSockets integrate better with load balancers, ingress controllers, and centralized proxy patterns.

A practical solution is to deploy a centralized MCP proxy that sits between MCP hosts and multiple server instances. The LLM sends the request to the proxy. The proxy identifies the user over a dedicated connection, using the session context or authorization header, then fetches the appropriate user- or role-scoped credentials before forwarding the request to the MCP server.

Architectural diagram showing an MCP Host sending requests through a centralized MCP Proxy, which authorizes requests and fetches credentials before forwarding them to various MCP Server Instances
Architectural diagram showing an MCP Host sending requests through a centralized MCP Proxy, which authorizes requests and fetches credentials before forwarding them to various MCP Server Instances

Pattern 3: Role-based credentials

In large organizations, managing unique credentials for every user accessing an MCP deployment can become complex. An alternative approach is to define role-based credentials aligned with teams or functions, such as SRE, frontend, or engineering.

Each function here would have clearly defined permissions that map to specific MCP-supported tools, custom integrations, server capabilities, or backend APIs. And data access would be tied to that function rather than an individual user. When a member of the SRE team invokes the MCP server, the server retrieves an SRE-scoped credential. When a junior developer invokes it, a read-only credential may be used instead.

When these multi-user patterns are implemented correctly, your MCP deployment can support multiple identities without encroaching identity boundaries. But that identity isolation alone is not enough for securing MCP secrets. Credentials still need to be rotated regularly, and how rotation is handled determines whether your system experiences downtime or remains in continuous operation.

Zero-downtime credential rotation

An important question to ask when designing reliable MCP deployments is: What happens to active traffic when credentials change? In production, there are many reasons for credentials change. It could be due to token expiration, leaked secrets, scheduled rotation, or even when auditors request proof of rotation. Irrespective of the trigger, it is critical to structure secret rotation to avoid downtime.

Manual rotation should not even be considered part of a production strategy. It leads to irregular rotation patterns, stale credentials, and downtime. When a token is rotated manually, the configuration must be updated, and services restarted. In a multi-instance deployment, this introduces failure windows and 401 spikes while instances reload.

A safer approach is the dual-credential phase. The MCP server is designed to temporarily accept two credentials: a primary token (new) and a fallback token (old). When rotation begins, the new token is introduced while the old token remains valid. If the new token is not yet active on the provider side, the server falls back to the old token. The old credential is only revoked after all instances confirm successful use of the new one.

The current and previous tokens come from a secrets source that can expose both versions under one stable key.

Doppler supports this approach within its rotated secrets functionality. During rotation, Doppler maintains both the new and previous credentials for an overlap window. This enables server instances to retrieve and begin using the new secret before the old credential is revoked. Whether rotation is scheduled or triggered, the overlap period reduces the risk of authentication failures during propagation.

A rotation timeline can be standardized for your environment to ensure sufficient propagation time across all server instances or pods.

TimeAction

T+0:00

New token generated

T+0:01

Token rotated in Doppler

T+0:15

All MCP instances receive updated secret

T+0:15

Old token revoked

T+0:16

Old token version removed from Doppler

Automated dual-phase rotation not only prevents downtime but also eliminates credential rotation as a bottleneck during incident response.

Responding to MCP security incidents

A team must maintain a defined incident response strategy for MCP-related security events. While detection, response, and post-incident hardening should follow a structured process, different scenarios may require different containment steps. Below are common examples.

Scenario 1: Credential leaked via git commit

A developer accidentally commits a .env file or hardcodes a service token into a repository.

Detection: Most code hosting platforms provide secret scanning on commit. Alerts may be triggered when .env files or hardcoded credentials are detected in a repository.

Response timeline:
T+0m: Automatic alert via Slack or PagerDuty.
T+2m: Immediately revoke the exposed credential.
T+5m: Generate and rotate a new secret through a secret manager such as Doppler to avoid further hardcoding or environment file exposure.

Post-incident hardening: Implement pre-commit hooks using tools such as Husky or secret-scanning utilities to prevent secrets from being committed locally. Restrict credential scope where possible.

Scenario 2: Suspicious API activity

Your database API, responsible for retrieving customer records or consumer profiles, shows a sudden spike in calls.

Detection: Anomaly detection through distributed tracing or metrics monitoring flags abnormal behavior. OpenTelemetry traces show a single user_id triggering the same MCP tool hundreds or thousands of times per minute.

Response timeline:
T+0m: Identify the affected MCP server instance or associated user identity using structured logs and trace IDs.
T+1m: Disable or revoke the credential to immediately stop further backend access.
T+10m: Review LLM conversation logs and request payloads to determine whether the behavior was caused by prompt injection, automation abuse, or compromised credentials.

Post-incident hardening: Implement rate limiting at the MCP proxy layer, add anomaly thresholds on tool invocation frequency, and restrict tool discovery to approved roles only.

Scenario 3: MCP server container compromised

An attacker exploits a vulnerability in a Node.js dependency used by your MCP server and gains shell access inside the container.

Detection: Runtime security tools such as Falco or Sysdig detect anomalous behavior, such as a shell being spawned inside the container or unexpected outbound network connections.

Response timeline:
T+0m: Update the NetworkPolicy to block lateral movement to internal databases and backend services.
T+1m: Terminate the compromised pod. In an immutable deployment, a clean pod should automatically replace it.
T+3m: Assume all environment variables holding sensitive data were exposed. Rotate every secret in the affected environment immediately.

Post-incident hardening: Adopt distroless base images that remove the shell and unnecessary binaries. Review container permissions and reduce network egress to only required backends.

When you have a proper incident response in your workflow, you layer your security against serious damage if a breach occurs. It should be part of a mandatory checkpoint before deploying to production.

Pre-production security checklist

A pre-production checklist should serve as the final checkpoint before deployments. Ensure that you validate the following control categories.

Infrastructure

  • Your MCP servers run in a containerized, immutable environment.
  • Containers run as a non-root user.
  • Distroless or minimal base images are used.
  • NetworkPolicies enforce restricted egress.
  • No unnecessary public-facing access.

Credentials

  • Secrets are managed through a secrets manager like Doppler.
  • No secrets are hardcoded in your source code.
  • Rotation interval is defined (for example, 30 days).
  • No .env files are used in production.
  • Dual-phase rotation strategy is implemented.
  • Emergency rotation procedure is documented.

Access control

  • No shared credentials across all users.
  • Per-user or role-based credential model is defined.
  • OIDC validation is enforced for user identity.
  • Least privilege is enforced on backend APIs.
  • Role permissions are documented and reviewed.

Monitoring

  • Structured JSON logging is enabled.
  • Secrets are redacted from logs.
  • Tool invocation metrics are configured to be captured.
  • Credential fetch latency is trackable.
  • OpenTelemetry tracing is configured.

Audit and compliance

  • Secret access and Rotation events are set to be logged.
  • Audit logs are configured be retained for at least 90 days.
  • Access to audit logs is restricted.
  • Security review documentation is set up.

Incident response

  • Credential leak response playbook is defined.
  • Container compromise response is documented.
  • Rotation under five minutes is tested.
  • Alerting is integrated with Slack/PagerDuty.
  • Post-incident review process is defined.

Developer experience

  • Secret scanning is enabled in CI.
  • Pre-commit hooks prevent secret commits.
  • Local development uses a separate environment config.
  • Documentation is available for onboarding.

What’s next

Building secure MCP servers that scale right requires deliberate effort to define a layered security model and architecture that supports growth. This guide covered various foundational principles, including least privilege and rotation readiness. It also discussed production patterns such as credential isolation, zero-downtime rotation, and structured incident response. Treat these practices as security baselines to check against your MCP ecosystem.

Here is a practical rollout plan:

Week 1: Audit your MCP deployment against the pre-production checklist.

Week 2: Migrate to Doppler for centralized and scalable secrets management.

Week 3: Implement containerization for your MCP servers and enforce network isolation.

Week 4: Set up monitoring, logging, tracing, and include production runbooks.

Within 2–3 months: Enable automated rotation and validate incident response drills.

Enjoying this content? Stay up to date and get our latest blogs, guides, and tutorials.

Related Content

Explore More