Kubernetes RBAC Deep Dive: Securing Multi-Tenant Clusters
I’m going to say something that’ll upset people: if your developers have cluster-admin access in production, you’re running on borrowed time. I don’t care how small your team is. I don’t care if “everyone’s responsible.” It’s insane, and I’ve got the scars to prove it.
This article is the RBAC deep dive I wish I’d had before a developer on my team ran kubectl delete namespace production-api on a Friday afternoon. Not maliciously. He thought he was pointed at his local minikube. He wasn’t. That namespace had 14 services, and we spent the weekend rebuilding it from manifests that were — let’s be generous — “mostly” up to date.
That Monday, I locked down RBAC properly. Here’s everything I learned.
What RBAC Actually Is (and Isn’t)
RBAC — Role-Based Access Control — is Kubernetes’ authorization mechanism. It answers one question: “Is this identity allowed to do this thing to this resource?” That’s it. It doesn’t handle authentication (that’s your identity provider’s job). It doesn’t handle network-level isolation. It’s purely about API server authorization.
The building blocks are simple:
- Role — a set of permissions scoped to a single namespace
- ClusterRole — a set of permissions scoped cluster-wide
- RoleBinding — attaches a Role to a user, group, or service account within a namespace
- ClusterRoleBinding — attaches a ClusterRole to an identity cluster-wide
That’s the entire model. Four objects. The complexity isn’t in the primitives — it’s in how you compose them for real multi-tenant clusters where teams shouldn’t be able to touch each other’s stuff.
The Multi-Tenant Problem
When I say “multi-tenant,” I mean multiple teams sharing a single Kubernetes cluster. Maybe it’s your platform team, your backend team, and your data team all on the same cluster. Each team gets their own namespaces, and the expectation is that Team A can’t accidentally (or intentionally) mess with Team B’s workloads.
Without RBAC, every authenticated user can do everything. Read secrets from any namespace. Delete deployments anywhere. Scale down someone else’s statefulset. It’s the Wild West, and it only works until it doesn’t.
Here’s what a sane multi-tenant RBAC setup looks like at a high level:
- Each team gets dedicated namespaces
- Team members get Role/RoleBindings scoped to their namespaces only
- A small platform team gets broader (but still not unlimited) ClusterRole access
- Service accounts are per-application, per-namespace, with minimal permissions
- Nobody — and I mean nobody — gets cluster-admin in production except break-glass scenarios
Building Namespace-Scoped Roles
Let’s start with the most common pattern: giving a development team access to their own namespace. Here’s a Role that lets developers do their day-to-day work without being able to do anything catastrophic:
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: team-backend
name: backend-developer
rules:
- apiGroups: ["apps"]
resources: ["deployments", "replicasets"]
verbs: ["get", "list", "watch", "create", "update", "patch"]
- apiGroups: [""]
resources: ["pods", "pods/log", "pods/exec", "services", "configmaps"]
verbs: ["get", "list", "watch", "create", "update", "patch"]
- apiGroups: [""]
resources: ["secrets"]
verbs: ["get", "list"]
- apiGroups: ["networking.k8s.io"]
resources: ["ingresses"]
verbs: ["get", "list", "watch"]
Notice what’s missing? No delete on deployments. No access to namespaces themselves. No ability to modify secrets (only read). No access to persistent volumes or storage classes. This is deliberate. Developers can deploy and debug, but they can’t nuke things or escalate privileges.
Now bind it:
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: backend-team-developers
namespace: team-backend
subjects:
- kind: Group
name: "backend-developers"
apiGroup: rbac.authorization.k8s.io
roleRef:
kind: Role
name: backend-developer
apiGroup: rbac.authorization.k8s.io
I always bind to groups, never individual users. When someone joins or leaves the team, you update group membership in your identity provider — not in Kubernetes manifests. This scales. Individual user bindings don’t.
ClusterRoles for Platform Teams
Your platform or SRE team needs broader access. They’re managing the cluster itself — nodes, namespaces, CRDs, cluster-wide networking. But even here, I don’t hand out cluster-admin. I build a custom ClusterRole:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: platform-engineer
rules:
- apiGroups: [""]
resources: ["namespaces", "nodes", "persistentvolumes"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["namespaces"]
verbs: ["create", "update", "patch"]
- apiGroups: ["rbac.authorization.k8s.io"]
resources: ["roles", "rolebindings"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
- apiGroups: ["networking.k8s.io"]
resources: ["networkpolicies", "ingresses"]
verbs: ["*"]
- apiGroups: ["policy"]
resources: ["podsecuritypolicies"]
verbs: ["get", "list", "watch"]
- apiGroups: ["apps"]
resources: ["deployments", "statefulsets", "daemonsets"]
verbs: ["get", "list", "watch"]
The platform team can create namespaces and manage RBAC bindings (so they can onboard new teams), but they can’t delete namespaces or modify workloads directly. If a deployment needs fixing, the owning team does it. The platform team manages the platform, not the applications.
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: platform-team-binding
subjects:
- kind: Group
name: "platform-engineers"
apiGroup: rbac.authorization.k8s.io
roleRef:
kind: ClusterRole
name: platform-engineer
apiGroup: rbac.authorization.k8s.io
Service Accounts: The Forgotten Attack Surface
Here’s where most teams get sloppy. Every pod in Kubernetes runs with a service account. If you don’t specify one, it gets the default service account for that namespace. And that default service account? It often has more permissions than you’d expect, especially if someone’s been lazy with RoleBindings.
I create a dedicated service account for every application, with exactly the permissions it needs and nothing more:
apiVersion: v1
kind: ServiceAccount
metadata:
name: order-processor
namespace: team-backend
automountServiceAccountToken: false
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: team-backend
name: order-processor-role
rules:
- apiGroups: [""]
resources: ["configmaps"]
verbs: ["get", "watch"]
resourceNames: ["order-processor-config"]
- apiGroups: [""]
resources: ["secrets"]
verbs: ["get"]
resourceNames: ["order-processor-db-credentials"]
Two things to notice. First, automountServiceAccountToken: false. Unless your application actually talks to the Kubernetes API, don’t mount the token. Most apps don’t need it, and that token is a privilege escalation vector if the pod gets compromised. Second, I’m using resourceNames to restrict access to specific configmaps and secrets. The order-processor can read its own config and its own database credentials. It can’t list all secrets in the namespace.
Then in the deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
name: order-processor
namespace: team-backend
spec:
template:
spec:
serviceAccountName: order-processor
automountServiceAccountToken: true # this app needs API access
containers:
- name: order-processor
image: myregistry/order-processor:v2.1.0
This pairs well with pod security policies to create defense in depth. RBAC controls what the service account can do; pod security controls what the container itself can do.
Aggregated ClusterRoles: DRY Permissions
One pattern that saved me a ton of duplication: aggregated ClusterRoles. Instead of copying the same rules into every team’s Role, you define building blocks and compose them.
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: workload-reader
labels:
rbac.mycompany.com/aggregate-to-developer: "true"
rules:
- apiGroups: ["apps"]
resources: ["deployments", "replicasets", "statefulsets"]
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: pod-debugger
labels:
rbac.mycompany.com/aggregate-to-developer: "true"
rules:
- apiGroups: [""]
resources: ["pods", "pods/log", "pods/exec"]
verbs: ["get", "list", "watch", "create"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: team-developer
aggregationRule:
clusterRoleSelectors:
- matchLabels:
rbac.mycompany.com/aggregate-to-developer: "true"
rules: [] # rules are automatically filled by the controller
The team-developer ClusterRole automatically inherits rules from any ClusterRole with the matching label. Need to give all developers access to a new CRD? Create one small ClusterRole with the right label. Done. Every team’s developer role picks it up without touching their bindings.
You then use RoleBindings (not ClusterRoleBindings) to scope this ClusterRole to specific namespaces:
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: data-team-developer-binding
namespace: team-data
subjects:
- kind: Group
name: "data-developers"
apiGroup: rbac.authorization.k8s.io
roleRef:
kind: ClusterRole
name: team-developer
apiGroup: rbac.authorization.k8s.io
This is a subtlety people miss: you can bind a ClusterRole with a RoleBinding, and it only grants those permissions within that namespace. The ClusterRole is just a reusable template at that point.
The Break-Glass Pattern
I promised nobody gets cluster-admin. That’s the steady state. But emergencies happen — a node goes sideways at 3am, and your on-call engineer needs to do things that their normal role doesn’t allow.
I handle this with a break-glass ClusterRoleBinding that’s normally inactive. We use a short-lived token approach:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: emergency-admin
annotations:
break-glass/reason: "Emergency access - requires incident ticket"
break-glass/expires: "2h"
subjects:
- kind: Group
name: "oncall-emergency"
apiGroup: rbac.authorization.k8s.io
roleRef:
kind: ClusterRole
name: cluster-admin
apiGroup: rbac.authorization.k8s.io
The oncall-emergency group is empty by default. When someone needs emergency access, they go through a workflow (we use a Slack bot backed by a Lambda) that adds them to the group, creates an audit trail, and automatically removes them after two hours. Every use triggers a post-incident review.
Is this more overhead than just giving the SRE team cluster-admin? Yes. Has it prevented accidental damage? Also yes. The friction is the feature.
Auditing: Trust but Verify
RBAC is only as good as your ability to verify it’s working. I run two checks regularly.
First, kubectl auth can-i is your best friend for spot-checking:
# Can the backend team delete namespaces?
kubectl auth can-i delete namespaces \
--as-group=backend-developers \
--as=test-user -n team-backend
# no
# Can they view pods in their namespace?
kubectl auth can-i get pods \
--as-group=backend-developers \
--as=test-user -n team-backend
# yes
# Can they view pods in another team's namespace?
kubectl auth can-i get pods \
--as-group=backend-developers \
--as=test-user -n team-data
# no
Second, I dump all permissions for a given identity periodically and review them. Here’s a quick script I keep around:
#!/bin/bash
# audit-rbac.sh - dump effective permissions for a group
GROUP=$1
NAMESPACES=$(kubectl get ns -o jsonpath='{.items[*].metadata.name}')
for ns in $NAMESPACES; do
echo "=== Namespace: $ns ==="
kubectl auth can-i --list \
--as-group="$GROUP" \
--as=audit-user \
-n "$ns" 2>/dev/null | grep -v "^Resources"
echo ""
done
Run this monthly. Diff it against the previous run. Permission creep is real, and it’s how you end up back at “everyone can do everything” six months after you locked things down.
Common Mistakes I’ve Seen (and Made)
Wildcard verbs on secrets. I’ve seen verbs: ["*"] on secrets more times than I can count. This means the identity can create, update, and delete secrets — which in practice means they can overwrite TLS certificates, database credentials, anything. Read-only on secrets unless there’s a specific, documented reason.
Forgetting about escalate and bind. These are special verbs for RBAC resources. If someone can bind roles, they can grant themselves any permission that role contains. If they can escalate, they can modify a role to include permissions they don’t currently have. Lock these down to your platform team only.
Not restricting pods/exec. Giving someone exec access to pods is essentially giving them shell access to your containers. That’s fine for debugging in dev, but in production it should be tightly controlled. A compromised developer laptop plus pods/exec access equals game over.
Ignoring the default service account. Every namespace has one. If you’ve bound any roles to it (or if an operator has), every pod in that namespace inherits those permissions unless it specifies a different service account. I strip all bindings from default service accounts and set automountServiceAccountToken: false on them.
Putting It All Together
Here’s the mental model I use for any new cluster. It’s not complicated, but it requires discipline:
- Namespaces per team, per environment.
team-backend-prod,team-backend-staging, etc. - Aggregated ClusterRoles as reusable permission templates.
- RoleBindings per namespace, bound to IdP groups.
- Per-application service accounts with
resourceNamesrestrictions. - Network policies to complement RBAC at the network layer.
- Pod security to restrict container-level capabilities.
- Break-glass process for emergencies, with automatic expiry and audit trail.
- Monthly permission audits with
kubectl auth can-i --list.
The developer who deleted our production namespace? He’s still on the team. He’s actually one of our strongest advocates for strict RBAC now. Turns out, people appreciate guardrails once they’ve experienced what happens without them.
If you’re just getting started with Kubernetes, RBAC might feel like overhead. It’s not. It’s the difference between a cluster you can trust and one you’re afraid to look at on Monday morning. And if you’re running anything resembling multi-tenant — even just two teams on the same cluster — you need this. Not next quarter. Now.
The cluster-admin days are over. Good riddance.
For a broader look at securing Kubernetes beyond just RBAC, I’ve written about the full threat model and defense-in-depth approach separately.