Platform Engineering: Building an Internal Developer Platform

Platform engineering is DevOps done right. Or maybe it’s DevOps with a product mindset. Either way, it’s the recognition that telling every team to “own their own infrastructure” without giving them decent tooling is a recipe for chaos. I’ve watched organisations try the “you build it, you run it” approach and end up with fifteen different ways to deploy a container, nine half-configured Terraform repos, and developers who spend more time fighting YAML than writing features.

An internal developer platform (IDP) is supposed to fix that. It’s the layer that sits between your raw infrastructure and your application teams, offering self-service capabilities with sensible defaults. Golden paths instead of gatekeeping. Guardrails instead of tickets.

But here’s the thing nobody tells you at the conference talks: building the platform is the easy part. Getting people to actually use it is where everything falls apart.

The Platform Nobody Used

I need to tell you about the time I built an internal developer platform that nobody used. It’s not a comfortable story, but it’s an important one.

We had a growing engineering org, maybe sixty developers across eight teams. Deployments were inconsistent. Some teams had solid CI/CD pipelines with GitHub Actions, others were still doing manual deployments. The infrastructure team was drowning in tickets. Classic scaling pain.

So we did what any self-respecting platform team would do: we went away for three months and built something beautiful. A slick CLI tool. A service catalog. Automated environment provisioning. Terraform modules wrapped in abstractions. We even had a portal with a nice UI.

We launched it with a demo. People clapped. Then they went back to their desks and kept doing exactly what they’d been doing before.

The problem was obvious in hindsight. We’d built what we thought developers needed without actually asking them. We’d assumed the pain points. We’d optimised for infrastructure elegance when developers just wanted to ship code without waiting three days for a database. We’d created a product without talking to our customers.

That failure taught me more about platform engineering than any success since. The platform isn’t the technology. It’s the relationship between the platform team and the developers who use it.

What Platform Engineering Actually Is

Let me be precise about this because the term gets thrown around loosely. Platform engineering is the discipline of designing and building toolchains and workflows that enable self-service capabilities for software engineering organisations. The internal developer platform is the product that results from this work.

It’s not just “infrastructure as code with a wrapper.” It’s not a Kubernetes dashboard. It’s not a ticket system with a nicer frontend.

A good IDP reduces cognitive load. A developer shouldn’t need to understand VPC peering to deploy a service that talks to a database. They shouldn’t need to know the difference between an ALB and an NLB to expose an endpoint. They should be able to express what they need in terms they understand, and the platform should handle the how.

This is fundamentally a DevOps culture problem dressed up as a technology problem. You can have the most sophisticated platform in the world, but if your organisation still thinks in terms of “dev team throws code over the wall to ops,” the platform won’t save you.

The shift is from infrastructure as a service request to infrastructure as a product. Your developers are your customers. Your platform is your product. You need product thinking: user research, feedback loops, iteration, and the willingness to kill features that aren’t working.

The Anatomy of an Internal Developer Platform

Every IDP I’ve built or worked with has roughly the same layers, even if the specific tools differ. Understanding these layers helps you figure out where to start and what to prioritise.

The bottom layer is your infrastructure orchestration. This is where your Terraform modules live, your cloud provider APIs, your Kubernetes clusters. It’s the raw capability. Most organisations already have this in some form, even if it’s messy.

Above that sits your configuration management and environment management. How do you define what a “staging environment” means? How do you ensure consistency between environments? This is where GitOps with ArgoCD becomes incredibly valuable, because it gives you a declarative, auditable way to manage environment state.

Then you’ve got your developer interface layer. This is what developers actually interact with. It might be a CLI, a web portal, a set of APIs, or some combination. Backstage from Spotify has become the de facto standard here, but it’s not the only option and it’s not always the right one.

Finally, there’s the integration layer. How does your platform connect to your CI/CD systems, your monitoring stack, your security scanning tools? This is often the most underestimated layer. Getting the integrations right is what makes a platform feel seamless rather than like yet another tool developers have to context-switch into.

The mistake I see most often is teams starting at the developer interface layer. They install Backstage, build some templates, and call it a platform. But if the underlying orchestration and configuration layers are a mess, you’ve just put a pretty face on chaos.

Golden Paths, Not Golden Cages

The concept of golden paths is central to platform engineering, and it’s also the concept most likely to be implemented badly.

A golden path is a supported, well-maintained, opinionated way to accomplish a common task. Want to deploy a new microservice? Here’s the golden path: use this template, it sets up your repo with CI/CD, creates your Kubernetes manifests, provisions a database, configures monitoring and alerting. You can go from idea to production in an afternoon.

The key word is “supported.” A golden path isn’t a mandate. Developers should be able to step off the path when they have a legitimate reason. Maybe they need a technology the platform doesn’t support yet. Maybe they have unusual performance requirements. The platform should make the common case easy, not make the uncommon case impossible.

I’ve seen platform teams get this wrong in both directions. Some build golden paths that are so rigid they become golden cages. You must use this language, this framework, this database. No exceptions. Developers rebel, fork the templates, and you’re back to chaos.

Others build golden paths that are so flexible they’re meaningless. Here’s a template that generates a Dockerfile. That’s it. Good luck with everything else. Developers look at it, shrug, and keep doing their own thing.

The sweet spot is opinionated defaults with escape hatches. Your golden path should handle 80% of use cases beautifully. For the other 20%, provide building blocks that developers can compose themselves, with documentation that explains the tradeoffs.

Starting Small: The Thinnest Viable Platform

If you’re starting from scratch, resist the urge to build everything at once. I learned this the hard way with that platform nobody used. Start with the thinnest viable platform, the smallest thing that provides real value to real developers.

Talk to your developers first. Not a survey. Actual conversations. Sit with them while they deploy. Watch where they get stuck. Ask what they dread. You’ll find the pain points quickly, and they’re rarely what you expect.

In my experience, the highest-value starting points are usually one of these:

Environment provisioning. If developers wait days for a new environment, fix that first. Give them self-service environment creation with sensible defaults. Even if it’s just a CLI command that runs some Terraform and updates a config file, that’s a massive win.

Service scaffolding. If every new service requires copying an old repo and spending two days ripping out the previous team’s business logic, build a template. A good create-service command that generates a repo with CI/CD, basic monitoring, and deployment configuration saves enormous time.

Database provisioning. This one comes up constantly. Developers need a database. They file a ticket. Someone provisions it manually three days later. Give them a self-service way to get a database with appropriate defaults for their environment.

Pick one. Build it. Ship it. Get feedback. Iterate. Then pick the next one.

This is where site reliability engineering principles intersect with platform engineering. You’re not just building features; you’re building reliability into the developer workflow from the start. Every golden path should include monitoring, alerting, and sensible resource limits by default.

The Platform Team as a Product Team

This is the mindset shift that separates platform engineering from traditional infrastructure teams. You’re not a service desk. You’re not a gatekeeper. You’re a product team, and your product is the developer experience.

That means you need the same things any product team needs. A roadmap driven by user needs, not by what’s technically interesting. Regular user research. Usage metrics. A feedback mechanism that’s low-friction enough that developers actually use it. A willingness to deprecate things that aren’t working.

I run platform teams with a few non-negotiable practices:

Weekly office hours. Developers can drop in, ask questions, report problems, or just vent. This is where you learn what’s actually happening on the ground. The developer who’s been silently working around a bug in your CLI for three weeks will mention it casually during office hours. You’d never find it in a ticket queue.

Embedded rotations. Platform engineers spend time embedded with application teams, pairing on real work. Nothing teaches you about your platform’s rough edges faster than trying to use it for a real project under real deadlines.

Internal SLOs. Yes, your platform should have service level objectives. How long does it take to provision a new service? What’s the success rate of deployments through the golden path? If you can’t measure it, you can’t improve it.

A public roadmap. Developers should know what’s coming and have a way to influence priorities. Transparency builds trust, and trust is the currency of platform adoption.

The hardest part of running a platform team is saying no. You’ll get requests for every possible feature and integration. You need to be ruthless about prioritisation. A platform that does three things well beats a platform that does thirty things poorly.

Technical Decisions That Matter

I’m not going to prescribe a specific tech stack because context matters enormously. But there are a few technical decisions that I’ve seen make or break platforms.

Abstraction level is the big one. How much of the underlying infrastructure do you expose to developers? Too little and they can’t debug problems. Too much and you haven’t actually reduced cognitive load. I lean toward exposing the “what” and hiding the “how.” Developers should know they have a PostgreSQL database with 16GB of RAM. They shouldn’t need to know it’s running on a specific EC2 instance type in a specific subnet.

API-first design matters more than you’d think. Your platform should be programmable. If the only way to interact with it is through a web UI, you’ve already lost the developers who live in their terminals. Build APIs first, then build UIs and CLIs on top of them. This also makes integration with CI/CD pipelines dramatically easier.

State management is where platforms get complicated. Your platform is managing state across multiple systems: cloud provider resources, Kubernetes objects, DNS records, certificates, monitoring configurations. You need a clear model for how state is tracked, how drift is detected, and how conflicts are resolved. GitOps helps enormously here because it gives you a single source of truth, but it’s not a complete answer for every type of resource.

Secrets management deserves special attention. Every developer needs secrets. Database credentials, API keys, certificates. If your platform doesn’t have a good story for secrets, developers will put them in environment variables, commit them to repos, or paste them in Slack. Build secrets management into your golden paths from day one.

Measuring Platform Success

You need to know if your platform is actually working. Not in the “we built cool stuff” sense, but in the “we’re making developers more productive” sense.

The metrics I care about most:

Time to first deployment. When a new developer joins or a new service is created, how long until code is running in production? This is your north star metric. If your platform is working, this number should be dropping.

Deployment frequency. Are teams deploying more often? More frequent deployments usually mean smaller changes, which mean lower risk and faster feedback loops.

Golden path adoption rate. What percentage of new services use the golden path? If it’s below 70%, something’s wrong. Either the golden path doesn’t cover enough use cases, or it’s too painful to use, or developers don’t know about it.

Platform NPS. Ask developers: would you recommend this platform to a colleague? It’s a blunt instrument, but it captures sentiment that usage metrics miss. A developer might use your platform because they have to, not because they want to. NPS tells you the difference.

Ticket volume for the platform team. This should be trending down over time as self-service capabilities mature. If it’s going up, you’re building the wrong things or your documentation isn’t good enough.

Don’t measure lines of YAML generated or number of Terraform modules created. Those are vanity metrics. Measure outcomes for developers.

The Adoption Problem

Let me come back to where I started: the platform nobody used. Because adoption is the existential challenge of platform engineering.

You can’t mandate adoption. Well, you can, but mandated adoption breeds resentment and workarounds. Developers are creative people. If they don’t want to use your platform, they’ll find ways around it that are worse than not having a platform at all.

Adoption has to be earned. Your platform has to be genuinely better than the alternative. Not theoretically better. Not better-according-to-the-architecture-diagram better. Actually, tangibly, measurably better for the developer sitting at their desk trying to ship a feature.

A few tactics that have worked for me:

Find your champions. In every engineering org, there are developers who are enthusiastic about tooling and infrastructure. Find them. Give them early access. Incorporate their feedback. When they start telling their teammates “you should try this, it’s actually good,” that’s worth more than any launch announcement.

Solve a burning problem first. Don’t start with the elegant long-term vision. Start with the thing that’s causing the most pain right now. If developers are spending hours debugging deployment failures, fix that. Quick wins build credibility.

Make migration gradual. Don’t ask teams to rewrite everything to use your platform. Provide migration paths that let them adopt incrementally. Maybe they start by using your CI/CD templates but keep their existing infrastructure. Then they migrate their infrastructure. Then they adopt your monitoring defaults. Each step should provide standalone value.

Document relentlessly. Not just how to use the platform, but why. Developers are more likely to adopt something when they understand the reasoning behind the design decisions. “We chose this approach because…” is more persuasive than “Do it this way.”

Where This Is All Going

Platform engineering isn’t a fad. The problems it solves, cognitive load, inconsistency, slow feedback loops, scaling engineering organisations, those problems aren’t going away. If anything, they’re getting worse as systems get more distributed and the cloud-native ecosystem gets more complex.

What I expect to see is platforms becoming more opinionated and more automated. The golden paths of today require developers to make choices. The golden paths of tomorrow will make more of those choices automatically based on context. You’re deploying a service that handles payment data? The platform knows that means encryption at rest, specific compliance controls, and restricted network access. You don’t have to ask for it.

I also expect the boundary between platform engineering and site reliability engineering to blur further. The platform isn’t just about provisioning; it’s about the entire lifecycle. Deployment, observability, incident response, capacity planning. The platform team that only handles provisioning is leaving value on the table.

But the core principle won’t change. Talk to your developers. Build what they need. Make the right thing the easy thing. Iterate relentlessly. Treat your platform as a product, not a project.

That platform nobody used? We eventually rebuilt it. This time we started with conversations, not code. We sat with developers for two weeks before writing a single line. The second version was less technically impressive but wildly more successful. Turns out developers didn’t want a portal. They wanted a CLI that worked and documentation that didn’t lie.

The best internal developer platform is the one developers actually choose to use. Everything else is just infrastructure with a marketing problem.