Ephemeral Environments And DORA Metrics

DevOps Research and Assessment (DORA) Metrics are the industry standard when it comes to measuring your engineering org’s performance. Improving your DORA Metrics isn’t something that you can do overnight; good DORA Metrics are a result of dedicated efforts to improve your infrastructure, people, and process.

While ephemeral environments are only part of the DORA equation, they can have a significant impact on how you build, test, and ship software. Once you remove infrastructure blocks and pain points from the equation, you’ll start to see improvements to people and process.

Change lead time

Change lead time is measured as the length of time it takes for a commit to get deployed to production. For reference, top-performing orgs have change lead time of under an hour, whereas low-scorers can have lead times upwards of months. With ephemeral environments, you can shorten lead times by:

Testing code changes immediately: instead of waiting for staging to free up, your PR’s ephemeral environment spins up quickly and automatically, allowing you to test, iterate, and test again until your changes are ready for merge.
Sharing environments across teams: design, product, and code reviews don’t have to be a big deal anymore — you can simply direct your collaborators to a live, interactive environment reflecting your code changes for async feedback.

Deployment frequency

Deployment frequency reflects how often code changes are deployed to production. Many orgs aspire to achieve multiple deployments per day: true continuous delivery. Deployments happen more often with ephemeral environments, owing to:

Smaller deployments: since you can test earlier and at the PR level, you can deploy individual commits as soon they’re verified by your CI/CD pipeline.
Testing features individually: with static environment limitations, E2E and integration tests happen infrequently, and against multiple features at a time. Testing features independently quickly identifies bugs and regressions and eliminates unnecessary wait time.

Change fail percentage

Change fail percentage is the percentage of deployments that cause production failures. Keeping this number low is crucial; and anything above 10% can indicate that your delivery process and/or test suite are not reliable. You can keep this low with ephemeral environments for a few reasons:

More opportunities for stakeholder review: traditional static staging environments aren’t usually shareable or user-friendly. When done right, ephemeral environments are easily distributed across teams and domains, getting more eyes and more UAT on your code changes.
Increased test exposure: in addition to enabling more human testing, ephemeral environments also enable easier, more frequent automated testing. Since they can spin up earlier on in the pipeline, you’ll be testing against code changes multiple times per day, instead of per week — meaning features can be tested 10-40x more prior to deployment.

Failed deployment recovery time

Failed deployment recovery time is the average time it takes to restore service after a deployment causes a failure or introduces a regression. Ideally, there should be little to no turnaround. Ephemeral environments help you keep this low by:

Expediting your remediations: test your patches and hotfixes as soon as they’re ready, getting instant feedback and then merging/deploying within the same hour.
Production-like conditions: your ephemeral environments simulate production pretty closely, so you can test remediations/patches with your entire test suite before they hit production and have more confidence once you deploy.

Ephemeral Environments And DORA Metrics

Change lead time

Deployment frequency

Change fail percentage

Failed deployment recovery time

Content Contributors