Measurement
These practices measure an operating model to decide if it's worthy of further investment. The cost effectiveness of business outcome protection is an informative, actionable measure of an operating model.
Leading and trailing measures can be used to gauge progress in deployment throughput, service reliability, and learning culture. It's important to select measures that are holistic, actionable, and do not create the wrong incentives. See Measuring Continuous Delivery by Steve Smith.

These practices are linked to our principles of operating models are insurance for business outcomes and operating models are powered by feedback.
Measure the deployment throughput of digital services as:
Measure | Type | Description | Suggested Implementation |
---|---|---|---|
Loose Coupling | Leading | Is a product team able to change a digital service without orchestrating simultaneous changes with other digital services | Inspect deployment history. Calculate how many deployments of different digital services occurred in lockstep |
Pre-Approved Changes | Leading | Does a product team have permission to pre-approve its own change requests for low risk, repeatable changes | Inspect ticketing system. Calculate how many digital services have pre-approved change request templates |
Deployment Lead Time | Trailing | How long does a release candidate for a digital service take to reach deployment | Instrument deployment pipeline. Calculate days from production deployment back to build date |
Deployment Frequency | Trailing | How often are digital services deployed to customers | Instrument deployment pipeline. Calculate rate of production deployments |
Measure the reliability of digital services as:
Measure | Type | Description | Suggested Implementation |
---|---|---|---|
Failure Design | Leading | Does a digital service include bulkheads, caches, circuit breakers | Survey product teams. Are they confident they can cope with downstream failures |
Service Telemetry | Leading | Does a digital service have custom logging, monitoring, and alerts | Survey product teams. Are they confident they can quickly diagnose abnormal operating conditions |
Availability | Trailing | What is the availability rate of a digital service, as a Nines Of Availability e.g. 99.9% | Instrument digital services. Calculate percentage of requests that are fully or partially successful |
Time To Restore Availability | Trailing | How long does it take to re-attain an availability target once it has been lost | Instrument digital services. Calculate minutes from loss of availability target to re-attainment of availability target |
Financial Loss Protection Effectiveness | Trailing | What amount of expected financial loss per incident is protected by a faster than expected Time To Restore Availability | Instrument digital services and incident financial losses. Calculate maximum incident financial loss, up to time to restore allotted by availability target. Calculate percentage of maximum incident financial loss protected by a time to restore faster than allotted by availability target |
Financial loss protection effectiveness is a measure we've used with customers to gauge the cost effectiveness of You Build It You Run It at scale. It's a check of projected financial loss per incident that is unrealised, because the actual time to restore is faster than the projected time to restore. It's a comparison that can be made between digital services, product teams, and even operating models.
In the earlier furniture retailer example, there was a furniture retailer example with a bedroom digital service. It had a maximum financial exposure of $200K per hour, and a 99.0% availability target.
Software Service | Maximum Financial Exposure In An Hour | Availability Target | Tolerable Unavailability In A Week |
---|---|---|---|
bedroom | $200K | 99.0% | 1:40:48 |
Assume the bedroom service is unavailable for 30 mins, and the incident financial loss is calculated as $100K. The tolerable unavailability per week for 99.0% is 1 hour 41 mins, which would have produced a $336K incident financial loss. Financial loss protection can therefore be calculated as 70% of a projected incident financial loss was unrealised, as $236K of a theoretical $336K was protected by a faster than expected time to restore.
Software Service | Incident Duration | Incident Financial Loss | Incident Financial Loss At Tolerable Unavailability Per Week | Incident Financial Loss Protection | Incident Financial Loss Protection Effectiveness % |
---|---|---|---|---|---|
bedroom | 0:30:00 | $100K | $336K | $236K | 70% |
For more on measuring financial loss protection effectiveness, see our case study on How to do digital transformation at John Lewis & Partners.
Measure the progress of a learning culture as:
Measure | Type | Description | Suggested Implementation |
---|---|---|---|
Post-Incident Review Readers | Leading | How many readers does a post-incident review for a digital service have | Instrument post-incident reviews. Calculate how many unique visitors each review receives |
Chaos Days Frequency | Leading | How often are Chaos Days run for a digital service | Instrument Chaos Day reviews. Calculate rate of review publication |
Improvement Action Lead Time | Trailing | How long does an improvement action from a post-incident review for a digital service take to be implemented | Instrument post-incident reviews. Calculate days from improvement action definition to completion |
Improvement Action Frequency | Trailing | How often are improvement actions implemented for a digital service | Instrument post-incident reviews. Calculate days between implementation of improvement actions |
Trends in post-incident reviews are of particular interest. Usage in training materials and citations in internal company documents could also be included. See Markers of progress in incident analysis by John Allspaw.
You Build It You Run It on the Xinja banking platform |
---|
This customer was a fintech startup, with very limited money and skilled employees. It was a greenfield project in every sense, including business processes as well as technical implementation. I worked on the single technical team responsible for delivery, and therefore the onus of the entire system was on us. It was the classic case for You Build It You Run It.
To achieve this, automation was key. Adoption of Continuous Delivery enabled us to focus our energy on development, not deployments. All our changes were tested and deployed through a single automated pipeline, for simplicity and ease of tracking. Operational tasks needed to be in code as well, to allow for seamless deployments of infrastructure and application changes.
Monitoring and system reliability were at the heart of the system design as well, since reduced issues meant more time for development. Code quality and system health awareness checks also steered the interests of our team.
Since the business and operational knowledge was contained within our team, the customer only needed to contact us to raise their issues. This meant issues were resolved rapidly, and learnings from the issues were fed back into the design, testing and monitoring of the system.
![]() |
Last modified 1yr ago