These practices refer to the initial selection of an operating model. We believe a hybrid operating model is essential, because we agree with Martin Fowler's statement that a software service is either a strategic differentiator or a utility. We consider a digital service to be a strategic differentiator, and a foundational system to be a utility.
We've created an operating model selector. Once you understand your failure tolerances and different levels of feature demand, you can map out which operating model is right for you.
To help understand which operating model is best for your software service, follow these selection practices.
Financial exposure is the maximum revenue loss and costs a software service can incur upon failure. An availability target is a desired level of availability, and expressed as a number of nines. An additional nine of availability means more implementation effort.
Calculate an availability target for a software service via this process:
- 1.Estimate financial exposure bands for availability levels. This is a two-step process for all software services:
- 1.Estimate high, medium, and low exposure bands on organisation-wide financial forecasts, and historic incident losses. We ask "what data do we have for how much money we could lose in an hour?"
- 2.Assign the availability levels 99.9%, 99.0%, and 95.0% to the high, medium, and low exposure bands. 99.99% can be used if your organisation has a genuine need for extreme reliability. We ask "for this exposure band, how much unavailability are we actually willing to tolerate - and how much of the exposure are we able to accept as a loss?"
- 2.Calculate the availability target, by estimating financial exposure. This is a multi-step process:
- 1.A product manager estimates their service exposure based on their own financial forecasting. We ask "what's the maximum revenue loss and costs we'll incur if this software service is unavailable for an hour?"
- 2.Automatically link a service exposure to the highest exposure band it fits into. We ask "what's the financial exposure band that covers this service exposure?"
- 3.Automatically link a service exposure onto the availability target for a particular exposure band. We ask "what's the availability target for the financial exposure band that covers this service exposure?"
- 3.Periodically reflect on financial exposure and availability. This is for all software services. Compare recent incident losses with our availability target calculator, on at least a quarterly basis. We ask "do we have any new financial data that warrants an update to our financial exposure bands, our availability levels, or the availability targets for our software services?"
Assume a furniture retailer has a self-hosted COTS ecommerce platform, a custom bedroom frontend, and a custom appointments frontend. Financial losses from prior incidents for multiple software services are examined. Their different traffic profiles and different incident losses allow for an arbitrary grouping into financial exposure bands of $7K, $100K, and $800K loss in one hour.
The furniture retailer examines its appetite for unavailability at its different exposure bands, and commits to 95.0%, 99.0%, and 99.9% as its availability levels. For example, for the $800K exposure band 99.9% of a week is 167 hours 49 mins 55 secs, so the tolerable unavailability in one week of 0.01% is 10 mins 5 secs.
Product managers then use their service forecasts to estimate a maximum financial exposure of $810K in an hour for the ecommerce platform, plus $6K and $200K for appointments and bedroom frontends.
Each software service is matched to an availability target via its service exposure. For example, the bedroom service is assigned a 99.5% availability target, as its service exposure of $200K in an hour falls under the 99.5% exposure band of $250K.
The furniture retailer values its financial exposure bands once a quarter. Financial losses from prior incidents are reviewed, and if necessary the financial exposure bands are recalculated. This ensures the risk of financial exposure is revalidated as the business changes.
Indirect financial losses caused by reputational damage also need to be considered. In the private sector this includes customer credits and subscription cancellations. In the public sector, it's employee time spent on manual, paper-based fallbacks. Reputational damage needs to be tied into core business metrics such as customer lifetime value and customer satisfaction, and it can be tracked using Net Promoter Score.
This involves a product manager calculating peak and non-peak financial exposures for their software service, based on their financial forecasting. The availability target for the software service is then upgraded just before the peak business event, and downgraded just after the peak business event.
This is an effective cost optimisation, that doesn't affect operability incentives. It replaces the notion of eyes on glass monitoring and peak support in Ops Run It. It ensures a balance between service reliability and run cost for You Build It You Run It.
Product feature demand is customer demand for new product features.
A deployment target is a required level of deployment frequency. An increase in frequency means more implementation effort.
Calculate a deployment target using this process:
- 1.Estimate feature demand bands for all software services. We estimate product feature demand based on organisation-wide financial forecasts, customer research, and live analytics. We ask "how much demand is there in a month for product features?"
- 2.Match deployment frequency levels to feature demand bands. We match a feature demand band with a deployment frequency. We ask "for this feature demand band, how often do we need to deploy product features to satisfy demand?"
- 3.Estimate feature demand for a software service. A product manager estimates their feature demand based on customer research and analytics. We ask "what's the feature demand we'll see from customers, for this software service?"
- 4.Calculate the feature demand band for a software service. We automatically link service demand with the feature demand band it fits into. We ask "which of our feature demand bands covers this service demand?"
- 5.Calculate the deployment target for a software service. We automatically link service demand onto the deployment target for a particular feature demand band. We ask "what's the deployment target for the feature demand band that covers this service demand?"
- 6.Periodically reflect on feature demand bands and deployment targets. We compare recent analytics with our deployment target calculator, on at least a quarterly basis. We ask "do we have any new data that suggests an update to our feature demand bands, or our deployment frequency levels?"
The furniture retailer defines relative feature demand bands, from low to very high. The deployment frequency of one software service relative to another is recognised as the most important property. Relative feature bands allow the unique characteristics of different furniture domains to be taken into account.
The furniture retailer identifies monthly, fortnightly, weekly, and daily deployments as its deployment frequency levels, and ties them to feature demand bands.
Product managers estimate low demand for the ecommerce platform, high demand for the appointments frontend, and very high demand for the bedroom frontend.
Each software service is matched to a deployment target by its service demand. For example, the bedroom service is assigned to daily deployments, as its service demand is believed to be very high.
Adopt You Build It You Run It for any foundational system or digital service, when you require extreme reliability:
- 99.99% availability protection.
- 1 minute of tolerable unavailability per week.
It's rare for our customers to truly require 99.99% availability. The engineering effort involved is enormous.
Adopt You Build It You Run It for your digital services, when you have these desired outcomes:
- Weekly, daily, or more frequent deployments.
- 95.0% to 99.99% availability protection.
- 9 hours to 10 minutes of tolerable unavailability per week.
For the furniture retailer, the appointments and bedroom frontends match with You Build It You Run It.
In the private sector, it's possible to have digital services with a low level of revenue exposure, and high feature demand. For example, a medical publishing company with a per-institution subscription model could have customers clamouring for new features, but a website failure would not incur direct revenue loss as customer dissatisfaction is not linked to subscription renewal. Operational costs and indirect revenue loss would need to be factored into the availability target calculation.
We'd urge caution if you believe a low availability target and a low deployment target apply to your digital services. Low financial exposure on failure and low product feature demand could signify you're building the wrong thing for your customers.
Select Ops Run It for your self-hosted COTS applications and custom integrations, when your desired outcomes are:
- Monthly to fortnightly deployments.
- 95.0% to 99.9% availability protection.
- 9 hours to 10 minutes of tolerable unavailability per week.
Foundational systems usually require infrequent code changes after launch.
For the furniture retailer, the ecommerce-platform COTS application matches with Ops Run It at a higher availability target.