Comment on page
You Build It You Run It governance is a radical departure from Ops Run It. These practices are a clear, long-term commitment to You Build It You Run It. They're hard to implement, but if none of them are put in place it's unlikely your on-call product teams will be successful. Here's an approximate sequencing to implementing governance practices.
Empower the budget holders for product teams to be accountable for deployment throughput, service reliability, and the learning culture for their digital services. The budget holder for a product team will be one of:
- Your Head of Product, if team funding comes from a product budget
- Your Head of Delivery, if your organisation has separate Product and IT departments and team funding comes from an IT budget
Making this change encourages the budget holders to translate business goals into operational objectives, incorporate customer feedback into development activities, and strike a balance between delivering product features and operational features. They devolve responsibilities such as availability target selection and on-call support to their product teams.
This is a radical departure from Ops Run It, in which the Head of Operations is accountable for the reliability of all software services. The Head of Operations retains accountability for foundational systems in You Build It You Run It. We recommend this RACI model:
Operability incentives are maximised for product teams when their budget holders are accountable for live support. If this practice isn’t implemented, you’ll suffer from the responsible but unaccountable pitfall.
Pay for on-call support for a digital service from the product team budget. This means the product team budget holder pays for:
- On-call standby costs. Compensation for one or more product team members making themselves available in case of an out of hours incident. This is part of the ongoing run cost.
- On-call callout costs. Compensation for one or more product team members responding to an out of hours incident. This is a per-incident cost.
If the on-call opex budget is owned by your Head of Operations, transfer their budget line item for digital services into an on-call capex budget allocated to the product team budget holders. Split the on-call capex budget into a line item per digital service, and allocate each line item to the corresponding product team. This maximises incentives for each product team budget holder to:
- Establish desired business outcomes prior to any development efforts.
- Choose an appropriate availability target that balances business outcomes insurance with a run cost estimate.
- Prioritise the protection of live product functionality alongside the delivery of new product features.
- Lend their credibility to organisational changes that improve on-call experience, such as product teams automating their own telemetry toolchains.
- Incorporate graceful degradation and adaptability into the customer experience.
Offer a level of pay to on-call developers for 24x7 support that recognises the inconvenience of out of hours support. We recommend choosing one of these payment models, based on your own organisational context:
We don't recommend callout payments only, as it's unfair remuneration. It's important to compensate your L1 on-call developers for the disruption that on-call standby causes to their lives outside of work. Always being available via phone, pager, and/or laptop out of hours has an impact on people.
Changing remuneration means updating contracts with suppliers and employees, to recognise 24x7 L1 support as a paid activity for product team developers. This can be a difficult process, and there's no one right answer on how much to pay. Transparency during this process is important, so people can understand the available funding. There's a lot of variability in how organisations approach this. See 2019 On-call compensation survey by Spike Lindsey et al.
We've listened to senior managers in different organisations say that You Build It You Run It payment models are too expensive compared to Ops Run It, particularly if operations teams are outsourced to a third party managed service for lower costs. It's a flawed comparison, because operating models are insurance for business outcomes. You Build It You Run It has a more cost effective premium than Ops Run It for digital services, where higher standards of deployment throughput and service reliability are required. Ops Run It remains the right choice for COTS applications.
Your product team developers already have intrinsic motivation to be on-call. It's vital they have an extrinsic motivation as well, that recognises their contribution to your organisation out of hours. This is hard to put into place, but if it doesn't happen people will decline to do on-call and you'll be in the limited on-call schedule pitfall.