Balancing Sustainable Growth in Systems and Teams

As software leaders, we are responsible for the systems we run and those who build and maintain them. Too often, these responsibilities are treated as separate domains—technical metrics in one corner, people and culture in another. In practice, they are tightly interwoven.

To deliver lasting value, engineering organisations must pursue throughput and stability—in both systems and teams. It’s not enough to launch features or ship platforms once. We must build the capability to improve, adapt, and scale sustainably.

This post identifies the metrics you should measure and track to guide you in successfully sustaining a valuable product and its future innovations.


Systems: Metrics drive sustained value

The technical side of throughput and stability demands rigorous measurement across three areas: business value, system health and system viability.

System value metrics

The business value your system provides needs to be ensured by defining and measuring Business KPIs. Many domains have industry norms for measuring success. Acquisition marketing, SEO, and Customer Service all have specific metrics that they track. Generally, the top-line metrics will ensure that the platform increases customer engagement, retention, or revenue. We can look to Value Driver Trees to help guide us where there are multiple layers of metrics we need to capture and track. Measure business value to ensure the system delivers meaningful impact anchored in business performance, not just technical success.

System health metrics

System health needs to be measured to ensure that the business service that the system was intended to provide is still being provided as expected. System health metrics have evolved past low-level measures of CPU and RAM utilisation. If my CPU utilisation is at 95%, should I be happy that I am getting high value from my cloud provider or worried that my system is about to crash? Ambiguous metrics like these are no longer our front-line leading measures. We are also less interested in binary measures like the system being “up” or “down”. Instead, the industry has evolved to use service-level objectives (SLO). SLOs allow you to define a tolerance for how often a Service Level Indicator meets a target; 99.9% of requests are served in under 100ms. SLOs will enable you to cover many aspects of your system with a relatively low number of metrics to track. They often represent what your users would experience and can encompass the impact of events like high CPU load, low memory, deployments, and node scaling without having to monitor them explicitly. SLOs can offer the ability to explore error budgets as a leading indicator of poor user experience.

By using Service Level Objectives and error budgets, teams can balance innovation with reliability, using data to guide risk tolerance. This links to delivery metrics, which will be covered later.

System viability metrics

While a system may meet business growth goals and system health targets, it could introduce risks. The system’s cost could exceed its revenue contribution, fail to meet regulatory obligations, or not align with architectural standards. These constraints should be regularly evaluated as cost, regulations, and standards evolve. What was once suitable may no longer be due to economic, political or technical drift. Our systems can only be sustainable while they are still viable. Viability ensures the system can be trusted to evolve without undue risk or burden.

System metrics checklist

Your system metrics checklist looks something like this:

1. Value: Business KPIs

  • Is the platform increasing customer engagement, retention, or revenue?
  • Are we shortening the time to value for internal or external users?

2. Health: SLOs and Error Budgets

  • Are SLOs explicitly defined for key services?
  • Are our SLOs within their error budget?
  • If an SLO is breached, do we have enough supporting data to understand why?

3. Viability: Cost, Compliance & Architecture

  • Are systems costs aligned with business budgets?
  • Are security, privacy and regulatory requirements actively enforced?
  • Are architectural standards being followed to ensure long-term maintainability?

Delivery: Metrics for sustainable innovation

Shipping software once is easy. Continuously improving and extending it is where real organisational capability is revealed. To ensure ongoing throughput and stability in delivery, leaders must look beyond project milestones. Delivery performance and team health are the metrics for sustainable innovation.

It is all too common that the longer a project or product exists, the longer the time between delivery of significant value improvements. Numerous contributors impact delivery throughput, and they can include:

  • Increasing support costs (security patching, bug fixes)
  • Excessive dependencies
  • Complications from increased feature set
  • Rotation of team members
  • Team fatigue
  • Aging technology

The business will expect repeat performances regardless of what may contribute to delivery throughput reduction. The throughput that has been established early on will become the expectation. Not only will they expect consistent throughput, but they will also have a low appetite for any increased system instability or unpredictability of delivery. While speed is almost always favoured in the short term, predictability is usually more beneficial for a company to function well. Predictability allows other teams to plan. This may include other product delivery teams, finance, marketing, or sales.

Delivery performance

The DORA report and Accelerate book show four key measures that indicate a high-performing software delivery team. They have become the industry standard for assessing delivery capability:

  • Deployment Frequency: How often do we release to production?
  • Lead Time for Changes: How long does it take for a code change to reach users?
  • Change Failure Rate: How often do deployments introduce issues?
  • Mean Time to Recovery (MTTR): How quickly can we restore service?

Many teams find that improving one area can help improve the others. Generally, this is because the cultural shifts required to improve one area can lead to a virtuous loop that helps improve the others. These metrics help distinguish between velocity that scales and speed that breaks things. We find that there can be a two-way relationship between delivery performance and system health: unreliable systems make for unpredictable delivery pipelines.

Team health

An unhealthy team cannot maintain a healthy system.

The Accelerate book shares Ron Westrum’s typology of organisational cultures, including pathological, bureaucratic and generative. Bureaucratic, and more so generative, cultures encourage cooperation and an environment to learn and improve.

High-performing teams across industries (Software, Aviation, Sports, Military, etc) have common threads in what they measure to ensure team health and performance. They value a shared understanding of objectives and intent, clarity on roles and communication flows, clarity on what is important and urgent, the ability to adapt, safety and resilience under pressure.

The way we measure these tends to be in the form of surveys. Tools like Atlassian’s Team Health Monitor and Culture Amp’s engagement surveys offer actionable insights into:

  • Clarity on direction
  • Clarity of roles and ownership
  • Psychological safety and team cohesion
  • Satisfaction, burnout, and feedback loops

Leaders must treat team health as a first-class concern, like system health. Team health is systemic health: Burned-out or misaligned teams will undermine system reliability over time.

Delivery metrics checklist

Your delivery metrics checklist looks something like this:

1. Delivery Performance: The DORA Metrics

  • Deployment Frequency: How often do we release to production?
  • Lead Time for Changes: How long does it take for a code change to reach users?
  • Change Failure Rate: How often do deployments introduce issues?
  • Mean Time to Recovery (MTTR): How quickly can we restore service?

2. Team Health: Culture as a Platform


Summary: Throughput and Stability – Balancing Sustainable Growth in Systems and Teams

Software leaders must pursue throughput and stability in systems and the teams that build and maintain them to deliver lasting value. Sustained delivery requires focusing on business value, operational resilience, and team health.

These values are intertwined. We can improve team health by providing clear direction with KPIs, SLOs and clear compliance guardrails. This reduces ambiguity and team anxiety. We can enhance system health by having a healthy team delivering that system.

Ensure your system delivers sustained value by surfacing these metrics:

  1. Business KPIs
  2. SLOs and Error Budgets
  3. Viability

Ensure systems can continue to evolve sustainably by tracking and addressing the following:

  1. DORA Metrics for Delivery Performance
  2. Team Health

Leaders should have the insight to understand which metrics to use based on the lifecycle stage of the business or product. High-performing engineering teams aren’t just fast—they’re reliable, sustainable, and resilient. That means balancing throughput (delivery speed) with stability (system health and team resilience).

References: