There is a question that every CTO or CEO should ask themselves regularly, but that most avoid out of discomfort: if your best engineer were hit by a bus tomorrow, what would happen to your operation? It is not a morbid question. It is one of the most important technology risk management questions that exists. It even has a name: bus factor.

The concept is simple in theory and devastating in practice. The bus factor of a system or organization is the minimum number of people who, if suddenly removed, would cause critical failure in the operation. Bus factor 1 means that a single person knows how something works. If they leave, fall ill, or simply hand in their resignation on a Monday, you have a serious problem on your hands.

In over 20 years working in technology, passing through IBM, AWS and serving companies such as BTG, B3, XP and Inter, I can say with confidence: key-person dependency is one of the most underestimated and most common risks in the Brazilian market. And, contrary to what it seems, it is not a technical problem. It is a management problem.

Why the bus factor is so high in most companies

Before talking about solutions, we need to understand the diagnosis. Key-person dependency does not arise by accident — it is, in most cases, actively built by the organization's culture.

The first factor is miscalibrated recognition. When a company rewards the "hero" — the one who resolves the incident at 3 in the morning, the one who is the only person who knows how to start the legacy system — it is, unconsciously, incentivizing knowledge hoarding. The person who knows more than everyone else becomes indispensable, and indispensability turns into power. Why would they ever teach anyone else?

The second factor is delivery pressure. In environments where the team is always chasing deadlines, documenting, mentoring and doing pair programming seem like luxuries. The result is that knowledge becomes concentrated in those who do the work, and those who do the work never have time to transfer what they know.

The third factor, more subtle, is rapid growth without structure. Startups and fast-expanding companies tend to hire quickly, but rarely invest in deep onboarding or in creating knowledge redundancy. The person who built the system in the early years becomes the sole reference for everything — and that status crystallizes over time.

The real cost of dependency: beyond the obvious risk

Most managers recognize the risk when a key person resigns or goes on leave. But the cost of key-person dependency begins long before that.

First, there is the speed cost. When only one person knows how a given system works, any demand involving that system goes through them. This creates a bottleneck. The team gets blocked waiting for the availability of an overloaded individual, and delivery slows down. In mid-sized companies, it is not uncommon to see entire teams operating at reduced capacity because one or two people concentrate critical knowledge.

Second, there is the negotiation cost. People who know they are irreplaceable negotiate from a position of strength — and sometimes abuse it. Inflated salaries, refusal of certain tasks, behaviors that are difficult to manage. The company becomes held hostage because it simply cannot afford to lose that person at that moment.

Third, there is the innovation cost. Systems and processes that depend on a single person rarely evolve. Modernization gets stuck because any change depends on that person's availability and willingness. I have seen organizations in the financial sector maintaining systems from the 1990s partly because the only person who understood the architecture had no interest — or capacity — to lead the modernization effort.

Research from Gartner indicates that companies with a high bus factor in critical areas take, on average, 3 to 5 times longer to recover operations after the unexpected departure of a senior technical employee. In Brazil, where the technology market has an average turnover rate exceeding 20% per year, this translates into real and recurring operational risk.

How to diagnose your organization's bus factor

Before acting, you need to measure. Here are the practical questions I ask during technology risk management consulting engagements to map the bus factor of an organization:

  • Which systems or processes would create a critical incident if the person responsible were unavailable for two weeks?
  • Who are the people who receive the most emergency direct messages on the team? They concentrate undocumented knowledge.
  • Which areas have up-to-date technical documentation and which ones depend on "asking so-and-so"?
  • If you opened the code repository, how many critical modules have commits from only one person in the last 12 months?
  • What was the impact the last time someone in technology took two weeks of vacation with no contact?

This diagnosis frequently reveals situations that leadership knew intuitively but had never quantified. A dependency map — crossing critical systems with the people who know them deeply — is the starting point for any organizational resilience plan in technology.

Practical strategies to reduce the bus factor

The good news is that bus factor is a solvable problem. The bad news is that there is no quick fix. It requires consistency, culture and structure. These are the approaches that work in practice:

1. Structured knowledge sharing, not voluntary

Waiting for people to share knowledge spontaneously does not work. Knowledge sharing needs to be a formal part of the work process. This means setting aside sprint time for documentation, creating knowledge transfer rituals (such as weekly "tech talks" where someone presents a system to the team), and making the ability to teach an explicit criterion in performance evaluations.

A fintech company I worked with in Brazil instituted a simple rule: no feature goes to production unless at least two different engineers are able to explain how it works. The result was a significant reduction in the bus factor of critical systems in less than a year.

2. Pair programming and code review as knowledge distribution tools

Pair programming is not just a code quality practice. It is one of the most efficient ways to transfer tacit knowledge — the kind that is not written down anywhere. Well-conducted code reviews have the same effect. When multiple people review code before merging, knowledge about that system is naturally distributed.

The problem is that many Brazilian companies treat these practices as optional or abandon them under deadline pressure. Leadership needs to protect these rituals, especially when the team is under pressure — which is exactly when the temptation to cut corners is greatest.

3. Deliberate rotation of responsibilities

One of the most powerful — and least used — strategies is the rotation of technical ownership. Periodically, the responsibility of maintaining a given system passes from one person to another. This forces knowledge transfer, exposes documentation gaps, and distributes context about critical systems throughout the team.

In the context of technical succession, this is also fundamental. When a senior position opens up, the company cannot depend on hiring someone from outside to cover knowledge that should exist internally.

4. Documentation as a product, not bureaucracy

Poor documentation is almost as dangerous as the absence of documentation — it creates a false sense of security. Technical documentation needs to be treated with the same rigor as code: versioned, reviewed, updated and tested. Incident runbooks, ADRs (Architecture Decision Records) for architecture decisions, and wikis organized by system are investments that pay dividends every time a new person needs to take over a context.

5. Incentives aligned with knowledge distribution

If you evaluate and promote engineers solely based on individual delivery, you are incentivizing knowledge concentration. Evaluation criteria need to explicitly include: how many people has this engineer helped grow? What is the quality of the documentation they produce? Can the team operate well when they are absent?

This realignment of incentives is perhaps the most powerful of all — and the one that meets the most resistance, because it changes the logic of power within the technical team.

The role of leadership in organizational resilience

A high bus factor is almost always a symptom of leadership failure. Not necessarily incompetence — but omission. Leadership that does not actively address key-person dependency is managing the problem reactively: waiting for the crisis to happen before acting.

CEOs and CTOs need to understand that organizational resilience in technology is not the exclusive responsibility of the engineering team. It is a strategic decision that begins with how the company structures its incentives, its processes and its learning culture.

There is also a human dimension that leadership needs to consider. The person with bus factor 1 is frequently overloaded, anxious and trapped. They cannot take a real vacation. They cannot disconnect. In many cases, they are the first to resign when they find an opportunity where they do not carry that weight. Reducing the bus factor is good for the company — but it is also good for people.

When an organization depends on individuals to survive, it is not resilient. It is fragile with good luck. The difference between the two only shows up when the luck runs out.

Where to start in practice

If you have reached this point recognizing your organization in any of the scenarios described, the next step is concrete. Do not try to solve everything at once — prioritize by risk.

Start by identifying the three systems or processes with the highest criticality to the business and the lowest knowledge redundancy. For each one, map: who knows it, what is not documented, and who could learn it. Then create a 90-day plan to raise the bus factor of those three points from 1 to at least 2.

This simple exercise — done honestly — reveals more about the real resilience of your technology organization than any formal audit. And it gives leadership a tangible starting point to act before a crisis forces their hand.

Companies that build resilient systems do not depend on heroes. They build teams where anyone can take on any context, where knowledge flows freely, and where the departure of a single person is a loss — not a catastrophe.

That is the difference between a mature technology organization and one that is, permanently, one resignation away from chaos.

If you want to assess your organization's bus factor and build a concrete technical resilience plan, get in touch. This is exactly the type of problem I address with CEOs, CTOs and founders — before it becomes an emergency.