198. Bus Factor

How many coders can you afford to lose to a series of inexplicable bus accidents before your project fails? As morbid as it sounds, your bus factor is an important means of measuring how risky your project is. Today we unpack the bus factor while touching on the top ways that you can boost your team’s capabilities while increasing project resilience. While exploring the topic, we look at the benefits of pair programming, documenting your coding journey, temporarily locking team members out of projects, and rotating coding roles so that your team develops a better sense of your codebase. Later, we discuss why you should prioritize building team competency over how fast you can progress on creating features. We wrap up the conversation by chatting about why someone might prefer to have a low bus factor. Sometimes you can’t avoid bus collisions, even if you look both ways before crossing the street. Join us to learn more about dealing with the bus factor and making projects that aren’t tied to the performance of one or two individuals. 


Key Points From This Episode:

  • We define what the bus factor is and why you want a high bus factor number.
  • Pair programming as a way of creating continuous feedback.
  • The benefits of temporarily locking coders out of projects.
  • Why it’s so important to write clean code and document your programming.
  • Rotating team members to ensure holistic codebase knowledge. 
  • The dangers of prioritizing speed over building team competency.
  • Creating a culture of learning, leading to asynchronous pairing. 
  • Pull requests as a way of detecting your bus factor. 
  • Exploring why someone might prefer a low bus factor.


Transcript for Episode 198. Bus Factor


[0:00:00.5] MN: Hello, and welcome to The Rabbit Hole, the definitive developers podcast, living large in New York. I'm your host, Michael Nunez, our co-host today.

[0:00:09.0] WJ: William Jeffries

[0:00:10.3] MN: And our producer.

[0:00:11.8] WJ: Dave Anderson.


[0:00:11.5] MN: And today, we'll be talking about the Bus Factor, and things to do before you walk out the street, not knowing where the bus is coming from.

[0:00:23.2] DA: This is a common problem for software developers not looking, when you're crossing a street, like probably checking your phone. So, that's the topic for this episode.

[0:00:35.8] MN: I mean, you want to debug before you get to work, and what better way to do that when you're on your phone and checking and not looking across the street when the light is still red —

[0:00:45.4] DA: Checking the PR.

[0:00:46.4] MN: Yeah, I don't know. Does anyone ever check their PRs on their phone? I think my phone screen is big enough where I could just hit the green button when I'm ready to merge something or rather, ready to approve something. But what to do? What am I to do if I do that while crossing the street and then getting hit by a bus? Wow. We need to do something.

[0:01:08.3] DA: You're a father.

[0:01:09.6] MN: Yes. I need to watch out. Yeah, it's more than just the code base people.

[0:01:13.7] WJ: That's good Stripe advice.

[0:01:13.3] DA: WHY don't we have a definition of what the bus factor is?

[0:01:18.4] WJ: So, the bus factor is the number of people who would have to get hit by a bus before nobody knew how to maintain your code base anymore.

[0:01:27.2] MN: Okay, so you want that number to be high, right? That's the idea.

[0:01:31.8] WJ: Yes.

[0:01:32.9] DA: Brutal, but yes. You want more people to get hit by a bus before you can maintain your code base anymore. What is the right number? Five? I don’t know, 10, 20?

[0:01:48.4] WJ: I think it depends on your code base. But I mean, if the bus factor is one, you're in big trouble. And for a lot of code bases, the bus factor is one.

[0:01:57.4] DA: That's true. Sometimes for recording the bus factor is one where I’ve forgotten to hit record. And then it's like, “Oops, sorry guys. Got hit by that bus.” In a metaphorical sense.

[0:02:11.9] MN: Yeah, got to record again. I'm recording right now. Is that safe to say? Just making sure. So, now we got the definition, we know that we want to keep a high number bus factor. So, in the event that more people on the team get hit by a bus, that's a problem — it wouldn’t be a problem.

[0:02:33.1] DA: Or if you want to look on the bright side, maybe they win the lottery, and they go buy a private island and live there where there's no buses, as far away from any buses as possible.

[0:02:43.0] WJ: Yeah, so you want a high lottery number, I guess, is what we're saying. Tips on how to win the lottery.

[0:02:51.3] MN:  I'll bring it back to a morbid situation. But like if your entire team was in an elevator shaft after coming back from lunch —

[0:02:59.3] DA: Wait. Is this — is inspired by a real-life scenario?

[0:03:04.5] MN: Well, kind of. I've had a project manager separate us at elevator shafts because of this.

[0:03:14.1] WJ: It's like Presidential Succession. You can't have everybody in line in the room at the same time.

[0:03:18.9] MN: No, no, no. Don’t do that.

[0:03:19.6] DA: That’s the bus factor. 

[0:03:24.0] MN: So, I'll bring it up. So, we want to have that number as high as possible. What are some ways that we can think of right now to increase that, that comes into mind first off?

[0:03:35.2] DA: Besides cloning, which has dubious moral quandary associated with it. We like to advocate for pair programming. I'm just going to take that one. We've talked about it a bunch. There are so many episodes — I'm covered in tattoos about pair programming. Episodes that we've done.

So, that helps by having a continuous feedback between two people where you are sharing information continuously about the problem you're working on right now, and also, the context that you share. Or the context that may be unique to you, that you separately know. You will each bring different experience with the system and other systems that help even out your bus factor.

[0:04:25.7] WJ: There is a trick that they use in the finance industry to prevent embezzlement and other kinds of financial fraud where they'll make bankers take like a month of vacation where they're totally locked out of the system and can't touch anything. Because, you know, if you're in the system, you can cover your tracks. But if you're missing for a month, then usually something will come out, something will come to light. 

And so, one strategy is making sure that your people take vacation, long chunks of vacation and that they do not have access to the system while they're gone, because the same thing will happen only for siloed knowledge and manual processes that only that person is able to do. Where you'll realize, “Oh, actually turns out that there's some monthly job that runs on the database. And if this person isn't around to coax it, it is going to crash in a way that affects users.” And then you find that out and everybody else on the team learns, and documents how to do that properly, so that that's no longer contributing to your bus factor.

[0:05:38.3] DA: I do love that idea, because it's very much more positive than anything we've been talking about prior to this. Yes, just take a vacation and relax and —

[0:05:48.8] WJ: Just put them in jail with no access to computers.

[0:05:54.5] DA: Okay, let's bring it down a notch. That's where we were before.

[0:05:59.3] MN: This person is going on their honeymoon. Let's bring it back to the lightness of it.

[0:06:07.2] DA: — Well, their marriage will be very short if they bring a computer with them. So, for the interests of their work life balance, the bus factor and their marriage, don't let them work. I think that's kind of a great healthy attitude too. William, you mentioned, “Hey, if something goes wrong, then the team will learn about it.” And it's not like blaming, it's like, “Oh, okay. We didn't know that we didn't know this or that this was the only guy who knew it.” Hopefully, unless you know about it, and then that's your problem. But yeah, let's figure it out, and then move on, document it.

[0:06:49.1] MN: Yeah, I think both William and Dave brought up the idea of documentation. I found that to be helpful for me. What tends to happen, especially like when learning new code bases that I’m unfamiliar with, is that I try to keep like a journal, if you will, a tech journal of what I'm learning. Or these little tidbits of code that I need to remember or kind of understand, and I have them on the side, and I read them every so often. And then I would try to then put them in documentation, preferably in like source code, whether it's like a markdown that someone can read if necessary, or I've used in the past something like a Storybook, I think it's called. It's like the reactor, party library that allows you to make these components like in a sandbox environment. And it allows you to document things that when you can play around with it in this little sandbox thing.

But the last place I would try to document which I feel like would be a good place to do it is in comments. But I know that comments can get old and stale and I try not to do that. And then, like, keep updating the document that's in source code and marked out to be a little bit harder. But I feel like it's always misleading when you try to leave comments, for things that you want people to understand and know.

[0:08:10.4] WJ: I would agree. I would say the one exception is if you are using your comments to generate the documentation automatically with a tool like Yardoc or Swagger, in which case, that's preferable than just having the documentation live in a wiki far from your code. It's at least more likely to be updated if it's next to the code. But definitely actually writing clean code that's easy to follow and doesn't need a ton of documentation is a better approach along with robust test coverage that effectively documents behavior, as well as preventing things from breaking.

This is one of the reasons why I love tools like RSpec, which are much more verbose, and that give you an opportunity to say in plain English what it is, that the feature is supposed to do, or what it is that the method is supposed to do.

[0:09:00.3] DA: Yeah, agree, Totally. RSpec, it's like documentation that you interact with and you see run on all of your PRs, and you run it locally. And hopefully, you're living it and breathing it and I think that's something that Storybook also has in common with it as well, where it's like very interactive, and it can also be used to generate some kind of documentation about the inputs and outputs and different use cases of your components and functions and styles. I think it's kind of nice because it's — everyone can kind of see and know about, even if they're non-technical.

[0:09:49.0] WJ: Another thing that I think can help with bus factor is team rotation. A lot of times somebody builds something and then, because they're on the team, anytime anything breaks or anything to be added or changed in that area of the code base, they're like, “Oh, have Dave do it. He wrote that. He knows it well.” And the problem with that is that then Dave knows it even better, and even better, and even better, and everybody else continues to not know it at all.

So, if you rotate Dave off of the team, all of a sudden, the next time a story comes along, that affects that part of the code base, it's not Dave, who gets assigned to it. It's a little bit easier than having mandatory vacation where people are totally cut off, because people can still be like, “Hey, Dave, can you pair with me on this? I don't know how to do it.”

[0:10:35.3] DA: I agree, I do not want to be that — we've talked about that before. That symptom before. In the classic case of the time lord, where it's like, “Oh, I do not understand how time zones work and this is very confusing. So, you, the time lord, shall handle the time zone issues, henceforth.” And that's not fun. That's not fun. It's better, like, to kind of share the context and spread it around. It is tough sometimes, because like, you want to optimize for speed, sometimes. “We just want to get it done quickly. So, if we just have Bobby do it, then it'll be faster.” But that's like, kind of a fallacy that may catch up to you, because that person may become a bottleneck, where if they're not there, if they have five other things that they have to do, then you just have to wait because you can't help yourself.

[0:11:36.3] MN: I mean, I had the idea of the example, suppose I am a project manager. I'm putting my project manager hat on, and I know that Carlos is the checkout guy. I know, I can reach out to Carlos. “Carlos, take care of this checkout story, because I know you are the person for the job, Carlos. Take that. Take it. Take the story.” What ends up happening is Carlos then, if he goes on vacation, or has to leave, then you know that checkout story will take longer, because Carlos isn't around to be the checkout guy anymore.

So, that's something that I want to, like, mention for people who want the speed of getting the feature out quickly is that it's not always the — how fast you get it out. It's whether the team understands the entire code base enough so that they can play any role. And it's not just Carlos to check out.

[0:12:40.4] WJ: Right. Sometimes Carlos doesn't die when he gets a private bus. He just gets maimed and can't come back to work for like a month.

[0:12:46.2] MN: Yeah, imagine that. And then what are you going to do? You're going to call Carlos at the hospital bed. I mean, he has a full body cast on and just kind of slide a keyboard under his fingertips, so that he could then type. That would be cruel.

[0:13:03.4] WJ: We're going to need a trigger warning for this episode.

[0:13:07.4] DA: I think another way that you can try to reduce bus factor is to try to encourage people to lean into learning and teaching each other, which is kind of a form of, I guess, asynchronous pairing. But if people are broadcasting, and teaching the people around them, these things that they're learning in some kind of a helpful way to the team, then that can be pretty helpful as well.

[0:13:36.8] MN: Do you think that pull requests are a good tool for reducing bus factor?

[0:13:44.0] WJ: Yeah, definitely. I think code reviews force somebody to at least look at the code and try to understand it enough to make some sensible comments. I think it can also be sort of a way of detecting if you have a bus factor problem. If nobody feels comfortable reviewing a particular poll request. Everybody's like, “Hi, I don't really feel qualified to comment on this because I don't really know what's going on.” That's a red flag.

[0:14:12.1] DA: Oh, my god. I've been in that situation before. It was like, “Oh, no. I’ve become the guy.” And yeah, it can be really challenging to get out of that. In that case. If you find yourself in those situations, then I would definitely recommend trying to be personable about code reviews and actually doing it synchronously or making yourself very available to questions and guiding people, providing the context that they need to be comfortable reviewing the code.

[0:14:49.9] WJ: Yeah, also mob code reviews or even my pairing on features. Another good way to reduce bus factor?

[0:14:57.4] DA: Mobbing. We got to talk about mobbing.

[0:14:59.1] WJ: What are some reasons why people might not want to minimize their bus factor? What are some arguments you might get?

[0:15:06.8] MN: The job security. “Only I can fix the checkout page, bro. Only I can do it.” And then, that way that person is, the organization is essentially dependent on the person who's responsible for that important feature, is what I have in mind. I mean, I think that comes to mind, not that, I don't know if that's ever possible. But that's the first thing that comes to mind.

[0:15:31.5] DA: Totally, yeah. Definitely. It could also be that there are like incentives in place that encourage a particular person to act in a way that reinforces the bus factor. If I have a bonus that I'm going to get, if this thing never has a problem, then I'm going to be like on top of that and fix it as quickly as possible and I'm not going to trust anyone else to do it, because I have a mortgage I need to pay.

[0:16:03.4] MN: And I need that bonus baby.

[0:16:04.8] DA: Right, exactly.

[0:16:06.9] WJ: I've seen pre-assigning tickets, incentivize people to hoard knowledge on a particular section of the code, because when it comes time to estimate, you don't want to over commit to more tickets than you can actually finish in a sprint. So, if you know a bunch of tickets are going to get pre-assigned to you, there's an incentive to make sure that there are a couple of areas of the code base where you are the expert and you can estimate with a higher degree of accuracy, so that you can always get your work for the sprint done. Even if it comes at the expense of anybody else knowing that section of the code base.

[0:16:47.1] MN: My argument is that it's called an estimate. Because you are estimating the work that you can do. And I think you should allow other individuals to be able to estimate work that they can do anywhere.

[0:16:58.8] WJ: But I'm just protective of this section of the code base, because it's important to me, and I want to make sure that it works. I just don't trust you to do as good of a job as I will.

[0:17:10.2] MN: That's great. But you'll be going on vacation next week, William. And when I say vacation, it might be that bus that's coming out in the middle of the street.

[0:17:22.3] DA: Sending the bus after him, just like a Final Destination curse by a gypsy in the street.

[0:17:27.5] MN: The Sopranos or something. But the idea is that like, yes, if you are super protective of your code, I can totally understand that. But it won't benefit the team when ultimately, or eventually — I won't say ultimately, because that sounds even more grim. But eventually, you may be out of the office, unless you're not taking vacation that you live, breathe lines of code, and the work-life balance for you is just work-work balance, I mean, I don't have an argument against that.

[0:18:03.9] WJ: “Look, Bobby, we have a big release coming. I can't wait around for you to figure out how to do this. I'm the one who's going to save the day here.”

[0:18:11.9] MN: And you can save it by pairing and I know maybe, hey, you might think that that'll slow people down, and by all means it could. But knowledge transfer is important. We need that bus factor as high as possible. So, stop hogging up all the code.

[0:18:28.7] WJ: “I think we need to wait until next quarter. Right now, what we need is speed. I got to ship this feature. I can't wait for you to learn how to do this.”

[0:18:37.4] MN: He’s got the need for speed.

[0:18:38.7] DA: Can’t let that bus factor go below 55 miles an other, otherwise Sandra Bullock is going to blow up.

[0:18:53.1] MN: Oh, my God, we did it. We brought it all back. We tied it all together. We tied it all together.

So, if you're listening to this, first of all, look both ways before you cross and get to work. Don't look at the PR on your phone. And ensure that when you are at work, try your best to increase that number, that vacation factor, that honeymoon factor. Well, you want to have one honeymoon. That's the ideal, but the vacation factor is important. Make sure you had that number of vacations high. Don't get hit by a bus. That way, the team that can streamline its process and release features without too much of a bottleneck.


[0:19:40.7] MN: Follow us now on Twitter @radiofreerabbit so we can keep the conversation going. Like what you hear? Give us a five-star review and help developers like you find their way into The Rabbit Hole. Never miss an episode, subscribe now however you listen to your favorite podcast.

On behalf of our producer extraordinaire, William Jeffries, and my amazing co-host, Dave Anderson, and me, your host, Michael Nunez, thanks for listening to The Rabbit Hole.


Links and Resources:

The Rabbit Hole on Twitter


Michael Nunez on LinkedIn

Michael Nunez on Twitter

David Anderson on LinkedIn

David Anderson on Twitter

William Jeffries on LinkedIn

William Jeffries on Twitter