267. Continuous Delivery (Replay)

Building on last week's episode about continuous integration, today we explore the idea of continuous delivery and whether it is the best way forward. We start off with some basics, defining continuous delivery and what can truly fall into this category. From there we weigh the value of a deployment button and the difference that this single step in the process can make. William makes a strong case for why full automation and real continuous delivery is a better approach and how this philosophy will force developers to take up more responsibility and acquire better tools for their part in the process. We also talk about the aversion to deployment that can grow in coders and how confronting this pain is the best way to improve. Other points from our discussion include the utility of a regular cadence for deployment, companies that continuous delivery seems to suit, and what might characterize a team that works with a strictly automated process. For all this, join us on The Rabbit Hole today!

Key Points From This Episode:

What is continuous delivery and how does it relate to continuous integration?
The difference between full automation and the added buffer of a button.
An argument for getting accustomed to full automation of code deployment.
The importance of monitoring and alerts coupled with deploying at a regular cadence.
Examples of the types of companies that seem to favor continuous delivery.
Smarter times to get rid of buttons that stand in the way of full automation.
How full automation leads to a greater need for responsibility on the part of each coder.
Confronting the pain and potential issues of putting code into production more often.
The changes that setting up continuous delivery will herald within an engineering team.

Transcript for Episode 267. Continuous Delivery

[INTRODUCTION]

[00:00:01] MN: Hello and welcome to The Rabbit Hole, the definitive developers podcast, living large in New York. I’m your host, Michael Nunez. Our cohost today –

[00:00:09] DA: Dave Anderson.

[00:00:10] MN: And our producer.

[00:00:12] WJ: William Jeffries.

[00:00:13] MN: Today, we’ll be talking about continuous delivery where we’re shipping out code fast, and it's all automated here, baby.

[00:00:21] DA: Yeah. It's finally time for some sweet, sweet jelly to balance out the peanut butter and the hamburger that we discussed earlier.

[00:00:34] MN: Yeah. We'll talk about the definition of what is continuous delivery and our experience in it, why you should add more of that jelly to your sandwich. Maybe if they're any risk, we'll have a discussion about it.

[00:00:47] WJ: What is continuous delivery?

[00:00:51] DA: Just to recap a little bit about last week because we were talking about continuous integration, so that is continuously taking all the code changes that you're making and bringing them into the main line or your trunk or your master. That itself doesn't say anything about what happens to all those changes. Eventually, you have to put them in a place where they can be used by people in production. So you might do that in a manual way. You might just have someone log into the server and drop it. You could automate it, so you could have a button that someone presses. But I think the more aggressive stance of continuous delivery is that you can reliably release your code at any time.

[00:01:41] WJ: So you just automatically deploy when the master build succeeds. If you merge to master and the CI server passes, you just go ahead and deploy your production.

[00:01:55] MN: If you're feeling froggy, baby, you could do that. Maybe there are some different checks. I know there are other places that would have like a button that you want to press.

[00:02:04] WJ: Is that still continuous delivery?

[00:02:07] MN: With a pause, I guess. Just like the button is there for you to click, but it's not continuous until there is some human interaction, I guess. So it may not be fully automated.

[00:02:18] WJ: I guess it depends on how often people are actually clicking the button. I mean, if somebody reliably clicks the button every time, it's sort of like the CI server with the rubber chicken and the bell, it's not like truly automated, but it is continuous. But I think a lot of times people put that manual intervention in the way because they're scared, and then that just makes them even more scared because now it's extra their fault for hitting the button.

[00:02:50] MN: Oh, man. So you’re telling people to remove the button, make it fully automated, and blame it on the computers.

[00:02:55] WJ: I think that if at all possible, you should deploy to production whenever you have a green build and just get everybody in the habit of writing their code so that that's safe. It’s a different way of working. You just have to have a lot of test coverage, and there's some mitigation that you can put in place because every once in a while, you're going to deploy something bad. But that happens even if it's not automated, even if you're doing manual deployment. Sometimes, bad things get into production. If you could have easy rollbacks, maybe even automated rollbacks if there are obvious errors in production, good monitoring setup.

[00:03:38] DA: Testing and production. In other words –

[00:03:41] MN: Testing and production.

[00:03:43] DA: Kind of having monitoring and alerting and like easily observable logging. Those kind of things that we've talked about previously on the podcast lets you validate the change, even when it's out there in the wild. But, yeah, even if you're not deploying continuously through an automated fashion and you are doing it manually on a periodic basis, you can still use all these principles to help you have assurance in your builds when you do deploy. But the risk that you face there is that you do need to try to deploy with some regular cadence and manage that manually to avoid all of these changes kind of balling up into one big blob of the delta code that goes out into production at once because that can make it very hard to figure out what actually broke the situation.

[00:04:53] MN: Yeah. Having smaller commits and smaller pieces of work going out makes it easier to debug the thing when you are continuously delivering the code.

[00:05:06] WJ: Do you guys support continuous delivery? Is that a thing you guys like to do?

[00:05:10] MN: I currently right now — on the project that I’m working on has the button. Yeah, we have a button.

[00:05:17] WJ: Okay. Fully automated just like deploy production on merge to master, does anybody want to advocate for that?

[00:05:24] DA: I mean, I think that like a lot of companies do take that approach like Google, Facebook. All those different kinds of companies have a button and somebody who is shepherding the code through in a controlled fashion. I think that makes sense at some kind of a scale because if you have 10 developers working on code, then maybe they're pushing out changes 5 times a day or 50 times a day if they're 10x developers. If you have 200 10x developers, then that's like just thousands of changes continuously going out, and your pipeline may not be able to keep up with that.

[00:06:11] MN: Well, that would be insane.

[00:06:12] DA: Or if you have thousands of developers, then that's just game over.

[00:06:16] WJ: Wait. Is this an argument for using continuous delivery or against using continuous delivery?

[00:06:22] DA: It is a potential argument for not having automated deployments with every merge.

[00:06:28] WJ: But you can scale. You could scale your pipeline faster than you could scale the human who's got to hit the button, right? You'd have to hire a new human.

[00:06:39] MN: You can just create buttons I guess or –

[00:06:42] DA: I mean, the robots are taking our jobs, so can't we just have that one job? Can I just have the professional button pusher job?

[00:06:52] MN: Yeah. Don't take my job, bro. Stop it. Well, you're taking my job.

[00:06:56] WJ: Nobody wants to argue for fully automating everything. All right, fine. I will argue for fully automating everything, no human intervention. Come at me, bro.

[00:07:08] MN: I mean, I’ve actually worked at places that have it, and there is like a sense of I think you can get rid of some you mentioned earlier, William. I think you can get rid of the button when you're extremely confident in the test suite that is already in place and that you're introducing as the developers there, the rollback mechanism with your continuous delivery and the monitoring that allows you to see whether something is wrong.

Because I feel like when you don't have the – If you're not confident in the rollback, then you're more likely going to make sure that everything is working fine before you deliver or you do the button that will continue from there. But when you got those other two, the logging, the tracking, the applications that I’ve used in the past escape my brain right now. But being able to know where you are and what's happening and how it's affected the application, the button could very well go away because you have confidence in the other things.

[00:08:06] WJ: All right. Here's my argument. By putting the button in place, you give developers permission to ship bad code because it's somebody else's problem. They're the ones hitting the button.

[00:08:19] MN: Wow.

[00:08:21] WJ: They might say, “Why do you have so much bad code?” Developers are merging into master is because they know it's safe because QA is going to catch it. UAT is going to catch it. DevOps is going to catch it. It's not their problem.

[00:08:37] MN: William woke up and chose spicy today.

[00:08:40] DA: Right. Kimchi for breakfast.

[00:08:44] MN: Oh, man. No. I mean, I never thought of it that way but that is really interesting. I guess subconsciously not thinking about or triple thinking about certain test suites and stuff like that. The button is there because someone else could potentially click and merge your stuff in there too. Then it's like, “Oh, that's Bobby's problem. He hit the button.”

[00:09:06] DA: I have definitely had that conversation with somebody where it's like weeks in I’m like, “I’ve never deployed here. I’ve never hit the button.” He's like, “Don't worry. Someone will hit the button.”

[00:09:19] MN: In the place that I worked in before, we used to call it like – what is it like? It’s a train. Imagine a train car with just like the engine in front, and then your PR gets merged in. It's like a cart on the train. Then someone will message, like, “Hey, I’m pushing the button. Who wants to jump on this train?” They’re like everyone else will put their cart — their PR is merged into master, and then you press the button, and it's like the choo-choo train going into production. It’s just that's what we used to call it. "You all jump on the train if you want your stuff to get merged in. I’m pressing the button. I’m making sure everything is fine." That was the idea. Train cars, baby.

[00:10:00] DA: Yeah. I mean, there is the argument too that if something hurts, if it's scary to pull into production, then you should do it more often. Even if that pain involves service loss in some way or a bug getting out there, then you lean into that pain because pain is just your body telling you, teaching you a lesson. You have to learn the lesson and then fix the pain.

[00:10:30] MN: Learn the lesson and fix the pain. There you go.

[00:10:33] DA: Yeah. It’s just weakness leaving your body as – Well, no. That's not true because like you do actually need to fix the problem. You can't just suffer the pain and just keep continuously deploying.

[00:10:45] MN: Just keep eating it. Just keep eating the pain. It's all right. All the bug’s out there. Customers are very angry, but it's pain leaving your body.

[00:10:54] DA: Right. But you're just growing. You're swelling, as a monstrosity.

[00:11:02] WJ: I think if the engineer who wrote the code merges the code and deploys the code as a result, then that engineer is going to be ready and waiting to deal with the issue if there is one in production. Whereas if it's somebody in some other department or if it's somebody days later with a bunch of other commits at the same time, then they're going to be much worse positioned to fix the problem in production. They don't even know what the code changes were.

I think engineers are scared of continuous delivery because they know they don't have the tools to make sure that their code did not break production. I think if you actually set up continuous delivery, your engineers will very quickly realize that they need to make a bunch of changes in order to avoid breaking master, in order to avoid breaking production. That's good. That's a great way to get good tooling for your engineers.

Then people will tell you like, “Hey, I need a lower environment that I can automatically deploy it to, like a branch deployment so that I can test my changes on a remote environment that's more prod-like. And hey, I need good dashboards in place so that I can watch after the code gets deployed to production and make sure that everything is fine. I need a button that I can hit that will instantly roll back my code changes.” They'll give you a list of all the things that they need in order to be able to do this better, and it'll just make all of your productions better.

[00:12:34] DA: Right, and meaningful metrics for you to see the invisible pieces that are moving behind the scenes for your cues and systems talking to each other and things like that. It takes a good amount of deliberate thought and practice and discipline in order to get to that level.

[00:12:57] WJ: Yeah. Because you're doing it on every merge, every change set that goes out is very small. It’s much easier for that engineer to troubleshoot if something goes wrong. It's their code. It's only their code that went out.

[00:13:11] MN: I think with the tooling as you mentioned, William, slowly but surely will get rid of the button. Soon in the future, no one would have a button to push. Everything will be completely automated. Then you will ultimately win, William. They will take my job. They will take my job as the button pusher. I do agree with the – definitely, branch deployments, like you mentioned, are like a solid way to ensure that things work because you can have something that's really close to production but have your own little sandbox to make sure that things are working well.

I think Dave joked earlier about, you know, testing on production, which is the blood pressure is always high when you have to test on production for certain things. There are better ways to mitigate that, so you don't introduce bugs to production when you're trying to test in production. I think the branch deployments are one of the many tools that we can utilize.

[00:14:06] DA: Yeah. Just takes some investment and thought, when you get there.

[00:14:11] MN: Make that list, baby. Get rid of that button. Let's do it.

[00:14:13] DA: 2021, fire the button.

[00:14:17] MN: 2021, fire the button. Fire Mike, from pushing the button. We hope your journey to continuous delivery is as smooth as it is to deploy and roll back if necessary.

[OUTRO]

[00:14:31] MN: Follow us now on Twitter @radiofreerabbit, so we can keep the conversation going. Like what you hear? Give us a five-star review and help developers just like you find their way into The Rabbit Hole and never miss an episode. Subscribe now however you listen to your favorite podcast. On behalf of our producer extraordinaire, William Jeffries, and my amazing co-host, Dave Anderson, and me, your host, Michael Nunez, thanks for listening to The Rabbit Hole.

[END]

Links and Resources:

The Rabbit Hole on Twitter