308. BDD [Behavior-driven Development] (Replay)

On today's episode, our host Dave Anderson and producer William Jeffries discuss end to end testing. They explore this interesting topic, looking at what exactly user testing entails and who benefits from this kind of testing. They also talk about the advantages and disadvantages of end to end testing, warning the importance of making sure this is a necessary step in the development process as it is often time consuming. Both Dave and William share some of their experiences with end to end testing and how these problems were resolved. To get further in the world of end to end testing, join us today!

Key Points From This Episode:

  • What an end to end testing is.
  • What a well written end to end test looks like in code.
  • How Gherkin goes the extra step compared to something written in JavaScript.
  • How Gherkin syntax works.
  • What advantages using Gherkin allows.   
  • What the disadvantages of Gherkin are.
  • William shares a recent end to end testing problem he encountered.
  • How to strike the balance between saving time and testing everything.
  • How to decide which browser to test with.
  • Discussing visual regression and how to fix those bugs.
  • And much more!

Transcript for Episode 308. BDD (Replay)

[0:00:01.9] DA: Hello and welcome to The Rabbit Hole, the definitive developer’s podcast in fantabulous Chelsea, Manhattan. I’m your host, Dave Anderson. With me today we have our fantabulous producer.

[0:00:11.7] WJ: William Jeffries. Good to be here.

[0:00:14.9] DA: You got upgraded. Everything’s fantabulous today.

[0:00:19.4] WJ: Feeling fantabulous.

[0:00:21.3] DA: Yup. Today, we’re going to be talking about end to end testing.

[0:00:25.5] WJ: Yes, both ends, all the way.

[0:00:27.5] DA: Yeah, both ends. One end being the beginning and the other and being –

[0:00:32.3] WJ: The end.

[0:00:33.8] DA: I guess? I mean, think about like that, you want to do testing on all the things.

[0:00:40.3] WJ: I think of it as like a top to bottom, you know?

[0:00:42.1] DA: Yeah, that’s true. It’s a top and bottom.

[0:00:43.2] WJ: It’s like the top being the UI and then the bottom being like your database or persistence layer?

[0:00:48.4] DA: Yeah, we’ve already talked about end to end testing where it’s like, the breadth of it, where you want to cover all the features in your unit test, really confusing with this concept.

[0:01:00.6] WJ: It’s a pyramid.

[0:01:01.0] DA: We’re talking about.

[0:01:02.0] WJ: Time is a flat circle.

[0:01:06.9] DA: [Inaudible] place. Haven’t yet watched any of it.

[0:01:10.8] WJ: I haven’t seen any of it. I just like the phrase. It doesn’t really – relate to end to end testing at all. End to end testing is very linear. You start at the beginning and you go to the end.

[0:01:23.3] DA: We’re talking about writing code that’s going to orchestrate actions that a user would take on a site. In the browser itself.

[0:01:29.7] WJ: Driving that browser, beep, beep.

[0:01:31.8] DA: Driving the browser. All aboard. Get your cookies in here and then actually hitting a database and making changes to the database and all of the wonderful, beautiful complexity that arises from –

[0:01:51.2] WJ: Everything that you have to do to service a user. All of the things that can break, that’s what makes it such a high value test is that it actually executes the entire stack. Anything that could break for an actual user gets tested by this one end to end test.

[0:02:08.0] DA: What does it look like when I end to end test is written well?

[0:02:11.5] WJ: I think it depends on what you’re writing it in, right? Because if you’re using Cucumber and you’re using Gherkin syntax, it’s going to look very different than if you’re just using Horseback, right?

[0:02:26.4] DA: I guess with R spec, much like other testing frameworks that you might use and JavaScript that are kind of BDD flavored, you would have like nested blocks that have some text that describes what they are, you might end up with some kind of a thing that looks like a functional spec.

How is Gherkin different from that?

[0:02:48.7] WJ: Yeah, I think with Gherkin, you’re taking an extra step and you’re taking the time to write out what it is that your test does in plain English. Given I am an admin when I go on, go to my home page, then I see the admin panel. It’s not really code, it’s English and then under the hood, you have to implement definitions for each one of those steps.

[0:03:13.9] DA: It’s all in one place like you just put all those definitions in a text file somewhere?

[0:03:19.8] WJ: Yeah. You have to figure out how to organize your step definitions which is always a nightmare.

[0:03:26.3] DA: The unification of those step definition is separate from the actual thing that you’re trying to do?

[0:03:32.2] WJ: Feature file and then the feature file has a collection of scenarios in it and then each one of those scenarios describes a flow through the browser, assuming it’s browser that you’re testing which I mean, as web developer, it’s kind of what we’re familiar with?

[0:03:46.7] DA: Right, yeah. I guess, it’s just a way to write a test, you could write it for even like a unit of code like a class or some kind of API.

[0:03:56.9] WJ: Yeah, it literally the same doing it in Gherkin syntax.

[0:04:04.0] DA: I guess I’d be doing it wrong if I was using it to unit test my code.

[0:04:07.4] WJ: I mean, it would technically run, it would work, the code will compile.

[0:04:13.5] DA: I have a test which is maybe better than not having a test.

[0:04:16.5] WJ: That is true.

[0:04:18.0] DA: You’d be coming out ahead in that sense, but – there’s a lot of overhead.

[0:04:23.5] WJ: Yeah, there’s a ton already, you have to maintain that extra code, you have to maintain English, which is harder to maintain than regular code. Editors are not built for that.

[0:04:37.1] DA: I’ve not worked much with Gherkin syntax. I’ve worked on projects that have it, but then mainly I’ve kind of mainly lived around the unit testing and written those kind of tests. Let the end to end test like do their own thing. Do you actually need to have the exact same string in your definition file as you have – where does it even for it?

[0:05:00.0] WJ: In your feature file, yeah.

[0:05:00.6] DA: Your feature file.

[0:05:01.3] WJ: The feature file has to match this type definition for beta unless you’re using – I mean, you can do regex and like string pattern matching stuff. If you wanted to have a step definition that’s as an admin and as a customer and as a super user. You could have one step and then you could just for each one of those things, yeah, and then have like a local variable that is the word customer or admin or super user or whatever.

Then you could dynamically change the behavior of that one step if you wanted to refactor and reuse your step.

[0:05:36.9] DA: It sounds super powerful.

[0:05:38.2] WJ: Yeah, definitely. I mean, it’s very flexible. I think you have to ask yourself why you want this tool though because I think it gets used for the wrong reasons, often.

[0:05:50.4] DA: Yeah, what are the outcomes? Really why we should be writing code. We should be writing code not just to write code.

[0:05:56.7] WJ: I mean, it is fine. But we also got to get paid.

[0:06:01.3] DA: That’s true, we got to get paid, okay? They’ll pay you even for that if it doesn’t do it. Well, maybe not. I guess, like the actual end result that you’re hopping to have is that you want to know that your application is working in a real context.

[0:06:18.9] WJ: Yeah, you can get that without Gherkin. Without that extra English syntax. I think the value you get out of it is mostly from defining terms. It is like by forcing yourself to express it in English, it forces you to verbalize exactly what it is that you want the pap to do and then now you have actual domain language in your head and so when you go to write the code, it ends up much more closely modeling your domain, you get that domain driven design kind of –

[0:06:52.1] DA: If you have like, if you come to it with sense of discipline then you can get that domain driven flavor. I suppose you could also like end up with like when – language that doesn’t match at all and it’s just like, crazy. Yeah, I could see the benefit of that, spending the time to think about the scenarios and what the actual user’s cases are, the actual stories because a style that you can write a story in this given, this and the other thing then I expect this thing.

As a user, given that I’m a user when I do this then, I expect something awesome to have and –

[0:07:34.3] WJ: Yeah, it does ease communication with product because you can point to a feature file and say look, this is exactly what it does in plain English. This paragraph will tell you the actual behavior of the app I promised. It is programmatically enforced.

[0:07:51.4] DA: I’ve heard like a tell of the idea of a product manager or a business analyst who will write Gherkin files for you.

[0:08:00.3] WJ: That’s a nightmare, that’s a disaster, don’t ever do that. That’s a trap.

[0:08:05.6] DA: Yeah. It’s an idea, but maybe it’s still a programmatic, so.

[0:08:10.5] WJ: Right, I mean, they’re just going to write it wrong and they’re going to forget it, it’s about really important steps that are going to make crazy assumptions that are really hard to program around. Like you can’t hand over that part of – I mean, that’s code, it goes in the code base. You should collaborate.

I think actually, it’s really helpful for collaborating with product because you could write out what the app is going to do with them and it’s like they can pair with you because they can read the code that you’re writing because it’s English.

[0:08:37.9] DA: Most of our managers are pretty good at like expressing things verbally and in written form.

[0:08:44.0] WJ: Yeah, it makes so much easier for them to point out things that are missing. I find it’s really helpful to collaborate with QA, actually. When you’re writing those because they’ll talk about H cases that even product was not aware of.

[0:08:57.8] DA: Especially if they’re like, maybe not especially, but like even the case if they’re like manual QA if they’re just going and banging on this page.

[0:09:10.0] WJ: Yeah, they’re used to. They know all of the ways to break things.

[0:09:13.6] DA: Right. It’s really interesting thing with people with that kind of skillset because you know, you work on a feature and you’re like, “yeah, I really nailed this.” And then you had it over to something like that, there were some reason I worked with that was just really thing I’m great – great QA. It just tore my feature apart.

I didn’t think about that, yeah, okay, I’m doing math so I guess I should have thought, okay, fine. I’ll do it all over again.

[0:09:47.4] WJ: Yeah, what happens if we put in a negative number?

[0:09:50.5] DA: Don’t do that, back off, get away from the keyboard.

[0:09:55.1] WJ: No user will ever try that.

[0:09:58.0] DA: Yeah, I expect some great benefits of this, like some collaboration that we can have and some more assurances that our system is behaving as we want it to and we have all the features down. What are some things that kind of go wrong or know you about?

[0:10:14.6] WJ: Oh my god, so many things. So many, well I don’t know where to start. Organizing that code is really hard. Step definitions in particular are really hard to organize.

[0:10:24.9] DA: Yeah it seemed so atomic like by itself and they may not have as much meaning.

[0:10:30.5] WJ: Right and then we try and reuse them and then you find that a word that it seems like it means just one thing actually means a very different thing in another context.

[0:10:40.6] DA: English.

[0:10:42.0] WJ: Yeah, English is not like code, it turns out. I mean there is a lot of weird browser stuff that happens.

[0:10:49.1] DA: Oh yeah, browsers are weird.

[0:10:50.7] WJ: Yeah, browsers are very weird and you are driving a browser with a piece of software that is probably written in a different language from the one that you are programming in.

[0:11:00.0] DA: It is like it’s shelling onto it in asynchronous way, right? You are loading a webpage asynchronously.

[0:11:06.7] WJ: Right, yeah you have to have a web server serving your app and then you have to have another server, like driving a browser which may be on an entirely separate machine. There is a lot of moving parts. And then browsers are just often non-deterministic like they are making a bunch of asynchronous calls when they are loading the page. If you have a front end, if you have a single page app, there is a lot of JavaScript. A lot of times things load in a different order every time you refresh the page.

[0:11:33.5] DA: Yeah, I was looking at some tweets today about like React, like the React framework, and try how they are not guarantees about certain aspects of behavior of a render like something may happen one-time, multiple times or not at all. It is like, “what? Wait”.

[0:11:57.3] WJ: This is not how computers are supposed to work.

[0:12:00.3] DA: It is quantum.

[0:12:01.9] WJ: Yeah, I mean the other day I ran into a problem where a test was flaking because we were getting a timeout, but when we watched it, a page appeared to load just fine and it only happened like I don’t know, maybe one out of half dozen test runs. It would fail because of a timeout. We would look at the page and clearly it has loaded, it has been loaded for 10 seconds and then it times out.

[0:12:28.9] DA: And so, you were like debugging it even the headless browser you were like spawning it.

[0:12:32.0] WJ: This was in a headed browser. I was trying to curve them yeah.

[0:12:34.6] DA: Okay, got you.

[0:12:36.7] WJ: And it turned out that there was one network request that the page was making that never resolved. It made I don’t know, a hundred network request and all of them resolved. Some of them unsuccessfully, but all of them resolved, expect for one which occasionally just took like 30 seconds and so connection. We would time out the way we get this time timeout here.

[0:13:00.7] DA: Even though that within the user, you have first band.

[0:13:02.1] WJ: Even though from a user’s perspective there is the entire UI is there and everything first should be fine.

[0:13:07.4] DA: And this could happen in production and the UI could never get this one little piece of data.

[0:13:13.0] WJ: Yeah, this is just confused Selenium. It actually wasn’t really a behavioral problem that affected any users. It just broke the automated test. It made them flaky and tracking those kinds of weird browser bugs down is the kind of thing that I think drives engineers to abandon their unit their end to end tests.

[0:13:34.5] DA: How do you fixed it in the end, did you make it less brittle?

[0:13:37.9] WJ: No, we have fixed the apps until everything is loaded properly. It’s like, “Why did that one request take 30 seconds?” This is not performing enough. We have higher standards. Actually, I think we ended up just eliminating the call because it wasn’t needed.

[0:13:52.2] DA: Okay that is fair.

[0:13:55.7] WJ: Yeah, I mean you get to the point where your test is so flaky that people don’t trust them anymore if you are not good at keeping up with that kind of thing. I think if you are going to have tests, they have to be accurate every time. If you get a test that fails for the wrong reason, you have to address that right away because if you let that problem build then by the time you have 10 different tests that are no longer accurate, they fail for the wrong reason.

Or they pass for the wrong reason, now you have undermined the confidence in the test suite to the point where people won’t actually make decisions based off of it. And then now there is no point. Now you are maintaining code for no reason.

[0:14:34.1] DA: That is true and also, I guess if you do figure out what the source of flaking this is, I have experienced this issue where there is an assumption that was incorrectly made about how the test was written. Maybe the fixture data was populating data with a sequence that if you add it will record right before it in the sequence of all of your test running. It wouldn’t cause it to fail. Some specific assumption as made. It is possible that you made the assumption somewhere else.

So, it might be a good opportunity that you go in and fix that assumption in other places or see if there is a pattern of a bad behavior that you or someone else had implemented.

[0:15:17.3] WJ: Yeah, tests are good for that forcing refactoring.

[0:15:19.9] DA: Could you talk a little bit before about test taking too much time that seems like a big challenge. If you discover these end to end tasks, you’re like, “Oh my gosh they’re so amazing. I want to test everything. I want to know that everything is working properly and don’t want to throw any test away ever.” I am wondering how you balance that like want with the need for resolution of the test suite to each so you can run it repeatedly.

[0:15:50.5] WJ: I have seen companies take different strategies. I saw one company just delete all, but one of their acceptance tests. They had one end to end test that ran to make sure that the app actually worked at all and it is extremely fast. It was very fast. Pretty fast test suite, just the one test.

[0:16:07.3] DA: They just re-evaluate their priorities and they are like, “You know what?”

[0:16:10.2] WJ: Well they looked at how many times an end to end test has saved them from a production bug and they couldn’t find anything. They were like, “I don’t know if this is worth the investment.” I mean it cost a lot right, you are paying engineers to write the things and maintain the things because they break and you have to fix them. And then you are paying for the server costs to spin up a browser and run these automated tests with whatever cadence you’re running them on.

And then there is the time that it adds to whatever process that it is a part of, like if it is a part of deployment or if it is a part of merging.

[0:16:45.8] DA: It sounds like it is a really useful thing to have, but you have to think really hard about which ones, which cases, which features you are really going to be safeguarding in this end to end test.

[0:16:58.0] WJ: Right, I have seen companies use tags to categorize tests. So, you know you have your smoke test. These are the ones that we are going to run in production and maybe hook up to an automated roll back. Those are super high value and they’re going to be very fast and very cursory. It is like is the feature even there? Not all the cases are covered and you can’t really do anything that is going to mess with the database too much because it’s broad.

And then you can tag things for a specific feature. So if you want to test one specific feature, maybe that is the feature you are currently deploying or it is a particularly high value critical feature and then you can have a regression tag for all of the stuff that you want to run when you want to see if anything is broken in a while.

[0:17:39.6] DA: So, we are talking a little bit before about how browsers can be a little bit unreliable but if you are publicly available website, you can’t control which browser people are accessing it with. How you just decide which ones you do the test with?

[0:17:58.1] WJ: Right, yeah your browser support major, which ones you care about. I mean hopefully you have analytics and you know what percentage of your users are on, which platform.

[0:18:05.2] DA: Yeah, that always helps like having an informed decision about that.

[0:18:09.9] WJ: Right, I mean because your user base might be really in Firefox or it might be a bunch of people who are stuck on corporate networks that restrict them to Internet Explorer six or something awful.

[0:18:21.2] DA: Firefox 42. Yeah, it takes a certain amount of boldness to just be like, “we will not support anything but Chrome. Only the finest Chrome is allowed on this website.”

[0:18:35.9] WJ: Yeah, it is a great power move. I love that.

[0:18:39.4] DA: Yeah, but then they’re –

[0:18:40.2] WJ: It is a strong endorsement too of your favorite browser.

[0:18:44.5] DA: I mean as a developer I support it, although there are real financial implications of not supporting a browser, not supporting it well.

[0:18:54.3] WJ: Yeah and if your platform gets to be less popular, if your browser gets to be less popular then you could be missing out on major market share right down the road.

[0:19:02.4] DA: Yeah, I have worked on like an ecommerce site where they had that kind of metrics like they knew how many people where on those crusty IE platforms and they could do nothing about it. They kept on coming back, they wouldn’t upgrade their browsers, but like –

[0:19:17.9] WJ: We are cutting you off man.

[0:19:19.5] DA: But when there is a problem like with something that is supported or was needed on those older browsers, they actually sell at your financial impact because they were tracking it. They saw that conversions were down with this particular browser and it has a real impact on the bottom line. They’re like, “I guess it makes financial sense to support this crusty browser.”

[0:19:41.9] WJ: Yeah, that is sad. That makes me sad when old browsers win like that. Just get on the edge versions.

[0:19:50.2] DA: That is another layer of complexity too because when you are scripting, you are often doing it in a Linux environment like with your headless browser you know? Only the coolest toys, but then if you are doing IE8 or nine –

[0:20:05.9] WJ: Yeah, then you are going to test with it. I mean if you want an automated test and cover your full browser support matrix including IE then you are going to need a VM that can run Internet Explorer and a Selenium node that connects to it.

[0:20:22.3] DA: Right like actually having Windows somewhere in the browser, somewhere.

[0:20:27.6] WJ: Huge pain and I mean it is not just limited to that, right? I mean there’s you can get really aggressive with your browser support matrix. I mean what about operating system. I mean are we including that? Are we talking about OSX and IOS and Android and Windows and Linux?

[0:20:46.5] DA: This is turning into a four-dimensional cube; this is no longer a matrix. It is a Tesseract.

[0:20:53.9] WJ: Well and then what view port widths are we talking about? How many breakpoints do you want?

[0:21:00.9] DA: Yeah, are we talking about visual regressions now?

[0:21:03.8] WJ: I mean, I guess if you want to go there, yeah like. I mean normally people have designers all set specific breakpoints that they care most about, but in reality, they want responsive design like any view port is going to look at least not terrible.

[0:21:21.2] DA: Right, yeah just pixel by pixel, but it is really hard to do visual regression testing like there aren’t a lot of tools out there that do it well.

[0:21:31.4] WJ: Yeah, I haven’t seen anything that I liked. Although there are solutions out there.

[0:21:35.0] DA: Yeah, I have Googled it a couple of times. There are some sassy things, you know softwares or service, not things with a lot of attitude.

[0:21:44.4] WJ: Oh, that do.

[0:21:46.7] DA: Not discounting the level of attitude, that’s provided by these services.

[0:21:51.9] WJ: Yeah.

[0:21:52.2] DA: You have to pay money for those things because it is non-trivial problem.

[0:21:56.3] WJ: Yeah and I don’t think that it is that high value of a problem honestly like the visual regression that don’t affect functionality are not the bugs that I’m scared of. Fixing them is usually pretty trivial because you go in and you take some CSS. Users experience is not that affected because they are generally able to come up with a workaround.

[0:22:17.8] DA: They’re able to efficiently enough, they’re able to get through and you know.

[0:22:22.1] WJ: Usually –

[0:22:22.7] DA: Overlook your misshapen buttons.

[0:22:24.4] WJ: Yeah. Usually it’s only in a certain browser or a certain view port with or some particular combination of weird factors that are as better resolved by telling those people like upgrade your browser.

[0:22:39.8] DA: Because it’s good for you. Zero [inaudible 0:22:42] out there, people.

[0:22:45.0] WJ: Really. Honestly, if you're on IE 6, you’re used to the internet looking broken.

[0:22:52.5] DA: That’s true actually, if it looked normal then you’d be shocked.

[0:22:57.4] WJ: What, how are all the buttons in the right position on this page? It’s so strange.

[0:23:02.0] DA: I honestly think that people who are most strongly impacted by a visual regression are the people closest to you like the designers on your team.

[0:23:09.4] WJ: My god, it’s nails on chalkboard. What is this shade of red? It’s not on our style guide.

[0:23:15.0] DA: Yeah, they will slay you.

[0:23:19.3] WJ: No, actually, for real though, it would be great to hear a designer’s perspective on all this because we’re very biased as developers here. If you are a designer and you want to be on The Rabbit Hole, hit us up at radiofreerabbit on Twitter and maybe we’ll have you on the show.

[0:23:34.9] DA: All right, well I think we learned a lot about end to end testing like there’s a lot of aspects to it here. Some awesome in moderation and some things that are a little bit more challenging but overall, very useful tool to have in your basket.

Follow us now on Twitter @radiofreerabbit so we can keep the conversation going. Like what you hear? Give us a five star review and help developers just like you find their way into the rabbit hole and never miss an episode, subscribe now however you listen to your favorite podcast. On behalf of our producer extraordinaire, William Jeffries and our amazing host, Michael Nunez who is out being a dad and me your host, Dave Anderson, thanks for listening to the Rabbit hole.

[END]

Links and Resources:

The Rabbit Hole on Twitter