32. What Makes a Good Test?

Today on the show we’ll be talking about unit testing. Unit testing is the most important thing that we do for our clients and we will be talking about unit testing best practices. So, what does make a good unit test? We have seen some good ones and some bad ones; what sets a bad one a part from the good ones? The first step is that you need to discover what your actual unit is and what the best way to test it is, then ensuring that the unit is tested. Getting the definition of a unit would be the first thing that would be important and that could mean very different things depending on what language or framework you are working on. Stay tuned as we dive into the importance of defining your specific unit to be tested, the types of tests to be used, and why you should be doing more unit test to ensure that the particular unit of code that you have is working effectively.

Key Points From This Episode:

The importance of defining exactly what a unit is and aligning your team.
SQLAlchemy defined; building objects out of things stored in the database.
Strategies for speeding up the testing process over all.
Making unit tests a requirement as part of the code review process.
Private methods and whether or not they should be tested.
Finding the balance of wrapping code repetition into a function that is descriptive.
How to leverage context blocks.
Functional tests and how it helps in the realm of the unit test.
Happy path tests, smoke tests, and applying the wisdom of tests.
And much more!

Transcript for Episode 32. What Makes a Good Test?

[0:00:01.9] MN: Hello and welcome to The Rabbit Hole, the definitive developer’s podcast in fantabulous Chelsey Manhattan. I’m your host Michael Nunez. My cohost today.

[0:00:10.3] DA: Dave Anderson.

[0:00:11.2] MN: Our producer.

[0:00:12.0] WJ: William Jeffries.

[0:00:13.2] MN: And our regular guest.

[0:00:14.2] CQ: Charles Quirin.

[0:00:15.6] MN: And today, we’ll be talking about unit testing. Unit testing is the most important thing that we do at the clients that we’re on and we’ll be talking about unit testing, best practices.

[0:00:26.5] DA: Yeah, what does make a good unit test? I’ve seen some bad ones. You know, what sets a bad one a part from the good ones?

[0:00:34.5] CQ: I think that you need to kind of discover what your actual unit is and what the best way to test it and ensuring that that unit is tested. So I guess, getting the definition of a unit would be the first thing that would be important and that could mean very different things depending on what language or framework you’re working on.

[0:00:55.2] MN: Yeah.

[0:00:56.1] DA: Like my capybara feature test is not a unit of code?

[0:01:00.1] CQ: That is probably not a unit of code.

[0:01:02.4] DA: Okay. What about like if I have a test that I’m creating like some database records in it?

[0:01:08.7] CQ: That may be an integration test but some teams may call that a unit test.

[0:01:14.0] DA: Yeah. I guess we just need to get on a same page.

[0:01:18.3] CQ: Yeah. I think that it’s like really important to get on the same page with your team as to what’s a unit test. I think that this is like a topic that I’ve been thinking about a lot in the last year or so since I’ve been doing a lot more React and what good unit tests for a React class would look like.

[0:01:38.1] WJ: I think another factor there is how tightly coupled different sections of your codebase are. Because if we’re thinking about a unit as being an atomic unit that is, that cannot be subdivided any further. Then some languages actually – or some frameworks couple things so closely that you could end up having two very separate things which do kind of need to be tested together.

I’m thinking, you could have an ORM that’s so deeply connected to your models that it would be impossible to test an individual function without a database solution.

[0:02:14.9] DA: Right. Just say, it’s rails. Dammit rails.

[0:02:21.8] CQ: He’s using SQLAlchemy.

[0:02:24.8] MN: What is SQLAlchemy?

[0:02:26.1] CQ: SQLAlchemy is an ORM in the Python world. A lot of people who use Flask, partner Flask with SQLAlchemy so that way they can build objects out of things stored in their database.

[0:02:40.7] MN: Yeah, basically Active Record.

[0:02:43.0] CQ: Yup, basically Active Record.

[0:02:43.8] MN: Same deal.

[0:02:45.5] DA: What kind of testing framework do you use for that? Is there – I guess there’s Factory Boy when you’re working with that kind of stuff, right?

[0:02:53.5] CQ: Factory Boy could be helpful if you’re creating mock objects.

[0:02:58.9] MN: We mentioned dummy data frameworks and mocking and stubbing, does that play a role on having a slow test week? How do we speed up the test week process in general in our code base? There are some reasons why slow test cases are happening and when you run all your specs. What are some of the things that would elude to that?

[0:03:24.8] WJ: Well, one thing could be that your code is slow, I mean, it actually could be helpful. Tests are pointing out to you what area of your code is slow. A lot of test runners have a feature where they’ll highlight which tests individually or the slowest, which might point you in the right direction of where you can make some gains by refactoring.

[0:03:42.2] MN: Could be that you’re also not isolating your test enough? You thought you were a unit test but actually exercise half the code base by accident?

[0:03:50.4] CQ: You could also separate out some of the units that you already determined were units but maybe those units are not small enough. For example, if there’s a function that’s running, it takes a little while to return, you can have test for that function and then separately, if there’s another unit of code that will become dependent on that function, you can lock that function out. It could be that you just have to refactor to pull out that functionality.

[0:04:16.6] WJ: You could also have an inverted pyramid where you just have too many feature test and not enough unit tests.

[0:04:22.9] MN: I’ve seen that before.

[0:04:24.5] WJ: It’s tempting to do because the feature test gives you so much more comfort that everything’s working.

[0:04:30.2] DA: Yeah, as long as they’re working properly and not being flaky because sometimes it’s pretty non-deterministic.

[0:04:36.1] WJ: Very true.

[0:04:37.1] MN: Phantom JS.

[0:04:39.4] WJ: Browsers in general, yeah.

[0:04:43.0] CQ: What’s a strategy if you are moving from a test suite that is so heavily reliant upon and a great number of feature tests or end to end tests?

[0:04:54.8] DA: I find like, just the act of creating the first unit test for something is the hardest thing sometimes. If there’s just no test file when I look for it then, you know, there’s a million feature tests then it’s very tempting just to, “Okay, I’ll just bolt on another feature test at the end of that.”

Just by writing that first test file and having it there, even if you don’t exercise all of the code in that particular module, if you can just start with writing one test and then when you come back to it, just leave things better than you found it, keep on growing that coverage over time. Maybe eventually deleting some feature tests.

[0:05:36.4] WJ: You can make unit tests a requirement as part of the code review process, I’ve seen that.

[0:05:40.8] DA: Yeah, peer pressure.

[0:05:44.2] MN: There’s also coverage to ensure that you have tests, right? If a particular file doesn’t have XML percentages of the code tested then it’s not – you won’t even be able to merge it on GitHub. I find that to be a pretty good tactic to you know, make fun of coworkers, make sure that they write test. We don’t want to make fun of our coworkers, but it’s a good way to say, “Hey, we can up to coverage and make sure that that is our level.”

[0:06:11.9] CQ: Right, it kind of gamifies it, you got to like get a high score.

[0:06:16.9] WJ: Shout out to Code Climate for making this super good.

[0:06:19.1] DA: Yeah.

[0:06:20.7] MN: We all agree that private methods shouldn’t be unit tested or it’s kosher, right? You don’t normally test private methods.

[0:06:30.1] CQ: That can be like a really interesting argument and in some languages, it’s more clear cut than others. For example in Python, there is no true privacy. There’s like the belief and trust in the developers. You usually have to denote that something’s private by putting an underscore in front of the function or method.

Some developers that I’ve worked with, doing Python code, they were adamant about testing private functions or methods. It’s something that I don’t’ personally prefer to do, but I think it’s ultimately like I said, defining that unit of code and what is a unit of code and if that’s what your intentional public interface. Then so be it. Then you have maybe test that a little bit more of what some people call an integration test.

I think that it reminds me of an article that Martin Fowler wrote about sociable unit test and solitary unit tests. I believe that’s the terminology that he used and so if you believe in like more sociable unit tests, that means that your unit tests may call other functions or rely upon other classes to a certain extent. If you believe in more solitary unit tests, or if you want to do more solitary unit tests, you may be mocking and stubbing a lot of stuff.

I find, when I unit test, if I’m constantly mocking and stubbing a lot of different functions and maybe some classes, I find that that’s an anti-pattern and that there’s something there’s an issue there either my unit of code is too small and I have to make it bigger or I have to maybe refactor my code because these different modules are too dependent on each other.

[0:08:23.6] DA: Right, yeah and there’s sometimes like way more graceful ways to get that same effect by doing dependency injection rather than stubbing things that are private. I like the idea of that if you’re writing a test for something, it should make it easier to work with in the future rather than harder to work with in the future where like if you’re just doing like double entry bookkeeping where you’re literally writing the same code in your test as you have in the method itself, then maybe that’s not a good test and maybe should question if you pulled back a little bit and see if the unit that you should be testing is a little bigger.

[0:09:03.3] WJ: I think it can also be helpful to delete tests as you’re writing them. For example, we were talking about private methods and whether or not you should test them. Sometimes I will test drive out private methods before I make them private. Then I will either make them private or I’ll delete them and refactor them away. I find that that’s helpful for two reasons. One is that TDD allows me like a sort of a steering mechanism. It’s like headlights, they’re showing me the way and then, it also allows me to clean up my test suite so that it’s more readable and more well formatted because I’m already in the mindset that some of these tests are going to need to be deleted. It gets you closer to the minimum number of tests.

[0:09:55.1] CQ: Yeah, I think that helps also with self-documenting code. Self-documenting code, you want your test to be extremely readable because it will basically explain what that piece of code that you’re exercising is doing.

If you have a lot of test in there that are not necessarily relevant or relevant to private methods, it may not be as helpful in actually describing what that public method or what that public interface does. I think that Dave, you were kind of hinting at something like, about test readability and something that I’ve actually came across today is how dry is too dry for tests?

[0:10:36.5] DA: Yeah, the old damp versus dry argument, right? Dry is like “Don’t repeat yourself” and damp is, “What is that? I don’t know what it means actually.”

[0:10:53.2] WJ: By the magic of post-production.

[0:10:56.0] MN: Damp means, descriptive and meaningful phrases. DAMP.

[0:11:02.0] DA: Definitely not an acronym.

[0:11:05.9] MN: Descriptive and meaningful phrases versus don’t repeat yourself.

[0:11:09.9] CQ: Yeah, I don’t know how phrases kind of works into that. I guess, in your test descriptions, you want like descriptive and meaningful phrases but sometimes I find that when, I like to have my expectations extremely readable and sometimes even my setup pretty readable. Because sometimes things can be a little bit office gated in code and ideally don’t want it that way.

You want like the next person that’s coming across that code, at least to be able to run and read the test and see, “Okay, I’m asserting that this object is giving me back a dictionary that looks XY and Z way,” but if you have that dictionary with a bunch of helpers or something, it may not be so clear what that looks like.

[0:11:54.0] DA: Right, if you expect that the output is the result of some helper function then you have to go to page 10 and find the helper function and then keep on working backwards.

[0:12:06.1] CQ: Yeah, I hate to repeat myself I guess, but I actually have learned to love repetition in code because it’s helped me a lot when previous authors of code have been more descriptive I guess or have been more verbose in test code and it does lead to a lot of deleting of code and a lot more repetition unfortunately but it is very helpful.

[0:12:31.8] MN: Well, you mentioned when you do have a lot of repetition, isn’t it possible to just like have you know, the best of both worlds where you can wrap the repetition in a function that is descriptive so that then, you can read it and understand what it is that it’s doing and then see the repetition happen anyway?

[0:12:51.1] CQ: Yeah, I guess there’s a balance to that.

[0:12:52.8] WJ: I think sometimes it works and sometimes it doesn’t because sometimes that abstraction, extracting that method makes your code less readable for the purpose of testing. Now you’re misdirecting people’s attention away from the spec that you just wrote and into the definition of this method. Really, what you want them to be focusing on is the functionality that you’re testing.

[0:13:17.7] DA: I see.

[0:13:18.9] CQ: For example, if I were testing that there was a string being returned by a method, I would actually type out that string even if I had to put it in the next 10 tests. Because I just prefer to see what that string is. I mean, there might be some exceptions but like, I think that it’s a lot easier to just be able to read the string than to kind of look at a variable.

[0:13:43.5] MN: Okay. I’m going to peacefully disagree. I’d rather have a variable with the string and then if you ever need to know what that string is again, you can just go up there or whenever you get the chance. Just if the event that that string changes for whatever reason, you only have to make that change in one place, what I prefer out of the two. But I totally can see why you would want to see that piece of string being tested because that is not the point of the spec, the point of the spec is the things after it. So to draw people away from a variable is no Bueno.

[0:14:17.6] WJ: I mean, it’s a question of degree, right? In that instance, I think you could go either way with it. But let’s say we were testing a rock paper scissors class, you know? That is able to input two hand throws of rock, paper, or scissors and then determine the winner.

[0:14:36.0] DA: Right.

[0:14:37.4] WJ: In your spec, you would want to pass in rock and rock, rock and paper, paper and paper, et cetera, right? All the way through all the different permutations.

[0:14:46.6] DA: Right.

[0:14:47.0] WJ: Now, you could extract that into a method called “evaluate” and then have that in your test suite and then, you know, go through and compare, have it generate your rock, your paper, your scissors and all the different permutations and then have your evaluate method, calculate them correctly and then return that result and compare it to what the class actually does. But now your test suite has implemented all of the functionality of the classes is supposed to be testing and so it does not add any new clarity to the leader.

[0:15:18.1] MN: I see.

[0:15:19.3] CQ: I think it also varies because it depends on what you’re testing in your test like in the rock-paper-scissors example, in one case you may be dependent upon like some other external library. It’s talking to the GitHub API, let’s say. So let’s say you just put a spy on the method that makes that request. You can make an assertion on that spy and in your assertion, you will not care about the other stuff. So I think the importance is what is your test testing, and in that expectation you want it to be apparent what that expectation is, whether it’s an object or a string or whatever you are testing against.

[0:16:02.3] DA: I think you touched on something important that we haven’t really talked about much yet, which is like how many expectations you did have in your ideal unit test.

[0:16:11.8] CQ: In each test?

[0:16:13.9] DA: Yeah in each actually test case itself.

[0:16:17.6] CQ: Yeah. I like to have one.

[0:16:20.4] DA: That’s a good number.

[0:16:22.1] MN: Yeah, one is a good number. Yes.

[0:16:24.3] CQ: Maybe in certain exceptions two, although I cannot think of one of those exceptions right now.

[0:16:29.2] DA: I have sometimes squeezed two in like if I have a lat and a long field, then I don’t want to duplicate things and it’s like really a two poll maybe.

[0:16:38.4] CQ: I actually had a test case that I improved upon today and it was in Java Script and I was comparing something that was stored in a variable versus, I can’t remember — something else. But there was a chance that both things were undefined and so I’ve noticed this bug because I wrote another test and I was like, “This test is not working but this test should be working if this is defined,” and then I was like, “Oh of course, we’re testing if undefined is equal to undefined.”

So I quickly added another expect cause and added that we expect it not to be undefined and I think in cases like that where it is directly relevant to the test case that it can be very helpful. I mean, I don’t think I needed to ask to add another test case that it’s not undefined because I really didn’t care if it was undefined. It was more that it was not working if it’s undefined like the initial test case should fail.

[0:17:45.5] DA: Right, you’re getting a false positive?

[0:17:47.4] CQ: Exactly.

[0:17:48.4] MN: Right but I think like in your example some purist, I will say, who believes in one test and one test only would have probably written that very same exact test but then just tested that it was undefined.

[0:18:03.6] CQ: I think that would have been fine as well.

[0:18:05.0] MN: Right.

[0:18:05.4] WJ: I think it gets found to the error messages that you are going to produce because if it’s undefined and you are calling a method on it, now you get an undefined is not a function kind of an error message, which is like my least favorite job descriptor.

[0:18:19.5] CQ: Unless of course you are comparing undefined to undefined in which case it just says that something is not equal to something else.

[0:18:27.2] MN: “Undefined is not a function” is not my favorite.

[0:18:31.3] DA: Is there any other message? It’s just that one right?

[0:18:35.1] CQ: How about context blocks, how do you leverage those?

[0:18:39.9] WJ: I think you have to have context, you have to actually be in the English sense of the word some context for the function to operate inside of. Like if you have a dog class and it behaves differently when you are indoors than when you’re outdoors, when you’re indoors it won’t poop because it’s well-trained, right? Then the context during which you test the poop method could be indoors and then outdoors and then you will find that it will eat in either location.

[0:19:06.1] DA: Yeah, the tricky thing with that though is that like maybe your dog has an age thing and then at some age, it becomes trained not to poop inside and now you have an extra content in here and then you could have like any number of further contexts, like is the dog sick, does it hate you?

[0:19:26.6] MN: When it hates you.

[0:19:28.9] CQ: So you’re saying about nesting context?

[0:19:31.4] DA: Right.

[0:19:32.1] CQ: What is the balance between nesting contexts and not nesting contexts? Because I have seen test suites where you have five nested contexts and you had before blocks in each context and variables that are being defined in each one and soon you didn’t know what the hell was going on in that test.

[0:19:53.2] DA: Right, yeah I guess it’s a question like is it really just one context where you know the dog is old and sick and doesn’t like you or? I like dogs, by the way.

[0:20:06.2] MN: Yeah, it’s true. I am not a fan of dogs. I am scared of them, terrified.

[0:20:10.1] CQ: I like to flatten my contexts but that’s not very dry so I guess that also comes to a balance.

[0:20:18.7] DA: Right.

[0:20:20.0] MN: Yeah, I mean I agree with what William mentioned before like in the English sense of your test, it should match the particular test has to have some kind of context so that when you read it, you can say, “dog class when indoors it doesn’t poop on the ground” and then when that test case fails at least in a, I’m thinking — mocha comes to mind when testing React like it will read that way when it fails and then you know exactly, “Oh that context has to do when it’s indoors, okay. Why is the dog pooping everywhere indoors? We need to fix that.” I think it works very well when tests fails, the error message you want to see.

[0:21:02.5] WJ: Yeah, I think that’s a good rule of thumb is what is going to give the developer the best error message in the future? Because that’s when your test really matter, is when they fail and when you’re trying to use them to guide you to a problem.

[0:21:15.8] DA: Right, coming back to what I was saying before like if the test makes it easier to work with to refactor then extend then that’s a good test.

[0:21:25.3] MN: So we’ve been talking about unit tests for some time but what is a functional test and how does that help out in that space in the realm of the unit test?

[0:21:34.4] CQ: So functional test usually test a specific – it’s not functionalized and your testing is specific function but you’re testing like a specific chunk or slice of functionality and I think that sometimes the terminology “functional” is interchanged with end to end tests, which is interchanged with smoke tests. I think that it could be a little bit different because usually when someone is talking about a smoke test, they mean that you’re following a specific path through an application and clicking around and maybe doing like if you have a content management system, you’re entering some type of content and you are going from the point where you’re creating the content to the point where you submit it and it gets accepted or something like that.

[0:22:26.7] DA: Right, your happy path.

[0:22:28.5] CQ: Yeah, happy path testing and happy path tests are when you are testing when everything works basically not when you try to submit that content and somehow you get some kind of error because you can’t speak to the API or whatever.

[0:22:46.0] WJ: Yeah, I found out apparently the term smoke test actually comes from electronics. One way of testing a circuit board is to pass current through it and see if it smokes. If it smokes then your board is fried and you did it wrong and you have to start over. Throw that away.

[0:23:00.9] MN: Oh gosh.

[0:23:03.4] DA: Yeah, I have totally done that before. Yep.

[0:23:05.6] CQ: I always thought it was from plumbing and I think this is from an article I read that they would actually, to see if a pipe is taught and secure before actually running water through it and then having water spread out everywhere. They’ll actually put smoke through the pipe and try to see if it will come out the other end. If there’s smoke that comes out any place in the pipe then that signifies that the pipe is broken.

[0:23:32.6] MN: All right so it seems like –

[0:23:34.1] CQ: I might be wrong.

[0:23:36.0] DA: I like both these stories but regardless, where there’s smoke there’s fire.

[0:23:40.6] WJ: That’s how I thought, that’s what I thought it originally was about because when I first started hearing the term, we were running tests in production right after a deploy to see if we were going to roll back and my thinking was, “If those smoke tests fail it’s because the build is on fire.”

[0:23:58.7] DA: Yeah, it is a very visceral description. But yeah, like I guess, as wonderful as they are and as much like tangible results they have, there’s definitely a balance like Martin Fowler, like always with the testing wisdom and other wisdom like he has a good article the testing pyramid and how sometimes you want to have more functional tests but you should resist that and kind of stay towards the bottom of the pyramid, which is the unit test and you know, have a little bit of a sandwich with some integration tests in the middle.

[0:24:34.8] CQ: Here’s a question. Could you have functional test that are not necessarily a test of the UI?

[0:24:40.3] DA: I guess you could if you’re testing like a rested UI. You could test that as a functional test. I guess it would be like a contract test for the consumers of that.

[0:24:51.6] MN: Interesting. In the smoke test, you guys both William and Charles brought up, you have to pass in something through one side and hopefully things happen on the other end, I guess in William’s circuit board result, electricity goes through which burns the entire thing or actually just fries the entire board and in Charles’ example with the pipes, guess that is like why you test it from one end to the other and that’s why it’s a thing.

[0:25:21.6] WJ: End to end test.

[0:25:21.5] MN: End to end test. There you go with some smoke. There you go. Just to conclude the conversation on unit tests, they’re very helpful, you should do it and make sure that you’re doing more unit test to ensure that the particular unit of code that you have is working effectively.

[0:25:41.1] DA: Yeah, write a bunch, have less functional tests.

[0:25:44.8] MN: Cool, so that wraps up the episode. I’d like to thank the cohost, thanks Dave.

[0:25:51.4] DA: Thanks man.

[0:25:52.5] MN: Our producer, William.

[0:25:54.8] WJ: Anytime.

[0:25:55.7] MN: And our guest, Charles, thank you.

[0:25:59.1] CQ: Thank you.

[0:25:59.6] MN: Awesome. Feel free to hit us up, twitter.com/radiofreerabbit. This is The Rabbit Hole. We’ll see you next time.

Links and Resources:

The Rabbit Hole on Twitter

Unit Test

Martin Fowler