- TDD, much like scrum, got corrupted by the "Agile Consulting Industry". Sticking to the original principles as laid out by Kent Beck results in fairly sane practices.
- When people talk about "unit tests", a unit doesn't refer to the common pattern of "a single class". A unit is a piece of the software with a clear boundary. It might be a whole microservice, or a chunk of a monolith that is internally consistent.
- What triggers writing a test is what matters. Overzealous testers test for each new public method in a class. This leads to testing on implementation details of the actual unit, because most classes are only consumed within a single unit.
- Behavior driven testing makes most sense to decide what needs tests. If it's required behavior to the people across the boundary, it needs a test. Otherwise, tests may be extraneous or even harmful.
- As such, a good trigger rule is "one test per desired external behavior of the unit, plus one test per bug fixed". The test for each bugfix comes from experience -- they delineate tricky parts of your unit and enforce workign code around them.
If I recall correctly, another very important point he makes in that talk is that it's fine to delete tests. TDD tends to result in a lot of tests being created as a sort of scaffolding to support initial development. The thing about scaffolding is, when you're done using it, you tear it down.
I don't think he mentions it during the talk, but the next step after deleting all those tests is a little bit more refactoring for maintainability. Now that you've deleted all the redundant tests, you can then guard against future developers unwittingly becoming tightly coupled to your implementation details, by taking all the members that used to only be exposed for testing purposes, and either deleting them or making them private.
> you should delete tests for everything that isn't a required external behavior
Wait, I'm terribly confused here.
Aren't a huge part of tests to prevent regression?
In attempting to fix a bug, that could cause another "internal" test to fail and expose a flaw in your bugfix that you wouldn't have caught otherwise. And it's not uncommon for your flawed bugfix to not cause an "external" test to fail, because it's related to the codepath there was never a good enough external test for in the first place -- hence why the bug existed.
I can't imagine why you would ever delete tests prematurely. I mean, running internal tests is cheap. I see zero benefit for real cost.
And not only that, when devs don't document the internal operation of a module sufficiently, keeping the tests around serves as at least a kind of minimal reference of how things work internally, to help with a future dev trying to figure it out.
If you're refactoring an implementation, then obviously at that point you'll delete the tests that no longer apply, and replace them with new ones to test your refactored code. But why would you delete tests prematurely? What's the benefit?
> In attempting to fix a bug, that could cause another "internal" test to fail and expose a flaw in your bugfix that you wouldn't have caught otherwise.
If an external test passes and an internal test fails, the external test isn't really adding any value, is it? And if the root of your issue is "What if test A doesn't test the right things", doesn't the whole conversation fall apart (because then you have to assume that about every test)?
IME this is a common path most shops take. "We have to write tests in case our other tests don't work." Which is a pretty bloaty and wildly inefficient strategy to "Our tests sometimes don't catch bugs." Write good tests, manage and update them often. Don't write more tests to accommodate other tests being written poorly.
> I mean, running internal tests is cheap.
Depends on your definition of cheap, I guess.
My last job was a gigantic rails app. Over a decade old. There were so many tests that running the entire suite took ~3 hours. That long of a gap between "Pushed code" and "See if it builds" creates a tremendous amount of problems. Context switching is cost. Starting and unstarting work is cost.
I'm much more of the "Just Enough Testing" mindset. Test things that are mission critical and complex enough to warrant tests. Go big on system tests, go small on unit tests. If you can, have a different eng write tests than the eng that wrote the functionality. Throw away tests frequently.
I understand what you're saying, but in my experience that's not very robust.
I've often found that an internal function might have a parameter that goes unused in any of the external tests, simply because it's too difficult to devise external tests that will cover every possible internal state or code path or race condition.
So the internal tests are used to ensure complete code coverage, while external tests are used to ensure all "main use cases" or "representative usage" work, and known frequent edge cases.
That doesn't mean the external tests aren't adding value -- they are. But sometimes it's just too difficult to set up an external test to guarantee that a deep-down race condition gets triggered in a certain way, but you can test that explicitly internally.
It's not that anyone is writing tests poorly, it's just that it simply isn't practically feasible to design external tests that cover every possible edge case of internal functionality, while internal tests can capture much of that.
And if your test suite takes 3 hours to run, there are many types of organizational solutions for that... but this is the first I've everd heard of "write less tests" being one of them.
> I've often found that an internal function might have a parameter that goes unused in any of the external tests,
It seems that you're still thinking about "code". What if you thought about "functionality"? If an external test doesn't test internal functionality, what is it testing?
> But sometimes it's just too difficult to set up an external test to guarantee that a deep-down race condition gets triggered in a certain way, but you can test that explicitly internally.
I would argue that if you're choosing an orders of magnitude worse testing strategy because it's easier, your intent is not to actually test the validity of your system.
> while internal tests can capture much of that.
We can agree to disagree.
> And if your test suite takes 3 hours to run, there are many types of organizational solutions for that... but this is the first I've everd heard of "write less tests" being one of them.
I was speaking about a real scenario that features a lot of the topics that you're describing. My point was not that it was good, my point was that testing dogmatism is very real and has very real costs. To describe writing/running lots of (usually unnecessary) tests as "cheap" is a big red flag.
Not the poster you replied to, but I've been thinking of it lately in a different way. Functional tests show that a system works, but if a functional test fails, the unit test might show where/why.
Yes, you'll usually get a stack trace when a test fails, but you might still spend a lot of time tracing exactly where the logical problem actually was. If you have unit tests as well, you can see that unit X failed, which is part of function A. Therefore you can fix the problem quicker, at least for some set of cases.
Internal code A has 5 states, piece B has 8 states.
Testing them individually requires 13 tests.
Testing them from out the outside, requires 5x8=40 tests.
Now, if you think of it that way, maybe you _do_ want to test the combinations because that might be source of bugs. And if you do it well, you don't actually need to write 40 tests, you have have some mechanism to loop through them.
But the basic argument is that the complexity of the 40 test-cases is actually _more_ than the 13 needed testing the internal parts as units.
FWIW, my own philosophy is to write as much pure-functional, side-effect free code that doesn't care about your business logic as possible, and have good coverage for those units. Then compose them into systems that do deal with the messy internal state and business-logic if statments that tend to clutter real systems, and ensure you have enough testing to cover all branching statements, but do so from an external to the system perspective.
I've got the impression that you are both talking slightly past each other.
At least my impression is that these "internal tests" you talk about are valid unit tests -- but not for the same unit. We build much of our logic out of building blocks, which we also want to be properly tested, but that doesn't mean we have to re-test them on the higher level of abstraction of a piece of code that composes them.
From that thought, it's maybe even a useful "design smell" you could watch out for if you encounter this scenario (in that you could maybe separate your building blocks more cleanly if you find yourself writing a lot of "internal" tests)?
Isn't the idea with unit testing forgotten here? The point is to validate the blocks you build and use to build the program. In order to make sure you've done each block right you test them, manually or automated... Automated testing just is generally soooo much easier. If you work like that, and do not add test after y,ou've written large chunks of code you should have constructed your program so that there's no overhead in the test. Advanced test which does lots of setup and advanced calculations generally ain't the test fault, but the code itself that requires that complexity to be tested.
Wanna underline here that system tests are slow, unit tests are fast.
This said, i agre that you should throw away tests in a similar fashion as you do code. When it does not make sense don't be afraid to throw it, but have enough left to define the function of the code, in a documenting way. Let the code /tests speak! :D
Imo the value of unit tests is partially a record for others to see "hey look, this thing has a lot of it's bases covered".
Especially if you're building a component that is intended to be reused all over the place, would anyone have confidence in reusing it if it wasn't at least tested in isolation?
If the test suite took hours, couldn't part of the problem be that a lot of those tests should have been more focused unit tests? With small unit tests and mocking, you could run millions of tests in 3 hours.
There were all kinds of problems with the test suite that could've been optimized. The problem was that there were too many to manage, and that deleting them was culturally unacceptable.
Lots of them made real DB requests. It's hard to get a product owner to justify having devs spend several months fixing tests that haven't been modified in 9 years.
If it can cause a regression, it's not internal. My rule of thumb is "test for regression directly", meaning a good test is one that only breaks if there's a real regression. I should only ever be changing my unit tests if the expected behavior of the unit changes, and in proportion to those changes.
A well-known case is the Timsort bug, discovered by a program verification tool. Also well known is the JDK binary search bug that had been present for many years. (This paper discusses the Timsort bug, and references the binary search bug: http://envisage-project.eu/proving-android-java-and-python-s...)
In both cases, you have an extremely simple API, and a test that depends on detailed knowledge of the implementation, revealing an underlying bug. Obviously, these test cases, when coded, reveal a regression. Equally obviously, the test cases do test internals. You would have no reason to come up with these test cases without an incredibly deep understanding of the implementations. And these tests would not be useful in testing other implementations of the same interfaces, (well, the binary search bug test case might be).
In general, I do not believe that you can do a good job of testing an interface without a good understanding of the implementation being tested. You don't know what corner cases to probe.
Using implementation to guide your test generation ("I think my code might fail on long strings") is fine, even expected. Testing private implementation details ("if I give it this string, does the internal state machine go through seventeen steps?") is completely different.
That's not what he's saying. He's saying the test should measure an externally visible detail. In this case that would be "is the list sorted". This way the test will still pass without maintenance if the sorting algorithm is switched again in the future. You can still consider the implementation to create antagonistic test cases.
One of my colleagues helped find the Timsort bug and recently another such bug (might be the Java binary search, don't remember).
The edge case to show a straightforward version of that recent bug basically required a supercomputer. The artifact evaluation committee complained even.
So you can try to test for that only based on output. But it's gigantically more efficient to test with knowledge of internals.
this sounds like a case where no amount of unit testing ever would've found the bug. someone found the bug either through reasoning about the implementation or using formal methods and then wrote a test to demonstrate it. you could spend your entire life writing unit tests for this function and chances are you would never find out there was an issue. i'd say this is more of an argument for formal methods than it is for any approach to testing.
One doesn't need to have detailed knowledge of the implementation, but merely if provided initial state creates invalid output then we can write test for that. Though yes, having knowledge of implementation allows you to define the state that produces invalid result.
Fair enough. And how do you know, before causing a regression, whether your test could detect one? In other words, how can you tell beforehand whether your test checks something internal or external?
"External" functionality will be behavior visible to other code units or to users. If you have a sorting function, the sorted list is external. The sorting algorithm is internal. Regression tests are often used in the context of enhancements and refactorings. You want to test that the rest of the program still behaves correctly. Knowing what behavior to test is specific to the domain and to the technologies used. You can ask yourself, "how do I know that this thing actually works?"
Isn’t the point that internal functions often have a much smaller state space than external functions, so it’s often easier to be sure that the edge cases of the internal functions are covered than that the edge cases of the external function are covered?
So, having detailed tests of internal functions will generally improve the chances that your test will catch a regression.
> Isn’t the point that internal functions often have a much smaller state space than external functions
That's the general theory, and why people recommend unit tests instead of only the broader possible integration tests. But things are not that simple.
Interfaces do not only add data, they add constraints too. And constraints reduce your state space. You will want to cut your software over the smallest possible interface complexity you can find and test those, those pieces are what people originally called "unities". You don't want to test any high-complexity interface, those tests will harm development and almost never give you any useful information.
It's not even rare that your unities are composed of vertical cuts through your software, so you'll end up with only integration tests.
The good news is that this kind of partition is also optimal for understanding and writing code, so people have been practicing it for ages.
I agree that they would help in the regression testing process, especially in diagnosing the cause. However, I think those are usually just called "unit" tests, not "regression" tests. For instance, the internal implementation of a feature might change, requiring a new, internal unit test. The regression test would be used to compare the output of the new implementation of the feature versus the old implementation of the feature.
Worth noting that performance is an externally visible feature. You shouldn't be testing for little performance variations, but you probably should check for pathological cases (e.g. takes a full minute to sort this particular list of only 1000 elements).
For features, I need to take the time to think of required behavior. If I just focus on the implementation, the tests add no documentation and I'm not forced through the exercise of thinking about what matters.
> Aren't a huge part of tests to prevent regression?
Just a quibble: I would argue that a huge benefit of tests is preventing regression, but that's a very small part of the value of tests.
The main value I get out of tests is informing the design of the software under test.
* Tests are your straight-edge as you try to draw a line.
* They're your checklist to make sure you've implemented all the functionality you want.
* They're your double-entry bookkeeping to surface trivial mistakes.
But I think I mostly agree with your point. I delete tests that should no longer pass (because some business logic or implementation details are intentionally changing). I will also delete tests that I made along the way when they're duplicating part of a better test. If a test was extremely expensive to run, I suppose I might delete it. But in that case I would look for a way to cover the same logic in tests of smaller units.
All legitimate tests are[0] regression tests. TDD, to the extent that it's actually useful, is the notion that sometimes the bug being regression-tested is a feature request.
Edit: 0: I guess "can be viewed as" if you want to be pedantic.
> Aren't a huge part of tests to prevent regression?
Depends on the kind of tests. Old school "purist" unit tests are meant to help you verify the correctness of the code as you're writing it. Preventing regressions is better left to integration tests and E2E tests, or smoke tests. Alternatively to "unit tests" if your definition of "unit" is big enough (in which case it only works within the unit).
It's totally fine and common to write unit tests that are not meant to catch bugs of significant refactors. If you do it right, they should be so easy to author that throwing them away shouldn't matter.
Integration, E2E, and smoke tests are generally slow, flakey, hard to write. They should not cover/duplicate all the cases your unit tests cover.
They are good at letting you know all your units are wired up and functioning together. In all the codebases I've ever worked in, I would feel way more comfortable deleting them vs deleting the unit tests.
Why would you want to? when the same unit test coverage will run under 1 minute, and be smaller easier to understand/change tests and can all be done on your laptop.
it all depends on your definition of unit/integration, what I am talking about as unit tests you may very well be talking about as integration tests...
one of the main points I was making is you shouldn't have significant duplication in test coverage and if you do, I'd much rather stick with the unit tests and delete the others.
> Unit tests are generally much harder to understand and need to be changed much more frequently.
Changed more frequently, yes.
Harder to understand is usually because they're not-quite-unit-tests-claiming-to-be.
Eg: a test for function that mocks some of its dependencies but also does shenanigans to deal with some global state without isolating it. So you get a test that only test the unit (if that), but has a ton of exotic techniques to deal with the globals. Worse of all worlds.
Proper unit tests are usually just a few line long, little to no abstraction, and test code you can see in the associated file without dealing with code you can't see without digging deeper.
If you can refactor (make a commit changing only implementation code, not touching any test code) and the tests still pass then you’re probably fine.
If you’re changing tests as you change the code you’re not refactoring. You have zero confidence that your changed behaviour and changed test didn’t introduce an unintended behaviour or regression.
if you can refactor without touching your tests and your tests still compile afterwards either the refactor was extremely trivial and didn't change any interfaces or you only had end to end tests.
I think the point is that if you have to change a test to make it pass or run after refactoring, it is not useful as a regression test. By changing it you might have broken the test itself so you have less confidence.
There is also the question of what a unit is. If you test (for example) the public interface of a class as a black box unit, you can refactor your class internals as much as you want and your tests don't need to change. You have high confidence you've done it correctly. At this point adding more fine-grained tests inside the class seems like more of a compliance activity than one that actually increases confidence, since you probably would've had to change a bunch of them to make them work again anyway.
Personally the way I'd phrase it is you need to refactor your tests just like you'd refactor the app code, but even looking at doing that independent of any app code refactoring.
Agreed. I would take an even stronger position, and say that a high degree of mocking actually implies two things: First, yes, you're testing at too fine-grained a level. Second, it's a code smell that suggests you may be working with a fundamentally untestable design that relies overmuch on opaque, stateful behavior.
Mocks are worthwhile though. Otherwise you end up not being able to unittest anything which accesses an external api such as databases, rest services etc.
IMO, databases is often an integral part of the program and should be part of the test (a real database in a docker image).
For instance, if you are not relying on unique constraint in the DB to implement idempotency you are probably doing something wrong, and if you are not testing idempotent behaviour you are probably doing something wrong.
It really depends on your definition of unit. In the London school of TDD, no, a unit cannot extend across an I/O boundary. The classicist school takes a more flexible, pragmatic approach.
You mean fakes/stubs, right? Unless you're testing whether you're correctly implementing the protocol exchange with an external party, you don't need to record the API calls.
How do you test the tests that are testing your mocks? That said verifying mocks are a great help - they won't let you mock methods that don't exist on the real object.
Some mocking libraries, like the VCR library in ruby, can be turned off every now and then so you tests hit real endpoints. It is worth doing from time to time.
Bertrand Meyer had the right of it, but I had to figure this out myself before I ever saw him quoted on the subject.
Me:
Code that makes decisions has branches. Branches require combinatoric tests.
Code with external actions requires mocks.
Therefore:
Code that makes decisions and calls external systems requires combinatorics for mocks.
Bertrand, more (too?) concisely:
Separate code that makes decisions from code that acts on them.
Follow this pattern to its logical conclusions, and most of you mocks become fixtures instead. You are passing in a blob of text as an argument instead of mocking the code that reads it from the file system. You are looking at a request body instead of mocking the PUT function in the HTTP library.
The tests of external systems are much fewer, and tend to be testing the plumbing and transportation of data. If I give you a response body do you actually propagate it to the http library? And even here, spies and stubs are simpler than full mocks.
I used this strategy when developing a client library for a web socket API. It was hugely helpful. I could just include the string of the response in my tests, instead of needing a live server or even a mock server for testing. Tests were much simpler to write and faster to execute.
One would argue that you should change your string fixtures to match and verify that the new API response doesn't break anything with your existing API client. Then you change the API client and verify that all the old tests still work as expected.
Better yet is if you keep the old fixtures and the new fixtures and ensure that your API client doesn't suddenly throw errors if the API server downgrades to before the new field was added.
Yes, you should delete tests for everything that isn't a required external behavior, or a bugfix IMO.
For the edification of junior programmers who may end up reading this thread, I’m just going to come right out and say it: this is awful advice in general.
For situations where this appears to be good advice, it’s almost certainly indicative of poor testing infrastructure or poorly written tests. For instance, consider the following context from the parent comment:
Otherwise you're implicitly testing the implementation, which makes refactoring impossible.
A big smell here is if the large majority of your tests are mocked. This might mean you're testing at too fine-grained a level.
These two points are in conflict and help clarify why someone might just give up and delete their tests.
The argument for deleting tests appears to be that changing a unit’s implementation will cause you to have to rewrite a bunch of old unrelated tests anyway, making refactoring “impossible.” But indeed that’s (almost) the whole point of mocking! Mocking is one tool used for writing tests that do not vary with unrelated implementations and thus pose no problem when it comes time to refactor.
Now there is a kernel of truth about an inordinate amount of mocking being a code smell, but it’s not about unit tests that are too fine-grained but rather unit tests that aren’t fine-grained enough (trying to test across units) or just a badly designed API. I usually find that if testing my code is annoying, I should revisit how I’ve designed it.
Testing is a surprisingly subtle topic and it takes some time to develop good taste and intuition about how much mocking/stubbing is natural and how much is actually a code smell.
In conclusion, as je42 said below:
Make sure you tests run (very) fast and are stable. Then there is little cost to pay to keep them around.
The key, of course, is learning how to do that. :)
Did you ever actually refactor code with a significant test suite written under heavy mocking?
The mocking assumptions generally end up re-creating the behavior creating the ossification. Lots of tests simply mock 3 systems to test that the method calls the 3 mocked systems with the proper API -- in effect testing nothing, while baking in lower level assumptions into tests for people refactoring what actually matters.
You might personally be a wizard at designing code to be beautifully mocked, but I've come across a lot of it and most has a higher cost (in hampering refactoring, reducing readability) than benefit.
Did you ever actually refactor code with a significant test suite written under heavy mocking?
I have. The assumptions you make in your code are there whether you test them or not. Better to make them explicit. This is why TDD can be useful as a design tool. Bad designs are incredibly annoying to test. :)
For example if you have to mock 3 other things every time you test a unit, it may be a good sign that you should reconsider your design not delete all your tests.
It sounds like your argument is “software that was designed to be testable is easy to test and refactor”.
I think a lot of the gripes in the thread are coming from folks who are in the situation where it’s too late to (practically) add that feature to the codebase.
You seem to think the rationale is testing performance; but from GP it seems that the rationale is avoiding the tests ossifying implementation details against refactoring rather than protecting external behavior to support refactoring.
> Mocking is one tool used for writing tests that do not vary with unrelated implementations
What if I chose the wrong abstractions (coupling things that shouldn't be coupled and splitting things in the wrong places) and have to refactor the implementation to use different interfaces and different parts?
All the tests will be testing the old parts using the old interfaces and will all break.
The issue that takes experience here is how to determine what's a unit. "The whole program" is obviously too big. "every public method or function" is obviously too small.
Even if your code never graduates to being used by multiple teams in your project or on others, “You” can turn into “you and your mentee” anyway, if you’re playing your cards right.
Every feature of the lexer should be testable through test cases written in the syntax of the language. That includes handling of bad lexical syntax also. For instance, a malformed floating-point constant or a string literal that is not closed are testable without having to treat the lexer as a unit. It should be easy to come up with valid syntax that exercises every possible token kind, in all of its varieties.
For any token kind, it should be easy to come up with a minimal piece of syntax which includes that token.
If there is a lexical analysis case (whether a successful token extraction or an error) that is somehow not testable through the parser, then that is dead code.
The division of the processing of a language into "parser" and "lexer" is arbitrary; it's an implementation detail which has to do with the fact that lexing requires lookahead and backtracking over multiple characters (and that is easily done with buffering techniques), whereas the simplest and fastest parsing algorithms like LALR(1) have only one symbol of lookahead.
Parsers and lexers sometimes end up integrated, in that the lexer may not know what to do without information from the parser. For instance a lex-generated lexer can have states in the form of start conditions. The parser may trigger these. That means that to get into certain states of the lexer, either the parser is required, or you need a mock up of that situation: some test-only method that gets into that state.
Basically, treating the lexer part of a lexer/parser combo as public interface is rarely going to be a good idea.
For any token kind, it should be easy to come up with a minimal piece of syntax which includes that token.
There is the problem, any tests that fail in the lexer now reach down through the parser to the lexer. The test is too far away from the point of failure. I'll now spend my time trying to understand a problem that would have been obvious when the lexer was being tested directly.
>Basically, treating the lexer part of a lexer/parser combo as public interface is rarely going to be a good idea.
This is part of the original point, the parser is the public interface which is why the OP was suggesting it should be the only contact point for the tests.
Lexer/Parsers are one of the few software engineering tasks I do routinely where it's self evident that TDD is useful and the tests will remain useful afterwards.
Indeed! I recall a lexer and parser built via TDD with a test suite that specified every detail of a DSL. A few years later, both were rewritten completely from scratch while all the tests stayed the same. When we got to passing all tests, it was working exactly as before, only much more efficiently.
From that experience, I would say that in some contexts, tests shouldn't be removed unless what it's testing is no longer being used.
If you have a good answer to that, then the lexer is separate (as others said). If you don't then wirte parser tests for the lexer so that you can more easily refractor the interface between them.
There is no on right answer, only trade-offs. You need to make the right decision for you. (though I will note that there is probably a good reason parse and lex are generally separated and that probably means that the best tradeoffs for you is they are separate. But if you decide different you are not necessarily wrong)
I’ve watched this play out a few times with different teams and different code bases (eg, one team two projects).
Part of the reason existing tests lock in behavior and prevent rework/new features is that the tests are too complicated. Complicated tests were expensive to write. Expense leads to sunk cost fallacy.
I’ve watched a bunch of people pair up for a day and a half trying to rescue a bunch of big ugly tests that they could have rewritten solo and in hours if they understood them, learn nothing, and do the same thing a month later. The same people had no problem deleting simple tests and replacing them with new ones when the requirements changed.
Conclusions:
- the long term consequences of ignoring the advice of writing tests with one action and one assertion are outsized and underreported.
- change your code so the don’t need elaborate mocks
- choose a test framework that supports setup methods
- choose a framework that supports custom/third party assertions, sometimes called matchers. You won’t use this often, but when you do, you really do.
> Otherwise you're implicitly testing the implementation, which makes refactoring impossible.
Red green refactoring isn't, and shouldn't, be a goal of unit testing. Integration and E2E tests provide that. Unit tests are mostly about making sure the individual pieces works as you author them, as well as implicitly documenting the intent of those individual pieces.
If done properly, they're always quick/easy/cheap to author, and thus are throwaway. When you refactor significantly (more than the unit), you just throw them away and write new ones (at which point their only goal is for you to understand the intent of the code you were shuffling around, and making sure you're breaking what you expected to break). Delete, rewrite.
People are resistant to getting rid of unit tests when they did complex integration tests that took forever to write instead. So the tests feel like they were wasted effort. Those tests are totally valuable, in this case for things such as red green refactoring, but then yes, you have to carefully pick and choose what you're testing to avoid churn.
I would also test implementation details that are legitimately complicated and might fail in subtle ways, or where the intended behavior isn't obvious.
If I've implemented my own B+ tree, for example, you better bet your butt I'll be keeping some property tests to document and verify that it conforms to all the necessary invariants.
Tests took work to produce and provide some sort of information.
It seems foolhardy to start off a process by throwing away information which could inform it.
Not having tests which cover the implementation makes refactoring impossible if the goal of refactoring is to preserve certain salient aspects of the implementation, rather than uproot it entirely.
Why not just start refactoring first. Then see what breaks, and then decide on a case-by-cases who wins: do you keep the refactoring which broke the test, and delete the test? Or do you back out that aspect of the refactoring.
When one does this one hopefully also get a feel for what tests will be useful and which ones will be thrown out early and start writing more of the first ones.
Couple this with a well designed language and a good IDE that can do the trivial refactorings (method rename - including catching overload collisions etc) and it becomes easy to maintain tests.
you do not want to delete tests that provide insight how your unit works internally.
If you were to delete these and you happen to have a regression, you need more time to analyse the faulty external behaviour and make conclusion about how the inner parts are working to produce the behaviour.
If you didn't write the code yourself, you might be in a situation that you will never able to be fix issue fully within reasonable time.
Side-note: I have seen these problems multiple times in production, where missing tests resulted in a large and expensive engineering effort to figure out the inner-mechanics of a particular piece of code.
Make sure you tests run (very) fast and are stable. Then there is little cost to pay to keep them around.
I agree with deleting tests. But when raising this with any team I've ever worked on, I might as well have said I was going to go drop the prod database. Deleting tests, in my experience, comes with a massive stigma that I am not sure how to surmount.
If there was a bug, this bug should be replicated in the test. You then solve the bug, make sure the test (and others pass), and you'll be (relatively) sure the bug will not be reintroduced with a later change.
Every bug you find is an "edge-case" you didn't anticipate. Leave it in the test for the future. I find that the "table test" approach of Go works surprisingly well with this. You just add a case to the table, and often that's all you have to do.
Well-meaning, smart people wrote some good, general guidelines. Somehow it got turned into an industry of people who want money to tell me how I'm living my life wrong.
Agile can be summed up by making little releases which you show to the client and so the developers can make decisions on what to develop next quickly.
Everything else naturally derive from this. All the other stuff are just ways, which you can adopt or not, to accomplish that. You don't need to hire a consultant to tell you that you're not doing your poker story points meeting wrong because of metaphysical explanation.
This is an excellent point. I once had to deal with an external contract firm on a project that I was hired to fix. We had issues of production code breaking so badly that it brought down the entire server (N+1 query issue that triggered 50k queries).
The tests passed. When I emergency patched the issue and deployed it to production, the contract firm got mad at me for breaking their tests...to fix a production emergency.
It’s put me on guard again militant test ideologues with no concept of real priorities ever since.
We generally wouldn't allow that either. We've ran into cases where emergency fixes cause even more damage (e.g., the system is up, but now it's processing payments wrong), so you have to prove beyond a shadow of a doubt that the test failures are irrelevant or less bad than the current incident.
Often times it's less effort/more expedient to make the change pass tests (or update the tests) than convince all of the stakeholders that what you're about to do is safe, but the break-glass is there if needed.
Maybe you'd call this a militant test ideology, but I think it's perfectly reasonable. Systems are complex, and people can get tunnel vision during a bad outage.
The way the tests were written in that case, it was hard coded to how the work was being done and not the result produced. Both the code and the tests were bad.
Fair point. In this particular case the primary developer for them wanted to “publicly shame” me in Slack. Seemed much more ideology driven at the time.
> The tests passed. When I emergency patched the issue and deployed it to production, the contract firm got mad at me for breaking their tests...to fix a production emergency.
I get the point, but it depends on which tests failed.
Tests for unreleased features and trivial UX stuff is not the same as breaking a test making sure not every customer gets a 50% discount.
I think most disenchantment with TDD comes from the second point that you notice. If one attempts to test every method of every class one ends up testing implementation details that are very much subject to change. Also, one can easily end up testing trivial points like 'is the processor still capable of adding two integers'. As you note in your third point it seems much more productive to test properties of code that the customer could potentially recognize as something they value.
TDD really isn't dead for me. I do it pretty much every day. Both in work and in personal projects.
I am not sure talks/conversations like these are very valuable. In the end it turns out that every practical question has the answer 'it depends'. Maybe the important thing to realize is that most questions do have the answer 'it depends' and that one can never stop using ones brain.
> When people talk about "unit tests", a unit doesn't refer to the common pattern of "a single class". A unit is a piece of the software with a clear boundary. It might be a whole microservice, or a chunk of a monolith that is internally consistent.
This is a really important aspect, and I think is one of the key things that separates "journeyman" level from "master" on the subject of testing.
> The structure of the tests must not reflect the structure of the production code, because that much coupling makes the system fragile and obstructs refactoring. Rather, the structure of the tests must be independently designed so as to minimize the coupling to the production code.
The "one test class per class" or "one test file per file" approach is an extremely common anti-pattern, and it's insidious because a lot of engineers think it's the obviously correct way of writing tests.
I often find that before I've diagnosed and fixed a bug, I don't know what the best test is.
So I tend to: fix the bug; write the test; see the test fail on the old version of the code; see the test pass on the new version of the code.
Something else I'll throw in on tests. Many times I've caught a bug I would otherwise be introducing because an existing test — which wasn't written to catch my new mistake — fails.
Sometimes you’re in a hurry to fix a bug, so you write the functionality and ship it out fast after some manual testing, without an automated test reproducing the bug. That’s okay!
Write the test afterwards in some branch, and after committing the tests, make an extra commit that undoes the bug fix. Let your CI run it and confirm it fails there, then bring back the bug fix on the next commit.
That way, you can have flexibility to fix things fast, but still keep the regression test that’s proven to check for it around for the future.
Note that if you see _another_ developer taking the time to write a test when you wouldn't, it doesn't necessarily mean they are wasting time.
As I am debugging something, I need to write myself a very clear description of my hypothesis of the steps to reproduce the bug -- otherwise it is hard to see incremental progress. I work faster if I can have the machine execute those steps.
Fully agreed. I didn't intend to imply a developer writing a test is wasting time, either!
Our team has been in situations where we're highly confident of the root cause, but we know creating a test to duplicate this might take hours, if not days. It might even be a fairly finicky scenario to try and setup.
Rather than letting customers have to handle the negative consequence of our bug for hours, we'll make the change, run it through our existing test suite (to make sure we're not making yet more troubles for ourselves!), and then release it after another teammate reviews the change.
But a test certainly will help with the confidence that the right thing was changed. I would definitely encourage writing a test if it's easy for your group to handle whatever negative consequences the bug existing in the wild is producing for as long as it takes to write a test.
Why is that Ok? How does the dev know that they've not undone anything else? Also, how does the dev know that the fix is complete? Or that it caters to the defect?
This anxiety of pushing out at fix - at the risk of undoing other working functionality - ought to be addressed first. It is better (and safer) to get into the habit of writing a test to reproduce the defect and then write the fix. After all, if the fix appears trivial, then the test for the defect ought not to take too much time either. I have been able to get to this mindset with practice.
Because being dogmatic is exactly what causes people to start ignoring this sort of methodology. Pragmatism really does have to win out sometimes. Maybe you're in a hurry because the bug is causing active downtime. Some bugs really are "obvious" once they've failed and you're looking at the code. Maybe development of a proper test involves some test infrastructure work that's a larger undertaking than you have the opportunity for at the moment. Maybe you have a solid manual/QA testing system behind you, allowing you at least temporary assurance that your fix is valid.
No team does everything perfectly all the time, and that's fine. The real question is what gets done about it afterward: is the technical debt that you've incurred paid down in a reasonable time frame?
This nicely encapsulates what I was hoping to say, but didn't take the time to write out. Thank you!
> Maybe development of a proper test involves some test infrastructure work that's a larger undertaking than you have the opportunity for at the moment.
This is something I've encountered many times.
> Maybe you have a solid manual/QA testing system behind you, allowing you at least temporary assurance that your fix is valid.
I would hope most people are doing this anyway, especially when a big production bug has been found.
I've fixed plenty of bugs which I could not reproduce and thus could not test, simply based on the source code and description of problem from customer. Based on the symptoms it was clear what the code must be doing, studying the code reveals it is clearly wrong, and so the fix was obvious.
So make a patch, make a new build, send to customer and get a call "yeah that fixed it, thanks!".
Sometimes the issue is I simply don't have time to reproduce it, like that time a blocking bug had to be fixed within 30 minutes, or the customer would have had to charter their own helicopter to get a few packages to an offshore oil platform, rather than piggy-back on the worker transport heli.
Other times it's some combination of the customers system it's running on and configuration of our software that I can't reproduce.
Not saying it's ideal, but it's quite possible to successfully fix issues without being able to reproduce and test.
I once fixed a bug like that, then went wait a minute. Sure enough source control revealed I'd made exactly the same fix 2 years ago, and 2 years before that someone else had done exactly the same. In the odd years someone else undid that fix to fix a different bug that seemed unrelated. Once I figured out what the other problem was I was able to find the more complex fix for both situations.
Yeah I always check source control (blame/annotate), even if I wrote the code myself, just to be sure I'm not missing some context.
Automated tests is pretty great, but a lot of the stuff we do is difficult to test, mostly due to a lot of legacy code that's not well confined. As we work on a piece of code we try to clean that part up, but it takes time.
>Why is that Ok? How does the dev know that they've not undone anything else?
If the bug is "I thought we were supposed to have a 10 minute timeout and we accidentally set the timeout to 10 seconds" it's pretty screamingly obvious that if you change the time from "10" to "600" the problem is now solved and you haven't broken anything else.
Religiously applying this rule of thumb as you describe causes ALL sorts of problems including the problem of people writing tests for the above kind of behavior. That test will fail when it's changed from 600 to 1800 deliberately and that will create a pointless waste of time for everybody.
Yeah, and then you find out that that constant was also used in another piece of code as number of minutes, so you've just changed another timeout in the system from 10 minutes to 10 hours.
Yes, of course that constant should never have been a naked integer in the first place, but we live in an imperfect world. One thing I like about Go's standard library is that almost everything takes time.Duration instead of a plain integer that's then interpreted to be milliseconds (or micro/nanoseconds in the most unexpected place, gotcha, developer, you should've read the docs!)
> This anxiety of pushing out at fix - at the risk of undoing other working functionality - ought to be addressed first.
It really depends on how urgent the fix is. Naturally you want the fix to be as isolated as possible so it does not regress other functionality. But even non-urgent fixes can sometimes benefit from getting pushed out following quick manual testing. For example we use Sentry.io for capturing errors in our applications and LogDNA for logging.
Occassionally we'll encounter some kind of edge case where we see a spike in logged errors which blows us past our Sentry and LogDNA quotas. Pushing a fix out before the test is written can be beneficial in cases like this although yes, it's worth avoiding if possible.
Sounds like I could have been more thorough in describing why I think "that's okay."
> How does the dev know that they've not undone anything else?
I didn't write it, but I was assuming a scenario where a system already has a comprehensive automated test suite. If you have most functionality under test, then hopefully you're pretty confident that it won't undo anything else.
> Also, how does the dev know that the fix is complete?
The same way a dev knows if one single automated test that addresses the one known failure scenario is a complete fix.
In other words: you don't know. You keep doing some manual testing, and watching whatever logs or status indicators, to see if things go back to normal after deployment.
> Or that it caters to the defect?
I also didn't write this, but I had in mind some level of manual testing before deploying the code change to ensure it caters to the defect.
> Why is that Ok?
Hopefully my answers above help explain why I said that's okay.
I'm not advocating for flippantly shipping code without having a variety of other guard rails in place; I was primarily talking about when a bug has really bad consequences for users, your team is confident that writing a test is going to take a long time, and you do some level of manual testing to confirm all seems generally well.
We tend to have a web focus on this site but not all IT is like that.
If I have a data processing job that takes 3 hours and it fell over at the 1 hour mark (and let's say whoever wrote it didn't have the foresight to make it resume neatly, because that's added complexity that never got budgeted), I'm going to fix the obvious bug and kick off reprocessing immediately. Possibly after some messing around in a REPL to confirm how the code acts.
While it's going, I can then do some manual tests and sanity checks and cancel/restart if necessary - but if not, I've gained a lot of time.
Nope. With experience comes wisdom. I've had the situation happen when I was sure I knew what the bug was. I wrote a test that I expected to fail—and it passed. My diagnosis was incorrect and the bug was somewhere else. Had I not done this, I might have not only not fixed the bug, I would have likely introduced new bugs.
The effort put into manual tests can be committed directly to adding a test case. Furthermore, a fix should be accepted only when a full test suite with the new tests pass. Such manual testing, depending on what the issue was, may result in a regression in other parts of software. Just a thought...
I agree with the last part, but the ordering isn’t crucial. For example you can stash your changes in the implementation to check that the test fails when the change is not present and passes when it is. For this (among other reasons) it is nice to have a test runner that re-runs when it detects file changes.
The ordering is one part of TDD that always bugged me- you have to write the tests first. But I often prefer to experiment and try a couple approaches before deciding on one. Having tests first would make a lot more overhead for that way of working.
To be fair though, if your tests are at the right abstraction level, the specific approach you are choosing for the implementation shouldn't matter for the test.
Writing the test first also forces you to think abouy what API do you actually want to expose. Once you got the API right, there is still room for experimenting.
In this approach you decide on the abstraction (i.e., the API) by writing example code (i.e., some tests) that uses the API. The tests are how you decide which abstraction seems to make the most sense.
It sounds like you actually implement the abstraction to decide whether it seems like the right one, which is a lot more work.
Yes, my position is that tests as client don't really tell you the truth about the abstraction because they don't represent a real usage of it.
It is better to write tests for code when you know what it is and what it should do. Tests also introduce a drag on changing strategies- if the choice you made when you wrote them is now not necessarily the optimal one, you must now change your tests or convince yourself that you were actually right the first time.
If people like to work this way then great, I'm just explaining why for me it feels bad and runs counter to my instincts.
I think I understand what you mean. At the same time though, one crucial takeaway for me from Ian's talk is that my tests might be on a too small scale if they are not useful whilst I am changing the implementation strategy.
For example, I found it useful to ditch concepts like the testing pyramid and focus on writing e2e tests for my HTTP API instead of trying to cover everything with module or function level tests. That makes it much less likely that they need to change during refactorings and hence provide more value.
I generally think that "What is going to break this test?" is a really powerful question to ask to evaluate how good it is. Any answer apart from "a change in requirements" could be a hint that something is odd about the software design. But to ask this question, I need to write the test first or at least think about what kind of test I would write. At some point, writing the actual test might be obsolete as just thinking about it makes you realize a flaw in the design.
Other interesting questions I like to ask myself are: "How much risk is this test eliminating?" and "How costly is it to write and maintain?"
In reality I tend to do both: write example client code to think through the abstraction (some call this “README-driven development”) and then write tests once the implementation is under way. Though you can get the first as a side effect of the second, I find that good tests aren’t really good example code (too fragmented, focus on edge cases, etc.).
“TDD, much like scrum, got corrupted by the "Agile Consulting Industry". Sticking to the original principles as laid out by Kent Beck results in fairly sane practices.”
Totally agree. Somehow every good idea gets converted to a rigid ideology after a while. Same for OOP. It’s a solid idea but then the ideologues pushed it way too far. And instead of dialing back a little we see other ideologues declare “X is dead“ and the pendulum swings into another extreme direction.
My company is generally behind the curve so now people have been bitten by the REST, JSON, Microservice bug. They don’t know why or what it really is but things have to be done that way. That together with calling themselves “agile” without understanding what it means besides using JIRA and having fixed sprints.
> My company is generally behind the curve so now people have been bitten by the REST, JSON, Microservice bug. They don’t know why or what it really is but things have to be done that way.
This resonates with me. My first job out of college was with a big, very old insurance company. My team lead became obsessed with using microservices for some reason, even though we were only building internal web apps that would have about 1,000 users on a busy day. There would be no performance concerns whatsoever that would warrant "breaking up a monolith" to make it more scalable. But microservices were a great way for the team to feel like we were using trendy tech despite not having any idea how to really go about doing it or any particular reason for doing so.
> When people talk about "unit tests", a unit doesn't refer to the common pattern of "a single class". A unit is a piece of the software with a clear boundary. It might be a whole microservice, or a chunk of a monolith that is internally consistent.
Every example I've read pertaining to unit testing uses a function as the unit to test. The easiest functions to test are ones that don't have side-effects (network, I/O, disk, etc). Could you point me to an example where a unit test applies to something beyond a function?
This bugs me as well. The people who argue about whether unit tests works will often redefine "unit" to mean anything from "the entire application" to a single function.
Colloquially, however, anything above the level of a self contained function or class is called something else - typically an integration test.
>> When people talk about "unit tests", a unit doesn't refer to the common pattern of "a single class". A unit is a piece of the software with a clear boundary. It might be a whole microservice, or a chunk of a monolith that is internally consistent.
Its OK to dislike unit testing, but please don't redefine the term to avoid it. That's not helpful. Instead try to find the papers (by NASA or IBM?) That show unit testing finds only very few actual bugs, making it low value.
That said, there are IMHO some units worth testing more.
They aren't redefining it though. The term has always been fuzzy. A common boundary of unit testing has always been a module's publicly exposed interface.
Wasn't the "clear boundary" definition the original, that was later interpreted as syntactic boundary (function/class) instead of semantic boundary (a chunk of business logic)?
It's worth noting that the greater the granularity of your unit, the more likely you will be able to write tests targeted at bugs.
For instance, if you are only testing through the API of the service, you may have a hard-to-impossible time confirming you recover gracefully from certain exceptions. You generally don't have service-level APIs to throw various exceptions intentionally.
Point being, the overzealous testers do have some good points, even if they miss the forest for the trees.
The bug triggered the exception. The test case encapsulates reproducing the bug. The bug is fixed. The bug can no longer be reproduced. As long as the bug remains fixed the test passes..
Another way of thinking about it. Unless your exceptions are a documented part of your API no one cares about them - they only care about the outcome they actually expect. If you construct tests that pass for positive outcomes or fail for any other outcome then your exceptions remain implementation details.
I think GP is referring to nondeterministic exceptions. For instance, if the service under test depends on some other service, then you may need to test the scenario where the other service is unavailable. The exception is not triggered by a bug, it is triggered by an operational externality.
For networking related problems you can deterministically control failures from the test using something like Toxiproxy. This can be especially useful if you’re working out a particular bug (e.g. gracefully handling a split brain situation or something).
A more general approach would be to just run your happy path tests while wrecking the environment (e.g. randomly killing instances, adding latency, dropping packets, whatever).
I’ve found that the latter often uncovers problems that you can use the former to solve.
Testing these sort of things with unit tests can work, but I’m more confident in tests that run on a ‘real’ networking stack, instead of e.g. mocking a socket timeout exception.
Imagine I am implementing a service that queries 20 separate MySQL database servers to generate a report. (I'm not saying this is a good architecture, it's merely to illustrate the point.) I know that sometimes one of the MySQL instances might be down, e.g. due to a hardware failure. When this happens, my service is supposed to return data from the other 19 databases, along with a warning message indicating that the data is incomplete.
I would like to write a test to verify that my code properly handles the case where one of the MySQL instances has experienced a hardware failure. The point is that I can't do this as a strict black-box test where I merely issue calls to my service's public API.
[edit] And of course "testing the externalities" doesn't help here. I can test the MySQL instances and verify that they are all running, but that doesn't remove the need for my code to handle the possibility that at some point one of them goes down.
Second. You've done this/someone else has done this and now you need to maintain it (we've all been there!). In this case my original post holds. Your test suite mocks the databases for unit tests anyway right? So write some test(s) checking that when the various databases are down appropriate responses are given by your service.
Yeah, sometimes for practical reasons you don't want to, or can't test directly across the API, as good testing practice would dictate.
Taken to the extreme, the philosophy I laid out leads to something that looks like "only integration and end-to-end tests" depending on your architecture.
So I try to be pragmatic whenever possible, but I think leaning towards BDD works better, after 18 months of doing it.
Corrupted is not the right way to describe it. Both TDD and agile provide something amazing to management: a way to make the black hole called "software engineering" into something tangible and quantifiable. This ofcourse also makes it possible to execute some bad management practices on software engineering as well. People like to complain about agile (and apparently TDD) but I would argue that there are also huge success stories that don't make it to HN.
Let's keep in mind this Fowler post has no science in it. It's just some "lauded practitioners'" view of TDD. Our industry is driven by this kind of discourse. There's very little good scientific research to answer questions like these in software in general. How much is any good on TDD? One paper? Two?
Replying to myself (facepalm): just to make sure people know what I think. I've been doing TDD since 2002. I'm of the "You will take this out of my cold, dead hands" variety on the usefulness of TDD. That doesn't mean that there's any science that proves it.
> - When people talk about "unit tests", a unit doesn't refer to the common pattern of "a single class". A unit is a piece of the software with a clear boundary. It might be a whole microservice, or a chunk of a monolith that is internally consistent.
I don't understand how this matches to the idea of "don't write one line of code unless it's necessary to make a failing test to pass".
All of the design advantages I've ever heard advertised for TDD come from writing a bit of test, a bit of code, a bit more test, a bit more code.
If instead you are writing a few tests and then an entire module, you're doing test-first development, but it's definitely not test-driven development as I've ever seen it presented by proponents.
You need to write a module. You write tests that tests the functionality of the module. But to pass those tests you need a couple of classes. You write tests for the classes (or at least the one you intend to work on next). The class needs some methods, you write tests for those. You write code for the methods until all methods tests pass. Hopefully the tests for the class pass, otherwise you might need to update methods and tests for those. Unt so weiter.
I used to be a big proponent of this, until I suggested this to my manager: "We ran the stats on our bugtracker, and bugs coming back are really rare, so we like to focus our effort on testing with a higher ROI".
Yeah, I never understood this safeguarding against bugs resurfacing after being fixed once. I only saw bugs coming back at a company that didn't use version control and instead copied source code back and forth with a USB stick.
I can understand something like test driven bug fixing, where you basically create a simple test to reproduce the bug quickly and then fix the bug using that. In many cases that is the most efficient workflow.
The test succeeding can then serve as evidence of the bugfix (though it might not be enough). So if you have already written the test, you might as well leave it in there, because it doesn't bother anyone usually, and the tiny chance that someone breaks this exact same thing again, while tiny isn't non-existant.
But fixing a bug and then putting extra work just for a test, if there is another easier way to prove that the bug is fixed? No, thanks.
> Yeah, I never understood this safeguarding against bugs resurfacing after being fixed once.
In my experience it's not infrequent for bugs to unknowingly only get half-fixed, not realizing that the true problem actually lies a level deeper, or has a mirror case, or whatever. Maybe a good example is that a parameter to a command is 0, the bugfix sets it to be 1, but a later bugfix changes it back to 0, when the correct bugfix would set it to be 0 in some cases and 1 in others.
And that if you fix the bug without a test, then the second related bug crops up a couple months later, and somebody else tries to fix it similarly naively, and can wind up re-introducing the first bug if there isn't a test for it.
Basically, in practice bugs have this nasty habit of clustering and affecting each other -- if the code was trickier than usual to write in the first place, it's going to be trickier than usual to fix, and more likely than usual to continue to have problems.
So keeping tests for fixed bugs is kind of like applying extra armor where your tank gets hit -- statistically, it's going to pay off.
I have a perfect real-world example. About four years ago some of my code broke in certain cases. I came up with a fix that relied on a case-sensitive regex to check for those cases. I think I made it case-sensitive because I wanted to make sure it didn't trigger accidentally on something added in the future. And these case names had never changed, right?
Yep, now that I've spelled it out, what happened is obvious. Three years later, I got ordered to change one letter in these case names from lower case to upper case. Of course I didn't remember that I'd used a case-sensitive test against the names three years before. And bam, the bug was back, and as there was no test for it, I shipped code with the bug.
The good news is the bug was obvious as soon as the customers tried to compile my code, so it didn't cause any harm but embarrassment on my part. Even so, it took me a while to track down what was going on. Imagine my shock when I got into the code and found the fix I thought I needed to make was already there... but itself needed to be fixed!
Avoiding too many layers of useless indirection is not hard: avoid refactoring effort that looks like a line by line, linear normalization of syntax.
Target tests and refactor for the confusing, incomplete chunks according to the team if there is one.
Semantic understanding of the system will improve. Instead of fetishizing code patterns, fetishize systemic understanding.
Personally, that habit has made it so I write better code from the start. It’s acted like a forcing function to reconsider if a habit is useful or just a habit.
My code went from deep OOP hierarchies of indirection, to composable, more functional, chunks. I import less, define fewer objects to begin with, compute a larger variety of useful objects, and can pull together features faster.
Have standardized machine? Nursing my own symbol library is where most of the fun is. With respect to Martin Fowler and the rest, who are great engineers in their own right, but this all smells like pandering to efficiency, importing a shared model, which impacts resiliency.
We shouldn’t have people think within the same context box everyday. Software philosophy has been taken over by the equivalent of popular bean counters. Focused on minimize the idea space for the perception of productivity gains. It’s cognitive indirection, IMO
What I got out of it was that tests for regressions are really good, but there are lots of considerations to make when determining which other tests to write, and why you are doing it. A good read nevertheless.
> When people talk about "unit tests", a unit doesn't refer to the common pattern of "a single class". A unit is a piece of the software with a clear boundary. It might be a whole microservice, or a chunk of a monolith that is internally consistent.
Yes, this. However, when we go in the direction of testing a complete service, is it not more convenient to call it an integration test?
My personal view is that it's an integration test only if the test actively involves external components (like a real database, some other api, etc..). I don't use integration test terminology if all those external components are mocked/stubbed out (even if it encompasses a large "unit").
I also call those mocked surface area tests "functional" tests, partially to differentiate them from non-mocked integration tests, and partially because people get too hung up on the "unit" term.
Often it's convenient to test the functions in your database access layer against a real or convincingly simulated database (e.g. sqlite). This is "like a unit test" in that it focuses on the details of individual functions, but "like an integration test" in that it crosses a system boundary rather than mocking it. I've not found it productive to use either term when talking about it.
The parent above you didn't necessarily imply that the whole microservice was being integrated before testing it. It is reasonable that a microservice can exist as whole, but nonintegrated with its other deliverable components, and remains sufficiently unit-testable.
It's also extremely likely that the microservice needs to be integrated to test it :)
Test coverage is the most conveniently accessible numeric value in the neighborhood of software quality, therefore it is software quality. Merging a change that reduces test coverage is reducing software quality. That's okay, sometimes we all have to cut corners, but defending this choice means you don't value quality, and are therefore not a culture fit.
This is a rampant problem in the enterprise world, and it drives me nuts. I regularly have to work for clients who mandate using Sonar Qube (or/and other SAST tools) with strict policies, and also require 85%-100% test coverage on all projects regardless of how much sense it makes.
Predictably, teams have to spend way too much energy getting "waivers" approved by some ridiculous group, and inevitably end up creating tests that don't actually test anything, just to get the coverage figures up.
Error handling taking up 50% of the code is definitely a problem with Go itself. For each meaningful line, there's an accompanying "if err != nil {return err}", so if you want coverage you end up testing this kind of boilerplate.
I've lost so much time making these arguments with people. Unfortunately the combination of dogma + an industry of sham consultants who have monetized that have created a monster.
>As such, a good trigger rule is "one test per desired external behavior of the unit, plus one test per bug fixed". The test for each bugfix comes from experience -- they delineate tricky parts of your unit and enforce workign code around them.
So much this. The smoothest integration I've ever worked on was one where I owned the API and back-end and another team built the front end. I defined the API behaviors, built the tests verify those behaviors, and created a mock API for the front-end folks to use while testing.
Over the course of building the API, I slowly got my test success rate from 0% to 100% and always immediately knew when I accidentally changed behaviors. When we finally integrated, there were literally 0 errors related to the client/server interface. Yes, there was some UI wonkiness and we discovered some issues of scale, but there were no issues with parameter values, HTTP response codes, error messages, etc. It was a) amazing and b) the only time I've had the luxury to build something in that manner.
The main issue apart from the "Consulting Industry" is that too many people try to follow a methodology to the letter, but superficially, without trying to understand the point and deeper meaning.
That's why people argue about rule or whether this or that can or cannot be done instead of trying to understand the aim first and then the cost/benefit balance.
Which is a problem when you see something that seems like it should help so you hire a consultant who can push the letter of the rules but doesn't understand it well enough to figure out how it should work in your organization.
I wasted a lot of time trying to get acceptance tests in here, they seem like a good idea but I can't get traction on them and the consultants preached the rules not how or why so the rules got roboticly followed with no helpful results. I'm still not sure if the concept is flawed or not just the execution but we threw it out.
Yes this is the best talk so far. Right now I am in a company that has strict class-method testing strategy. This makes refactoring a pain in the A and decreases the programmer time / quality insured by the test ratio.
In "unit testing", the unit doesn't mean "the smallest thing we test which we decided is a class even if doesn't make sense"; it simply means "on its own" so external dependencies are not in the way.
My heuristic for determining how much a test is worth writing is something like this:
cost of number of outbound dependence * human understanding complexity + non-immutability + how much code coverage it adds.
This leads to a grouping of tests and code that naturally fit together.
>- TDD, much like scrum, got corrupted by the "Agile Consulting Industry". Sticking to the original principles as laid out by Kent Beck results in fairly sane practices.
Trying to explain this to people just gets exhausting after a while.
It would be great if it worked this way. What I see instead is that managers are complaining if coverage is less than 80% and you have to write tooling to compensate for generated code and all that crap. It is the 10th circle of hell.
> What triggers writing a test is what matters. Overzealous testers test for each new public method in a class.
I have seen this before. At one company I forced by a couple of other developers to write tests for accessors(get/set) on model classes. They would reject my PR if I didn't do it. To leadership it looked like I was against automated testing.
To me its more important to write tests for places where your most likely to make mistakes. For example calculations or business logic or how the view renders your model as it changes. Not every single small method.
But how can I measure that with a red/yellow/green stoplight chart based on code coverage? I’m actually supposed to trust that the developers will do the right thing?!
Sorry. How is this hard? You test a function you write and write it down. Run the test. Why is this on Youtube? This is nuts - this isn't an existential developer crisis.
Couple of notes:
- TDD, much like scrum, got corrupted by the "Agile Consulting Industry". Sticking to the original principles as laid out by Kent Beck results in fairly sane practices.
- When people talk about "unit tests", a unit doesn't refer to the common pattern of "a single class". A unit is a piece of the software with a clear boundary. It might be a whole microservice, or a chunk of a monolith that is internally consistent.
- What triggers writing a test is what matters. Overzealous testers test for each new public method in a class. This leads to testing on implementation details of the actual unit, because most classes are only consumed within a single unit.
- Behavior driven testing makes most sense to decide what needs tests. If it's required behavior to the people across the boundary, it needs a test. Otherwise, tests may be extraneous or even harmful.
- As such, a good trigger rule is "one test per desired external behavior of the unit, plus one test per bug fixed". The test for each bugfix comes from experience -- they delineate tricky parts of your unit and enforce workign code around them.