Do-nothing scripting: the key to gradual automation (2019)

sovietmudkipz · on Nov 2, 2021

I did this after seeing this before on HN. There were a few processes that were manual that benefitted from the technique in the article.

Learn from my folly: I even called them “do nothing scripts,” referencing this article. However; I was judged by peers for not writing the full automation versions as they didn’t appreciate the idea of gradual automation (programmer hubris?). Saying “do nothing scripts” in meetings did catch the awkward attention of leadership.

As a description, “do nothing” communicates a lot. As a brand, “do nothing” can use some improvements.

My short prescription of turning “do nothings” into “do some things” into “do all the things” didn’t help. We had some new people join the team and they had fun turning the do nothing scripts into a document. * Sigh *

I still build these type of process description scripts still. I usually don’t advertise them to peers until they do some of the things nowadays.

jjk166 · on Nov 2, 2021

I like the term "scaffold". It intuitively conveys that this is something quickly and easily started and which can stand on its own, but at the same time is fundamentally incomplete and a stepping stone towards something more permanent of greater value.

abalaji · on Nov 2, 2021

"scaffold scripts" that actually has a nice ring to it and is alliterative which makes it more memorable

bredren · on Nov 2, 2021

How about IterScripts. As in, scripts you iterate on.

tomrod · on Nov 3, 2021

"Iteratively Inclusive Scaffolding Scripts"

genewitch · on Nov 3, 2021

- as a Service

martin_a · on Nov 3, 2021

IISSaaS it is then.

tomrod · on Nov 3, 2021

Careful. We may spawn slithering Elder Gods.

a9h74j · on Nov 2, 2021

Next time try MVS -- Minimum Viable Script.

Actually, other applicable terms could be: a) programmed documentation; b) script as single-source of procedural truth

cutemonster · on Nov 4, 2021

"Print script" maybe?

Short for "print commands to run manually" script

scrooched_moose · on Nov 2, 2021

Something like "Checklist Script", as that's what they really provide?

BariumBlue · on Nov 2, 2021

"Process Automation" would be a good, encompassing term I think.

Even if there's no code being run, having all the steps down and easy to follow is a way to make the process "automatic". The term is extendible to completely automated computer code, while including non-computer-code (human) automation.

simonw · on Nov 2, 2021

How about "scripted playbook"?

bredren · on Nov 2, 2021

Too easily confused w Ansible

andrewflnr · on Nov 2, 2021

"Runnable checklist"

randomswede · on Nov 3, 2021

I ended up landing on "runnable plan".

nmca · on Nov 3, 2021

checkscript

qwertox · on Nov 2, 2021

I agree with this. It's like a Todo App but a Checklist Script.

dqv · on Nov 2, 2021

>My short prescription of turning “do nothings” into “do some things” into “do all the things” didn’t help. We had some new people join the team and they had fun turning the do nothing scripts into a document. * Sigh *

You know, it might be useful to do a little bit of extra work and turn it into its own exportable documentation so that people can't do that and have to rely on the script to get the documentation for it.

    donothing.py export

sovietmudkipz · on Nov 3, 2021

Yes this is also important. A complaint the new folks had for the “do nothing” scripts is the ability to browse all of the process without the script.

I’ve added this logic afterwards.

bradwood · on Nov 4, 2021

This. Make it spit out a markdown file which means your docs are always up to date.

thinkyfish · on Nov 3, 2021

I think they should be called Runbook Scripts. They basically walk you through a procedure with a text interface. I bet they would work even better as a slide deck.

dflock · on Nov 2, 2021

It's a Run Book Script.

adrianmonk · on Nov 2, 2021

How about "immediate placeholder scripts"?

"Immediate" is meant to accentuate that they can be deployed and start delivering value right now, something that (usually) can't be said for the implementation of actual automation.

And "placeholder" is meant to convey that you don't intend to stop there.

bryan_w · on Nov 2, 2021

If you have a ticketing system that has an API, change the print() statements into file_ticket_and_wait()

ByThyGrace · on Nov 3, 2021

What do nothing scripts automate is on the human side, particularly the shift of focus. It's perhaps correct to call them along the lines of "thought automation" or "decision automation" scripts.

lumannnn · on Nov 2, 2021

Maybe "<name>.playbook.sh" could give it a better look?

mro_name · on Nov 2, 2021

it's a running checklist, isn't it? Awesome.

hoosieree · on Nov 3, 2021

"Dynamic Documentation" has the word "dynamic" in it (cf. dynamic programming) and still descriptive.

nyanpasu64 · on Nov 3, 2021

Pet peeve: historical reasons aside, "inductive programming" (as in mathematical induction) is a more accurate/descriptive name than "dynamic programming".

gugagore · on Nov 4, 2021

You might as well change programming, too, since it is a very operations-research-y term:

. (Programming in this context does not refer to computer programming, but comes from the use of program by the United States military to refer to proposed training and logistics schedules, which were the problems Dantzig studied at that time.)

barosl · on Nov 3, 2021

I think "script as an instructor" might work, as I view this approach as that.

colmanhumphrey · on Nov 2, 2021

Maybe "Gradual Scripting" could work

mamcx · on Nov 2, 2021

Stepwise to-dos

throwaway20371 · on Nov 2, 2021

You could call it an "E-I Script" for Efficiency Interest Script. Over time your costs are gradually lowered as each step is automated - like accruing interest in a savings account.

genewitch · on Nov 3, 2021

ls -l

Let's see, e i... e i... Oh! There it is!

phaedrus · on Nov 2, 2021

My experience in corporate IT convinced me that even if an entire release could be automated down to a single press of a giant red button (and even if undoing the release could be another, single button), then culturally our organization would still figure out a way to turn the pressing of that button into a six hour ordeal requiring the participation of twelve people.

oscribinn · on Nov 2, 2021

My empirical evidence backs this up, I had deploying new releases to all servers completely automated down to just creating a tag in GitLab and every time the decision of actually running it turned into a series of emails and meetings throughout the week.

wing-_-nuts · on Nov 2, 2021

This is so true it hurts.

catern · on Nov 2, 2021

The idea here can be generalized to a primitive programming construct: The magic function "wish", which can do anything you want, just give it a string description.

For example:

    wish('copy these files to this host', files, host)

You can augment this further by allowing the program to specify a return type from the wish, and magically the wish will produce the type you want. So for example:

    data = wish(Path, 'the path of the data files I need')

Of course, the wish is actually implemented by sending a request to some human user. In my "wish" library for Python http://rsyscall.org/wish/ you can have a stack of handlers for wishes, just like exception handlers. If later you want to automate some wish, you can specify a handler which intercepts certain wishes and forwards the other ones on (by wishing again, just like an exception handler can re-raise).

bowmessage · on Nov 2, 2021

No offense, but this seems like unnecessary indirection harmful to script legibility, e.g. it would help me as a reader to see, instead, `scp.copy(files, host)`, instead of having to browse through a list of wish handlers to see which one implements file copy.

catern · on Nov 3, 2021

Yeah I agree, you'd only want to automate it that way if you couldn't modify the original script, which is kind of a niche use case... anyway I just think it's neat

Too · on Nov 3, 2021

Similar to behavior driven testing. For example behave framework.

It works exactly like wish does above but creates a whole language out of it.

For a human it reads very natural but understanding which combination of human words that map to underlying handlers is a hopeless and unnecessary level of indirection, and is why anyone shouldn’t touch these tools with a ten foot pole.

ilovetux · on Nov 2, 2021

I like your idea. Im gonna try to use your project soon, but due to my companies policies ill probably have to slowly implement your ideas. If it comes to that I might rename your "wish"es to "requirement"s as that would fit with my company's culture a bit more.

sullyj3 · on Nov 3, 2021

Side note: - I don't understand why classes are needed here. instead of

    procedure = [
        CreateSSHKeypairStep(),
        GitCommitStep(),
        WaitForBuildStep(),
        RetrieveUserEmailStep(),
        SendPrivateKeyStep(),
    ]
    for step in procedure:
        step.run(context)

Why not make them functions instead of classes. The URLs can be locals instead of class variables and the context argument becomes an argument to the function rather than a run method.

    procedure = [
        createSSHKeypairStep,
        gitCommitStep,
        waitForBuildStep,
        retrieveUserEmailStep,
        sendPrivateKeyStep,
    ]
    for step in procedure:
        step(context)

The OO is just gratuitous. Personally I probably wouldn't even bother with the for loop/context and would just pass the data that's actually needed by each function into it manually.

    createSSHKeypairStep(username)
    gitCommitStep(username)
    waitForBuildStep()
    email = retrieveUserEmailStep(username)
    sendPrivateKeyStep(email)

I get that it feels more modular - "oh all I need to do to add a step is add it to the list!" But like, all you need to do to add a step is add a line of code. There's no difference.

Much easier to see how the data is actually flowing that way. I found the fact that the context was mutated in a function it was passed into kind of gross. It's just globals with extra steps. Besides, this is less LOC anyway.

makapuf · on Nov 3, 2021

Like someone said very wisely in a python talk*, a class with two functions,one of which is init, is a function.

* https://pyvideo.org/pycon-us-2012/stop-writing-classes.html

snthpy · on Nov 3, 2021

I agree with you. I currently have a similar process where I jot down the steps in README.md or INSTRUCTIONS.md files. The script is a bit better in that it forces you to do the steps in order and you're less likely to skip a step by accident.

However if I'm going to start creating scripts like this then I think all the OO/Class boilerplate overhead would make me reluctant to do it. For me using functions with docstrings keeps more of the simplicity of the markdown approach while still allowing for gradual automation.

Here's a simple example:

https://gist.github.com/snth/c5c5a1236dd8ddf91973aed77d66cd9...

dullcrisp · on Nov 3, 2021

I assume the point is to let any one of the steps turn into something more complex and automated, as they say, without affecting the rest of the script. If each step is its own class, that gives you a place to go in and expand any one of them without worrying about conflicting with unrelated changes.

If the idea were to leave the script as it is, then of course your way is simpler.

sullyj3 · on Nov 3, 2021

Seems like premature abstraction to me. It's not difficult to make that change if you need it. In any case, I'm really struggling to envisage a scenario where you would need them to be classes - why couldn't a more automatic version still be a function?

In my opinion, in order to justify instantiating an object, it should actually contain data, and that data should actually be accessed by code other than methods internal to that class. This rules out using an object to represent a procedure - regular functions subsume that functionality.

Before first class functions became commonplace, it did make sense to use objects as procedures in some OO languages, to allow you to pass them around as values. Since most languages these days do have first class functions, this has become less necessary.

If you want a class in order to to split the procedure into multiple methods with access to shared mutable state, that's totally fine - you can just use nested functions to do the same thing.

geysersam · on Nov 3, 2021

I see what you mean. But in Python a function can be turned into a full fledged callable class at a later point without any change to the rest of the script.

Not that it matters much either way.

gitgud · on Nov 3, 2021

Devil's advocate; each class in the array has a consistent interface, making it easier to do common things every step like measurements and logging during iteration etc.

But I agree with you, it's simpler to just call the functions only giving them what they need. This prevents them from implicitly depending on context and makes reasoning about the code much easier....

xhevahir · on Nov 4, 2021

FWIW the author of the post apparently uses Go for this: https://github.com/danslimmon/donothing

Juliate · on Nov 3, 2021

You might want/need to decompose each step in a set of contained smaller behaviors.

This is a pattern easier to represent/embed within objects. But you might as well use a library of prefixed functions too.

tossaway9000 · on Nov 2, 2021

> Create an SSH key pair for the user.

> Send the user their private key via 1Password.

Why are you generating *private keys* for users, then sharing them? Not that this impacts the automation bits but IMHO users should known how to generate and maintain a key pair, and send you the public key.

renewiltord · on Nov 2, 2021

Did it all the time. Users weren’t sophisticated. They want a tool they can put a user/pass into and then upload and download. They have maybe some conditions from IT (SFTP okay, S3 okay, encrypted at rest, whatever).

We did this trivially with S3, our implementation guy gives them an access key ID and secret access key, tells them to install Cyberduck, and gives them a URL to paste. We’re off to the races.

Having the user generate the thing will turn going live from hours to days.

I’ve also done this analogously with SFTP. You keep the creds so you can help them because they’ll type it in wrong, their software will fuck it up, whatever.

iso1631 · on Nov 2, 2021

Agree, it's painful, and it really takes me out of the larger point they are trying to make

A workflow should be be

    User creates an SSH key pair, the private key never leaving the computer
    User sends public key to authoriser
    Authoriser pushes public key to Git (presumably an email + key touple?)
    Wait for the build job to finish (not sure what this does)
    Build process sends the email saying "You can now use your key"

Same with say a wireguard key. Or an SSL certificate.

I think the larger point is that you just have step by step instructions, and thus dont need to catch edge cases, but it also makes it harder to avoid skipping a step, which seems reasonable (I do this myself in some areas)

hollasch · on Nov 5, 2021

Not to weigh in on this particular issue, but I think your objection here is another piece of evidence weighing in the favor of this overall approach. It's perfect to have this debate immediately and so early in the process, before any time is spent in detailed implementation.

DantesKite · on Nov 2, 2021

tossaway9000 · on Nov 2, 2021

It's not a "private" key if its shared. Only the end user needs to know the private key details.

Consider the usecase of generating a user their initial password for a service, this almost always results in the user needing to immediately reset their password after initial login, this doesn't happen if someone is generating both parts of the key pair for you.

My sysadmin doesn't need to know my password, and doesn't need to know my private key or passphrase, that would allow them to impersonate me.

ghostly_s · on Nov 2, 2021

well, your sysadmin created your AD account, and probably has some means of root access to your box too, so they could already impersonate you. *

*this may or may not be true depending on how Enterprise-y your workplace is.

chousuke · on Nov 2, 2021

The difference is in the kind of audit trail it leaves. If a sysadmin impersonates a user, that leaves a different kind of trail than a user logging in with their own key that only they can access.

In principle, the sysadmin should explicitly avoid knowing any of the user's secrets, because if they do, the user can shift blame onto them: "It wasn't me, it was the admin!".

I will never generate a private key for a user and will initiate a reset process for any passwords and other personal secrets revealed to me; to do anything else would be irresponsible.

nerdponx · on Nov 2, 2021

If nothing else, they need to be able to access your files in case you leave.

lucb1e · on Nov 2, 2021

Unless it's all encrypted with the user's own password: sure, you can always invoke the admin powers and override the password field in the database or whatever. But the default mode should be that an account is only accessed by the person it belongs to.

Even as a security firm we have some exceptions in our company (disk encryption password for a physical device in the office is in a shared vault, for example), or support might do a remote control session with the user present, but it's definitely not the default setup procedure for a new employee to have shared credentials for example.

chousuke · on Nov 2, 2021

I think it should be noted that even with with encrypted data you'll generally just encrypt the master key with multiple passphrases; one (or more) master keys for admin use and one for the user themselves. There's really very rarely a good reason to share any secrets.

It's pretty much a necessity for any larger business that device data be protected with the user's own password and with a master password known to IT so that it can be accessed when the user's password is inevitably lost.

silvester23 · on Nov 2, 2021

(2019)

Discussed at the time: https://news.ycombinator.com/item?id=20495739

qwertox · on Nov 2, 2021

This was very enlightening for me.

But I wonder if it wouldn't be better if it were something like a ncurses list with checkboxable list items which one would progress trough.

Take a checklist for a pilot for example. Wouldn't it be not as good if the next item on the list only appeared after an item has been checked? It could be beneficial if you could peek a couple of items down the line in order to group your actions or get a better overview of the overall task.

"Do I put the water for the coffee in the microwave now or do I first finish this quick item and let the next long-running item start before I go and make my coffee?"

glacials · on Nov 3, 2021

You lost me at ncurses. IMO this idea works because the task is so simple no one would bat an eye at the thought of doing it. All it takes is one person to question why we're not using <technology> instead, or why we're not spending this ncurses amount of effort towards something else instead, to jeopardize the effort.

hollasch · on Nov 5, 2021

I think this would be a great idea if the intent was a checklist that would never be automated. If the idea is to create a scaffold for future incremental automation where tasks start falling off the checklist (until ideally none are left), then I'd rather start with that framework instead.

slightwinder · on Nov 3, 2021

Presenting one task after another has the benefit to enforce a focus on it. Planing ahead is fine, but can be also distract or even mislead.

thr0waway37373 · on Nov 3, 2021

This approach doesn't work well in the long term. I've worked as a successor in one of the author's previous company. All the "do-nothing scripts" diverges from the reality as time goes by. The scripts became a false prophet, not a single source of truth. When I came in, the fully automated parts are doing fairly well. On the other hands, most the one-off scripts doesn't work.

Falsy docs are worse than no documentation. We've tried looking at the scripts, but small divergents in the scripts destroyed our confidence in the scripts. Instead we figure how to do task by looking at the source code and actual deployments.

The lesson we learned from this is we need to treat scripts as a real software instead. We make scripts idempotent and run them periodically to see if anything breaks. For infrastructures we make them immutable and embraced the cattle mindset(vs pet mindset).

hollasch · on Nov 5, 2021

> All the "do-nothing scripts" diverges from the reality as time goes by.

Not sure I follow. In my experience, we have _tons_ of process that's written up in random pages in our company wiki. I assure you, nothing diverges as quickly as a set (regrettably) of duplicated wiki pages outlining a process.

If you're making the argument that a fully automated script is superior to a checklist, then ... well, sure. But while you're spending the resources to fully automate one process, duplicated wiki pages are being written for the 100 other checklists (at least in my experience).

The key here is that I fully agree with you on the divergence problem. But if a scripted checklist diverges, people complain and it's trivial to update the script. If you don't have these, then people either add a comment to one of the wiki pages, or a side note, or worse, write up a "better" wiki page with the real actual this-time-for-sure steps.

I believe the key here is to implement a consistent tactical approach to incremental development, capturing divergence along the way. If you just let them lie fallow, then yes, that's bad. But surely a fully-automated script allowed to lie fallow and diverge from reality is no better off.

NationOfJoe · on Nov 3, 2021

  We make scripts idempotent

does this mean change the commands that run to be idempotent? so the script/echo concept stays the same but you make sure the commands it echo's wont break anything if run more then once?

dbmikus · on Nov 3, 2021

Not the parent, but I'm guessing it's the idea that the script always moves the system to the same steady state. Similar to how Ansible tries to be and how Kubernetes controller loops work.

bazhova · on Nov 2, 2021

This is a really nice solution. For the 99% who don't work for a place with a massive DevOps team, this approach is a nice middle ground that actually works. If management doesn't like it they don't have to know. These scripts are for getting through the day with 10% more energy to spare at the end.

crispyambulance · on Nov 2, 2021

Cool. A nice medium on the way between "seat-of-your-pants" manual and completely inscrutable automation!

People like to brag that they never do something manually more than once, but in reality it takes time and practice to be able to fully automate any non-trivial process. You usually can't do it in one go. I really like that this article embraces that.

I especially like how it's broken down such that any particular function can be swapped out with full automation once the author gets enough confidence to be able to do that.

oehpr · on Nov 2, 2021

Hey I independently came to this as well!

We were performing some migrations to our servers and we weren't sure we had nailed the process down. Automating the procedure has a problem in that I can't just "reset" the server and start again. I could try to make all my operations idempotent or using a babushka style "Check - Set - Check" methodology, but that would greatly expand the work for a one off procedure that once it's done it's done.

I wasn't even sure of the generality of the steps, and they were complicated enough that copy and pasting them with all their parameters was tedious. So I created a structure just like this!

I called it a living procedure. An excerpt:

  #!/bin/bash
  source ./procedure-lib.sh

  question "Which client is this?" old_client
  question "What worker is $old_client on?" old_worker
  question "What is the targets name going to be? (it's fine if it's still $old_client)" new_client
  question "What worker is $new_client going to be on?" new_worker
  question "What branch is $new_client going to be on?" new_branch
  question "What should we name the migration? (typically something like migraiton-$(date --iso))" migration_name

  #================ Initialize the client on the new worker ================
  step "Login to $new_worker and Initialize $new_client in legacy mode using run.sh init_legacy, set their branch to $new_branch"\
    'eg.
    ssh '"$new_worker"'.lightship.works "sudo -i bash -c \"

These question functions also remembered the answer you put in last time (it was based off what the text of the question was, so questions that depend on the answer of previous questions did not presume anything) so that if you just hit "enter" then it would load the last one. This was vital because when a procedure ended up incorrect or needed to be amended, you could ctrl+c the script, fix it, then spam enter until you reached the point you were last time.

I've been thinking about creating a better structure to these scripts that allow you to gradually transform these living procedures into babushka style "Check Set Check" automations.

Flenser · on Nov 10, 2021

Thank you for introducing me to babushka style "Check - Set - Check" methodology. I assume you meant this:

https://github.com/benhoskings/babushka

SeriousM · on Nov 2, 2021

Usually I write instructions in markdown files and test them by redo all the steps. This saved me already many times because I will forget how to renew this damn certificate two years later, or setup the logging engine after 18 months ago. No need for a script.

qwertox · on Nov 2, 2021

I'm so scared of my certificate renewal script. It's really cool and sends me a summary via email of the executed tasks and runs automatically every month. But when I look at the code I don't know what the hell I was thinking when I wrote it.

I mean, it even fetches the certificates prior to renewing them and then re-fetches them afterwards in order to make sure that the servers are actually using the new ones.

dnautics · on Nov 2, 2021

I think this is eventually going to be how people use elixir's livebook library.

abledon · on Nov 3, 2021

i wish you could extend markdown syntax so certain things colored the editor differently.

# colors it color A

## colors it color B

hollasch · on Nov 5, 2021

So naturally, my knee-jerk reaction is to create some kind of checklist language and interpreter. Which is almost wrong, of course.

I say "almost", because it occurs to me that many steps are shared individually across many different workflows. If you see a single task as an automation target, then you end up writing simple scripts that perform very focused operations -- a strong advantage over a large complicated script that does many different things, some that are repeated for other complicated scripts.

In this setup, you end up with checklists that remain forever, with automation replacing some or all of the print commands.

Of course, my spidey sense tells me that I've just pushed the problem down a level, so ¯\_(ツ)_/¯.

nicbou · on Nov 2, 2021

I really like this idea. It goes hand in hand with an older post of mine: "no script is too simple" [1].

A script is a recipe that your whole team can follow. It's easier to update a recipe than to propagate changes across the whole team.

It's also similar to a flight check, which is a fancy name for a reusable to-do list.

[1] https://nicolasbouliane.com/blog/no-script-is-too-simple

dradtke · on Nov 2, 2021

Braintree created a tool for doing something similar: https://medium.com/braintree-product-technology/https-medium...

randomswede · on Nov 3, 2021

This is basically why I ended up writing Runnable Plans, over at https://github.com/vatine/runnable-plans/, the main differences are "you define the plan in YAML" (yeah, horrible, but better than hand-chasing a parser, that MAY come at a later date) instead of "in the script", "the plan has no inherent order" (to allow for future parallel execution), "saves success/failure to allow for later restart", and "can generate a GraphViz graph of the plan dependency ordering".

But, whatever works, works. Start somewhere, get it into a script, plan, whatever. Then, it is easier to identify steps that can be turned to entirely machine-operated.

gorgoiler · on Nov 2, 2021

Nice. A class with a single method can be nicely modelled as a closure.

Returning higher order functions and using format’s kwargs is an even more succinct version of this.

  def explode(**k)
    def run():
      return TheBomb(
        ’{user} is!’.format(**k))
    return run

  run(user=‘gorgoiler’)

nerdponx · on Nov 2, 2021

I'd almost always rather have a class than a function returning a function, in a language where classes are typical and natural. It imposes more structure on the code, provides some degree of namespacing, and makes it easier to extend.

gorgoiler · on Nov 3, 2021

Personally, classes feel particularly unnatural in Python.

Closures are also weird though, I get your point. I wouldn’t use them in code that’s widely shared.

The upside is truly private data.

iFreilicht · on Nov 3, 2021

A class with a single function can also be nicely modelled as a function.

gorgoiler · on Nov 3, 2021

Ahem, encapsulation of both data and code :)

ImportOllie · on Nov 2, 2021

I really like this, but in the example is there much value defaulting to wrapping each function in a class?

Jtsummers · on Nov 2, 2021

It gives a uniform interface (the run method) for each step. You can also, then, encapsulate more logic around that particular step if it needs more than just a single run function or some other stateful component.

_dain_ · on Nov 2, 2021

A plain function has a uniform interface too: just calling it.

FPGAhacker · on Nov 3, 2021

So the idea is that eventually real work will be done, likely by adding more functions. The class provides a nice encapsulation for that future work.

Jtsummers · on Nov 2, 2021

True, but that doesn't help you with encapsulation or the other (potential) benefits of using classes like inheritance.

Too · on Nov 3, 2021

Local variables inside a function are very much encapsulated.

Unless you need to keep state between two distinct actions on the same object, but in that case a single run() entry point will not be sufficient anyway.

Inheritance can be replaced by just calling a common function.

There are many “what if” here, which will likely never happen and until they do it’s premature abstraction. Most likely some other abstraction is needed by the time script grows.

_dain_ · on Nov 3, 2021

"Lots of times people think they might need something later. You don't. Or you can just do it later if it comes up."

https://youtu.be/o9pEzgHorH0?t=339

For glue code scripts like this you usually don't end up using encapsulation and inheritance or any other fancy OOP design patterns. It's usually grungy IO-bound procedural code, the sort of thing you'd write in bash if bash weren't terrible for long scripts. Don't burden it with premature abstractions. If you end up needing a class, just write the class later.

Jtsummers · on Nov 3, 2021

It's important to remember that YAGNI is a guideline not a rule. If someone spends 10-minutes thinking about the future instead of writing code immediately, and they realize a need that introduces some extra upfront effort but reduces longterm maintenance (fewer changes in the future because it already matches the desired structure), then YAGNI doesn't apply. For standard work, this is a pretty common situation to find yourself in: Recognizing that you've done X before and it led to a particular structure, so when doing X' start with the final structure instead of insisting on 100 extra refactors along the way.

ImportOllie · on Nov 3, 2021

Some great discussion and insights here thanks everyone.

My continued thoughts: since this is a framework for trying to reduce cognitive load, I'm now thinking a pragmatic approach would be to try it out a few times, either as a set of function steps or a set of class steps and see whether one approach falls out as a preferable pattern to default too.

Thinking about it, I can see why classes might be a reasonable default, if you don't need the state, it doesn't matter, if you do, you might be tempted to refactor the script which I think detracts from the goal of this framework.

If anyone is using this, in python, would be interesting to hear their experience.

IshKebab · on Nov 2, 2021

Yeah it's not the greatest style. You also don't need to derive from `object` anymore. But it gets the idea across and the advice is great!

tusharsadhwani · on Nov 2, 2021

notice the raw_input(), it's written in python 2.

hyperpallium2 · on Nov 3, 2021

Another benefit of this interactive to-do list: you'll find out pretty quickly if the steps really are that simple.

The test is you.

Many "slogs" require too much human decision to be automatable, but too little to be interesting. tedious. tiresome. irksome. In a way, also trite, bromidic/

paulmooreparks · on Nov 2, 2021

So, this is a wizard, basically. https://microsoft.fandom.com/wiki/Wizard

simonw · on Nov 2, 2021

I like Observable https://observablehq.com notebooks for this kind of thing - quickly building automated scripts that simply accept user input and use it to dynamically construct a copy-and-paste output.

You can run and edit them entirely in the browser, they can do anything you can do with JavaScript and they're easy to share with other people and create forks.

NationOfJoe · on Nov 3, 2021

Amazingly obvious concept, love it. I feel i could use this in my personal life.

Just wondering, i know it isn't the point, but does this exist as a service? For example is there a web service where my sales dept could add all the steps and it just goes through them one by one, but can include a start/completed webhooks.

kaycebasques · on Nov 2, 2021

I also independently stumbled on this idea. My rational was that it turns the process into more of a checklist, without all of the hassle of handling edge cases in my code. Because usually whenever I try to fully automate I re-learn the wisdom of that xkcd cartoon about "automation expectations versus reality".

Another wise thing about it is that it's a relatively low-friction way to start documenting institutional knowledge. Your engineers might hate writing docs, but maybe they won't mind writing a bash script like this. Which is essentially documentation in disguise. And since it's (presumably) living with the rest of your version control system you increase the chances that your engineers will explain changes to the process over time (via commit logs).

hollasch · on Nov 5, 2021

In my work, we have develop on Linux, MacOS and Windows. Another advantage to this approach is that it allows you not just to incrementally automate steps in the process, but _steps on platform X_. It eliminates yet another barrier to starting the process of automation.

tmerr · on Nov 2, 2021

I do something similar but take it a step further and statefully track the progress of the workflow. The code generates a file that is like a TODO list, and a separate command runs a single step before marking it complete. Great for longer running workflows (days).

stanislavb · on Nov 3, 2021

Braintree's Runbook seems to regard exactly this case of "Do-nothing scripting" https://github.com/braintree/runbook

PaulHoule · on Nov 3, 2021

I have a hobby of printing "three-sided cards" that have a front side (maybe a photograph, art reproduction, or anime character) and a back side that describes the image and has a QR code (and maybe an NFC tag) that points to a "web side".

This process has a number of steps, some of which are physical (handling paper), some of which are digital and scriptable, and some of which are digital but not scriptable (serious image modifications with photoshop.)

I use a process very similar to what's described in the article to manage the whole thing.

toby_tw · on Nov 5, 2021

Why don’t you just write this down into a checklist? Why codify it?

You need the interpreter to run it.

It’s less accessible to non technical people.

Actually running the script is an extra step that’s not needed.

This feels like an engineer overcomplicating something.

bones6 · on Nov 5, 2021

I immediately recognized this as a nice scaffolding to start the automation process of a slog task. There's lots of things that are difficult to fully automate. But breaking it down into step by step functions, you can identify and automate the low hanging fruit.

Then maybe you could request some help with the ones that are a stretch for you to automate yourself.

hollasch · on Nov 5, 2021

It addresses the false dichotomy between checklist and full automation. Documentation loves to spawn future copies and variants, while scripts tend to remain single-source.

dctoedt · on Nov 2, 2021

Could crude shell scripts be used for this purpose? For example, in zsh [0]:

  $ echo "Do the first step, then press any key."; read -s -k

EDIT: Apparently so: [1]

[0] Modeled on https://stackoverflow.com/questions/5215343/how-can-i-pause-...

[1] https://news.ycombinator.com/item?id=20495739

mr-karan · on Nov 2, 2021

The author has written a Golang version of the same as well [1].

1: https://github.com/danslimmon/donothing

Arch-TK · on Nov 2, 2021

I like the article. This is a great idea. I'm going to implement it over the next week. I have high hopes.

That being said, I have absolutely no clue what the advantage is to the python programming style exemplified in the article. Can someone explain why on earth I would ever want to convert:

    def do_something(context):
        pass
    ...
    do_something(c)

into

    class DoSomething(object):
        def run(self, context):
            pass
    ...
    DoSomething().run(c)

What is the point here?

ttymck · on Nov 2, 2021

In that contrived example, there is no point. In the real world, there are plenty of reasons to use a class instead of a function. Extensibility is the first that comes to mind. What answers are you expecting (i.e. performance)?

Arch-TK · on Nov 2, 2021

I want to know about this hypothetical extensibility. What extensibility does this approach achieve over just a bunch of functions?

Jtsummers · on Nov 3, 2021

Potential reasons to use classes over just a bunch of functions (and the syntax is rough and probably wrong, I don't use Python often):

1. You have a complex step that you want to refactor. Without classes you end up with something like:

  def Actual_Step(context):
    part1(context)
    part2(context)
    part3(context)

  def part1... 
  def part2...
  def part3...

Great, you've now polluted the larger namespace and further exacerbated the need to thread data through a bunch of function calls. With a class, you could do something like:

  class Actual_Step(object):
    var1, var2, var3 # things needed for this
    def run(self, context):
      # set the vars somehow
      self.part1()
      self.part2()
      self.part3()
    def part1(self):
      use var1
    def part2(self):
      use var2
    def part3(self):
      use var1, var2, var3

It's not stateless, but it works just fine. No polluting the larger namespace, and less threading a bunch of data through a bunch of calls. (Do you need the self.part1() bits in Python?)

2. You want to inherit. Maybe you have a collection of steps that all have the same setup/teardown bits, you can do this:

  class Common_Base(object):
    def setup(self, context):
      ...
    def teardown(self, context):
      ...
    def custom(self, context): # should be implemented by each derived class
      ...
    def run(self, context):
      self.setup(context)
      self.custom(context)
      self.teardown(context)

  class Specific_Instance(Common_Base):
    def custom(self, context):
      ...

And reuse that in multiple places.

3. Customizing at construction time. Maybe you have a step that goes out to some database, but what database may vary depending on what you're doing (for instance, at work we have test and ops modes):

  class Get_Data(object):
    def __init__(self, db_name):
      self.db_name = db_name
    def run(self, context):
      ...

And of course, since this is Python, you can change all of this to use a decorator or decorators and remove most of the boilerplate, but still retain the potential benefits.

In theory you could grow this (though at that point you may be better off with another system) so that different reports can be constructed from the steps themselves, without the need to execute them. Add elements like documentation strings that can be returned and prettified for different user interfaces or reporting purposes. All without ever having to touch the actual methods doing the work, just the way you interact with the classes holding them.

Arch-TK · on Nov 3, 2021

So the first example can be done like so:

    def do_thing(context):
        def step1():
            pass
        def step2():
            pass
        def step3():
            pass
        step1()
        step2()
        step3()

The second example can be replaced with:

    def make_thing_doer(db_name):
        def f(context):
            pass
        return f

I think both examples are clearer and involve less boilerplate.

But I specifically asked why that initial example posted to the blog was written the way it was. I will stress that "because in the future you may want to do X or Y with it" is not a really good reason because there is absolutely no reason why you couldn't just replace the function when it came to that point.

Moreover, the database example you gave would require changing the code just as much as the function example I gave.

specifically, in your case:

    tasks = [
        ThingDoer1(),
        ThingDoer2(db_name),
        ThingDoer3(),
    ]

or in my case:

    tasks = [
        do_something_1,
        make_thing_doer_2(db_name),
        do_something_3,
    ]

Once again, less code, fewer noisy parentheses, I don't see the advantage.

Edit: I missed the inheritance example at first read but this can trivially be implemented like so:

    def make_thing_doer(custom_step):
        def step1():
            pass
        def step3():
            pass
        def f(context):
            step1()
            custom_step(context)
            step2()
        return f

or more simply:

    def do_thing(custom_step, context):
        ...
        step1()
        custom_step(context)
        step3()
    
    def special_step(context):
        pass
    
    do_special_thing = lambda c: do_thing(special_step, c)

That doesn't implicitly inherit all the state. In which case I would concede that a class may be worthwhile. But as I already said, making the code so ugly initially is not justified by the possibility of eventually needing it to do this specific thing.

You can still make the code do this specific thing without ever needing to make it so ugly to begin with.

Nothing stops you from mixing the style and implementing __call__ in the class (or subclassing from a class which implements __call__ and calls run() if you don't like double underscores.

Jtsummers · on Nov 3, 2021

I mean, if you don’t like classes then don’t use classes. It’s not forced here. But if the author’s article is correct, they’re planning to change the script. Presumably they’ve done it enough to know a basic structure that’s worked well for them once they’ve started making changes. Why not set up that structure at the start if you know it’ll be the end result?

Arch-TK · on Nov 3, 2021

> Why not set up that structure at the start if you know it’ll be the end result?

Precisely because it's an unnecessary amount of code overhead.

There's no indication at all that IF he needed to expand the code that classes would be necessary UNTIL there were very specific requirements and there's also no reason why he should put that code in place in case of that requirement popping up.

It's always cheaper to write the simplest code you need at the time and then delete it (or rework some of it) and write the more complicated code you need in the future than it is to write more complicated code up front and expand less on some of it in the future.

If the point of the article is effectively "minimising the activation energy required to start automating a task" then surely the code should also be minimal?

Jtsummers · on Nov 3, 2021

> Precisely because it's an unnecessary amount of code overhead.

One extra line per step is not a lot of overhead.

Arch-TK · on Nov 3, 2021

It's more than one line though is it, it's a lot more _text_ even ignoring whitespace it's almost twice as much text. And that's the least important metric! Classes carry a lot more baggage. In the sense that if you see one, your expectations change and expand. The realm of possibilities expands too. This puts extra mental load when anyone new to the codebase has to look at it.

The fact that I looked at that code and spent 5 minutes trying to figure out why you would want classes there and what exactly it achieves is precisely what you want to avoid in codebases.

I audit code for a living, a lot of it, it takes far more work to audit code where the authors put a lot of unnecessary boilerplate or unnecessarily flexible code in place because it means I have to start figuring out what the authors intentions were. The expanded realm of possibilities means I have to check for more behaviors. It slows down the reading process.

slightwinder · on Nov 3, 2021

There is no benefit in using classes that way. Technically, it's even harmful, because it has slight overhead. It's just poor style to have a consistent look for the script. You could just do the same with functions, and replace them one by one with classes when the demand for more complex code comes up.

Though, it's also possible that the writer simple did'nt know better and has not enough understanding of python, or thinks other people who might extent them lack those knowledge.

jgtrosh · on Nov 2, 2021

This script is ridiculous in python with classes. It would be much cleaner as a sequence of prints and inputs.

JeremyNT · on Nov 2, 2021

Lengthy previous discussion (2019): https://news.ycombinator.com/item?id=20495739

kristianpaul · on Nov 2, 2021

The lesson for me here is to actually have scripts written where the business logic is clear to see and change.

This is a workflow in a python script but could also be a tool just by adding argument parsing. But also tools can be called from software pipelines and have tools to execute this for us (automation).

Look at https://aosabook.org/en/500L/a-continuous-integration-system...

mig39 · on Nov 2, 2021

This is a lot like the checklists a lot of industries use, just automated on your shell instead of being on paper.

I'm thinking of a pilot's checklist, or an operating theatre checklist.

mypalmike · on Nov 2, 2021

There's a difference though. I've worked in ops teams where there were many such checklists in Word documents, wikis, or whatever. Standard procedures involved reading from these runbooks and following the steps.

But here, you have code that plays the same role, but which can eventually be changed out with code that actually does the work for you. It's a seemingly small difference but it's actually huge if you have some decent developers who can spend a bit of time every once in a while pushing the needle forward in automating these manual steps.

Another note: I've found that it's hard to even get teams to consider the minor change of using the code version instead of the wiki. Inertia is a real problem.

nathanathan · on Nov 3, 2021

Combining this technique with an automation website like Jenkins and a "no code" programing language could be an interesting product. I imagine having a UI where prompts can be specified and predefined components like send X an email can be dropped into the workflow. Acknowledgments that steps and complete can be tracked, and long running workflows could be shared between users.

cuddlecake · on Nov 3, 2021

The underlying concept would be akin to Business Process Modelling.

Using something like https://camunda.com [0], you can execute a business process model. Such a model can include sending emails, waiting for responses, human action, custom API requests, etc.

Camunda is not no-code, but it comes close to what you imagine. I suspect that when the existing API landscape of a company needs to be used for workflows, coding/integration is inevitable.

There's also (truly) no-code platforms like https://bryter.com

[0]: I only know of this because there's a department at my current job that specializes in implementing workflows using camunda. I got explanations for it during moderate drinking, so my recounting of it may be incorrect in parts.

gregwebs · on Nov 2, 2021

I like to do this with a checklist. In Confluence you can clone the checklist page for the onboarding/release, etc and check off the cloned page.

Confluence is wiki-style (edit first, ask questions later), but I could definitely see git-style (pull request first) as an improvement for some situations.

bionhoward · on Nov 2, 2021

Makefiles can be great for this sort of thing. Over time you build up a really nice library of shell commands and it's trivial to organize them into a DAG of dependent steps. Lack of makefiles is a strong reason to avoid windows dev machines like the plague...(IMHO)

guhidalg · on Nov 2, 2021

Windows has MSBuild that is just as expressive as Makefiles, don’t use the lack of make as an excuse.

shadowfox · on Nov 3, 2021

Or use a `make` from any of Msys, Msys2, Cygwin, WSL, Gow or any of the assorted GNU util ports.

bionhoward · on Nov 3, 2021

Ok, I'll think of a different excuse next time

rgrs · on Nov 3, 2021

I use Bash aliases to do the same

patleeman · on Nov 2, 2021

One could pretty easily create a tool that takes Github Flavored Markdown and interactively displays the text and make checkbox items appear one at a time in a manner similar to this. Would be much more useful and easier to configure.

Jtsummers · on Nov 2, 2021

How would it be easier than the proposed Python example? Which only relies on bog standard Python (unless your steps are more complicated) and can (by virtue of encapsulating steps into a class) be wrapped up in nearly any user interface.

patleeman · on Nov 2, 2021

Fair. In my mind, easier in the sense that past the initial script I'm just writing markdown for future checklists. It's also compatible with existing markdown documentation.

hollasch · on Nov 5, 2021

That would make sense if your intent to was to remain a checklist forever.

lifeisstillgood · on Nov 2, 2021

Weirdly I have something like this called SOPys (usual python joke). It is pretty much that for pretty much the same reason only this is waaay better explained.

So - got my external validation . Will expand that repo when i get my own time back

tomrod · on Nov 2, 2021

As others have stated, "do-nothing" may have some baggage.

Consider something like "incrementally inclusive" or "eventually complete" or similar.

didip · on Nov 2, 2021

So... I am confused about something, isn't it better to start writing some automation scripts and document the missing piece for later?

slim · on Nov 2, 2021

That's implied. Optimisation is not premature when obvious.

throwaway20371 · on Nov 2, 2021

That's basically what the do-nothing script is. The difference is that before you write any automation, you document all the steps in the script. Right there - when you've got it all written down, and no automation work has been done yet - that in itself is a very valuable piece of work. Now you can point anyone in the company to that script, and they can all accomplish the task without having to figure it out for themselves. You can now scale that process N times (N = the number of people in your company). Just writing down the steps has become a force-multiplier of repeatable work. Then as you begin automating each step, people automatically receive the benefit of that automation. Because the documented steps and the automation are in the exact same place, both will always be up-to-date.

sakesun · on Nov 3, 2021

Soon AI will generate do-nothing script for us.

andai · on Nov 3, 2021

I was just thinking, I should make a tool to make these on the fly, I have ADHD and constantly forget where I was in a process. I also have many projects running at any one time, it would be great to be able to leave them open and pick up where I left off.

dragontamer · on Nov 2, 2021

How about you just... cut out the middleman and write an expect script (or pexpect script, for you Python programmers) ???

rthomas6 · on Nov 3, 2021

Why a class for each step instead of a function? They don't have individual contexts and don't even have any members. They don't even need to persist, it's literally a command line script. Object oriented style for no reason is my pet peeve.

lenkite · on Nov 2, 2021

This is nothing but a checklist right ?

andrewflnr · on Nov 2, 2021

It's an interactive checklist with a gradual upgrade path to full automation.

throwaway20371 · on Nov 2, 2021

This is the way. I wish this were taught in computer science class, development bootcamps, operations team onboarding, anywhere there is a procedure that is even slightly complicated to automate. It is the absolute best solution there is.

* Documentation of the entire procedure is contained in one place. No need to go sifting through 20 different sources of documentation. This lowers the human emotional barrier to "just get it done", as people will always avoid things they aren't comfortable/familiar with, or don't have all the steps to. This central point of documentation also enables rapidly improving the process by letting people see all the steps in one place, which makes it easier to fix/collapse/remove steps.

* Automation in small pieces over time avoids the trap of "a project" where one or more engineers have to be dedicated to this one task for a long period of time. Most things shouldn't be automated unless there is demonstrably greater value in the cost of automating them than the cost of not doing so. Automating only the most valuable/costly pieces first gives immediate gains without sinking too much into the entire thing.

* One unified "method" to encapsulate any kind of process means your organization can ramp up on processes easier, reducing overall organizational cost.

* In the absence of any other similar process, you are guaranteed to save time and money.

I would say that the only potential downside is if someone decides to "engineer" this method, making it more and more and more complicated, until it loses its value. KISS is a requirement for it to be sustainable.

csdvrx · on Nov 2, 2021

> only potential downside is if someone decides to "engineer" this method

It can be engineered, if you follow a gradual process.

On servers, I keep a log of what was deployed in a root directory following the sequential number _ goal format (ex: 00_partitions ... 90_web_server)

It is not fancy, most of the logs are not even scripts: many are just ASCII text files, that will only be used as a checklist if the same "goal" has to be achieved again. For example, 00_partition may be "gdisk /dev/nvme0n1" followed by a copy-pasted list of the partitions and some quick description about why it was done that way.

But that's on the first iteration only: the next iteration turns that into "do-nothing" script, the next iteration into a better script with basic checks (supporting both /dev/nvme0n1 and /dev/sda), then exception handling (if partitions already exist, etc), and so on: this gradual complexification process avoids the "premature optimization" of creating infrastructure-as-code for what you rarely need, while optimizing and fine tuning the parts you most often need.

Someone will certainly mention Terraform, or Ansible, or something else - yes, they exist and they are nice, but if you are doing everything there, you are over-engineering and wasting time: not everything needs your equal attention!

If you only install a webserver once in a blue moon, make a .txt checklist of the steps you followed.

But if you leave and breathe nginx options and certificate deployment, fully automatize all that, including the obscure details of what may fail if you use let's encrypt with some specific DNS configuration!

And if you don't know yet which is which, start small (a .txt checklist will cost you a few minutes) and the next time you find yourself doing the same thing, do it better using the previous artifact (the .txt file) to create a better one (a script, then a better script etc)

chousuke · on Nov 2, 2021

I would argue that all deployments (no matter how small) should have configuration management.

In the simplest case, a deployment consists of: 1. Install packages 2. Install configuration files, possibly from templates. 3. Configure services to start on boot.

This kind of automation is trivial to do with almost any tool, but there's no reason not to use something like Ansible that's designed for infrastructure automation, because you get encrypted secrets, templating and idempotency with zero effort and the result can be stored in a git repository somewhere.

The Ansible playbook is as fast to write as installing and configuring things manually, and after some practice, it's faster and the resulting system is of higher quality.

Even if you end up using the automation you write only once, it still has value because it doubles as a formalized description of what you did, and can be stored together with additional documentation. Over time you will also accumulate a library of bits and pieces that you can copy over to new setups, further improving your speed and quality.

csdvrx · on Nov 2, 2021

> I would argue that all deployments (no matter how small) should have configuration management.

I would argue that in most cases, they don't need anything but some documentation explaining what was installed, and why.

I will take a word file with screenshots over a broken script in an obscure language every single time.

> but there's no reason not to use something like Ansible that's designed for infrastructure automation,

There's a big one: my time isn't free.

If someone is willing to waste money on that, sure, I'll be happy to bill them for their extravagant tastes (but only after having done my best to explain them it's a waste of money)

And still, I will think about the next person that may have to maintain or tweak whay I wrote, so I will also leave a document full of screenshots in case they don't know ansible or whatever new fashionable tool that the client may have specifically requested.

> it's faster and the resulting system is of higher quality.

Not everything needs to be of high quality.

Forgive me if I'm assuming your gender, but I see a lot of black-and-white thinking among male sysadmins/devops: it's good or it's bad, it's high quality or it's not.

I prefer to have a "sufficient" degree of quality: if a checklist is enough, I will not waste time writing a script. If a shell script is enough, I will not waste time writing proper code - and so on.

> Over time you will also accumulate a library of bits and pieces that you can copy over to new setups, further improving your speed and quality

Except you assume a continuous progress, without any change of scope or tools, and with the tools themselves never evolving. It doesn't work like that: over time, you will accumulate a bunch of useless code for old versions.

Even small inconsequential changes (like unbound in debian 11 requesting spaces before some options, which wasn't the case before) will take some time and effort. Why waste your energy one one shots?

The do-nothing approach argues that you should avoid premature optimization, which strikes me a good approach in software in general.

brulard · on Nov 3, 2021

Such a needless dismissive use of "male" word here. Devops guys I work with are very reasonable and I have not seen a female one in my whole career.

loxias · on Nov 3, 2021

> Forgive me if I'm assuming your gender, but I see a lot of black-and-white thinking among male sysadmins/devops: it's good or it's bad, it's high quality or it's not.

Male lifetime linux nerd here, who started as a sysadmin, checking in just to say that I agree with everything "policy related" in your comments on this article. Knowing where to tune the knob between "high quality"/"good architecture" vs "can i just get this done now and move on?" is difficult, at least I don't know how it could be taught other than experientially.

IME, the predilection to see things as black-or-white is more correlated with age, than gender.

Anyway, "not all men". :P

csdvrx · on Nov 3, 2021

> checking in just to say that I agree with everything "policy related" in your comments on this article

Your nick seemed familiar - now I remember, I read your great comment in "I just want to serve 5TB" earlier today!

I also agree with everything you wrote about simplicity in software development: I'll take almost every time some dirty php scripts running baremetal over Docker + Golang + Kubernetes + Terraform + Gitlab + Saltstack + Prometheus + the new fashionable tool because with so many parts now begging for attention, nothing will get done quickly - if we're lucky and something gets done.

Knowing where to tune the knob is indeed very difficult, and I'm afraid most people now are just doing a cargo cult of whatever google does, except they are not google, and they don't understand the tradeoffs or the possible alternatives.

But at the scale of most companies, it's a folly to sacrifice flexibility and simplicity to some unachievable desire for software perfection!

It's also a very costly hubris: I have been asked way too often to improve the performance after having thown very expansive hardware at the problem, that still performs miserably due to missing the big picture.

The solution was almost always removing the useless parts, or when trying to disentangle the architecture astronaut fancy mess would have been too costly, start from scratch with a saner design: most recently, I replaced a few hundreds java files (and test and stuff) by about 10 lines of bash, and 20 lines of awk.

My work is not fancy, but it works, unlike the previous solution that was going to be ready the next month, every month, for almost a year...

To all those who want to do things like google, maybe apply there instead of over engineering/polishing your CV with fancy keywords at your employer or client expense?

> IME, the predilection to see things as black-or-white is more correlated with age, than gender.

I had noticed this weird pattern, and it was my best explanation even if I didn't like it much, because it's sexist.

But your version seems more plausible (Occam's Razor!), so thanks a lot for taking the time to post!

chousuke · on Nov 2, 2021

I do not think automation is "premature optimization", nor do I that think everything needs to be high-quality; I did not say that. I do think, however, that everything you do should be of acceptable quality.

And for me, having configuration management is the minimum level of acceptable quality. It's simply not possible to have acceptable quality of a system without some form of configuration management. I can't recall a single instance where I (or anyone else involved) ever said "wow, this unmanaged mess sure works well" :P

In some cases, the management can be as simple as a comment in some script explaining some part of the process was done manually, or simply a periodic snapshot backup of the server that can be restored when the configuration is broken, but the point is that a process must exist and it must be consciously chosen.

Free-form documentation is not an alternative to configuration management either; if you can document your configuration in a wiki, you might as well put it in a git repository in the form of a script or a template.

When done properly, It's the exact same amount of effort, except when you use automation tools, the documentation is in a format that's not ad-hoc and can actually help do the things it documents instead of requiring a human to interpret them (possibly introducing mistakes). "Setting up" Ansible requires literally nothing but a text file containing your server's hostname, and SSH access, which you already will have.

Also, I don't know where you got the idea that I would somehow assume unchanging scope? I am the first person to throw away useless code and tools; I consider code my worst enemy and it's practically my favourite activity to delete as much of it as I can. If some piece of automation is no longer fit for purpose, it gets rewritten if necessary. Throwing away code is no big deal, because the tools I use allow me to get things done efficiently enough that I can spend time refactoring and making sure the computer properly handles the tedious parts of whatever I'm working on.

Your unbound example is something that is trivially solved with configuration management. After an upgrade, you notice your configuration does not work, navigate into your git repository, update the template, and then deploy your changes to any server that happened to be running unbound using that same template (because you might have redundancy, if you're running DNS). If you make a mistake, you revert and try again. There is no manual ad-hoc process that comes even close to allowing you to manage change like this, but it is trivially enabled by existing, well-understood automation tools.

csdvrx · on Nov 3, 2021

Your definition of "acceptable quality" is my definition of "overengineering".

It does not take the same amount of effort, if only cause you mention how for unbound, you have to update the template.

For one shots, this is overkill.

randomswede · on Nov 3, 2021

For "truly one-shot", you're right. But a "truly one-shot" is not a production machine, it is a test bed, informing what the eventual production machine should look like.

Because even if you will only ever have a single production machine, it will have something go horribly wrong with it and need recreating from fresh hardware (or from a fresh VM or whatever).

I guess, if you're cloud-based, you could turn your finely tuned test box into a template, then you have something that is (effectively) scripted.

chousuke · on Nov 4, 2021

Leaving aside all the other benefits and even if you never need to rebuild your system, having some sort of IaaC automation in place allows for extremely powerful change management. When your system is defined as code[0], change over time can be reviewed with a "git log -p", which definitely beats searching through ticket comments or ad-hoc documentation and attempting to reconstruct the history of change.

It's a no-brainer nowadays that software should be developed with version control. I don't see why infrastructure should be treated differently.

[0] Ansible playbooks are code, no matter what some people may think. It's a declarative'ish programming language with a silly syntax.

chousuke · on Nov 3, 2021

There's no such thing as an oneshot if you're creating a system that someone will actually use and depend on.

All systems have a lifecycle, and even on a "trivial" system you have backups, access, monitoring, logging and security maintenance to worry about even before you consider how installing any useful software affects those things.

There are exceptions to any rule, of course, and I did in fact create a system where the configuration management is a snapshot backup just two weeks ago; but that system has no data on it, its lifecycle is expected to last for less than a year, and if/when it breaks, a backup restore can be performed without any additional considerations. It was also an emergency installation into a network that's not easily accessible with SSH, which is why I did not just use Ansible from the start.

I thought it would be a oneshot, but I did end up having to create a second instance of the system a few days later, fortunately with less emergency :P

Still even ignoring that, I fail to see what could possibly be overkill about literally 3 small files in a git repository. You call "overengineering" what is to me "5 minutes of effort with extremely relevant upsides". That's literally how much time it would take me to create a playbook for unbound if I already know what the configuration needs to look like; probably less, but most of the time will be lost to context-switching overhead.

My point being, most of the time will be spent actually configuring the software and the automation overhead is nothing in comparison compared to the value you get from it, and that's why I generally automate things by default: It provides more value than I put in effort.

When you start of learning configuration management and infra automation tools, there's a learning curve; in the beginning, you will be "wasting" time learning (what a silly statement) how to use your tools effectively, but with practice, you will learn how to effectively use the tools and where to apply them and how to approach managing specific kinds of systems such that over time, using the automation tools is simply easier and faster than doing it manually. That's what I meant when I mentioned "higher quality" earlier; you get it for free, with no effort, once you've put in a bit of practice first. It just sounds to me like you're arguing against doing things well in favour of doing things with strictly inferior tools.

rgj · on Nov 3, 2021

> Not everything needs to be of high quality.

But stuff connected to the internet needs to be or it will be compromised before you even finished installing it.

minetest2048 · on Nov 3, 2021

Rant/question:

The word software configuration management can mean 2 related but different thing:

1. Configuration management in system engineering sense, which is is a process to systematically manage, organize, and control the changes in the documents, codes, and other entities during the Software Development Life Cycle (guru99).

2. Something to manage your config files, from something simple as python/bash scripts to full infrastructure-as-code solutions such as terraform and ansible

When I think about configuration management, I (and the parent) thinks about the second meaning, but if I googled that, all of the search results points to the first meaning

chousuke · on Nov 3, 2021

Both are important. A good CMDB is key in finding your documentation that points you at the configuration management used for the actual system.

Let me tell you, just a wiki with a search is not enough beyond a certain size, and you hit that faster than you'd think.

crispyambulance · on Nov 3, 2021

> [...] all deployments (no matter how small) should have configuration management [...] Ansible [...] with zero effort [...]

No.

It's a powerful tool, Ansible, but let's not get carried way. There's a ton of complexity behind the scenes. If you over-do it you end up with a ream of ugly yaml and you're fighting with the tool as much as you are any real problems.

Arch-TK · on Nov 2, 2021

This is assuming you already know Ansible.

chousuke · on Nov 2, 2021

Sure, but then again, typing shell commands into text files is assuming you already know shell commands. You have to spend time to learn your tools at some point.

For simple configuration management, Ansible is a straight upgrade to most shells because of idempotency alone, never mind the fancier features like the more advanced modules, multi-node orchestration, or encrypted vaults. The YAML syntax is dumb and it has its issues, for sure, but it still does even the simple things much better than plain old shell.

Anyone who has any familiarity at all with UNIXy systems can learn Ansible from zero well enough in a day or two for it to start becoming truly useful, and if you don't have the foundation for that... why on Earth are you setting up a web server? I mean, it's of course fine to tinker with things for learning, but I was assuming a real deployment scenario.

Arch-TK · on Nov 2, 2021

> Sure, but then again, typing shell commands into text files is assuming you already know shell commands. You have to spend time to learn your tools at some point.

Don't I still need to know shell programming for Ansible? Or at least know all the systems I want to manage with it inside out?

Yes, I need to learn tools at some point. But as I see it, I am not a system administrator of anything but my own network of 8 infrastructure hosts. The effort required to recreate this with ansible (and I don't think ansible can actually idempotently handle ALL of these devices, not without serious limitations) seems far greater than maintaining a few scripts and keeping backups. Also, I already know bash (unlike ansible).

> Ansible is a straight upgrade to most shells because of idempotency alone

So, as I said, I know nothing about Ansible. But idempotency implies that Ansible always starts from nothing and builds from there. Does this mean that every time I want to change my server I have to wait 15 minutes for it to re-install the distro and re-configure everything? Do I have to keep my state on a different server? I don't see how this can't be achieved with just as much hassle with a script?

Surely I misunderstand this. But if I did, then surely it's not THAT idempotent.

> Anyone who has any familiarity at all with UNIXy systems can learn Ansible from zero well enough in a day or two for it to start becoming truly useful

My problem with this is that every time I've looked into Ansible, it didn't look like a day of work. It looked like a week of work converting my entire infrastructure to it, for very little benefit, in addition to having to change the way I do a lot of things to fit the Ansible blessed method of doing them. It may take a day to learn Ansible but it probably takes even more time than that to learn it to a standard where I would consider the knowledge reliable. It would require making mistakes and lots of practice before I felt like I could quickly recover from any mistake I could make using it as well as avoid those mistakes. Not just that, but because of my nonstandard setup I would likely have to spend extra time learning Ansible well enough that I can actually replicate my nontrivial setup.

jjnoakes · on Nov 3, 2021

> idempotency implies that Ansible always starts from nothing and builds from there

No, it doesn't. In Ansible you say something like "make sure apache is installed" and if apache is installed, nothing happens. If it isn't, it gets installed. Then you say "make sure apache is running" and if apache is running, nothing happens. If it isn't, it is started.

Arch-TK · on Nov 3, 2021

Okay, this is a rather limited form of idempotency. I don't see the advantage. My system's package manager and service manager already perform this function.

jjnoakes · on Nov 3, 2021

You should really spend a little time learning ansible before you critique it. Ansible isn't perfect, but the things you describe aren't how ansible works in general, so they aren't even valid criticism.

For example, it has idempotent modules for all sorts of things - contents in files, files and directories in the file system, etc - things that you COULD script in an ad-hoc and verbose way, but things which come built-in as one-liners in ansible.

It's quite convenient.

Arch-TK · on Nov 5, 2021

There are no resources which are seemingly suitable for my environment. If you're going to claim that I'm missing something, rather than telling me that I have things to learn (no shit sherlock), you could tell me specifically which initial impressions are wrong.

jjnoakes · on Nov 8, 2021

I did, a few comments up. This:

> idempotency implies that Ansible always starts from nothing and builds from there

...is wrong. It might be true that Ansible is unusable in your environment for some reason, but that's quite different fromage this specific false claim.

Here are a few more quotes that imply you should learn about Ansible before critiquing it for your use case (or, if you don't have time, then refrain from critiquing it in general):

> Don't I still need to know shell programming for Ansible?

No, Ansible uses a custom non-shell syntax and python modules. You can dip into shell scripts but you don't have to. Examples are everywhere in the Ansible documentation.

> Does this mean that every time I want to change my server I have to wait 15 minutes for it to re-install the distro and re-configure everything?

No. Ansible will examine your existing system and apply the changes you configure. Idempotency does not imply or require a functional-like OS or rebuilding from scratch; Ansible is more imperative.

Too · on Nov 3, 2021

> Don't I still need to know shell programming for Ansible?

No. Ansible has its own built in functions for creating files, managing systemd, docker and so on. These are built with idempotency in mind.

You can however call out to shell for situations where there is no built in. There are a lot of people who only ever use this role, and just see ansible as the fleet orchestration layer. Which imo defeats most of the benefits of using it, you might as well ssh a full script in that case.

As a side note I wouldn’t actually recommend Ansible for server management. Like you say learning all these blessed roles feels like relearning basics you already know and the syntax and directory structure is messy. It has no place if you use containers.

Arch-TK · on Nov 3, 2021

> Ansible has its own built in functions for creating files, managing systemd, docker and so on. These are built with idempotency in mind.

Do I still get idempotency if I do not use systemd or docker?

> You can however call out to shell for situations where there is no built in. There are a lot of people who only ever use this role, and just see ansible as the fleet orchestration layer. Which imo defeats most of the benefits of using it, you might as well ssh a full script in that case.

So it sounds like I wasn't entirely wrong in my first impressions that it would be useless for my situation where I don't think any of the "built ins" would really be suitable. Of the 8 machines on my network, only one has systemd (and I'm in the process of phasing it out because systemd seriously struggles to deal with services with dependencies on specific network interfaces being "UP", these issues are documented by freedesktop[0]).

> As a side note I wouldn’t actually recommend Ansible for server management.

Given the background of my infrastructure being a mixture of FreeBSD, OpenBSD, non-systemd Linux and systemd Linux machines. What would you recommend?

[0]: https://www.freedesktop.org/wiki/Software/systemd/NetworkTar...

rgj · on Nov 3, 2021

You seem to have a very strong opinion about Ansible while you keep emphasizing that you don’t know anything about Ansible at the same time.

As a result, all your arguments against Ansible seem to be based upon assumptions, some of them completely false.

Arch-TK · on Nov 3, 2021

I have opinions of my limited experience of trying to look into ansible once.

Why don't you tell me which "arguments" (correction, they're my opinions) are based on false "assumptions" (correction, they're my impressions) rather than just giving me this blanket statement to work from?

chousuke · on Nov 3, 2021

If you already have something that works, by all means, stick with it.

If you want to learn Ansible, you don't even have to throw away your scripts; Ansible is a perfectly good way to run ad-hoc scripts if that solves your problem better than writing a full-blown playbook or even a custom module.

Ansible is weird and annoying in the beginning, but it's still a good tool to learn on top of your existing knowledge, because it provides extremely useful features beyond what's possible with plain old shell, and more importantly, it's a common language for system administration tasks that anyone can learn and understand without having to figure out how your specific scripts accomplish the things that Ansible gives you for free. The same applies to any management tool like Terraform, Puppet or even Kubernetes manifests. I put my expertise in my Ansible scripts and provide an easy interface to them such that a more junior person can, say, upgrade an Elasticsearch cluster by issuing a documented "make upgrade" (I like to use Makefiles to provide a neat interface for "standard" operations. "make help" and anyone can get going.) command that does everything correctly even though they have no idea how to actually upgrade it manually. If they wanted to learn, they have all the resources available required to read and understand my playbooks and figure it out without me being there to teach them the particulars of whatever unholy custom script setup I might have used instead.

Ansible is also mostly useful once you already have a server up and running but with 0 configuration; it's pretty bad at actually installing new servers, and I'd recommend using better tools for that part (Terraform, kickstart, or maybe just a script that clones an image). Just a manual next-next-next install is also perfectly acceptable way to get the base OS installed if the defaults are fine, though beyond a few servers it's a good idea to have a better process.

My perspective is that of someone who works with very varied systems daily, ranging in size from one to hundreds of nodes. I can manage that kind of scale alone because I use automation, and Ansible in particular is a tool that fits extremely well in the 1-20 "size range" for an environment; It is extremely lightweight and low-investment and can be used for even single nodes to great effect; once you get beyond a couple dozen, something more "heavyweight" like Puppet will start showing its usefulness.

As for idempotency, it's a very useful feature for automation: basically "Only do something if it is required". With a shell script, you have to implement manual checks for everything you run such that if you re-run a script on a system where it's already been run once, you won't accidentally break things by applying some things twice. A side benefit of this is that you can run your playbooks in "check mode", ie. "Tell me what you would do, but don't actually do it". Extremely useful and very error-prone to implement manually (Ansible doesn't always get it right either).

csdvrx · on Nov 3, 2021

> With a shell script, you have to implement manual checks for everything you run such that if you re-run a script on a system where it's already been run once, you won't accidentally break things by applying some things twice

Using tools like grep + basic logic like || and && goes a long way...

I'm not saying there is no place for ansible, but in my personal experience, it's a very small one.

> Ansible is also mostly useful once you already have a server up and running but with 0 configuration; it's pretty bad at actually installing new servers, and I'd recommend using better tools for that part (Terraform, kickstart, or maybe just a script that clones an image).

Agreed!

More recently, I've found zfs clones of base installs surprisingly flexible.

Now I only with there was a way to do some kind of merge or reconciliation of zfs snapshots from a common ancestor that haven't diverged much in practice, spawning the differences into separate datasets per subdirectory (ex: if /a/ hasn't changed but only /a/b/c/d1 and /a/b/c/d2 differs, move d1 and d2 off to create a separate d dataset mounted in /a/b/c/ so you can keep the common parts identical )

rambambram · on Nov 2, 2021

Oh, what do I love .txt files. I use them for all kinds of simple checklists, logs, and basically everything. I had a manager once (not in the software field though) and she asked me annoyed why I kept using these strange files, and if I wanted to "just use Word and .doc files because that was more safe and compatible". I wasn't able to explain that text files were there and will be there for a long time. She also didn't understand the difference between a text file and a document. Not even when I pointed her to the .txt in the filename.

throwaway20371 · on Nov 2, 2021

> If you only install a webserver once in a blue moon, make a .txt checklist of the steps you followed.

This brings up a very important point about checklists that I don't think gets enough attention.

The problem happens when somebody "updates" that web server in-place. If they try to record what changes they made in the middle of the checklist, eventually when someone tries the whole checklist from the beginning, they'll find it's now broken; the steps aren't working as expected. This happens to me when I try to record changes in my VirtualBox configuration after I add a new system package or something; later I try to re-deploy my vbox, and it breaks.

So checklists should be considered immutable. Once you create them, don't assume they will work again if modified. Instead, if you make any change to the checklist, you must follow all the steps from beginning to end. This way you catch the unexpected problems and confirm the checklist still works for the next person.

notatoad · on Nov 3, 2021

i just went through this with a colleague this afternoon, and i was super happy when i realized what we had finally accomplished:

he asked if there was any way to access a server, and the answer was "no". the only way to "access" our production server is to modify the provisioner script. there is no way to "update it in place". it's taken a while to get here, but it's really freeing to realize that yes, i have the credentials and could probably get in, but i know my changes would be automatically reversed in the near-term and there's no point in even attempting to access a server directly. the server belongs to the deploy script, not to me.

csdvrx · on Nov 3, 2021

> the server belongs to the deploy script, not to me.

I prefer it when both the server and the deploy scripts belong to me :)

"infrastructure as code" with no way or extremely limited possibilities to ssh for emergencies strikes me as foolish overengineering / painting yourself in a corner, but if you like that, why not?

theshrike79 · on Nov 3, 2021

The possibility to ssh in an emergency is also a possibility ssh in when it's not an emergency and "just quickly change this one thing".

And then the server gets deployed via the script and the quick change isn't there any more.

Whoops.

My EC2 instances are all configured so that they can't be accessed from the outside. They boot up, fetch their install script from a set location and run it.

If they need changes, I either update the base image or the install script.

csdvrx · on Nov 3, 2021

> If they need changes, I either update the base image or the install script.

You lose some time and flexibility, just because you are afraid you may forget to integrate the quick change in your scripts.

My bash histories go into a global database to avoid this

nl · on Nov 3, 2021

Cattle, not pets.

Using SSH is what you do if it's a pet.

ynx · on Nov 3, 2021

Well...if I'm only tending to, say, three or four cows, then they may as well be pets for my purposes, even if most of my management is systematic.

You can do a lot with four servers.

csdvrx · on Nov 2, 2021

> The problem happens when somebody "updates" that web server in-place.

Imagine this is 28-nginx : I would create another script 29-nginx-update only recording the update, even if it: "echo apt-get update; apt-get upgrade nginx ; echo "make sure to fix variable $foo"

Next time I have to do that, I will integrate that into 28-nginx and remove 29-nginx-update

> eventually when someone tries the whole checklist from the beginning, they'll find it's now broken; the steps aren't working as expected.

Maybe I don't understand the issue, but my scripts or text files are simple and meant to be used in sequence. If I hack the scripts, I make sure it still works as expected - and given my natural laziness, I only ever update scripts when deploying to a new server or VM, so I get an immediate feedback if they stop working

Still, sometimes something may work as expected (ex: above, maybe $foo depends on a context?), but it only means I need to generalize the previous solution - and since the script update only happen in the context of a new deployment, everything is still fresh in my head, so I can do it easily

To help me with that, I also use zfs snapshots at important steps, to be able to "observe" what the files looked like on the other server at a specific time. The snapshots conveniently share the same name (ex etc@28-nginx) so comparing the files to create one ot more scripts can be easily done with diff -Nur using .zfs/snapshot/ cf https://docs.oracle.com/cd/E19253-01/819-5461/gbiqe/index.ht...

Between that + a sqlite database containing the full history of commands types (including in which directory, and their return code), I rarely have such issues

Shameless plug for that bash history in sqlite: https://github.com/csdvrx/bash-timestamping-sqlite

> So checklists should be considered immutable. Once you create them, don't assume they will work again if modified. Instead, if you make any change to the checklist, you must follow all the steps from beginning to end.

I agree: if I don't have time to fix 28-nginx, I write 29-nginx-update instead, with the goal next time to integrate it. But I don't try to tweak 28-nginx if I know I won't have the time to test it.

throwaway20371 · on Nov 2, 2021

It can work this way (that's how software patches have historically worked) but if you don't test it from the beginning, you will still find the odd case where that added step is broken, even though it seemed like it should have worked. The more you use that method, the more chances for breakage.

If you don't want to repeat the steps from the beginning, you could make a completely separate checklist to be followed on a given system that includes things like "make sure X package is installed", "make sure Y configuration is applied", so that the new checklist accounts for any inconsistencies. This is pretty common anyway as checklists are broken up into discrete purposes and mixed and matched.

nerdponx · on Nov 2, 2021

The problem I see is that someone will inevitably update the procedure (or make a change that unknowingly requires a change in the procedure) and not update the script. Either because they are pressed for time or because they forgot. Same as any other documentation.

The solution ultimately is for PMs to get it into their heads that software and infrastructure require maintenance like anything else, and consistently refusing to schedule time for software/dev-tool maintenance (such as updating documentation) has the same effect as refusing to schedule time for physical equipment maintenance. Then and only then do engineers have the freedom to set up mandatory procedures and checklists for their work, the way all engineers should be allowed and encouraged to do.

masukomi · on Nov 2, 2021

> The problem I see is that someone will inevitably update the procedure (or make a change that unknowingly requires a change in the procedure) and not update the script

why would your procedure be to do anything _other_ than "run script foo and do what it says"? If your procedure is not that, then your procedure doesn't reflect reality, and thus is outdated documentation that needs to be updated.

if the steps of the procedure only exist within the script then there's only one place to update it. And yes, this suggests the script should be very readable.

nerdponx · on Nov 2, 2021

> If your procedure is not that, then your procedure doesn't reflect reality, and thus is outdated documentation that needs to be updated.

Configurations change all the time. There is no technological safeguard against someone forgetting to write down the change in the playbook script; it has to be organizational.

chousuke · on Nov 2, 2021

Declarative configuration management systems solve this by unchanging your configuration after someone messes with it manually. :) Hard to forget to change the automation when it persistently undoes all your hard labour.

You can help solve the problem with technology, you just have to make the solution easier than working around it.

nineteen999 · on Nov 3, 2021

> Declarative configuration management systems solve this by unchanging your configuration after someone messes with it manually

Not always, there are frequently ways to do an "end-run" around tools like Puppet and Ansible; take for example the following list of /etc/*.d directories on a Redhat distribution:

/etc/bash_completion.d

/etc/binfmt.d

/etc/chkconfig.d

/etc/cron.d

/etc/depmod.d

/etc/dracut.conf.d

/etc/gdbinit.d

/etc/grub.d

/etc/init.d

/etc/krb5.conf.d

/etc/ld.so.conf.d

/etc/logrotate.d

/etc/lsb-release.d

/etc/modprobe.d

/etc/modules-load.d

/etc/my.cnf.d

/etc/pam.d

/etc/popt.d

/etc/prelink.conf.d

/etc/profile.d

/etc/rc0.d

... <snip> ...

/etc/rc6.d

/etc/rc.d

/etc/rsyslog.d

/etc/rwtab.d

/etc/statetab.d

/etc/sudoers.d

/etc/sysctl.d

/etc/tmpfiles.d

/etc/xinetd.d

/etc/yum.repos.d

Someone can manually log onto the environment and drop additional configuration files into those directories that vastly effect what is run on the system (and when it's run in the case of cron.d for example).

"Idempotency" tools like Puppet and Ansible are very good at saying, "this file should exist in this directory with this MD5 hash", but not as good at saying "this directory shouldn't contain anything except these files".

Of course you can list all the files out that you consider to be valid and their signatures in the above directories, but that's going to break next time Redhat pushes an update that installs/removes files from those directories.

I guess you could setup an audit script that checks that all the files in those directories match the expected RPM signatures, and then account for any local customisations (additions, removals, changes etc). But you are starting to get into a lot of extra work there.

Point I am making, is that these tools are not as forcibly idempotent as a lot of people assume.

chousuke · on Nov 3, 2021

Of course; no tool is perfect. But in the general case, they're good enough, and they do help.

For example, I manage nodes with Puppet, and Puppet can and will "clean" things like sudoers.d, yum.repos.d, nginx.conf.d etc. of files that it does not manage.

I don't do this for every possible directory because so far configuration drift in those has not been a problem and generally whatever comes from the packages by default functions fine and crucially, the system can be rebuilt from scratch using the configuration that is managed, so the important bits are there.

I will simply start managing more directories as needed.

csdvrx · on Nov 2, 2021

A script comparing the md5 or the timestamp of the configuration files against the md5 or the timestamps of the log entry in charge of these files can do that

I mean, if /etc/hosts is more recent than /log-directory/03-static-hosts-in-etc or the md5 you have recorded for this file, a daemon can easily create a ticket / send an email to whoever was logged at the time of the change.