Ask HN: A software to track errors and group into root causes?

sltr · 2025-12-16T16:01:07 1765900867

To help draw out some concrete software requirements, can you expand on what a "good way to manage the failures" would be and how what you've tried "does not work very well"?

- How many failures per day do you experience?

- How long do you need to retain records?

- What's the lifecycle of a failure? How does it get recorded, who investigates/triages, who responds?

- What are your custom scripts doing? Importing/exporting tickets?

- SSO, audit, compliance needs?

- You said JIRA was slow and limited. Is that just the UI or is creating/managing tickets too cumbersome? (Or yes to both X-D)

- What specifically broke with Google Sheets? Not enough rows/too slow?

You may well be looking at custom software since a lot of the apps that come to mind are either focused on aggregation (Datadog, etc) or high-touch tickets e.g. product development or customer support.

theamk · 2025-12-16T19:22:26 1765912946

The process is pretty informal, so there are no hard requirements. that said:

- Failures per day: let's say 0-100

- How long to retain records: no hard requirements? I guess a few months at least, some failures are pretty rare

- What's the lifecycle of a failure? Scripts record it, team members investigate it and assign to "root cause".

- Custom scripts:

(1) create ticket per failure

(2) create failure reports (to prioritize work - for example if there were 50 failure reports with root cause of 'github was down' , the priority of "set up github mirror" will get bumped up)

(3) mass-update tickets (for example if github.com is down, there will be few dozens of failed processes because of that)

(4) handle rules for automatic classification (again, if github.com is down, it'd be lovely if I can have a rule: "for the next 48 hours, every ticket which mentions github.com and 503 is auto-assigned to 'github was down' root cause")

- SSO, audit, compliance: nice but not required

- JIRA problems: search sucks. "Find similar ticket" sucks. Rules are missing (or need admin). Even something as simple as "close those 20 tickets and link them all to ABC-1234" is impossible.

- Google sheets: not enough automation. At least I can do "filter rows, copy-paste the 'root cause' field into all of them", and it is pretty fast, but: multi-line outputs don't look good and there are no automation (we did not explore App Script, maybe we should have...)

And yeah, I am getting the feeling this would be a custom job. We have resources in house to do so, but I was hoping there was an existing product. Surely there are people out there who run batch-like jobs and want them to be reliable? Something like data conversion jobs, CI builds, training jobs, etc...

Perhaps it's a good thing for generative AI, I've heard it's pretty good at making websites (and security/availability is not an issue, as this will be internal website not exposed to internet). Or I may revisit Google's App Script...

sltr · 2025-12-16T21:56:39 1765922199

Thanks for your reply. I suggest looking at Airtable and _maybe_ Linear. They have API and automations. You could likely get AI to rewrite your scripts.

If those don't work, you may have a business case for building it.

I'm a founder and dev looking to for a good problem to solve. If the need could be proven (e.g. 10 people with decision power said they wanted it), I'd consider making it.

rlupi · 2025-12-16T08:50:21 1765875021

You're asking for a database app. What prevents you from building one?

theamk · 2025-12-16T18:57:33 1765911453

Nothing, and that's the route we'll likely end up taking.

It is just that we have money in the budget for those kinds of things, and there is existing product, we'd rather support their creators instead.