Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think the counter argument here is that you’re now including a CSV decoder in every CSV data file now. At the data sizes we’re talking, this is negligible overhead, but it seems overly complicated to me. Almost like it’s trying too hard to be clever.

How many different storage format implementations will there realistically be?



> How many different storage format implementations will there realistically be?

Apparently an infinite number, if we go with the approach in the paper /s


It does open up the possibility for specialized compressors for the data in the file, which might be interesting for archiving where improved compression ratio is worth a lot.


That makes sense. I think fundamentally you’re trading off space between the compressed data and the lookup tables stored in your decompression code. I can see that amortizing well if the compressed payloads are large or if there are a lot of payloads with the same distribution of sequences though.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: