This is why I'm super irritated Microsoft dropped EPUB support in Edge when switching to Chromium, and why it's so frustrating that EPUB support isn't common by default in operating systems: It's literally just HTML/CSS in an opinionated structure. EPUB should be as ubiquitously supported as PDF is today, no browser has an excuse for not supporting it.
One of the possible reasons Chromium is a bad for EPUB is its lack of MathML support. But then it's not like all EPUB readers have it, and it's not like MathJax can't be used for rendering.
It really was. And just the fact that it was default-installed was a main perk: I could rely on it being on, say, a work PC, where I can't install software for personal use.
It was the nicest one I've used on any platform. Now I read epubs on Apple books and I come away a little frustrated every time I use it (especially when I copy and paste)
You can make copy and paste work normally in Apple Books with the solutions on https://apple.stackexchange.com/q/137047 ‘Don't want iBooks to always paste the “Excerpt From” of what I have copied’.
Addons/extensions are a major security risk. For example,
Firefox addon: Download files and read and modify the browser’s download history, Access browser activity during navigation, Access your data for all web sites
Chrome Web Store obscures the full permission list, but the comments for that extension admits: right "read and change all your data on the websites you visit" needed
It's 2021, you should view all browser addons as a threat.
I think it was probably a combination of low usage (because a lot of people either use Kindle or a DRM-encumbered EPUB platform like Adobe Digital Editions), and the fact that Microsoft likely was more concerned with prioritizing porting over other more critical functionality like Active Directory integration into the Chrome codebase.
Presumably if Google decided to add EPUB support, Edge would get it back too, but Microsoft hasn't decided that feature is valuable enough to add onto their modifications.
I agree, I am just speculating to why Microsoft did what it did. :)
I also think it's particularly sad that Windows 10 had a perfectly good EPUB app, Reader, that Microsoft deprecated aggressively to force everyone to read eBooks in Edge... only to remove eBook support from Edge too.
They don't support MathML which also is a part of the format apparently. I think Google dropped it presumably not to have to worry about securing the code base.
One of the reasons they couldn't just port it over is that it used some trident-specific CSS layouts to display the book pages. I remember extracting the JS/CSS they used and being very confused.
> An ebook, unzip, is rarely bigger than 1Mo, which is lower than what most page are today.
Oh no, we can't have that. Here are some "beautiful, performant and lightweight" Electron ePUB readers/organizers: https://www.electronjs.org/apps?q=epub
Rather than complaining about the Electron apps, write better cross-platform apps that don't use Electron.
I know it's fashionable to dog on Electron, but if it didn't fill a legitimate need, people wouldn't use it.
It's easy to compare an Electron app, that actually exists, with some imaginary native app that doesn't.
It's not so easy to find the budget and personnel to actually build dedicated apps for minority platforms. The choice generally isn't between "bloated Electron app" and "sleek native app". It's between "bloated Electron app" and "no app at all".
In the case of eBook readers, there's plenty of native software that fills that role in every platform, there's hardly a need for a cross-platform Electron ePUB reader.
I suspect Electron frequently fills a need for the developers, instead of their users. It's easy to deploy, cross-platform and stable, I give you that, but the users pay for it with RAM, disk, CPU and energy.
A single user might not mean much, but multiply that by the millions of Electron programs installed, that's the scale of lost resources that pay for the advantages of Electron.
> here's hardly a need for a cross-platform Electron ePUB reader.
If there's "no need" for them, why are people using them? How come you get to decide what other people "need"?
> I suspect Electron frequently fills a need for the developers, instead of their users.
It fills the need of the users to have actual apps they can install and run, rather than imaginary ones.
> that's the scale of lost resources that pay for the advantages of Electron.
People don't hand-write programs in assembly language any more, either, even though that means that you can no longer write a word processor that runs in 12K of RAM.
I'm not saying all Electron programs are bad. VSCode, for example, is surprisingly good for many use cases, and being an IDE with many features, its resource usage is pretty justified.
I'm also not buying that people need software that only exists as Electron programs. Check the categories in https://www.electronjs.org/apps, there's even taskbar notifications and app launchers, do those merit running a dedicated browser?
It's not that users need "non-imaginary" software and Electron fills that need, it's that most users don't know about native and web frameworks, and they will install software as long as their computer can run it, even if better alternatives exist right now.
> In the case of eBook readers, there's plenty of native software that fills that role in every platform, there's hardly a need for a cross-platform Electron ePUB reader.
I see this sort of assertion pop up often, but it never is accompanied by specific verifiable examples of what those options are.
Can you point out a single example of said native software that fills that role in every platform? A single one.
I think you misunderstood my response. it's not that the same software works as an eBook reader on every platform, but that nearly every platform has native and high-quality eBook readers. Try and run your Electron application in a Symbian phone or an iPad, for example.
Compared with simple html+javascript running on a webview, Qt is relatively hard to maintain and develop, relies on source code generations and processors to work, prototyping tools are subpar and undermaintained, has no support for centralized theming, it's basic support for component-specific theming is already CSS shoved in a convoluted way, its model/view take is absurd and very poorly engineered, and yeah there's the fact that it forces you to write frontend code in C++. But wait, it's not even C++ because it requires code to be preprocessed to generate boilerplate code.
And let's not pretend that Qt's widgets successor is already a markup+javascript combo that takes the bad parts of javascript and bundles it with the bad parts of a custom markup language.
My assumption was that it was a branch of functionality that could be dropped to make any maintenance of classic Edge easier. I was really irritated when they dropped support as epub support for Windows has been minimal until recently.
Great. Instead of us choosing the simplest solution, i.e. native support in browsers that already have the HTML/CSS engine, we continue to build layers and layers of bloated abstractions, now with Javascript™.
Not the fault of the library developer - they’re just trying to help, but it’s like instead of fixing holes in the ship, we build pumps to dump the water out. Too many pumps on the ship and it gets bloated and can’t take any cargo. This is the current web in a nutshell. We need a ship captain that can guide us authoritatively.
I don't think epub is just HTML/CSS, there's something in the way it's being processed by readers.
I have tried a lot of browser extension based and standalone readers in the past and they all render things differently with different bugs. Something doesn't add up.
Structurally it's a renamed ZIP file containing a pre-defined collection of XHTML, XML, NCX, OPF, and HTML/CSS content files.
So it's not quite just HTML/CSS when packaged up, but it is just HTML/CSS when it comes to the actual text content.
Other than getting the various constituent files zipped up, with their interrelated contents synced, the only other oddity is that the renamed ZIP must always start with an uncompressed 'mimetype' file.
The IDPF maintain the standard. The easy one is v2 (http://idpf.org/epub/201) but that is now deprecated. Unfortunately v3 allows more interactivity and scripting - and we all know how bad the tech industry is at keeping that kind of stuff secure.
Now that you mention it I am pretty sure v3 is not entirely backward compatible with v2. Which could explain some oddities on older e-readers or out-of-date software.
It's XHTML to be precise, and that doesn't change no matter what processing is done by readers, it's still (just) XHTML. The format is literally described in the submission :)
> they all render things differently with different bugs
> > they all render things differently with different bugs
> Just like HTML did in the beginning.
ePub wasn't born yesterday. It's revision 3, first released ~2006/7. XHTML was proposed to correct and prevent the kind of problems html4 had because of how it organically grew, specifically relying on its XML root (no pun intended). Epub should definitely not suffer from bugs like HTML had in the pre-XHTML and pre-HTML5 era.
I sometimes have bugs like:
- whole book is black
- some pages can't be loaded/read so I have to skip them
- some toc and back link don't work like they should (probable bad markup)
There's also some readers oddities:
- completely inconsistent line-height
- aligned setting not working at all
etc.
Anyway, there's a reason XHTML2 didn't happen and we got HTML5 instead. Either ePub has some extensions that are not trivial to implement or most readers are buggy. Or both.
I've worked on a ePub parser and renderer and the issues you're describing sound pretty familiar.
The three main components of the ePub (aside from the actual pages) are the TOC, the spine and the manifest. The manifest basically tells you where everything is, the TOC is the table of contents which can link to various pages and the spine gives you the traversal order.
Some mistakes I've seen are using the TOC to traverse the book. Using the spine to traverse the book but not handling hidden pages properly. Not handling two page spread properly.
So yeah the spec is nuanced and it would be easy to make a reader that worked with a lot of books but then had weird issues on another set of books that aren't particularly different. We ended up writing our own parser because we kept finding issues with the main open source ones.
You can use 7z to decompress and view an epub contents. Typically : a table of content `toc.xhtml` file, some `chapterXX.xhtml` files, maybe a few css and images files. (I don't remember the archival format epub use, probably zip, but 7z will guess for you)
Crazy right? And then you realize you can publish a legitimate epub using a JAMStack, which means some of us may have turned our onboarding documentation into a book, preloaded it onto a cheap branded android tablet, and then sent it to our premium clients as marketing schwag!
Wait, so you could actually do all of that and then let it interact with APIs as well? When JS gets involved like this, I can see some crazy applications in my mind packaged as an ".epub" book.
I guess it depends on what reader you're targetting then. A quick cursory search shows that not all of them support JS. Makes sense to me.
I used to work for a large US-based publisher with a big presence in education. I worked on the ePub parser and renderer written in React. As a company we basically took the standard and ran with it. Each book could have its own interactive widgets where kids could do reading comprehension questions or math problems and the system would capture all this for the teacher to grade. We had closed captioned audio for a lot books that the ePub reader would co-ordinate and play. Last I heard they abandoned all that for a completely proprietary format though. It's been the only situation where I've gotten elbow deep into implementing a specification. It was interesting feeling out the nuances and finding the optional parts of the spec that actually end up being important because it was all planned to fit together.
EPUB3.0 spec includes Javascript. So probably much of existing readers support Javascript. IRRC, iBook on Mac also supports javascript, but it is activated after user clicking.
It's all at a very approachable level, too. I had to write an epub-maker as a necessary component for a project, and it turned out to be ~150 lines of python. You have to make a few indexes in XML and stick them into a zip file, basically.
I built it because I wanted to have something that made a daily brief news paper that was personalized and sent to my kindle. It makes an epub and uses kindlegen to convert it to a .mobi. There's a lot of fun epub formatting stuff you can do.
Here's the system that makes the daily newspaper, but it's been so long I'm not sure it's actually functional code outside of my production version:
Yes it's "just" HTML/CSS, but given the wide range of ePub reader capabilities, it's not like you can just take any web page and put it in an .epub. You have be conservative, and use only basic stuff. Also JavaScript is not supported by most ePub readers, so many of the modern web "dynamic" niceties are not available.
For example, rendering math on the web has been a solved problem for many years thanks to MathJax and KaTeX, but these require JS, so cannot be used in ePubs (unless you know the reader supports scripting).
Kobo's do. Apple iBooks does. PocketBooks Android app does (not sure about their readers though). Kindle's don't still. Which I feel is holding back it's use in books at least. It seems like with KFX they will at least convert MathML to images for publishers now? So hopefully that will lead publishers to use it.
Out of all my epub math textbooks I have exactly one done with MathML and it's fucking glorious compared to the dogshit image based ones most publishers put out with blurry images intended for the 800x600 readers of 2007 and not modern 300 dpi ones. I had one book where literally 2/3rds of the equations were just missing from the file and unviewable on any device. This started on PAGE 7. I then had to ask the publisher to fix it, which they sort of did by replacing with a PDF version.
I found this out last week when I bought a Kobo Forma and started converting all of my favourite Markdown documents to epub to stick on there. Calibre even lets you create a TOC by specifying the header regex (#, ##, etc for Markdown), it's great! had to edit a few manually to tweak layout and Calibre (https://calibre-ebook.com/) has a nice editor for epubs built in.
It actually goes deeper than that: Epub is basically a specific implementation of DocBook [0], which is itself a specific XML specification derived from the grand daddy of markup languages SGML.
Epubs are basically what "motherfuckingwebsite.com" advocates for.
Despite being HTML/CSS, the layouts aren't particularly interesting though. Most content reads from top to bottom, and the formatting is identical whether you read it on a phone or a tablet.
I was at a developer conference and one of the original Apple guys was there (name evades me at the moment). He mentioned that after they built webkit and wanted to move into the book space with the iPad launch that Steve Jobs wanted to reuse all of the webkit work. They did that and made epub.
It's pretty easy to write code that generates and displays EPUB2.
EPUB3 is a dog's breakfast -- it's hard to think of a better example of "second system effect". As far as I know, there's still not even one reference implementation that supports the full standard, even though it's been out for nearly 10 years. It gains you very little over EPUB2 for standard novels written in western scripts. EPUB3 is only needed if you require embedded scripting, support for non-alphabetical or bidirectional scripts, etc. I believe that most commercial "EPUB3" files still have an EPUB2 toc.ncx file and are designed to fall back to EPUB2 if the reader doesn't support EPUB3 (there are a lot of readers like this).
Something that's easy to overlook: "The mimetype file must be a text document in ASCII that contains the string application/epub+zip. It must also be uncompressed, unencrypted, and the first file in the ZIP archive".
All the other files in the ZIP can be compressed normally.
What this means in practice is that uncompressing an EPUB is easy (just rename it to .zip, if necessary, and run unzip), but recompressing it requires some care.
Assuming you've got your book's content in an OEBPS folder, and the container XML file in the META-INF folder, you can do it like this:
zip -X0 test.epub mimetype
zip -X9Dr test.epub META-INF OEBPS
Shoutout to some excellent software for ebook wrangling.
The first is the "Standard Ebooks"[1] toolset, which is a suite of Python scripts to create, process, and build ebooks in all common formats. The results on the Standard Ebooks site speak for themselves. They're impeccable in every way, and far better than many big name, commercially produced efforts.
You don't even need to load the books into calibre's database to view it. You can invoke 'ebook-viewer' from the command line directly with the epub file's path as the argument.
emacs can be a surprisingly comfortable text mode epub reader too, via nov.el¹ which has been discussed here². If you use a GUI emacs build you get inline images and other goodies, but starting emacs with -nw can be a reasonable solution for quickly checking a book in a term.
Note: You don't have to be a full-time emacs user to use nov.el.
I learned this a while back and used the Python web page scraping tool BeautifulSoup to take an eBook version of a cookbook and generate individual recipe files compatible with my favourite recipe manager, Paprika.
I recently published a book and going through the w3c epub specifications was a pain. Instead, I bought a book I wanted to read then reversed engineered it.
For small files you can use the w3c online validator, which will give you an overwhelming list of errors.
Note: The kindle does not support epub, instead it uses kpf. For that you have to download a 333 MB program to convert your epubs.
Note: it's simply the wiki page, but in all my years that I read .epub files I never bothered to check the wiki page. So it is to my surprise I found out that it's just some XML and HTML/CSS!
Just look for that PK in the first 2 bytes of the file and it is a good chance it is a zip file. That jar/war one has saved me a few times in figuring out what exactly the compiler did to a program.
Yeah, it is surprising at first, but after you think about it, maybe not so much.
If you need to cram a bunch of files into one package, zip is the obvious candidate. There are well-tested libraries and apps for dealing with zips for essentially every language and operating system.
To Downvoters: So there is something wrong with saying 'cool'? What is the problem this time? There is nothing malicious or 'offensive' than reacting to something by saying 'cool'. Come on.
I didn't downvote, but it's a low-effort comment that doesn't really add anything to the discussion. There is no information, no argument, no widening of context, no additional perspective, nothing. It also isn't an acknowledgement-type reply like "I see" ending a discussion. In other words, it decreases signal-to-noise ratio. You can compare it to "congrats" email chains at corps.
EDIT: It would be ok in a live discussion. But on a forum, not so much.
ePub is a open format but as the wiki page states "it is supported by almost all hardware readers, except for Kindle".
I can recommend Kobo as an e-ink e-reader that supports ePub with one caveat: Kobo requires you to sign-up for a Kobo account before you can even use the device - horrible. It's easy to search online to find a way to bypass this.
Although Kobo is an alternative to Kindle, you won't find the range of titles that Amazon sells. However, I think e-readers are best for text-only, small paperback-sized books. Anything else simply doesn't fit the small screen and is inferior to the physical version of a title. (Amazon sells a lot of Kindle titles that are simply unsuitable for small e-reader screens.)
One of the benefit of EPUB is that text can be reflowed, so the display size doesn't matter much, unlike PDF which sets a specific page size. I'm not sure about the MOBI format, but I assume it has similar features to EPUB?
At least it's possible to strip the DRM on Amazon books with the right set of tools, and Calibre is able to convert them to EPUB.
"One of the benefit of EPUB is that text can be reflowed, so the display size doesn't matter much"
I do feel that the e-ink reader screen size does matter because reflowed text only works well for small, paperback-sized books. Any book larger than this small size that also features tables, charts, images, diagrams, code listings and more, will not display well on a small e-ink screen.
Had an amazon Kindle but it was practically useless for me due to how locked down it was. If I wanted to add something esoteric I had to mail it to amazon for whatever reason. I don't mind paying for book, but I would hate to have amazon decide what books I read.
People sometimes want to preserve whole websites that they can then access and use in their personal library. (Or at least, back before the cloud and streaming got a lot of people off of developing and maintaining their own personal library).
I am thinking of a scenario where, if there is a collapse (societal, economic, political, or technological), how can knowledge be disseminated and preserved in a resilient way?
What's going on with the link "wikipedia.org/wiki/EPUB#:~:text=EPUB%20is%20an%20e%2Dbook,smartphones%2C%20tablets%2C%20and%20computers."? There doesn't seem to be any ID in the page with that fragment.
Does EPUB support JavaScript as well? And if it doesn't, are there any similar alternatives? Seems like a single file document that can also pull in data from somewhere could be pretty useful to say the least.
Books should be immutable much like a true real website. Anyone using javascript in a book should not be writing a book. If you're writing javascript then go write an app.
I can see some useful cases. For example, in a computer science book, you could update a caption space that gets its data from the web. This would allow you to display an "obsolete sample code" warning below the examples. When the user is not connected to the internet, you could display "Get online to know code snippet status". And so on.
> in a computer science book, you could update a caption space that gets its data from the web.
First, there's opportunity for that web endpoint to stop functioning. Second, there's opportunity for that web endpoint to become taken over by malice. And third, there's opportunity to turn that caption space into an advertisement.
So, to put it succinctly: fuck no.
> This would allow you to display an "obsolete sample code" warning below the examples.
So now the book isn't timeless. It changes. It's no longer a book.
A better idea: include the "obsolete sample code" warning in the book and ask the user check for the latest practices at a URL also included in the book.
> When the user is not connected to the internet, you could display "Get online to know code snippet status". And so on.
When the user is not connected to the internet should be the only case ever considered for a book. Otherwise you're not writing a book. You're writing an app.
Also I wouldn't want my reading habits to be tracked by anyone, or at least for it to be minimized. Lets say that I want to read the communist manifesto then that shouldn't go into the hands of people creating targeted political propaganda campaigns or whatever people keeping track of credit score.
You know how online newspaper articles have ads appearing in between paragraphs? That is precisely how JS would be implemented in epubs.
Sure, we all have pleasant visions of truly interactive ebooks driven by creatively built JS content. But in the real world, ads would be the first thing to be added if JS was supported.
It is. But you can't really run an HTML file locally without at least setting up some type of server (if you plan to make requests). And I know setting up a server is extremely easy, but it's almost impossible for someone who hasn't programmed before.
The point you’re missing is that if you just change the file extension of a ePub to html it will work fine in the browser. There’s nothing special going on. Just bec most .html files are served from a remote location that doesn’t mean they need to be, you can send someone a .html file as a download which they can open from their desktop, it will work just as well as an ePub file, in fact bec the browser recognizes the file ext it will likely run it better!
> if you just change the file extension of a ePub to html it will work fine in the browser
If that's the case, you didn't have a genuine EPUB to start with. To meet the spec it needs to be in a container (a renamed ZIP file) and have a handful of related metadata and navigation files alongside it.
That said, the actual text of the book is done by HTML/CSS, but within the EPUB container file.
I don't understand, by definition, where is that remote data going to come from, if not another computer ?
Also, peer to peer software has made it easier.
I could agree on gifs or animated pngs or something, but AV1 I think is pushing it a bit to far. What you are looking for if you want an offline webpage is electron right?
And if I wanted scripts in my document¤, then instead of packaging the whole browser (that would be overkill!!) with the document like Electron does (?), I would rather use MHTML.
¤ The following was supposed to be my example, but since this website uses Flash, these scripts are no longer easily ran. But since they are described, you should have a good idea what they are going for :
> MHO EPUB should specifically be script-free.
Agreed.
Yeah maybe I am just knee jerking, with AV1. Broadly speaking however there would be two schools of image and video compression, one for computer generated images and one for images of natural concepts. So AV1 isn't really suited for animated graphs and the like at the end of the day.
On second thought maybe allowing in AV1 would open the door for sound, which really would take us to far away from the book format.
I do definitely see the need for having animations such as the ones you link in your post however. So many things that just takes sentence and sentence to describe can still be described way more efficiently with an animation.
Hmm, I should really look into vector graphics one of these days.
Yeah, animated SVGs would be even better for this use case - but you need raster graphics support too for other use cases (some animations might require the inclusion of photographs).
I don't see how sound support is an issue - no more than color and high frame rate support are issues for a format that might also end up displayed on grayscale only displays incapable of high framerates. It's up to the creator of the media to take these into account (or not).
Yes epubs aren't books. I think you really nailed it there. E-ink display devices would have a way lower refresh rates from what we are used to and are as such probably really bad at or unable to play video.
When I checked, years ago, it seemed that .mobi included an embedded EPUB file. The books I checked also had it in the old format, presumably for compatibility, but I think I remember seeing something that this was not recommended as the only form.