Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I noticed on the Arabic example they lost a space after the first letter on the third to last line, can any native speakers confirm? (I only know enough Arabic to ask dumb questions like this, curious to learn more.)

Edit: it looks like they also added a vowel mark not present in the input on the line immediately after.

Edit2: here's a picture of what I'm talking about, the before/after: https://ibb.co/v6xcPMHv



Arabic speaker here. No, it's perfect.


I am pretty sure it added a kasrah not present in the input on the 2nd to last line. (Not saying it's not super impressive, and also that almost certainly is the right word, but I think that still means not quite "perfect"?)


Yes, it looks like it did add a kasrah to the word ظهري


Yep, and فمِنا too, this is not just OCR, it made some post-processing corrections or "enhancements". That could be good, but it could also be trouble the 1% chance it makes a mistake in critical documents.


He means the space between the wāw (و) and the word


I added a pic to the original comment, sorry for not being clear!


And here I thought after reading the headline: finally a reliable Arabic OCR. I've never in my life found a good that does the job decently especially for a scanned document. Or is there something out there I don't know about?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: