pointless

  • 0 Posts
  • 19 Comments
Joined 1 year ago
cake
Cake day: June 23rd, 2023

help-circle











  • Another vote for Tesseract – just to clarify the terminology, though: PDF is a fragile format best used read-only; so you really don’t want to edit a pdf, but make a new one using the same (or cleaned-up) bitmaps and a new ocr text layer.

    Now, tesseract is excellent at recognizing glyphs; but especially if the scanned image is a little fuzzy, the layout detection falters; and when it falters, you get redundant line breaks, & chunks of text in the wrong order – all of which gets incredibly annoying for searching & copying purposes. So if you can spare the time, and the text requires it, you may need to mark regions (paragraphs & titles mainly) on the bitmap image manually. There exist a few frontends to Tesseract that help with a task like that; check out, e.g., https://github.com/manisandro/gImageReader - inside single paragraph blocks of text, Tesseract doesn’t get as easily confused; and the text output is in the correct reading order, & w/o redundant breaks.






  • I mean, this is cringe AF.

    Kotlin ‘built by communism’? Because the founders of JB are Russian? Is that it?

    Swift is ‘greed’ how? It’s open source since 2015 or so; & available on Linux. Apple’s graphical toolkits are ‘closed down’; & obviously restrict users’ freedoms; though not sure how that implies ‘monopoly’. ‘Monopoly’ would be trying to dominate all toolkits, not have one’s own.

    Vague word associations are cool, I guess.