This looks amazing. I'd like to try it. I have scanned and OCRed entire print-based books to make PBs out of them, but this looks like it would cut the time required down by a factor of 20 or 30.
http://www.youtube.com/watch?v=6LY3y3eyVL8
http://booksorber.com
That is very cool indeed.
I think you're overstating the case. You'd still need good OCR software to get a PB from that. Otherwise it's just an image-based PDF.
I'm not convinced it's even a good idea. A cheap bookscanner will always get better results than a warped photo, no matter how good the software.
If you're looking to tidy up book scans (and perhaps even photos), the OpenSource ScanTailor is excellent. I don't have a badly scanned document to show you its full capabilites, but here's a quick example. ScanTailor will split pages, rotate, straighten, centre, dewarp, despeckle, sort out the brightness/contact and tidy up the margins, all in batch.
I have good OCR software, both Adobe Acrobat Pro and ABBYY FineReader Pro.
I'm not convinced it's even a good idea. A cheap bookscanner will also get better results than a warped photo, no matter how good the software.
If you're looking to tidy up book scans (and perhaps even photos), the OpenSource ScanTailor is excellent.
I'm not looking to tidy up book scans, I'm looking to make the process faster. I already have a flatbed scanner and it does high resolution (600dpi) images, but it takes nearly 1 minute per page, when you count the 30 seconds or so for the scan and then opening the top, turning the page, positioning the book back on the glass, closing the top and pressing the cover down and holding it to make the book as flat as possible. It looks from that video like you could turn the pages and photograph them at about 2 to 3 seconds per page. Maybe that's a bit optimistic, but a factor of 10 speedup does not sound out of the realm of possibility and even that would be a huge time savings for me.
I already have a very high resolution camera, a tripod, and a good light source, so I wouldn't have to buy any extra equipment. The software is free to try, so it seems totally worth giving it a spin. And if it works well enough, it's only € 29.00 to buy, which would pay for itself in just a few PBs.
It remains to be seen how well the software straightens warped photos, but the demo video looked like it does a pretty good job. And the sample PDF produced by Booksorber is pretty good. Adobe Acrobat had no problem converting it (except the mathematical formulas, but I'm not going to have those in PBs I do for Logos). Acrobat and FineReader can handle slight warps pretty well too.
ScanTailor will split pages, rotate, straighten, centre, dewarp, despeckle, sort out the brightness/contact and tidy up the margins, all in batch.
It looks like that's exactly what Booksorber does too. But it also appears to control your camera remotely to take the photos in sequence quickly. I'd need to learn more about how it does that; what kind of cameras it's compatible with, and how does it know when you're ready with the page turned? Or do you have to keep up a regular pace?
The only case I was stating was "this looks cool" and "I'd like to try it." I don't think I was overstating, except maybe I exaggerated how much faster it might make my workflow.
Anyway, thanks for the tip about ScanTailor.
He had a remote for his canon dslr camera in one hand. He clicks the shutter button every time he turns the page. Which is a good idea. I've scanned photos in bulk using this method - its way better than a flatbed scanner, and with my dslr the quality & resolution are still quite good as well.
. I already have a flatbed scanner and it does high resolution (600dpi) images, but it takes nearly 1 minute per page, when you count the 30 seconds or so for the scan and then opening the top, turning the page, positioning the book back on the glass, closing the top and pressing the cover down and holding it to make the book as flat as possible.
Perhaps you just need a better book scanner. I can do six pages per minute with mine.
If you give Booksorber a try, I'd love to hear how you get on. You might find this interesting: http://dcoetzee.tumblr.com/post/52197794319/book-scanning-and-my-experience-with-booksorber
I was looking at fully-automated diy book scanners on youtube. Some were ingenuous and could scan 100 pages a minute. hour.
HI Rosie,
I've done some scanning and OCR and have found that the crucial issue is as Mark Barnes says, not the scanning/photo process, but the OCR conversion from image to text. That is, if a text-based output is what is required instead of an image based pdf (if very large file sizes is not an issue).
On one project I am still working on, it would have been far to expensive or labour-intensive to use professional book scanning. So, having by chance come across this company http://www.blueleaf-book-scanning.com/ I sent my 600+ page book for scanning and was very happy with the result. The cheaper scanning option basically uses a guillotine to cut the spine off, and with then discarding the covers, what remains are loose-leaf pages which can be fed quickly through a bulk-fed scanner (like a photocopier in reverse). Since this involves flat-page scanning, there is no correction needed to page shape. Of course, that is the 'destructive' option, but there is a more expensive 'non-destructive' option which is basically traditional book scanning.
It still needed OCR proofreading to ensure that the text is 100% correct for the text version. In terms of characters needing correction, I would estimate the accuracy rate to be much higher than 99% from the high contrast, relatively medium/large font size, original book that I supplied for scanning (less optimised books can result is much lower accuracy rates). Yet, with an average of 1,500 characters per page across 600+ pages (as in the book I supplied), even a 99.5% accuracy rate would result in 7.5 corrections per page, totallying over 4,500 corrections in the book. After basic OCR corrections, I've got some other people working on parts of the text to correct and proofread it. This is taking quite a bit of time! The scanning stage is not the main issue in a project like this, but the OCR correction/proofreading is instead.
Now, if there was already a digital text-based version of the publication available to purchase, I would have had no hesitation to do so, as the overall project costs are staggeringly high to go down the scan/OCR route instead of purchase a readily available digital version instead.
To convert a home or ministry bookshelf using this project would be too costly and labour intensive. But for the occasional project it can speed things up greatly and provide exceptionally high quality results. But at least the scanning process was much quicker and cheaper than it might otherwise have been for a ministry-based project.
As Mark correctly points out the text would not be searchable in Logos format. Nor could you copy the text to paste into a sermon or document. You would only have photographs of each page. Not a very good solution for Logos. In fact, there would be no point in having the output in Logos. You could read in better in a PDF reader.
Hey folks, I've done this before a number of times, and I know what I'm doing. The text is searchable in Logos when I scan and OCR a book. I have professional OCR software. I know how time-consuming it is to proofread and fix the OCR errors, which are substantial even with high quality software. But that is the part that I actually enjoy and am quite efficient at. I am not interested in cutting the spines off and ruining my paper books. Nor am I interested in doing this for a massive number of books in my library, just a few that are out of print and not available as digital texts anywhere. I wasn't looking for advice or help, just pointing out something I thought was cool.
So, thanks everyone for the advice/discouragement. In honour of Milford I will try not to focus on how unfun it is to have people dump cold water on one's enthusiasm in a "hey, this is cool!" post. I know you all meant well.
Hi Rosie, sorry - I didn't intend to rain on your parade.
I thank you Rosie for the post. I will try to find the time to scan some of my books as well. [:D]
My apologies. I am sure that you know that the text would have to be run through an very good OCR, then carefully edited before it was searchable in Logos. But I am equally sure that many reading this probably did not know that.
I certainly did not intend to be rude.
Speaking of a very good OCR, that is affordable - Omnipage 18 is currently available for less that $49 from a source with a very good reputation. I picked it up today, and it is a great help in making Logos Personal books, no matter how you scan the book. Go search for it like an Amazon warrior.
Just wondering what the copyright situation would be in digitizing books that one owns? I have researched online and it seems to be a divided topic of discussion. Some compare to digitizing copies of CD's to use on iPods and other devices. Others say it falls outside of the Fair Use clause.
Does anyone else have any more info? I have some some old college textbooks I would like to have digitized...
Just posting so that this will show in my conversations!