Is Making a PBB Difficult? (When the add-on is available)

Page 1 of 1 (10 items)
This post has 9 Replies | 0 Followers

Posts 1829
Rick | Forum Activity | Posted: Wed, Apr 21 2010 6:47 PM

I have been thinking about how neat it would be to be able to create our own books that are not available in Logos format but are available in public domain.

Does a person have to have any kind of special coding skills etc? I can see where it would be much easier to convert them from an already electronic file such as .pdf but what about hardback books? I am especially interested in the writings on one of my denominations early founders writings but it is about 450 pages long in hard back book. Would most of the time required be trying to scan it or would that only be the beginning of something I should ultimately avoid? I believe that I can also get the book in .pdf format.

One last question, if it looks as if it would be to difficult for me to do personally, are there any businesses or individuals who do it for a reasonable per page fee (assuming it is ethical and within the license agreement for the owner of the PBB builder)?

Thanks.

Posts 19315
Rosie Perera | Forum Activity | Replied: Wed, Apr 21 2010 8:48 PM

You can skim through the tutorial on how to do a PBB in the old Logos 3 method: http://www.logos.com/media/pbb/tutorial.html. I'm sure there will be something similar once PBB is implemented in Logos 4. But it's not much harder than formatting a Word document with various heading styles, and putting in a few special codes for links (which are well documented and not too hard to understand).

If you can get a book already in PDF format, I'd certainly do that to avoid the time spent scanning.

Posts 1547
Blair Laird | Forum Activity | Replied: Wed, Apr 21 2010 9:02 PM

Rosie Perera:
If you can get a book already in PDF format, I'd certainly do that to avoid the time spent scanning.

I am personally hoping Logos figures out how to make the scanned Pdf's searchable. I have all my public domain books in my google library for that very purpose. I dont know how they did it but they figured out how to make them searchable. I just dont like having to seperate search engines. They should be unvailing the Pbb soon, last I heard they did not have the scanned pdf book support. But we will see if they figured out how to incorporate the technology.

 

Posts 19315
Rosie Perera | Forum Activity | Replied: Wed, Apr 21 2010 9:14 PM

Blair Laird:

Rosie Perera:
If you can get a book already in PDF format, I'd certainly do that to avoid the time spent scanning.

I am personally hoping Logos figures out how to make the scanned Pdf's searchable. I have all my public domain books in my google library for that very purpose. I dont know how they did it but they figured out how to make them searchable. I just dont like having to seperate search engines. They should be unvailing the Pbb soon, last I heard they did not have the scanned pdf book support. But we will see if they figured out how to incorporate the technology.

Scanned PDFs can become searchable by running OCR software on them (optical character recognition). Most scanning software has that option built in nowadays. It's slower, and it makes the PDF file take up more space when it has the OCRed text in it as opposed to just the images, but it means that you can then select text and copy/paste it, as well as search it. As I've got Adobe Acrobat, I can create PDF files that are fully selectable and searchable from a Word document. There are cheaper alternatives for doing this too (e.g., print to PDF using BullZip PDF printer driver). If you have a scanned image PDF that has not been OCR'ed there are ways to do that too, and probably free utilities for doing it.

My guess is that Logos would prefer to have you convert your existing scanned OCRed PDFs to Logos PBBs (and presumably they will someday provide a tool to make that easier) than to provide native support in Logos for viewing/searching PDF files. It would be easier to write such a conversion tool than to write a full-blown Acrobat Reader knock-off. And the books would be able to be more fully integrated into your library that way, as they'd have real Logos indexing (if the book were a special translation of the Bible, for example; it could be indexed by Scripture reference and thus linked in with your commentaries), you could edit them to add hyperlinks, etc.

Posts 1547
Blair Laird | Forum Activity | Replied: Wed, Apr 21 2010 10:02 PM

I am not sure that google is using an ocr software "as we know them". Most of the time typical ocr software does a poor job at recognizing the text. Some books turn out just fine, so in reality it is hit and miss. (at least that has been my experience). If you want to download a book in epub format, you will see that they used the typical ocr software. I am new to ocr but from trying out multiple softwares they convert the pic to text. Google is leaving the text intact and hyperlinking the text also. You can go to the index of the book and jump to certain chapters.. I am not familiar with any ocr software that leaves the image intact yet makes it searchable and hyperlinkable.. But like I said I am pretty new at the ocr thing.

Blessings in Christ.

Posts 19315
Rosie Perera | Forum Activity | Replied: Wed, Apr 21 2010 10:41 PM

Blair Laird:

I am not familiar with any ocr software that leaves the image intact yet makes it searchable and hyperlinkable.

 

Adobe Acrobat (the full version, not just Acrobat Reader) can do it:

Settings...

Here's the job it did with an article about my brother in Haiti. Not perfect text recognition, but pretty good. I can select the text and only a few word chunks are left out:

5428.rick_bam.pdf

This is just like what happens on Google when some of the text in scanned books couldn't be recognized. I guarantee you they use some sort of OCR software, probably something proprietary that is better than what's commercially available for the rest of us poor sods.

Posts 1547
Blair Laird | Forum Activity | Replied: Wed, Apr 21 2010 11:11 PM

Rosie Perera:
This is just like what happens on Google when some of the text in scanned books couldn't be recognized. I guarantee you they use some sort of OCR software, probably something proprietary that is better than what's commercially available for the rest of us poor sods.

It has to be some special software.I have used adobe, I found that Nuances ocr software worked better for me. Using either software I was not aware of hyperlinking capability. Also they had problems when it came to different langages like hebrew greek aramiac latin etc. I agree google uses some sort of ocr software, but it is not like any ocr sofware that I am aware of.

Posts 19315
Rosie Perera | Forum Activity | Replied: Thu, Apr 22 2010 12:24 AM

Blair Laird:

Rosie Perera:
This is just like what happens on Google when some of the text in scanned books couldn't be recognized. I guarantee you they use some sort of OCR software, probably something proprietary that is better than what's commercially available for the rest of us poor sods.

It has to be some special software.I have used adobe, I found that Nuances ocr software worked better for me. Using either software I was not aware of hyperlinking capability.  

I'm not familiar with Nuance, but in Adobe, you do this using the "Link Tool" on the Advanced Editing toolbar or menu:

Blair Laird:

I agree google uses some sort of ocr software, but it is not like any ocr sofware that I am aware of.

And yes, I know it's some special software, not Adobe Acrobat (and probably not Nuance either) that they're using to do this. The behavior and look of their hyperlinks is quite different from the ones that Acrobat puts in, and they've got a higher quality of text recognition than at least Adobe does. I just meant that it is some sort of OCR technology which they are using, and whatever engine it is they're using, it's all rooted in the same idea of optically recognizing characters to convert an image to searchable text, being able to mark locations in it with bookmarks, adding hyperlinks to be able to jump to those bookmarks, etc. There's no other way to do it when you're starting with a printed book without having access to the original data files. You've got to do some optical character recognition, which is how Adobe and Nuance approach it. Google probably wrote their own higher quality one, rather than taking one "off the shelf" so to speak. So we're just dithering over nothing about whether it is "like" other OCR software. It is like in kind, but not in quality. And Google's is most likely not available to the public, to help in making PBBs, which is how this whole conversation got started. So you win, and I quit trying to have the last word. Wink

Posts 1367
JimTowler | Forum Activity | Replied: Thu, Apr 22 2010 5:30 AM

Rosie,

About 10 years ago I worked on a project for capturing paper forms. It used a rather large and fast Kodak sheet scanner.

The software was broken into seperate modules and using a work-flow system (multiple Unix machines).

It was able to run more than one server running OCR modules, and then feeding errors and correction needs to multiple human operators.

The human operators were given the error-text without context, and they only had to hit the correct letter or numeric key if it was simple, or they could get a small context, or the whole page if required (but slower).

My point - Someone as large as Google need not be limited by what we normally think of as an OCR package on a PC in the corner of the office where the scanner is installed.

EDIT: Which is really the point Rosie was making ...

Posts 19315
Rosie Perera | Forum Activity | Replied: Thu, Apr 22 2010 5:34 PM

Jim Towler:

My point - Someone as large as Google need not be limited by what we normally think of as an OCR package on a PC in the corner of the office where the scanner is installed.

EDIT: Which is really the point Rosie was making ...

Precisely. I was never saying Google was using a PC-based OCR package.

Page 1 of 1 (10 items) | RSS