PBB Request: Al-Bukhari Hadith

Page 1 of 1 (8 items)
This post has 7 Replies | 1 Follower

Posts 544
Armin | Forum Activity | Posted: Wed, Mar 21 2012 9:24 AM

I am having trouble converting a PDF into docx format to create a PBB. I tried it  but was not able to get it into a proper format or the tool crashed or claimed that it is out of memory. Here is the file:

http://d1.islamhouse.com/data/en/ih_books/single/en_Sahih_Al-Bukhari.pdf

Is anyone willing and able to do this?

Thanks!

Armin

Posts 23679
Forum MVP
MJ. Smith | Forum Activity | Replied: Wed, Mar 21 2012 4:06 PM

What software did you use to extract the text from the PDF format? I would consider helping but fear the headers and footers would make for a very tedious conversion task.

Orthodox Bishop Hilarion Alfeyev: "To be a theologian means to have experience of a personal encounter with God through prayer and worship."

Posts 544
Armin | Forum Activity | Replied: Wed, Mar 21 2012 9:42 PM

Adobe Acrobat 9 Pro either stated that it is out of memory or it crashed. Calibre did a conversion but it is very messy: the formatting is pretty bad and all "ll" have been replaced with "l " (i.e., l+space).

Armin

Posts 23679
Forum MVP
MJ. Smith | Forum Activity | Replied: Thu, Mar 22 2012 1:00 PM

Armin:
Adobe Acrobat 9 Pro either stated that it is out of memory or it crashed.

What I'd suggest is that you break it into several pieces - small enough to not create the out of memory problem. To create the PB itself you can use multiple input files which will put all of it together into one book again.

Orthodox Bishop Hilarion Alfeyev: "To be a theologian means to have experience of a personal encounter with God through prayer and worship."

Posts 1334
Robert M. Warren | Forum Activity | Replied: Sat, Mar 24 2012 4:11 PM

Armin:

Try this. I'm not extremely fastidious when i make PBs, but maybe this will be useful to you until a more complete rendition is available. Basically, it has Word Header 1 for Volumes and Header 2 for Books. (The tool I use to convert the PDFs does a good job at preserving formatting and I used Word's Find & Replace with Styles, so it didn't take very long to produce.) So, Logos will have those levels in the Contents pane and the Book name will be referenced when the resource is in a Search hit.

I tried it out and it compiled without any errors.

Pleasant Lord's Day tomorrow!

7608.SAHIH BUKHARI.docx

 

Win7 Android 4.4.2

Posts 544
Armin | Forum Activity | Replied: Sat, Mar 24 2012 9:55 PM

Hi Robert,

Thank you so much. You did an amazing job. I tried MJ. Smith's suggestion of splitting the file. It helped prevent the crashing. But the format was still a real mess and it would have been impossible for me to find the time to manually fix the formatting of 1,700 pages.

Thank you again!

Armin

Posts 1334
Robert M. Warren | Forum Activity | Replied: Sun, Mar 25 2012 4:47 AM

Armin:
Thank you again!

You're welcome.

In case it will be useful to you, this is what I use to convert PDFs::

 

  1. I use the free MobiPocket Creator to import the PDF. MobiPocket can create .prc files for Kindle, but for this use I import only, then I'm done with MobiPocket
  2. One of the intermediate steps MobiPocket does is create an HTML of the PDF contents. i go to the folder where this is, which on my Win 7 computer is My Documents\My Publications\name of PDF.
  3. The HTML file is there with the same name as the PDF.
  4. I open the HTML file in a browser, do Ctrl+A to select all and Ctrl+C to copy.
  5. I paste it into Word.

 

In many cases, the formatting is such that the TOC heading styles are already in place. Also, BCV citations are typically in some unique font-attribute combination, which can facilitate using Find and Replace to make Bible Milestones. I don't do any academic or professional work with Logos (just a plain ol' Christian reading), so I don't sweat the page numbers or other things required for academic citation.

I've tried some of the free PDF converters, but my experience with them is that they turn all lines into paragraphs, which I think will hose the whole thing. I've also tried Calibre with mixed results; i think my bad results from Calibre are from a lack of experience using it and tweaking the process.

In this particular case, I explicitly applied the Word styles because I thought you might also find the Word document useful for other purposes and I tried to replicate the header/footer with volume/book, which required that Word see explicit styles applied. But the neat thing about the above technique is that often you don't even have to explicitly apply the styles; Logos will generally recognize them for TOC purposes if Word uses them in a Word TOC. So, a quick check to see how Logos might use the headings would be to generate a Word TOC.

Incidentally, I do use Calibre, but I use it to take a Mobi/Kindle format into PDF and then use the above technique. It's like going fo Miami by way of China, but it generally works. Each individual can wrestle wrestle with the ethics of format-shifting, but I always use source files that I have paid for already or are in the public domain.

Hope that's useful.

Win7 Android 4.4.2

Posts 544
Armin | Forum Activity | Replied: Wed, Mar 28 2012 6:07 AM

Thanks so much, Robert, for this excellent step-by-step guide. It will be useful for my next PBB.

Armin

Page 1 of 1 (8 items) | RSS
Copyright 1992-2015 Faithlife / Logos Bible Software.