PBB Request: Al-Bukhari Hadith
I am having trouble converting a PDF into docx format to create a PBB. I tried it but was not able to get it into a proper format or the tool crashed or claimed that it is out of memory. Here is the file:
http://d1.islamhouse.com/data/en/ih_books/single/en_Sahih_Al-Bukhari.pdf
Is anyone willing and able to do this?
Thanks!
Armin
Comments
-
What software did you use to extract the text from the PDF format? I would consider helping but fear the headers and footers would make for a very tedious conversion task.
Orthodox Bishop Alfeyev: "To be a theologian means to have experience of a personal encounter with God through prayer and worship."; Orthodox proverb: "We know where the Church is, we do not know where it is not."
0 -
Adobe Acrobat 9 Pro either stated that it is out of memory or it crashed. Calibre did a conversion but it is very messy: the formatting is pretty bad and all "ll" have been replaced with "l " (i.e., l+space).
Armin
0 -
Armin said:
Adobe Acrobat 9 Pro either stated that it is out of memory or it crashed.
What I'd suggest is that you break it into several pieces - small enough to not create the out of memory problem. To create the PB itself you can use multiple input files which will put all of it together into one book again.
Orthodox Bishop Alfeyev: "To be a theologian means to have experience of a personal encounter with God through prayer and worship."; Orthodox proverb: "We know where the Church is, we do not know where it is not."
0 -
Armin:
Try this. I'm not extremely fastidious when i make PBs, but maybe this will be useful to you until a more complete rendition is available. Basically, it has Word Header 1 for Volumes and Header 2 for Books. (The tool I use to convert the PDFs does a good job at preserving formatting and I used Word's Find & Replace with Styles, so it didn't take very long to produce.) So, Logos will have those levels in the Contents pane and the Book name will be referenced when the resource is in a Search hit.
I tried it out and it compiled without any errors.
Pleasant Lord's Day tomorrow!
macOS (Logos Pro - Beta) | Android 13 (Logos Stable)
0 -
Hi Robert,
Thank you so much. You did an amazing job. I tried MJ. Smith's suggestion of splitting the file. It helped prevent the crashing. But the format was still a real mess and it would have been impossible for me to find the time to manually fix the formatting of 1,700 pages.
Thank you again!
Armin
0 -
Armin said:
Thank you again!
You're welcome.
In case it will be useful to you, this is what I use to convert PDFs::
- I use the free MobiPocket Creator to import the PDF. MobiPocket can create .prc files for Kindle, but for this use I import only, then I'm done with MobiPocket
- One of the intermediate steps MobiPocket does is create an HTML of the PDF contents. i go to the folder where this is, which on my Win 7 computer is My Documents\My Publications\name of PDF.
- The HTML file is there with the same name as the PDF.
- I open the HTML file in a browser, do Ctrl+A to select all and Ctrl+C to copy.
- I paste it into Word.
In many cases, the formatting is such that the TOC heading styles are already in place. Also, BCV citations are typically in some unique font-attribute combination, which can facilitate using Find and Replace to make Bible Milestones. I don't do any academic or professional work with Logos (just a plain ol' Christian reading), so I don't sweat the page numbers or other things required for academic citation.
I've tried some of the free PDF converters, but my experience with them is that they turn all lines into paragraphs, which I think will hose the whole thing. I've also tried Calibre with mixed results; i think my bad results from Calibre are from a lack of experience using it and tweaking the process.
In this particular case, I explicitly applied the Word styles because I thought you might also find the Word document useful for other purposes and I tried to replicate the header/footer with volume/book, which required that Word see explicit styles applied. But the neat thing about the above technique is that often you don't even have to explicitly apply the styles; Logos will generally recognize them for TOC purposes if Word uses them in a Word TOC. So, a quick check to see how Logos might use the headings would be to generate a Word TOC.
Incidentally, I do use Calibre, but I use it to take a Mobi/Kindle format into PDF and then use the above technique. It's like going fo Miami by way of China, but it generally works. Each individual can wrestle wrestle with the ethics of format-shifting, but I always use source files that I have paid for already or are in the public domain.
Hope that's useful.
macOS (Logos Pro - Beta) | Android 13 (Logos Stable)
0 -
Thanks so much, Robert, for this excellent step-by-step guide. It will be useful for my next PBB.
Armin
0 -
Interesting! Thanks for the software tip. I typically use Abbyy fine reader pro for Mac and a windows parallel for ABBYY enterprise. It is excellent at extracting text and OCR, but I’m not sure that it’s as good at doing a TOC. Thanks for the tip
0