PBB from website?

Back in Libronix PBBs were developed from HTML files. I figured out then how to convert relatively complex websites into PBB books. The books maintained the internal navigation hyperlinks, and even turned them into a TOC, so I could navigate as easily as the website, and the books had the additional Logos benefits.
My question is this: does it seem possible to somehow convert a website's HTML files to DOCX files in a way that would make a workable PBB? Adobe Acrobat Professional allows something similar. That is, you can build a navigable multi-page PDF document from a website. I'm not expecting something quite so automatic, but I am hoping there is a somewhat easy way to concatenate the site's HTML pages into a book such that the links between pages are navigable and also show up in the TOC. Is this pie in the sky?
Thanks,
Dudley
Comments
-
Dudley, You'll be pleased to know that it is indeed possible to save a webpage as a html file, open it in word, make any changes you might want to for aesthetics, and compile your book. I did exactly that with the forum page where you asked the question. All links were preserved. Sweet!
"I read dead people..."
0 -
I am not sure it works exactly as Dudley may want. For one thing, the images/icons etc. don't come through well or at all.
Second, while the links are preserved, they are links to the Internet site. So in a way, the context is broken: If you download a group of related web pages (as Dudley mentioned), the links on the Internet go between those pages. But once it comes into a document, the links will NOT go between the pages of the document, they will go to a URL on the Internet - which could be disastrous if an author has customized the page and want that to come up and not the Internet.
0 -
Agree Dominick, he'll have to rework links to point to sections of the book.... but it looks like he's already figured out how to do that in L3. Perhaps not, though.
"I read dead people..."
0 -
Dominick is right. So far I can get only partially what I'm hoping for. Software such as HTTrack (free, GNU license) will download a website to a local drive and build the directories recursively so that the links will point within the off-line downloaded site rather than back out to the internet. In L3, since it worked directly with the HTML files, those internal pointers were nicely carried over to the PBB.
What I can't figure out how to do, in any automatic fashion anyway, is to carry those links over from a set of HTML files to a set of DOCX files in such a way that they are maintained in the PBB in L4. I may be missing something simple.
Adobe Acrobat Professional allows you to convert a web page to PDF and then continue to append web pages from the site. You can end up with a multi-page PDF, and all the internal links work. Oddly, if I convert the multi-page PDF to word, links external to the document work, but the internal hyperlinks disappear. Again, I may be missing something simple.
0 -
Dudley C. Rose said:
Software such as HTTrack (free, GNU license) will download a website to a local drive and build the directories recursively so that the links will point within the off-line downloaded site rather than back out to the internet....
What I can't figure out how to do, in any automatic fashion anyway, is to carry those links over from a set of HTML files to a set of DOCX files in such a way that they are maintained in the PBB in L4. I may be missing something simple.
Dudley, I'm a long time user of HTTrack as well (great product and FREE!), and I can confirm that your findings are consistent with my own. Although links are converted to local store vs. internet, importing the HTML into Word preparatory to compiling a PBB will not convert the links from local store to (for instance) chapter headings within the PB.
You would literally have to import every local store file into the Word document and then reconstruct every link to point to the header for that section vs. either the internet or local store location. Messy. Labor intensive. Ewww.
"I read dead people..."
0 -
Thanks, Brother Mark. Maybe some coding wizard will figure out a way to do such a conversion. I could even live with the internal links surviving a conversion from HTML>PDF>DOCX
0 -