Passage list

At some sites, if I try to add passages by URL, it finds nothing. If I copy to clipboard, it finds plenty of passages. Am I doing something wrong?

Here is a sample that for me finds no passages by URL, but finds 87 passages by clipboard:

http://www.vatican.va/holy_father/john_paul_ii/encyclicals/documents/hf_jp-ii_enc_17042003_ecclesia-de-eucharistia_en.html

Here is another that finds 79 from the clipboard but none by URL:

http://www.vatican.va/holy_father/john_paul_ii/encyclicals/documents/hf_jp-ii_enc_15101998_fides-et-ratio_en.html

Find more posts tagged with

Comments

Dave Hooton

I can reproduce that. Could the underlying HTML be protecting it?

NetworkGeek

https://community.logos.com/discussion/comment/176746#Comment_176746

Is there any comment on this, should I enter it as a bug or something?

Mark Barnes

https://community.logos.com/discussion/comment/177349#Comment_177349

I presume this is because of the formatting. All the book names are in italics, which means Mk 26:26 actually looks like Mk 26:26 when you download it. It's therefore not parsed correctly.

NetworkGeek

https://community.logos.com/discussion/comment/177355#Comment_177355

Interesting...but it's on the clipboard that way as well, if you paste in Word you can see the italics, if you paste in Notepad you do not - being text-based, it strips that out.

Would one think that if they are properly parsing it when they read it off the clipboard, that they could properly parse it when reading it via URL?

Mark Barnes

https://community.logos.com/discussion/comment/177360#Comment_177360

Would one think that if they are properly parsing it when they read it off the clipboard, that they could properly parse it when reading it via URL?

It doesn't work like that, but like this:

The browser parses the HTML and produces the text output.
When you copy the text from your browser, it places several versions onto the clipboard, particularly a formatted and an unformatted version.
When you paste the text into Logos (same for notepad), Logos requests the unformatted version as it has no need for the formatting.

So, it's the browser that is essentially stripping out the formatting, not Logos. It would be very hard for Logos to do (at least it would be very hard to come up with a system that always worked (HTML is a complicated beast), though getting something that worked perhaps 90% of the time would be much more achievable. I'm not sure it's worth it though.

NetworkGeek

https://community.logos.com/discussion/comment/177369#Comment_177369

Thanks Mark, for the analysis.

So what we are saying, is that if a web site does virtually any formatting of the bible notations, it won't be able to be recognized by Logos? That hardly seems like a very useful feature then. FYI I was a programmer for over 30 years, and stripping any HTML formatting out of the text before it's parsed is very trivial. That's why it seems like a half-baked feature to me, if it won't work a lot of the time given how they implemented it, why implement it at all?

I'll just use copy to clipboard from now on, and hope they didn't spend a lot of time implementing that feature [:^)]

Mark Barnes

https://community.logos.com/discussion/comment/177522#Comment_177522

So what we are saying, is that if a web site does virtually any formatting of the bible notations, it won't be able to be recognized by Logos?

No, I'm saying that if a website does any formatting within the notation it's a problem. So Matt 26:3 is a problem, but Matt 26:3 isn't.

FYI I was a programmer for over 30 years, and stripping any HTML formatting out of the text before it's parsed is very trivial.

Yes and no. I'm sure Logos are aware of whatever the .NET equivalent of KSES is. But just stripping HTML formatting does not solve the problem. Take this example:

<li>1. I went for a walk with Matt.</li>
<li>2. Then I got pizza.</li>

Strip out the tags and you end up with: 1. I went for a walk with Matt. 2. Then I got pizza.

So stripping tags in that scenario makes things worse, not better.

When I said that something that works 90% of time might be feasible, I had in mind a script that removed element level HTML, but not block level (so it kept the 's, <div>'s and <li>'s, but removed the 's, 's, 's, etc.). But it would still fail with references added by javascript, and might also introduce some unintended consequences.

NetworkGeek

https://community.logos.com/discussion/comment/177527#Comment_177527

In a past life I actually did a lot of work "screen scraping" scripture verses off of public web sites using .NET and C#. I used regular expressions (http://en.wikipedia.org/wiki/Regular_expression), a very powerful yet obscure/unreadable "language" that .NET and many other languages and platforms support. Using regexp I could get accurate scripture verses with pretty much anything embedded in or around it, or even if the verse notation went across multiple lines. It could even correct and reformat the abbreviations where a web site would use non-standard book abbreviations. The actual extraction/formatting code was probably under 20 lines of code, regexp is so powerful (did I mention obscure? <g>).

It's really not that hard, someone just has to believe it's important that this work correctly 100% of the time. I guess with all the features that need to go in, and all the more important bug fixes, some features are just going to be an 80% implementation first pass.

MJ. Smith

https://community.logos.com/discussion/comment/177543#Comment_177543

In this particular case the source code makes the problem evident:

Certainly it is a gift given for our sake, and indeed that of all
humanity (cf. Mt 26:28; Mk 14:24; Lk 22:20; Jn 10:15), yet it is first and foremost a gift to the Father: “asacrifice that the Father accepted, giving, in return for this total self-giving
by his Son, who 'became obedient unto death' (Phil 2:8),

I'd make an educated guess - based on more evidence than this - that the routine expects the string to all fall within a pair of tags rather than being split by an end tag. This should be a hard fix ... but perhaps at the cost of creating other problems? More alaysis needed.

Mark Barnes

https://community.logos.com/discussion/comment/177543#Comment_177543

I have dabbled in RegEx for a multi-lingual version of RefTagger I started to code (like most of my projects, still unfinished!). You should test your expression on the HTML in question, and if it works, send it to Bradley!