Uncategorized

Semantic Web compatibility

It should be possible to express hyperglossary in a way that is compatible with established Web standards, explicitly SKOS, RDFa-Lite and WebAnnotation.

The first part is easy, embedding the glossary itself in a HTML document.

Suppose a web page http://example.com/glossary/first-term represents a single term, it’s easy to make it into a SKOS concept, thus:

<html lang="en" typeof="skos:Concept">
<head>
</head>
<body>
<div>
    <h1 property="skos:prefLabel">My first glossary term</h1>
    <div property="skos:definition">
        <p>The definition of my first term.</p>
    </div>
    <p>This term specializes <a property="skos:broader" resource="second-term" href="second-term">another term</a></p>
</div>
</body>
</html>

The key attributes would be inserted in the WordPress template.

It is also possible (but not necessary) to have multiple terms in one page:

http://example.com/glossary could contain

<html lang="en" vocab="http://www.w3.org/2004/02/skos/core#" typeof="ConceptScheme">
<head>
  <base href="http://example.com/glossary/"></base>
</head>
<body>
<div>
    <h1>A glossary with a few entries</h1>
    <div resource="first-term" typeof="Concept">
        <h2 property="prefLabel">My first glossary term</h2>
        <p property="definition">The definition of my first term.</p>
        <span property="inScheme" resource="."/>
    </div>
    <div resource="second-term" typeof="Concept">
        <h2 property="prefLabel">My first glossary term</h2>
        <p property="definition">The definition of my second term.</p>
        <span property="broader" resource="first-term"></span>
        <span property="inScheme" resource="."/>
    </div>
</div>
</body>
</html>

(Here I used the vocab attribute to avoid repeating the skos: prefix.)

Note that the HTML tags are irrelevant. It could be
<a>, <dt>, <span>, whatever.

What matters are: a unique id given by resource attribute (it would be good practice to also use an id="..."), which may be relative to the page URL (or a full URL); , and the typeof and property attributes with those exact literal values.

(By email, I had sent the resource as #first-term instead of first-term, but here I’m supposing that they also exist as independent pages.)

The skos:broader gives thesaurus-like relationships between concepts. Other types of relationships between concepts would need including other ontologies, tbd.
I’m less sure about tagging, but there are a few ontologies for tagging such https://lov.linkeddata.es/dataset/lov/vocabs/tag

The more difficult part is how to say that a text fragment refers to that concept.
(The following examples could be on another page, or even be internal links in the glossary description.)
Many vocabularies (eg foaf) define the topic of a document, but the notion of text fragment is less common. Let’s use that of WebAnnotation.

So what I propose is as follows: The HTML fragment containing the reference to a concept would mostly need to have an ID attribute. It may or may not live in a HREF.

<p>text with a  <span id="target1">simple span</span> or even a <a id="target2" href="http://example.com/my-glossary#second-term">simple href</a></p>

We could mark the fact that this is a glossary reference with a class, as microformat does; but to make it visible to semantic web, we’d independently have a WebAnnotation stanza elsewhere in the same document:

<span resource="#target1_anno" vocab="http://www.w3.org/ns/oa#" typeof="Annotation">
    <span property="hasTarget" resource="#target1" typeof="SpecificResource">
        <span property="hasSource" resource="."></span>
        <span property="hasSelector" typeof="CssSelector">
            <span property="rdf:value" content="#target1" lang=""></span>
        </span>
    </span>
    <span property="motivatedBy" resource="identifying"></span>
    <span property="hasBody" resource="http://example.com/glossary/second-term"></span>
</span>

There are other ways to express that stanza, but this is a simple one, albeit verbose. Anywhere you see ‘target1’ is a placeholder, everything else would be constant.
Many libraries can extract information from there. dokie.li is a good example. There is more documentation I need to digest here.

It is also possible to propose such annotations from outside, by wrapping them in ActivityPub. I’ll develop that later.

Rejected alternatives:

  1. Put the annotation span in the text, which would be more conventional RDFa, but probably less legible HTML:
    <p>text with a  <span resource="#anno" vocab="http://www.w3.org/ns/oa#" typeof="Annotation"><span property="hasTarget" id="target1" resource="#target1" typeof="SpecificResource">not so simple annotation<span property="hasSource" resource="#"></span><span property="hasSelector" typeof="CssSelector"><span property="rdf:value" content="span#target1" lang=""></span></span></span><span property="motivatedBy" resource="identifying"></span><span property="hasBody" resource="http://example.com/my-glossary#second-term"></span></span></p>
  2. Use another, simpler vocabulary for the link between fragment and concept, while still using WebAnnotation for the fragment:
    <p>text with a <span vocab="http://www.w3.org/ns/oa#"  resource="#target1" typeof="SpecificResource"><a property="dc:subject" href="http://example.com/my-glossary#second-term">simple span</a></span></p>

BUT the SpecificResource is quite incomplete, and we don’t know that anyone can use this combination of vocabularies (or any similar one such as sioc:topic etc.)

12 Comments

  • Frode Hegland

    The use case I am looking at is all about anyone being able to create an entry with a normal wordpress interface. How would this look in the actual page? Can we simply use headings and body text to make it parseable for knowledge graphs to read?

        • skreutzer

          There could be a WordPress plugin developed that offers some fields and stores the data according to Gyuri’s proposal, and then a theme/page/filter that renders such a post to make it look like a normal post for a web visitor in the browser, if that’s what the two of you have in mind. If the “normal WordPress interface” refers to what comes with the standard installation, you’re in essence asking to do glossary according to your proposal and not according to Gyuri’s. There are some issues with yours, which include that it isn’t semantic. We could define that the WordPress post structure is supposed to be interpreted as a glossary entry, but then how would one distinguish from ordinary posts? I know that you use a WordPress category for it, which works for a single instance if people hard-code the ID of that category, but other WordPress instances will have other categories/IDs. Go by the name of the category? Fine, then explicitly write that into a spec. Still, it wouldn’t be better than microformats (much worse actually as long as it’s not established), which is why I have avoided this per default in my proposal and Gyuri’s proposal is specifically there to improve it. And then you have to do special tricks anyway to include multiple terms or references to other terms, which isn’t good technical design. So Gyuri’s proposal needs to be looked at in terms of what it does technically at the expense of user interface (for now), while your proposal aims at user interface at the expense of technical usefulness. I don’t see why one should be dismissed on no other basis than just another focus/perspective. The easy solution is for Gyuri to abandon his proposal (and then why write it in the first place?) or use his anyway by ignoring yours, but if you want something decent on the same (!) shared WordPress instance, I guess you really have to discuss those things instead of adding more proposals or picking a one-sided one. The real work in our context is to consolidate them.

          • Marc Antoine Parent

            Brief answers, because I agreed to postpone this.
            1. I’m not competing with Gyuri, I think we can make the HTML expression of hyperglossary standards-compliant, and I’m providing a template to do so. It is possible to do it within the work Gyuri is doing, though he does not see it as a priority, which I understand.
            2. I do not think that how this HTML expression is accomplished in WordPress needs to be standardized. It can be done using a category and templates, but there can be other ways. That’s an implementation concern.
            3. It’s not worse than microformats, precisely because I am proposing to use established standards.

          • skreutzer

            @Marc-Antoine Parent, 2018-11-22T13:34 (instead of a reply in lack of support of threaded conversations that can be more than 5 levels deep): What’s “HTML expression of hyperglossary” refering to, Frode’s, Gyuri’s or my proposal, or a fourth of yours? “Standards-compliant”, to what standards? One or several HTML specs, WordPress spec, SKOS/RDFa-Lite/WebAnnotation? I’m referring to “standards” as the well-known specs for open formats/protocols, but more importantly, in the jrnl context, there are 3 now that conflict, and either you need to decide/”standardize” on one or support multiple ones in clients/systems, as there’s no capability infrastructure planned that would take care of it.

            I think the crucial point is to have somebody who does the template, so the technical benefits of semantic encoding can be realized, while Frode’s demand to use a normal WordPress interface is taken care of as well. In terms of priority, I’m totally puzzled why glossary keeps coming up again and again: if we would have been serious for demonstrating a glossary capability for demo day, we would have resolved those questions a long time ago, but I think as each individual has his own presentation, conflicting formats need to be stored the same WordPress instance, which too only makes limited sense to me.

            2. Specifically the notion of using categories, do you imagine that there never will be any other jrnl instance than jrnl.global? Do you intend to install a globally unique glossary category on WordPress/plugin/template installation, that avoids conflicts with existing categories? Is it just a hack for now, for the demo, and later you’ll scrap it and make something better than relying on the category? Do you expect clients to have lists of which category name/ID it is for every jrnl instance out on the web? Do you alternatively think about a plugin that translates the glossary category to semantic output on external request? How do your plugins/themes know which one is the glossary category, if you don’t add it yourself and let the user manually create it as it is now?

            Can’t understand 3. because as indicated above, I’m not sure what proposal and standards you’re referring to.

Leave a Reply to Frode Hegland Cancel reply

Your email address will not be published. Required fields are marked *