Hypertext

Resilient Citations

A fragile citation is a citation with a mutable source, that is, source text that can be modified or deleted after the citation is published. Referencing a conventional web URL (such as in the traditional “Retrieved at” format) would be a fragile citation since the webpage may be changed, the server may go down, the domain name may expire, etc.

A resilient citation, in contrast, is a citation with an immutable source, or permanent link to its source text.

Great strides are being made towards a permanent web with technologies such as IPFS, Arweave, and Swarm. Such a Web would effectively make all published resources resilient. These are early technologies however. The timeline for their development and scaling is unknown, and they may be supplanted by other technologies before reaching maturity. In the mean time, what is a reasonable solution to the fragile citation problem?

Instead of attempting to globally solve the problem of fragile citations for all documents, we can downgrade our requirement to a given document with citations. Thus the problem becomes, how can a citation preserve its source text as long as the citing document exists. This is more tractable, as the permanence of the source is conditional and isolated to only the citations used.

This may be achieved in two steps:

  1. Archive the source text with the citing document. This provides the conditional permanence that depends only on the preservation of the citing document.
  2. Prove the authenticity of the source text using TLSNotary. This effectively proves that the source was retrieved without tampering.

I suggest that the HTTP Range header would be a useful way to delimit the specific span of source text cited and reduce bandwidth requirements. Unfortunately this is not in widespread use nor enabled on platforms such as WordPress. In lieu of Range support at the source, a proxy server could be used to retrieve the full resource and then re-serve it with Range support, with the appropriate notarization to prove its authenticity.

By publishing source text with the citing document and including a TLSNotary proof of retrieval, resilient citations become possible using technologies readily available today. This forms the infrastructure for a deeply linked web of collective sensemaking with or without the use of Semantic Web or Linked Data mechanisms of machine-readability.

2 Comments

  • Peter Flynn

    Lack of resilience was always given by my Humanities academics as the reason why print books are still used as source references.

    I believe (must check with a librarian) that some university repositories require (or at least recommend) that theses, for example, are accompanied by a zip file of any cited sources that are not permanent (ie web documents).

    The use of Range would be nice, but servers would need to be able to support document-structure-specific chunking, as is sometimes done in serving Humanities documents from corpora. The TEI provides specific tagging so an editor of an edition can specify the structure used for citations (and thus retrieval too); in effect saying “in this document, refer to Chapter and Section using dot notation, eg 3.5 means chapter 3, section 5)”. In another document, it could be folio and line number, or stanza and line, etc. The Range header would need to understand units other than bytes.

Leave a Reply to Frode Hegland Cancel reply

Your email address will not be published. Required fields are marked *