[1] An ever growing body of academic publications is available online, ideally open access. Most publications today are prepared electronically from the start, and more and more older publications are retro-digitized. And, by their very nature, academic publications are an interconnected network of knowledge. Yet, academia-online does not fully exploit the hypertextuality of the web. It is exceedingly tedious to follow citations to other works, and references to pages or subsections have to be reached by manually scrolling through page-based (PDF) documents. In contrast, by leveraging Open Web technologies it is straightforwardly feasible to prepare academic publications in a truly interconnected way.
[2] The Open Web Platform is a concept advanced by the World Wide Web Consortium (W3C). It is the collection of open and royalty-free technologies which enables the Web. Basically, this means HTML content with CSS/Javascript layout and various additional technologies like MathML and SVG. With (ever ongoing) advances in HTML/CSS and browser technology, the quality and possibilities of layout are approaching the sophistication of Latex, while additionally being adaptive/responsive. Precise cross-referencing is possible through hyperlinking with hash-formatted URI Fragments. Annotations can be added by using the Web Annotation Standard of the W3C, for example implemented by Hypothesis. Also note that HTML/CSS is much better suited to accessibility. In contrast, PDF is often unusable for screen readers.
[3] This document proposes some best practices for the preparation of scientific publications on the Open Web with the explicit goal to add to a dialogue about the many pracitical details that this involves. As a practical example, the preparation of a linguistic monograph is detailed using the document preparation framework Pandoc.
[4] Note that this plea for Open Web Publishing is largely independent of the question of Open Access Publishing. Ideally, scientific publications are open access, so that any links to them easily resolve without paywalls. However, even with restricted access to the content the principles of Open Web Publishing still hold.
[5] The current de-facto standard of electronic publication is to provide scientific works as page-based PDF. This format is a one-to-one reflection of the traditional book form-factor. Moving to purely web-based publications implies various changes to this model. I propose the following best practices:
STABLEID/index.html for the actual full text HTML file.
However, depending on publisher’s preferences this location might be
used as a landing page, so an alternative location for the actual
fulltext might be needed, e.g. STABLEID/fulltext.html.index.html for linking to the
actual content.id=sec4.1 to a heading in the HTML is becomes
possible to use a link like STABLEID/index.html#sec4.1 to
directly link to this section. A user following such a link will be
directly redirected to that specific heading. The format of the IDs can
be freely chosen, as long as the IDs are clearly communicated to
readers. However, for consistency I propose the following identifiers
for scientific publications:
sec4.1#5.34.[6] As an example consider the manuscript being prepared for publication at github.com/cysouw/diathesis. This manuscript is a testbed for the various practices proposed above. Because it is not yet published in a final form, the stable identifier is of course not yet stable. However, for testing the following location can be used for the full text of the publication: cysouw.github.io/diathesis/. Try adding any fragment to this link to cross-reference a specific part, e.g.
[7] Note that the usage
of index.html has the extra benefit that it can be left out
in the links, so the references are even stronger reduced, e.g.:
[8] This publication was prepared in Markdown using Pandoc for the conversion to HTML, though any other toolchain could just as well be used. To extend the functionality of Pandoc, various filters (Pandoc parlance for extensions) were used:
pandoc-crossref
for cross-reference to sections, figures and tables.crossref-adapt
for changing the IDs of these cross-references to the format proposed
above, so they can transparently be cited.count-para
to add numbers to text paragraphs. Refer to a specific paragraph for
example as (Cysouw 2021: #2.7). Adding the suffix to a stable link
directly redirects the reader to the paragraph, e.g. (cysouw 2021: #2.7).pandoc-ling
for the layout, numbering and cross-reference of linguistic examples,
using the ex identifiers.citeproc
for citation and bibliography.[9] To insert
cross-references to other Open Web Publications it would be highly
desirable for reference managers to insert the proper links. For
example, in preparing the above mentioned publication I used Pandoc’s citeproc to
prepare the in-text references and bibliography. Ideally, when I would
add a citation like [@cysouw2021: #2.7] in such a work, then
the hash-based suffix should ideally be transformed to
STABLEID/index.html#2.7.