Academic publishing on the open web

Michael Cysouw

1 The academic web

[1] An ever growing body of academic publications is available online, ideally open access. Most publications today are prepared electronically from the start, and more and more older publications are retro-digitized. And, by their very nature, academic publications are an interconnected network of knowledge. Yet, academia-online does not fully exploit the hypertextuality of the web. It is exceedingly tedious to follow citations to other works, and references to pages or subsections have to be reached by manually scrolling through page-based (PDF) documents. In contrast, by leveraging Open Web technologies it is straightforwardly feasible to prepare academic publications in a truly interconnected way.

[2] The Open Web Platform is a concept advanced by the World Wide Web Consortium (W3C). It is the collection of open and royalty-free technologies which enables the Web. Basically, this means HTML content with CSS/Javascript layout and various additional technologies like MathML and SVG. With (ever ongoing) advances in HTML/CSS and browser technology, the quality and possibilities of layout are approaching the sophistication of Latex, while additionally being adaptive/responsive. Precise cross-referencing is possible through hyperlinking with hash-formatted URI Fragments. Annotations can be added by using the Web Annotation Standard of the W3C, for example implemented by Hypothesis. Also note that HTML/CSS is much better suited to accessibility. In contrast, PDF is often unusable for screen readers.

[3] This document proposes some best practices for the preparation of scientific publications on the Open Web with the explicit goal to add to a dialogue about the many pracitical details that this involves. As a practical example, the preparation of a linguistic monograph is detailed using the document preparation framework Pandoc.

[4] Note that this plea for Open Web Publishing is largely independent of the question of Open Access Publishing. Ideally, scientific publications are open access, so that any links to them easily resolve without paywalls. However, even with restricted access to the content the principles of Open Web Publishing still hold.

2 Best practices for academic web-publication

[5] The current de-facto standard of electronic publication is to provide scientific works as page-based PDF. This format is a one-to-one reflection of the traditional book form-factor. Moving to purely web-based publications implies various changes to this model. I propose the following best practices:

3 Example

[6] As an example consider the manuscript being prepared for publication at github.com/cysouw/diathesis. This manuscript is a testbed for the various practices proposed above. Because it is not yet published in a final form, the stable identifier is of course not yet stable. However, for testing the following location can be used for the full text of the publication: cysouw.github.io/diathesis/. Try adding any fragment to this link to cross-reference a specific part, e.g.

[7] Note that the usage of index.html has the extra benefit that it can be left out in the links, so the references are even stronger reduced, e.g.:

[8] This publication was prepared in Markdown using Pandoc for the conversion to HTML, though any other toolchain could just as well be used. To extend the functionality of Pandoc, various filters (Pandoc parlance for extensions) were used:

4 Cross referencing

[9] To insert cross-references to other Open Web Publications it would be highly desirable for reference managers to insert the proper links. For example, in preparing the above mentioned publication I used Pandoc’s citeproc to prepare the in-text references and bibliography. Ideally, when I would add a citation like [@cysouw2021: #2.7] in such a work, then the hash-based suffix should ideally be transformed to STABLEID/index.html#2.7.