A Disconnect between the Free Software Movement and Open Science

— Marc Jones and Robert L. Read, PhD

The Problem is Terminology

Academic researchers and the Free Software Movement (FSM) use the word publish differently.

The difference in meaning in the word publish (and publication) creates a disconnect when the Free Software and Academic communities try to collaborate or when Academy adopts the ethos inspired by the Free Movement and the Creative Commons community, as we learned at a recent hackathon on behalf of Project Drawdown.

Publication:

Free Software Publication means putting anything, no matter how trivial or unrefined, online and potentially accessible to world, expecting it to be revised periodically and possibly linked to by others.
Academic Publication means putting a work in academic journal after it has been critically reviewed and circulated for review by peers and trusted advisors, where it will be eternally unchanged, and possibly referenced by others.

Free Software culture expects that works are “published = made accessible and known to a limited audience” from day one. Academic researchers often expect that things are not “published = announced in a peer-reviewed forum” until they have been thoroughly vetted and refined. This difference in expectations can challenge free software developers and academics who work together on projects. Nonetheless, such collaboration is extraordinarily useful and becomes more so every day. Creating a productive working relationship means creating a common understanding about expectations, the language used to describe the process, and the process of implementing the project.

This essay is an attempt to explain this to facilitate these communities working together.

The Mutability of Published Works and Expected Quality

Expectations in the FSM are that works are eternally mutable: they are constantly improved and are never in a final state. In a free software project, the contributors may not know who will make the next improvement, since in theory a wide audience is invited to contribute. Contributors are recognized rather discreetly, sometimes not even by name, in the commit logs and spread out in comments throughout the code. Contributors to a free software project are not individually responsible for its overall quality.

In academia the expectation is that a definite list of authors will take responsibility and great care for moving the work from conception to final, published state, after which it should not need any serious revision. Academia demands non-repudiation: each author is expected to stand behind the conclusions of the work with their reputation.

Free software is “published” immediately. By published, free software authors mean it is available for anyone who cares to discover, to examine, comment upon, and and even build a rival to. There is no expectation that the work is highly usable, let alone finalized. Everyone accepts that some bugs will exist. In fact, the expectation is that you will make it available before having a confidence it is bug free! Functionality often is achieved before all of the possible ideas the original author have come to fruition. The ethos of “release early and release often” embeds this idea. Well after first publication a project will reach the first point of functionality. At that point free software authors will frequently make a “release.” The “release” of an open source software project is symbolic; it is an assertion of readiness rather than a revelation of information. Once a “final” release has been published it is a indication that the authors believe it has some degree of usability. To working programmers, the release is a non-event; the development process immediately continues to revise the code base to add more functionality and fix any bugs, which are expected to be discovered in the previous release.

Quality Standards

Academic publication (and traditional writing at large) has a different standard to meet for publication, which is a momentous event. The work product is the explanation of an idea. Authors are judged and criticized on how accurate or complete the idea is. Significant flaws in the idea or its explanation are problems indicating that publication and sharing the idea was premature. Publication is a seal indicating a level of quality and finality. Not only does free software not have this sense of finality but the standard for quality, minimal functionality, is entirely different. Typically in academic publication there is no similar standard to being functional. English prose can only express an idea; to the extent it is a “useful” idea (as opposed to just an abstract idea) it requires someone to apply it through more work. Free software has the advantage of doing work in its current state, typically without the user even understanding the ideas expressed.

Free software developers often expect works to be accessible to anyone or “Open from day one”, even before anything useful is done. To them published does not mean publicized. The expected audience of a nascent project is tiny. Nonetheless, developers expect the underlying ideas, goals and data involved to also be shared publicly. They will expect every mistake and half-way step will be made freely available to any party that cares to go looking. They will not be concerned that mistakes will reflect poorly on them early in the process and expect to be judged on the progress and the process initially and only on the quality of work when the developer specifically states they believe it is high quality. Software is never “done”.

In contrast academics have the expectation that works are only shared broadly with others when they have reached a final “done” or permanent state. The final, permanent state requires that the first publication be of high, even meticulous, quality and free of all serious flaws. The finality of the state turns the work into an artifact that allows others to judge and critique it as soon as it is published. Any reputational impact rests on the state of the work at the moment of first publication. Academic publishers seek to be respectful of their readers’ time by producing the highest-quality work possible.

Free software developers have the advantage of being able to layer the fixes over there bugs that get buried in the revision control history. Academics mistakes are hard to correct silently once an article has been published.

Rivalry

In some cases there is a race to reach this final point of publication since reputational interests are disproportionately granted to those who publish first. Those who follow cite previous works, increasing the reputation of the previous works. In academia, sharing too much too soon might enable someone else to craft a publication that preempts the work and the reputational rewards that it carries. Authors of an academic paper circulate select drafts of the paper before publication to only a few individuals the author trusts. The author hopes that any criticism will be in private and that the carefully selected reader won’t attempt to compete or usurp the opportunity to be the first to publish by rushing to the presses.

FSM developers expect that anyone could look at the work in progress and criticize, contribute to, or be inspired to create a rival to, the work. These activities have largely been embraced by the free software community and turned into opportunities to accelerate the progress of the body of free software generally. Rival works are common and to some extent validate the value of the work they seek to supplant. It is impossible to count the number of competing GNU/Linux distributions, LLVM has long been competing to supplant GCC as the default compiler in the free software world. These rivals encourage diverse approaches until one dominant modality emerges, which can only rest on its laurels for so long. As an example, in the important field of version control systems, RCS was replaced by CVS, which was largely supplanted by SVN, which lost to a competitive field of distributed source code control systems, until git as emerged as the dominant player.

Method of Reuse

Despite the different understandings around the meaning of “publish” and the expectations that come with the act of publishing, there are many similarities between writing an academic paper and developing free software. Both recognize the need to build on the works of others. Academic papers do this through a rigorous method of citation. Free Software does this by incorporating libraries written by previous authors into the work, or by modifying existing software directly. Both methods of production also recognize the need to circulate works prior to their general release to communities of knowledgeable individuals that can offer critical feedback to give a diversity of thought on the quality of work done so far and identify further work.

When a research paper cites a previous work it acknowledges the priority of the earlier work. Academics acknowledge the contributions of those who have expressed ideas previously:

to give credit to those who thought of it first,
to show that they are contributing something new beyond what has been expressed previously, and
to give the reader a pointer to valuable reading on related ideas.

There are, however, no legal restrictions on the use of the idea in an academic paper, the citation process is not directly regulated by law, but rather by industry standards which carry consequences.

For instance, suppose you write a paper criticizing another person’s paper for a logical flaw: you need to cite the flawed work to give readers a reference point. But if your reference point was not fixed in time and after reading your paper the author of the original paper repaired the logical flaw, your criticism no longer makes sense in reference to the now corrected paper. The change to the underlying paper being criticized robs those who criticize it of the reputational reward undermining the motivation to interact and collaborate to move the ideas and the field forward.

In contrast, Free software references code which changes frequently. By using some software, you are providing a small reputational reward to the author, not for having any particular idea fixed in time, but for having working code. When you borrow the implemented code you have a social requirement of acknowledgement. Typically it is even a legal duty, since you are literally textually copying copyrighted software into your work.

Software languages have been designed to make a weaker form of reuse by reference possible by the use of “libraries”, which facilitate the use of software created by other authors. Often the expectation is that you will only reference the functional work rather than textually including the work, because there is a recognition that the softwares functionality will change and you want to be able to easily take advantage of the improvements. If someone fixes the flaw in the software you are incorporating into your software, your software doesn’t typically stop working — just the opposite — the hope would be that your software now works better because it benefits from the fix as well.

Conclusion: How to Cooperate

Free software developers and researchers typically have different professional reward systems and expectations around mutability, rivalry, and means of reuse of their work, but this does not imply they are at cross purposes on a particular project. The key is to create a shared understanding and language to be able to precisely discuss the goals of the individuals and the goal of the overall project. The natural language of these groups diverges most around the terminology of publication. We present the examples below as a guide to clarifying this confusion.

FSM developers should be ready to say:

“…publish the software…” …by which we mean… “…place it in a publicly accessible repository without publicizing it.”

“…release the software…” …by which we mean… “…make a minor release which will only be noticed by dedicated parties.”

“…make a major release…” …by which we mean… “…we will make a public announcement which, while largely symbolic, will attract a lot of attention.”

“…make this freely available…” …by which we mean… “…make it accessible with documentation and a license that it allows it be vetted, shared and improved, but does not carry with it any expectation that it is perfect, free or error, or even works very well.”

Academic researchers need to be prepared to say:

“…data not ready to be published…” …by which we mean… “…we don’t mind people looking at the data, but we don’t want to publicize it yet.”

“…algorithm and model is not ready to be published…” …by which we mean… “…we don’t mind it being in a public repository under a public license as long as documentation and version control clearly reflect that we are still working on this and track our changes.”

“…of course we invite people to improve this work, but we have not published it yet…” …by which we mean… “…we want it to be made accessible under a license but with an understanding that using this without giving us academic credit should be considered plagiarism.”