Write Good Documentation

What it is, why it’s hard, and how to make it happen

Good documentation is hard. Most developers want to have it, but few are passionate about writing and maintaining it. That being said, my personal philosophy is that written communication is the most effective way to decouple human dependencies and the most efficient way to scale information as teams grow.

What it is

Code tells what, docs tell why

Code should be readable. As a good rule of thumb, write code so that it can be easily understood by a developer who has been in the industry full-time for a year or two. This is a good way to guard against being too clever but also allows for the reasonable use of specialized language features.

When code is readable, developers who are new to the codebase will easily be able to determine what the code does. For example, I could easily look at a FizzBuzzService and be able to predict what the output would be for various numbers. What is not obvious is why the FizzBuzzService is a stand-alone module rather than being part of the NumberManager.

Regardless of how readable the functionality is, the reason why the functionality exists (or why it exists in its current form) will not be present in the code. For example, why code does deviate from established patterns? Why do two seemingly identical web service routes exist? Why does utility code seem to be duplicated rather than abstracted? Looking at code gives a down-in-the-weeds view of a project that is often much less clear than the birds-eye view. Well written documentation explains why code needs to function the way it does as well as how it fits into the larger picture.

The idea behind self-documenting code is that as much information as possible is contained in the code itself. The reality is that, in most cases, the answer to “why is the code like this?” must be answered long-form in a comment, README, wiki, or some other form of human-oriented writing.

In summary, if you were to walk someone through your code and have to whiteboard something or rationalize why your code works the way it does, write that down. If someone could read through your code without needing you to be present to explain anything, then no documentation is needed.

Documentation is a singly-linked tree

For the purpose of this article, the internet is just a big, singly-linked web of pages. Parent pages have references to child pages (hyperlinks) but child pages don’t natively contain any data about what is linking to them. There are services that provide data about “backlinks” but that data is gathered by scraping large portions of the internet and then reconstructing everything as a doubly-linked data set.

Also for the purpose of this article, I assume that you are using some form of web-accessible format for documentation. (If you have binders full of documentation, please stop reading and consider another career path.) Documentation has the ability to link out to other documents, but any one document does not contain data with respect to its parent page. In this sense, documentation for any given project could be represented by one or more singly-linked tree of information nodes.

In a singly-linked tree, you can traverse from the root to the leaves, but not the other way. If you start with a node part way down a branch, you will fundamentally never be able to navigate to certain parts of the tree. Doing so would require traversing from a child to a parent, which is not possible. And obviously, even if you start at the root, you will not be able to traverse to nodes outside the tree.

Thinking about documentation this way leads to two conclusions.

First, include all of your documentation for a project in one singly-linked tree. This means that while you don’t have to have all of your docs in one system, you should at least not have any orphaned pages. Every thing should be accessible (without leaving the tree) by clicking enough links. An example of “leaving the tree” would be information that exists only in someone’s head. This type of information is frustratingly only exposed by serendipitous conversations with the right people and, therefore, it is horribly unreliable to depend on being able to find it. (If nothing else, keep that “in the tree” by leaving a written note that “Bob is the only guy who knows how Redis is deployed.”)

Secondly, good documentation has a root node that can be located predictably. If all of your data is in one tree, then all you have to know to get all of the data is the location of the root. From there your can traverse and find the rest.

Later on I’ll get to more specific ways to actually implement such a tree, but for now just think about documentation as a single, single-linked tree.

Why it’s hard

It’s hard to find

Written communication is only useful if you know where it is. People are easy to find. They sit at the same desk everyday. They have phone numbers, email addresses, and so on. Documentation on the other hand could live anywhere. Unless specific care is taken to optimize for discovery, written material gets lost. Period.

As anyone knows who has ever launched their own blog or website, getting the content online is the easy part. But when the “go live” button is clicked and no one shows up, you realize that content without discoverability is a depressing waste of time and effort. The proof of this is the multi-billion dollar (billion with a “B”) SEO industry whose sole purpose is essentially selling discoverability. You can have the best product, but if you can’t attract visitors, you will sell nothing.

Documentation is no different. The best rule of thumb is to assume that by default no one will ever find or care about what you write. That sounds harshly pessimistic, but it’s reality. Just like any other online content requires specific marketing strategies to attract and engage users, you have to intentionally market your docs to do the same.

But getting content in front of users is only half the battle. It also has to actually be useful, which leads into …

It goes out of date

Most developers start out their careers with aspirations to write software, not long essays about software. When time or motivation is limited, if you have the choice between updating either the code or the docs, it’s usually just the code that gets any attention.

This is a problem because incorrect documentation is probably worse than none at all. It’s impossible to trust documentation that is routinely out-of-date so developers end up going back to the humans that contain the most update date information. This reinforces the notion that writing anything but code is a waste of time and keeps system knowledge locked safely away in human brains.

How to make it happen

Document as close to the stuff as possible

The best way to make code-oriented documentation discoverable is to put it in the actual code. One of the best ways to think about this is to rubber duck your way through your project. If you would need to explain something verbally that is not explicitly declared in the code, that should be documented in writing.

If you had to break with normal conventions for a legitimate reason, type out a comment right there in the code explaining why. If you need to explain or give guidelines for a set of services in a particular directory, put a short README right beside them in the folder. If you would need to whiteboard out the architecture of the whole project (and you likely do), put a diagram in a README in the root of the repository. Some system- or platform-level documentation could very reasonably exist outside of any project repository. This is fine as long at it is discoverable, but we’ll get to that in just a bit.

This accomplishes two things. First it aids discoverability. If you are reading the code, you are likely to find the comment. If you are adding a new service, you will likely see the README in the services directory. If you are looking at the project repository for the first time, you are bound to see the diagram.

Secondly, as you are updating code, you are much more likely to see that the comment or other docs are now out of date. This is also true when reviewing code. When looking at changes, take a quick look for any comments or READMEs in the immediate area that may or may not have been updated. If you have any questions about the code during a review, consider adding the answers to them to the docs before approving the changes.

The general rule is that the closer the documentation is to the code, the more likely it is to be found and thus utilized and maintained. Put documentation in the same place as the stuff that it documents. Then, when you update the stuff, update the docs.

Make it discoverable

Going back to the concept of documentation being a singly-linked tree, the root node should always be in placed a specific, predictable location for every single project without exception. I cannot stress this principle enough. It is not enough to say “the docs are on the Wiki” or “its posted in Slack.” If you know the name of a project or system, any IT employee should be able to find the root node within minutes at any time of day without the need to contact any other human. Without exception.

My personal preference is to use a README in the root of the project repository as this root node. I find this to be the best choice for two main reasons. First, as a developer, if you are going to be working on a project, you will already have access to the source code and thus the README. Secondly, Github (like other similar products) does a fantastic job of prominently displaying the root README on the home page of the project. Basically, if you can find the project, then you can find the root node, and if you can find the root node, then you can traverse the entire documentation tree.

There are other options besides the repository root, but the key thing to keep in mind is that without exception the root node must be discoverable by anyone at anytime without asking anyone else. (Without exception.)

From this root node, start linking out to all of the other written material, build systems, logging systems, environments, and so on that are related to the project. It would be perfectly acceptable if the root node only contained links and no explicit information, although it’s likely best to have some actual content. As long as it’s a sufficiently predictable starting point for discovering all of the other existing written material, it’s doing its job.

Put it in source control whenever possible

Not all docs ought to live in the root README. Like the example given earlier, sometimes you will want to explain the purpose of or guidelines for a collection of services that live in a particular directory. In this case you could create another README in that folder and then link to it from the repo root. By virtue of being along side the code, that documentation will be in source control.

Other documentation such as a user guide, deployment instructions, or other written material isn’t necessarily an obvious choice to put in the same repository. However, consider creating a “docs” folder in the root of the repository for this type of content. This lets you track changes to these documents as the project evolves.

When using Github or similar products that implement Pull Requests, code reviews can also serve to review documentation, not just code. If documentation changes, it’s obvious and those changes get an extra set of eyes. If code changes and documentation does not, then that gives the reviewer an opportunity to notice. All of this is made easier when docs are as close to the documented code as possible. (See above)

One last consideration is that when documentation is in source control and large code changes need to be rolled back, any documentation changes are also rolled back by default. Granted, that is a rare case, but anything to help keep docs up to date is a good thing.

Respond to verbal questions with written answers

Everything discussed so far assumes that you have already been rigorously following best practices and documenting everything as you go. If this is true of you, I applaud you and would ask you to send me your résumé.

The reality is that your documentation will always be in various states of completion. This means that invariably someone will stop by and ask how something works because it has not been documented. You could just do the easy thing and tell them. Alternatively, you could do the easy thing AND pretend to take the high road by telling them and then asking them to write down the answer for later. However, the best bet is to tell them that you’ll get back to them with the info in just a bit. Take the time to write out the minimum information that they need in the correct place (see above) and then send them a link.

In the end, it takes about the same amount of time to write down a brief explanation as it does to explain it twice. (And if wherever you work, you only have to explain things two times, again, I applaud you!) Additionally, if the conversation is happening over email, you will be writing out your response anyway. Being intentional about leaving the written material in a discoverable place takes the same amount of time, but has a far greater return on investment.

Responding to frequently asked questions with written answers ensures that the most requested information gets documented first. Responding with discoverable, written answers ensures that the information scales out efficiently to the whole team.

Write more than just documentation

Writing documentation is fundamentally different than writing code. Getting good at one doesn’t magically make you better at the other. Joel Spolsky argues that being able to write is one of the three most important things to learn as a developer (along with economics and C programming) because “the most successful people are the ones who can explain their ideas the best.” He also writes about specs (which just as easily applies to all forms of documentation):

“If you need to write specs and you can’t, start a journal, create a [blog], take a creative writing class, or just write a nice letter to every relative and college roommate you’ve blown off for the last 4 years. Anything that involves putting words down on paper will improve your spec writing skills.” — Joel Spolsky

Writing is a skill that you can only improve with practice. I don’t claim to be the best author (and so I thank you for reading this far) but honestly one of the reasons why I write is to just get hours writing english rather than code. Please don’t just write code, and please, please, PLEASE don’t let your only non-code writing be documentation.

If you write awesome code, you know you didn’t get there overnight. Take the time to learn to write. Start by writing bad documentation today, then learn from your mistakes and, over time, start writing good documentation.

Did I miss anything here? Do you have any crazy documentation horror stories? Drop a note below and let me know! :)