We Need to Discuss Code Legibility

Written by Cheopys | Published 2019/11/05
Tech Story Tags: coding-style | legibility | personal-style | aspirin | software-development | hackernoon-top-story | code-legibility | coding

TLDR Companies have given up, seeing code layout as a strictly personal matter and not as important as shipping product. Many habits persist from early C idioms that, given the size and complexity of modern projects, are overdue for retirement. The objective of code layout is reliability in reading, not expression of the writer’s awesomeness, the writer writes. A lot of coders regard as proof of their superiority that nobody else can read their code. This article is not a coding standards document; readers use many different languages.via the TL;DR App

Fruitless, Exhausted Discussion

If you are even reading this, thank you. Most of us are sick to death of the topic, but not because it’s been settled. Far from it. Companies have given up, seeing code layout as a strictly personal matter and not as important as shipping product. They are dead wrong.
Most of us have dealt with so much contention on the topic of code formatting and personal coding style that the subject has lain fallow for years. Developers each choose something different and defend it against all suggestion and opposition. Companies regard it as tangential to their projects even as they impose every latest management fad and “manifesto” on us. There are more important things than formatting quibbles. Like stories and standups and patterns.

The Goal

It is this writer’s firm belief that people are more alike than different and that the popular notion among coders who see “personal coding style” as the hallmark of individuality are forwarding an illegitimate outlook and seeking to do in code what dogs do with fire hydrants. There should be one maximally legible way to lay out software lexical artifacts that is satisfactory for
  • all developers
  • on all teams
  • in all companies
and that expression of individuality is better done with personal items more like plastic action figures and commercial sports pennants.
This article is not a coding standards document; readers use many different languages and some of them have idioms. I will present some general principles with the expectation that the reader will apply them creatively and honestly.
Remember that the objective of code layout is reliability in reading, not expression of the writer’s awesomeness. Many habits persist from early C idioms that, given the size and complexity of modern projects, are overdue for retirement. Most examples will be snippets in C# with references to C and others.

Ineffectual Solutions

The contentious matter is usually settled in one of two wholly inadequate ways.
Tyrannical Coding Standards: “read this document and apply the mandated style exactly,” with a certain silkily sinister suggestion that any disagreement, however politely presented, will be called “whining,” and will not be well-received.
Free-for-all: it’s all subjective, right? To each his own. Every developer has his own scent-mark and applies it to all his work, aggressively mismatching code around it

Horrible Team Standards

The coding standards documents are never motivated by clarity, always by consistancy (sic), creating vast codebases that are nearly impossible to read but uniformly impossible, as the author revels in his tyranny and boasts about the grim and harsh formatting he’s come to use. As if consistency is its own virtue.
The standard is never based on scientific studies of parsing and differs from written language as much as possible. One person’s starkly illegible preferences become canonical. And if you don’t like it, there are plenty of others who would be happy to have your job.
“I would recommend you not whine about this”
— from such a standards document forwarded to me from a friend at Microsoft

To Each “Their” Own

Free-for-all is even worse. Come, let us have some candor: a lot of coders regard as proof of their superiority that nobody else can read their code. They will adopt the most bizarre layouts they can dream up and earnestly claim they “find it to be more readable.”
They are lying. There is no other conceivable explanation for the freakish bizarreness of so much code.

The Two Great Formatting Sins

For something that so many see as emblematic of their individuality it’s remarkable how few parameters go into their choices. Two prevail, clutter and inversion.

Clutter: Whitespace is for Sissies

This is how many of us got our start
10 FORA=1TO10:PRINTA:NEXTA
For those too young to remember the JFK assassination this is an example of GW-BASIC, an interpreted language that came with MS-DOS. It had no whitespace requirements to speak of and even the most hard-core clutterists might balk at this now. Keywords were not separated from variables and it was a point of honor for many to cram everything together, to put multiple statements on one line.
To add even minimal whitespace
10 FOR A=1 TO 10:PRINT A:NEXT A
was regarded as effete. Enough of that, OK? This brings back bad memories.
While no modern languages would allow this level of compression it wasn’t hard to understand why people would continue to do things like this
if(a==1) printf("a=1");
With the solitary space after the closing parenthesis seen as an act of generosity, likely to be removed by the next person editing the code, since the language doesn’t require it.
(*shudder*)
While GW-BASIC is history, crammed-together code remains esteemed despite its eye-strain, regardless of the modernity of the language. Many see it as some sort of distinction, dedication to their work.

Inversion: Backwards is Better

Inversion takes many forms but the basic rule is to reverse every standard of written language, the goofier the better; after all, code isn’t prose, right?
void uninffn(
    int a
    ,char b
    ,double c
){
Since commas customarily go at line ends, put them at the beginning and
  • have an excuse ready
  • don’t give the function or the variables informative names (omitting letters is idiomatic clutter)
  • use just enough space so it will compile
  • pat your back because your teammates are going to help elevate the price of your Bayer stock shares.
It is near certain that some reading this are scratching their heads, wondering what fault anyone could possibly find with this. Isn’t this the “canonical form” for function bodies introduced in one of those books from Microsoft Press?

An Avalanche of Eyestrain

One of the Microsoft journals in the 1990s (Microsoft Journal? MSDN?) had an article where the author’s code samples introduced the trailblazing inversion of spaces immediately within parentheses:
if( ( a=unscrutFn( 0 ) )==2 )
    gothedach( );
Since this was spectacularly goofy it caught on like wildfire in a dead cornfield in heavy wind and half the developer community was copying it within a month. Note that the only whitespace in the expression is where the eyes don’t expect it, and that everything else is as crammed together as the compiler will allow.
The single following clause has no curly braces because the language doesn’t require them. Yet people who wrote — who still write — like this will insist that it makes perfect sense. And narrow their eyes in unconcealed hostility at any suggestion that it’s not perfectly clear.
A quarter century later I still can’t read this. I call it out because it was and is appallingly offensive to the eyes, and wholly at odds with everything we know about how we read.
Clutter and inversion are just two prominent offenses; there are many other formatting offenses but most of them are variations on these two. I would add a third: failure to tabularize. Please keep reading.

Why It Matters

When we do our work, where do you think the time goes? What do we spend the most time doing? Your answer may depend on where you work; if the true answer is “meetings” then you have my pity.
But the activity that occupies the most of our time
  • isn’t writing code
  • isn’t debugging
  • isn’t writing documentation
  • isn’t commits or other layers of process
No, what we spend more time doing at our desks than all other activities combined is …
Reading Code.
That’s right. Reading code is our foremost activity, so one would think that enabling ease and accuracy in the endeavor would be of paramount importance. One would think. Instead we are resigned to accepting these deliberate and inconsiderate obstacles. This is as backwards as can be, and the worst part about it is that legibility is nothing like the personal matter that most believe.

The Cost of Bugs

I am not going to brook any argument that illegible code doesn’t cause bugs. At the very least it causes eyestrain and headaches but in my experience when the code is hard to read we are going to make more mistakes.
There’s more to it than eyestrain. In the early days of software development, bug tracking was on the backs of envelopes, on whiteboards, an unstructured and haphazard affair. No longer. Bug tracking is a serious part of the work of many different team members.
When a bug is discovered by a tester
  1. The tester spend 5–20 minutes making an entry into a bug tracking database; considerably longer with a detailed repro scenario.
  2. A daily triage meeting with perhaps a half-dozen attendees decides a priority and assigns it to a developer
  3. The developer spends anywhere from a minute to days fixing and verifying the bug and updating its status in the database
  4. Hours or days later the tester retests and updates status again; if it is not fixed, goto (2) or (3).
Fixed, the bug is closed.
Total elapsed time is man-hours at best, possibly days. And there may be days or weeks between any of those steps, while the work languishes in an aging Git branch whose merge will be increasingly problematical.
All this for a but that could very likely have never happened had the code not been visually obfuscated.
Of course only some bugs are the result of misreading code, but given the cost of even one such bug, one would think that companies would be adamant about having maximally legible work. This happens to not be the case.

Why? Why?

It is simply astonishing that as deep as we are into the era of information technology, as much of our time is affected by it, that this remains unsettled. The reasons are as above:
  • Passionate developer identification with personal coding style; territorial scent-marking
  • Futility of consensus on what is “readable”
  • Management exasperation with formatting arguments
  • The belief that legibility is “subjective”
It’s the latter, subjectivity, that holds all the cards here, because eliminating this idea would settle all the other reasons. Quite simply, we know what is best, and the answer is so obvious that the metaphor I use is

The Undiscovered Continent

Imagine if someone picked up a satellite photo and saw an unknown continent the size of Australia in the middle of the Pacific. It’s not a new seamount or volcanic event; it’s heavily forested and has trees hundreds or thousands of years old.
Astonished, he goes to older photos and sees that this continent has been there all along, and was somehow overlooked all this time (a similar thing actually happened with the number of human chromosomes).
THIS is the state of software formatting. We know exactly what works but deliberately ignore it because of that “personal coding style” thing. And given that
  • reading code takes more of our time than anything
  • the cost of misreading
  • the wanton illegibility of so very much code
this is simply astonishing.
Communications scientists have studied how we recognize words, how we read text. There is no reason to believe that reading code is substantially different from reading prose; all newspapers use narrow columns, all written languages use similar demarcation between tokens, even bidirectional Middle Eastern languages.
Youwon’tseebookswritten( like-this ),notever. We all have years of experience and deeply ingrained habits for reading.
For example:
Eyes make brief, unnoticeable movements called saccades approximately three to four times per second. Saccades are separated by fixations, which are moments when the eyes are not moving. During saccades, visual sensitivity is diminished, which is called saccadic suppression. This ensures that the majority of the intake of visual information occurs during fixations. Lexical processing does, however, continue during saccades. The timing and accuracy of word recognition relies on where in the word the eye is currently fixating. Recognition is fastest and most accurate when fixating in the middle of the word. This is due to a decrease in visual acuity that results as letters are situated farther from the fixated location and become harder to see.
Excerpt from Word Recognition on Wikipedia
This writer is the only person ever to have written a coding standards document who is known to have considered independent research in word recognition instead of merely canonizing personal preference.

Example: The Ubiquity of Tables

I offer as an example the structure of a two-dimensional table. Tables are so familiar to us all that nobody has ever needed to be taught how to interpret a spreadsheet. Rows and columns; vertical alignment.
But for some reason when we come to code we elect to reject that which we all share in favor of some perverse homage to individuality, and imagining differences between people where they don’t exist. If there is one core point to this article, that was it right there.

A Few Scattered Notions

The ideas that follow are rather scattershot, in no way a complete list but a few examples to think about, and one, the final, to take very seriously.

Never Omit Braces

No example needed. Use them. They’re free. Hard to believe people still do this.

One Operation Per Line

Don’t do this
if (condition) executeCondition();
Yes even
for 
loops:
for(int index = 0; 
    index < operations.Count(); 
    index++)
{

Clear Variable Naming

Early languages like Fortran and early implementations of C imposed limits on variable lengths, and monitors were limited to 80 columns, so developers used abbreviated words. The habit persists even though the restrictions are long gone. This should stop.
Modern IDEs usually have autocomplete so there is no more excuse for
int NumGrads;
The first part stands for number, which doesn’t add anything to the int declaration. Is it ordinal or cardinal? A count, an index, or offset? And even if “number” was useful information, which here it isn’t, leaving off the other three letters is pointless. Let’s quit dropping letters.
This
int GraduateCount;
is explicit and unmistakable. Don’t use garbled abbreviations like Ctx for Context, spell things out. The only exceptions to this are unmistakable abbreviations for excessively long names like Info for Information. Most everything else should use complete words
There are two possible conventions for compound variable names: increasing specificity or increasing generality.
In increasing generality the names look more like English compound nouns:
AnimalSpeciesCount
AnimalSpeciesIndex
PlantSpeciesCount
PlantSpeciesIndex
Here
  • Animal is the most specific part of the name
  • Species is less specific
  • Count is least specific.
This has the advantage of reading like “White House” (adjective precedes noun) languages like English or German.
In increasing specificity the variable names are easier to use in dropdown autocomplete lists:
CountSpeciesAnimal
CountSpeciesPlant
IndexSpeciesAnimal
IndexSpeciesPlant
This has the advantage of reading more like “Casa Blanca” (adjective follows noun) languages like Spanish or Vietnamese.
The difference between these two is a matter of choice, and depends mostly on how one uses autocomplete lists. But names are fully spelled out in both choices.

Strong Typing

It was most disappointing to return to C# after many years working in other languages and note that the
var 
keyword had been added; the strong typing in C# is one of its most attractive features. As with the wrongheaded
throw
keyword in all languages that use it,
var
had a narrow justification but was instantly used by lazy developers to save a few seconds of typing.
I don’t care that one can hover the mouse cursor over a
var
-declared variable and see its actual type; good developers keep two hands on the keyboard and only use a mouse to change focus where keyboard shortcuts are missing or where a change of context is excessively awkward.
The only legitimate use for
var 
in C# is for a singular instance of an anonymous object:
var Bachelors = new
{
    Graduates  = db.Graduates .Where(g => g.Type == Bachelor)),
    Ceremonies = db.Ceremonies.Where(c => c.Graduates
                              .Any  (g => g.Type == Bachelor)
);
Any anonymous object that is used more than once should be declared as a Model object to enforce proper typing. It’s not that much work, just do it. This will catch misspellings and allow the editor to suggest missing variable names. I don’t use
var 
in C# at all.

Obsolete Operators

In the C language typing was very weak and it was “cool” to use the
 !
operator as a test for 0 or
 NULL
. This was the “idiomatic” test for string equality:
if (!strcmp(name1, name2)) …
This is hideous. In more modern languages this is no longer allowed, the operator is only allowed to explicitly test for Boolean expressions. But even then the autocorrupting reformatting done by many IDEs will render this significant but tiny symbol all but invisible and very easy to overlook, and it should be abandoned.
Don’t use magic numbers for such tests either, use enumerations and let the
 ! operator
and others like it disappear. Again, generalize this idea.

Vertical Alignment

This is the final entry in this article and the clearest example of departing from clutter and inversion in favor of clarity. I use vertical alignment obsessively and while I have had many people tell me through clenched teeth that they will! not! take the trouble to do this, I have never had one tell me it was hard to read. This is the biggest undiscovered continent of them all.

Example 1

The example below assigns the members of a class:
Person person = new Person
{
    FirstName = “Chris”,
    LastName = “Jeffers”,
    Address = “123 Mockingbird Lane”,
    City = “Gotham”,
    State = “Moribund”,
    PostalCode = 12345,
    Phone = 2062169069,
    PhoneType = PhoneType.Mobile,
    Notes = “Remote Only”
};
In this case the code is “consistent” in that there is exactly one space on each side of the assignment operator but the result is jagged and ugly. It is the work of a few extra seconds to do this instead:
Person person = new Person
{
    FirstName  = “Chris”,
    LastName   = “Jeffers”,
    Address    = “123 Mockingbird Lane”,
    City       = “Gotham”,
    State      = “Moribund”,
    PostalCode = 12345,
    Phone      = 2062169069,
    PhoneType  = PhoneType.Mobile,
    Notes      = “Remote Only”
};
There is more to this than obsessive-compulsive arrangement. It is now like a table;
  • A column of lvalues
  • A column of assignment operators
  • A column of rvalues
and instantly engages our familiarity with tables.
Doing this in Visual Studio requires writing the closing brace and semicolon before filling it in, else typing the semicolon will “beautify” the code back to the jaggedness of the first example, which is some jerk’s idea of consistancy (sic). The latter is at most a few seconds of extra work.

Example 2

Jagged:
List<Bird> parrots = db.Birds.Where(b => b.Order== Psittaccidae);
List<Bird> crows = db.Birds.Where(b => b.Order == Corvidae);
List<Bird> hummings = db.Birds.Where(m => m.Order == Trochilidae);
Aligned:
List<Bird> parrots = db.Birds
                       .Where(b => b.Order == Psittacidae);
List<Bird> crows    = db.Birds
                        .Where(b => b.Order == Corvidae);
List<Bird> hummings = db.Birds
                        .Where(m => m.Order == Trochilidae);
Which is easier to read? The
Order 
names are all in a sort of column, there is exactly one operation per line. Yes it’s a few seconds extra work. The differences between the three types are easily isolated.
Nothing I have learned to do in entering code has been more effective at enhancing legibility than rigorous employment of vertical alignment.

Example 3

Vertical alignment allows instant visual apprehension of brace-matching since opening and closing braces are aligned and one does not need to throw the caret around to find a matching brace at end of line.
This should settle the end of line / beginning of line debate once and for all.
public void ExampleFunc()
{
   …
}

Conclusion, For Now

Cluttered code, personal coding styles, illegibility as superiority, these are leftovers from the early days of our industry, childish toys that must now be lovingly shelved as our work moves into its adulthood.
Who am I kidding? Its adolescence. We have moved from an era of ad hoc to an era of fads, giving new names to old ideas and pretending we’re doing revolutionary new things. But we need to put the Era of Eyestrain behind us and let go of the idea that there is anything individual or subjective about doing that which we do more than anything: reading code.
And that means learning to write it for others to read, not to mark our territory. Because it isn’t ours anymore. It’s our collaborators’ too.

Resources







Written by Cheopys | Titles of my articles have been badly edited without my consent
Published by HackerNoon on 2019/11/05