Best Practices for Naming Variables: What the Research Shows
Naming things is sometimes said to be the hardest problem in computer science. Now, someone is trying to make that task easier.
Felienne Hermans, who leads the Programming Education Research Lab at Leiden University’s Institute of Advanced Computer Science — where Hermans is also an associate professor — has developed a set of research-based guidelines that will, in her words, “help you to get better at naming things,” at least in the realm of software development.
She summarized her work in a recent talk at a recent online conference, It Will Never Work in Theory: Live! online conference.
Variations in Variable Names
First, there’s what not to do. Hermans provides the example of a method checking the validity of an input that’s been given the logical name isValid(). But what happens if the developer decides to return something more than a “true” or “false” value — for example, also converting the value to a different format, returning not just its validity but also its reformatted version?
This is one of three common ways method names can go wrong — by returning more than promised by their name.
But it’s also bad to return less than promised by the name, or even more befuddling: by returning the reverse of what’s implied by the name. “These are things you want to avoid because they make it more confusing,” Hermans points out. And the same thing can be true for variable names, where the name promises less, more, or the reverse of what’s actually stored by that variable.
And that’s not just one educator’s opinion; it’s backed by academic research. Hermans cited a 2016 paper titled “Linguistic Antipatterns: What They Are and How Developers Perceive Them” — published in the journal Empirical Software Engineering. In general, the term antipattern describes those common responses for recurring programming problems which should actually be eschewed altogether as bad practices.
So in probing “linguistic antipatterns,” the researchers tried to identify specific things that would impair a developer’s understanding of code. They ultimately collected 17 different types of antipattern from real-world open source projects, and then asked developers if they’d view them as poor practices. (And the researchers also tracked how often the authors of the code went back and changed their code after being contacted by the researchers.)
The researchers identified similar naming problems — like “getter” methods that do more than just get a variable’s value or “setter” methods that do more than set it. Hermans calls this paper one of the scholarly works that ultimately defines what the problems are.
Hermans highlights another paper, “The Effect of Poor Source Code Lexicon and Readability on Developers’ Cognitive Load” that warns that bad naming “impairs program comprehension and consequently increases the effort that developers must spend to maintain the software.”
These researchers actually used a brain imaging technique to measure the additional cognitive load that’s created, even mapping its results to specific identifiers in the source code by also implementing eye-tracking devices.
They had found that “the presence of linguistic antipatterns in source code significantly increases the developers’ cognitive load.”
“Their results show that ‘linguistic code smells’ actually increase cognitive loads,” she said. “Your brain has to work harder to process code that has these type of code smells. So that’s not what we want.”
Form and Substance
Yet beyond the actual words in a variable’s name, the way that it is written can also cause cognitive overhead. For example, the indicating of new words with capital letters — or separating words with an underscore in the so-called “snake case” can both slow comprehension.
Hermans cites a 2021 Arxiv paper titled “How Developers Choose Names” which includes a remarkable statistic: When 334 developers were asked to choose 47 variable names, the median probability that two would choose the same name was just 6.9%.
With several obvious and common patterns for arranging these words — and many slight variations — “there are many reasonable options,” the researchers found, leading inevitably to the conclusion that “the probability that two developers would select the same name is low.”
The follow-up experiments from these researchers found that developers chose much better names if they consciously first decided what concepts they’d wanted to include, and then selected the words best representing those concepts. (And in addition, “Respondents who were coached with the model tended to use longer names with more concepts.”)
“As a team, you can talk about ‘name molds’. You can say, ‘Okay, what do we do? Do we always do the quantifier in the beginning, or at the end? What is our plan here?’ Hermans summarized.
Hermans’ talk provoked an impromptu discussion on Hacker News as real-world developers responded to the talk by sharing their own best practices for naming variables. Full-stack developer Nick Janetakis (who also teaches web app development) shared some of his own variable-naming preferences:
- If a variable only holds the values “true” or “false”, end its name with a question mark.
- If a variable holds a date, end it with _on, but if it holds a date with a time end it with _at
“It’s the opposite of cognitive load,” Janetakis argues because you can glance at a name and know what it is without knowing more about it. ”
And Amsterdam-based AI software developer Don Hopkins shared a preference for placing the most significant words at the beginning of variable names — so that when autocomplete alphabetizes variable names, all the related ones are naturally grouped together. Hopkins also gets annoyed when developers don’t demarcate the beginning of a new word. “My name is Don, so every time I see a column called ‘createdon’ I think it’s a boolean flag that you can set true to create me. I wish the db designer would use snake case instead of mashing all the words together.”
Throughout the thread Hopkins shared more least-favorite practices — like the needless ordering of words in a variable name so they form a complete and grammatically-correct sentence.
But Hopkins really hates it when developers prefix every class, variable, or subroutine with the nearly-meaningless name of the larger project or library. Hopkins calls this practice “smurfing” — apparently referencing how the blue cartoon characters constantly reminded viewers of their identity by continually replacing words in sentences with the word “smurf”.
And later Hopkins also advised coders to avoid “Bill and Ted’s Excellent Postfix” — that is, saving a very surfer-like “not” for the last word in a variable’s name.
Hopkins quips, “That’s a most totally bogus code smell, dude.”