Variables are one of the most basic components of writing code. Students of any course about writing software are taught all about them on day one. Developers use & create variables every day. I think that variables are one of the best examples of Phil Karlton's rule of naming things.

There are only two hard things in Computer Science: cache invalidation and naming things. - Phil Karlton

Because variables are such a fundamental building block of software development, many developers tend to overlook the important role variable names play to ensure a clear & maintainable codebase.

Today I want to outline my thought process for naming variables. I have a couple of basic rules that go a long way to help me keep variable names from becoming confusing or misleading.


Most variable naming conventions can be broken down into three key styles.

  1. Abbreviated: x = 1;
  2. Data Usage: allowUserLogin = true;
  3. Data Classification: accountPassword = "super_secret;)";

Each different style can be appropriate based on the context it is being used, but like most things, just because you can use it doesn't mean you should. Let's dive into each style to see what I mean.

Abbreviated

I know a lot of developers who think you should never use abbreviated variable names, but I think there are plenty of uses for them. It is important, however, to put some guard rails in place so we know when they should be used & when they should be avoided.

Abbreviated naming works best when representing short lived or narrowly scoped variables. For loops traditionally use an abbreviated i to represent the loop's current index. This is useful for a couple of reasons.

Typically, the index variable is only references once or twice within the body of the loop. The abbreviated name allows us to quickly skim the references to i because the iterator variable isn't usually a part of the logic that needs to be focused on.

Within the context of a loop the meaning of i is easily understood by the reader. Common abbreviations like this are great for saving the writer & readers time because there is a shared understanding about the intent of the variable.

Abbreviations are great if the context allows for them but should be avoided for any variables that are long lived or references more than once or twice.

Data Usage

Data usage naming refers to any variable that has a name indicating how the value is being used. These typically store the results of calculations or logic that has been performed on one or more other data points. Lots of boolean & numeric values are going to fall into this category.

In my experience data usage based variable names are the most commonly used in code bases. They are very useful in expressing a clear intent about why a variable exists. When you read the name it clearly says, "I am here to do X". There is no ambiguity around what it is doing or why it was created.

One of the downsides of this style of naming is that it is so limiting. Once you have named a variable in a way that expresses how it should be used you lose a lot of the reusability that variables offer.

Data Usage naming is great if you can be certain reuse isn't going to be an issue, however I have never met a codebase that can make that guarantee. That isn't to say that usage based naming should be avoided, but I tend to shy away from it in favor of classification style naming when I can.

Data Classification

This is the category that most variables should fall into. I say should because most developers tend to focus on how the variable is used instead of what the variable represents.

Using a name that represent what the data in the variable is offers quite a few benefits that might not be obvious.

Variables utilize a classification naming structure are more easily reused. While Data Usage naming is useful, the very nature of that naming style limits how the variable can be used. This leads to situations where when the variable inevitably is reused, it can seem out of place or even confusing to a developer who might not know that the data inside the variable can serve multiple purposes.

Let's look at an example.

var allowUserLogin = User.BillPaid && !User.Suspended;

if (allowUserLogin)
{
    loginManager.Login(User);
}

// Many lines later...

if (allowUserLogin && User.HasUnreadMessages)
{
    messageManager.SendSummaryEmail(User);
}

In the code above we reuse the allowUserLogin variable to see if we should send that user a summary email. In reality, their isn't any correlation between a user being able to receive a summary email & being able to log in. What the developer really cared about the two data points the allowUserLogin variable represented; User.BillPaid & !User.Suspended. To be more clear, we would only send a summary email to a user who's account is able to log in & check the messages.

What if instead of using variables that represent what the data should be used for, we used a name that indicated what data really was? What does knowing if a user has paid their bill & their account is not suspended tell us? It tells us if the user's account is in good standing & they are able to use the features of our application.

Let's revise our example with a variable name that follows the Data Classification style of naming instead of Data Usage.

var accountIsInGoodStanding = User.BillPaid && !User.Suspended;

if (accountIsInGoodStanding)
{
    loginManager.Login(User);
}

// Many lines later...

if (accountIsInGoodStanding && User.HasUnreadMessages)
{
    messageManager.SendSummaryEmail(User);
}

It is a very small change, but now our variable can be reused without confusion or fear of accidental breaking changes in the future.

The previous variable name allowUserLogin could have a criteria change in the future that could accidentally break the SendSummaryEmail(User) statement. What if we implement Lockout functionality after a certain number of failed login attempts? Then we would want to update our allowUserLogin assignment to include a new User.Locked property. This would mean that any accounts that are temporarily locked would also not be able to receive their summary emails. A most likely undesired side effect.

I strongly believe that most variable should follow the Data Classification naming style whenever possible. It naturally lends itself to code reuse & clarity when being read by someone who might not necessarily have all the context about what data points can be repurposed.