How to use HashMap with Custom Keys (and Avoid Shooting Yourself in the Leg)

Written by artemsutulov | Published 2022/04/08
Tech Story Tags: java | backend | hashmap | immutability | immutable | junit | equality | hashcode

TLDRWhen writing complex data processing, we often find hash tables very useful. But when you decide to use your object as a key, it’s elementary to implement it the wrong way. The problem is about the way we write our objects, and how we write them. It seems impossible to make this mistake - modify a key of HashMap. However, one day it could happen. For example, a user's role gets new permission throughout a business logic. After adding it, `hashCode` will change, and the user's data is gone. It seems a common problem: don't modify it, and it will be okay.via the TL;DR App

When writing complex data processes, we often find hash tables very useful. But when you decide to use your object as a key, it’s easy to implement it the wrong way.

A little disclaimer: this article requires some specific knowledge of Java development, and I can’t cover all this stuff in this article. Being familiar with these is highly appreciated:

  • Immutability
  • HashMap.java
  • equals() / hashCode() best practices
  • Unit testing

A few words about HashMap

It consists of key->val entries → that are in buckets → that are in array. One bucket includes many entries. Here’s an example of what it looks like:

When the algorithm searches for a suitable key:

  1. Firstly, it searches for a bucket. It uses hashCode(). Roughly speaking, it takes hash, simplifies it in order not to go out of bounds of the array of buckets, and then tries to take the bucket by this index - bucket = table[hash & length],
  2. then it seeks the desired key, comparing entries by key.equals()

The problem

If you have a Number or a String as a key, you have nothing to worry about. However, sometimes you need your object as a key, for example, for future processing or simplifying the logic, or to name it.

With complex keys, when it comes to code, the first thing we do is override Equals and HashCodebecause it’s necessary to make this data structure work.

But are your keys immutable?

What will happen to this search algorithm if the key's hash is changed sometime after it has been used in a map? We can say goodbye to an entry like that. Most likely, we’ll never see it again.

Once, I spent hours trying to fix a bug like this until I debugged it and found out that my key’s field was modified deep inside a business logic by another developer.

It seems like a common problem: don’t modify it, and it will be okay. But it’s not. It’s about the way we write our objects.

Throughout my experience, most objects were made using Lombok or IDE boilerplate. It’s easy to use and saves plenty of time. One simple action generates Constructors, Getters / Setters, overridesequals, and hashCode. But those objects are mutable.

Let’s have a look at the example below. The user has userName and a bunch of roles. Each role has a name and some permissions, which this role can do.

User.java

@Data
@AllArgsConstructor
public class User {
    private String userName;
    private Set<Role> roles;
}

Role.java

@Data
@AllArgsConstructor
public class Role {
    private String name;
    private Set<String> permissions;
}

It seems impossible to make this mistake - modifying a key of HashMap. However, one day it could happen. For example, a user’s role gets new permission throughout a business logic. After adding it, hashCode will change, and the user’s data is gone.

Here’s a demonstration of that:

@Test
public void cantFindExistingDataByUser() {
    // given
    final var map = new HashMap<User, String>();
    final var role = new Role("admin", new HashSet<>());
    final var key = new User("name", Set.of(role));
    map.put(key, "some data");

    // when
    final var data = map.get(key);
    // Here we modify role's permissions, it can happen somewhere far away where we don't remember that User is a key in HashMap
    role.getPermissions().add("new permission");
    final var dataAfterSomeTime = map.get(key);

    // then
    assertThat(data).isEqualTo("some data");
    assertThat(dataAfterSomeTime).isNull();
}

Solution

As you can see, it’s effortless to break our map. The good news - we can avoid this forever if we use immutable objects. They don’t change with time, are side-effects free, and we can use them in a multi-threading environment.

There are plenty of ways to write immutable objects. I will show arguably a good one I prefer. We’ll use a Builder pattern and, of course, the final keyword on our fields. Remember that all fields classes must be immutable too. If we have a collection, it must be unmodifiable, and every field’s class must be immutable.

User.java

public class User {
    private final String userName;
    private final Set<Role> roles;

    public User(Builder builder) {
        this.userName = builder.userName;
        this.roles = builder.roles;
    }

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;
        User user = (User) o;
        return Objects.equals(userName, user.userName) && Objects.equals(roles, user.roles);
    }

    @Override
    public int hashCode() {
        return Objects.hash(userName, roles);
    }

    public static class Builder {

        private String userName;
        private Set<Role> roles;

        public static Builder user() {
            return new Builder();
        }

        public User build() {
            return new User(this);
        }

        public Builder userName(String userName) {
            this.userName = userName;
            return this;
        }

        public Builder roles(Set<Role> roles) {
            this.roles = Collections.unmodifiableSet(roles);
            return this;
        }
    }
}

Role.java

public class Role {
    private final String name;
    private final Set<String> permissions;

    public Role(Builder builder) {
        this.name = builder.name;
        this.permissions = builder.permissions;
    }

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;
        Role role = (Role) o;
        return Objects.equals(name, role.name) && Objects.equals(permissions, role.permissions);
    }

    @Override
    public int hashCode() {
        return Objects.hash(name, permissions);
    }

    public Set<String> getPermissions() {
        return permissions;
    }

    public static class Builder {

        private String name;
        private Set<String> permissions;

        public static Builder role() {
            return new Builder();
        }

        public Role build() {
            return new Role(this);
        }

        public Builder name(String name) {
            this.name = name;
            return this;
        }

        public Builder permissions(Set<String> permissions) {
            this.permissions = Collections.unmodifiableSet(permissions);
            return this;
        }
    }
}

And here’s an attempt to break this version of HashMap, as you can see, we cannot do that because it’s immutable and will never be changed.

@Test
public void alwaysCanFindExistingDataByUser() {
    // given
    final var map = new HashMap<User, String>();
    final var role = role().name("admin").permissions(new HashSet<>()).build();
    final var key = user().userName("name").roles(Set.of(role)).build();
    map.put(key, "some data");

    // when
    final var data = map.get(key);
    // Never can we modify an immutable field
    assertThatThrownBy(() -> role.getPermissions().add("new permission")).isInstanceOf(UnsupportedOperationException.class);
    final var dataAfterSomeTime = map.get(key);

    // then
    assertThat(data).isEqualTo("some data");
    assertThat(dataAfterSomeTime).isEqualTo("some data");
}

Conclusion

If you use HashMap, don’t forget to override equals() and hashCode() and do not use mutable objects as a key. Instead, make it immutable, because immutable objects: don’t change with time, are side-effects free, and are good in a multi-threading environment.

You can find a full working example here, on GitHub.


Written by artemsutulov | I'm a professional FullStack Software Engineer, currently working for Revolut as Software Engineer (Backend).
Published by HackerNoon on 2022/04/08