Further study on duplicate elements of HashSet

In mathematics, it is trivial to check duplicate elements because the properties of an element are simple. It may just contain numbers, letters, and colors and so on. However, it is complicated to check whether two elements (objects) are the “same” in computer programming.
In Java language, the Set interface is defined as “A collection that contains no duplicate elements. More formally, sets contain no pair of elements e1 and e2 such that e1.equals(e2), and at most one null element”. The HashSet class implements the Set interface. How does it check? Does it really use equals method? Let’s have a look.
We may first think there’re two ways to check the equality of two objects: using “=” operation or using equals method. The former is checked by the addresses of two objects. If they refer to the same address, they are equal. The latter usually returns the same result as “=” operation, but many classes override this method so that we may get different result. To be mentioned, hashCode method is also used here. Let’s see what actually happens.
We write a class and override both equals method and hashCode method. The equals method can return true or false and the hashCode method may return the same or two different integers on two objects. So, there’re four combinations:
1. true same
2. true different
3. false same
4. false different
We create two instances of the class and put them into a HashSet. Then change the combination one by one to see how they influence the result of HashSet, and the corresponding results are listed below:
1. one element
2. two elements
3. two elements
4. two elements
Hence, we may conclude that two objects are the “same” if and only if hashCodes are equal and equals method return true.

Actually, this is part of the right answer.
Now, we look into the source of JDK to see the truth.

The add method in HashSet:

private transient HashMap<E,Object> map;
 
public boolean add(E e) {
    return map.put(e, PRESENT)==null;
}

We know from documentation that “This class(HashSet) implements the Set interface, backed by a hash table (actually a HashMap instance)”. So, it invokes the put method in HashMap. We trace into it and find:

The put method in HashMap:

public V put(K key, V value) {
    if (key == null)
        return putForNullKey(value);
    int hash = hash(key.hashCode());
    int i = indexFor(hash, table.length);
    for (Entry<K,V> e = table[i]; e != null; e = e.next) {
        Object k;
        if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
            V oldValue = e.value;
            e.value = value;
            e.recordAccess(this);
            return oldValue;
        }
    }
 
    modCount++;
    addEntry(hash, key, value, i);
    return null;
}

Pay attention to the key statement “if (e.hash == hash && ((k = e.key) == key || key.equals(k)))”. From this sentence, we can know the truth:
Two objects are the “same” when they satisfy both of the following conditions:
1. hashCode are the same
2. They refer to the same address or equals method returns true

You may look up the source of TreeSet or other sets to see how they check duplicate elements. They differ from each other.

Share
This entry was posted in Java Technology. Bookmark the permalink.

3 Responses to Further study on duplicate elements of HashSet

  1. hello says:

    didn’t understand what you mean by 4 combinations like
    “So, there’re four combinations:
    1. true same
    2. true different
    3. false same
    4. false different

    please explain.

    • Icyfish says:

      Because we’d like to see whether the equals method or the hashCode method will be used to check the equality of two elements. The equals method will return true or false and the hashCode will give us two different or same integers. So we combine the return values of these two methods to see which one(or both two) will affect the result.

  2. Ashokkumar says:

    Thanks good concept…

Leave a Reply

Your email address will not be published. Required fields are marked *