In this chapter, we studied the Java Collections Framework of the HashMap class,
for which key searches, insertions, and removals can be very fast, on average. This exceptional performance is due to hashing, the process of transforming a key into an
array index. A hashing algorithm must include a collision handler for the possibility
that two keys might hash to the same index. A widely used collision handler is chaining.
With chaining, the HashMap object is represented as an array of singly linked
lists. Each list contains the elements whose keys hashed to that index in the array.
Let n represent the number of elements currently in the table, and let m represent
the capacity of the table. The load factor is the maximum ratio of n to m before
rehashing will take place. The load factor is an estimate of the average size of each
list, assuming that the hash method scatters the keys uniformly throughout the table.
With that assumption—the Uniform Hashing Assumption—the average time for
successful and unsuccessful searches depends only on the ratio of n to m. The same
is true for insertions and removals. If we double the table size whenever the ratio of
n to m exceeds the load factor, then the size of each list, on average, will be less than
or equal to the load factor. This shows that with chained hashing, the average time
to insert, remove, or search is constant.
A HashSet object is simply a HashMap in which each key has the same dummy
value. Almost all of the HashSet methods are one-liners that invoke the corresponding
HashMap method.
An alternative to chaining is open addressing. When open addressing is used to
handle collisions, each location in the table consists of entries only; there are no
linked lists. If a key hashes to an index at which another key already resides, the
table is searched systematically until an open address, that is, empty location is
found. With an offset of 1, the sequence searched if the key hashes to index, is
index, index + 1, index + 2, . . ., table.length – 1, 0, 1, . . . , index - 1.
The offset-of-1 collision handler is susceptible to primary clustering: the phenomenon
of having accelerating growth in the size of collision paths. Primary clustering
is avoided with double hashing: the offset is the (positivized) hash value divided by
table.length. If the Uniform Hashing Assumption holds and the table size is prime,
the average time for both successful and unsuccessful searches with double hashing
is constant.
To learn more about the book this website supports, please visit its Information Center.