Site MapHelpFeedbackChapter Overview
Chapter Overview
(See related pages)

In this chapter, we studied the Java Collections Framework of the HashMap class, for which key searches, insertions, and removals can be very fast, on average. This exceptional performance is due to hashing, the process of transforming a key into an array index. A hashing algorithm must include a collision handler for the possibility that two keys might hash to the same index. A widely used collision handler is chaining. With chaining, the HashMap object is represented as an array of singly linked lists. Each list contains the elements whose keys hashed to that index in the array.

Let n represent the number of elements currently in the table, and let m represent the capacity of the table. The load factor is the maximum ratio of n to m before rehashing will take place. The load factor is an estimate of the average size of each list, assuming that the hash method scatters the keys uniformly throughout the table. With that assumption—the Uniform Hashing Assumption—the average time for successful and unsuccessful searches depends only on the ratio of n to m. The same is true for insertions and removals. If we double the table size whenever the ratio of n to m exceeds the load factor, then the size of each list, on average, will be less than or equal to the load factor. This shows that with chained hashing, the average time to insert, remove, or search is constant.

A HashSet object is simply a HashMap in which each key has the same dummy value. Almost all of the HashSet methods are one-liners that invoke the corresponding HashMap method.

An alternative to chaining is open addressing. When open addressing is used to handle collisions, each location in the table consists of entries only; there are no linked lists. If a key hashes to an index at which another key already resides, the table is searched systematically until an open address, that is, empty location is found. With an offset of 1, the sequence searched if the key hashes to index, is

index, index + 1, index + 2, . . ., table.length – 1, 0, 1, . . . , index - 1.

The offset-of-1 collision handler is susceptible to primary clustering: the phenomenon of having accelerating growth in the size of collision paths. Primary clustering is avoided with double hashing: the offset is the (positivized) hash value divided by table.length. If the Uniform Hashing Assumption holds and the table size is prime, the average time for both successful and unsuccessful searches with double hashing is constant.







CollinsOnline Learning Center

Home > Chapter 14 > Chapter Overview