Chapter 3: Better Living Through Better Hashing

The following are the code listings for Chapter 3. [book.py]

Listing 3-1. Code to print readable calendar for any month and year
Listing 3-2. Convert word to an integer assuming base 26
Listing 3-3. Ineffective hashtable implementation
Listing 3-4. Open addressing implementation of Hashtable
Listing 3-5. LinkedEntry node structure to support linked list of (key, value) pairs
Listing 3-6. Separate chaining implementation of Hashtable
Listing 3-7. remove() function for separate chaining implementation of Hashtable
Listing 3-8. Determine load_factor and threshold when creating DynamicHashtable
Listing 3-9. Revised put() method initiates resize()
Listing 3-10. Resize method to dynamically grow hashtable storage for separate chaining
Listing 3-11. Perfect hashtable for ten words from Shakespeare
Listing 3-12. Partial listing of perfect hash function for English dictionary
Listing 3-13. Iterate over all entries in a hashtable using Python generator function
Listing 3-14. ValueBadHash has a horrible hash() function

Listing 3-1. Code to print readable calendar for any month and year

from datetime import date
import calendar

def print_month(month, year):
  idx = key_array.index(month)                ❶
  day = 1

  wd = date(year,idx + 1,day).weekday()       ❷
  wd = (wd + 1) % 7                           ❸
  end = month_length[idx]                     ❹
  if calendar.isleap(year) and idx == 1:      ❺
    end += 1

  print('{} {}'.format(month,year).center(20))
  print('Su Mo Tu We Th Fr Sa')
  print('   ' * wd, end='')                   ❻
  while day <= end:
    print('{:2d} '.format(day), end='')
    wd = (wd + 1) % 7                         ❼
    day += 1
    if wd == 0: print()                       ❽
  print()

❶ Find index to use for month_length, an integer from 0 to 11.
❷ Returns weekday for first day of given month, using 0 for Monday. Note date() uses idx + 1 since its month argument must be an integer from 1 to 2.
❸ Adjust to have Sunday be the weekday with a value of 0, instead of Monday.
❹ Determine length of the month corresponding to input parameter.
❺ In a leap year, February (index 1 when counting from 0) has 29 days.
❻ Add spaces in first week to start day 1 at right indentation.
❼ Advance day and weekday for next day.
❽ Insert line break before Sunday to start a new line.

Listing 3-2. Convert word to an integer assuming base 26

def base26(w):
  val = 0
  for ch in w.lower():                   ❶
    next_digit = ord(ch) - ord('a')      ❷
    val = 26*val + next_digit            ❸
  return val

❶Convert all characters to lowercase.
❷ Compute digit in next position.
❸ Accumulate total and return.

Listing 3-3. Ineffective hashtable implementation

class Hashtable:
  def __init__(self, M=10):
    self.table = [None] * M    ❶
    self.M = M

  def get(self, k):            ❷
    hc = hash(k) % self.M
    return self.table[hc].value if self.table[hc] else None

  def put(self, k, v):         ❸
    hc = hash(k) % self.M
    entry = self.table[hc]
    if entry:
      if entry.key == k:
        entry.value = v
      else:                    ❹
        raise RuntimeError('Key Collision: {} and {}'.format(k, entry.key))
    else:
      self.table[hc] = Entry(k, v)

❶ Allocate a table to hold M Entry objects.
❷ The get() function locates the entry associated with the hash code for k and returns its value, if present.
❸ The put() function locates the entry associated with the hash code for k, if present, and overwrites its value; otherwise it stores a new entry.
❹ A collision occurs when two different keys map to the same bucket identified by its hash code value.

Listing 3-4. Open addressing implementation of Hashtable

class Hashtable:
  def __init__(self, M=10):
    self.table = [None] * M
    self.M = M
    self.N = 0

  def get(self, k):
    hc = hash(k) % self.M           ❶
    while self.table[hc]:
      if self.table[hc].key == k:   ❷
        return self.table[hc].value
      hc = (hc + 1) % self.M        ❸
    return None                     ❹

  def put(self, k, v):
    hc = hash(k) % self.M           ❶
    while self.table[hc]:
      if self.table[hc].key == k:   ❺
        self.table[hc].value = v
        return
      hc = (hc + 1) % self.M        ❸

    if self.N >= self.M - 1:        ❻
      raise RuntimeError ('Table is Full.')

    self.table[hc] = Entry(k, v)    ❼
    self.N += 1

❶ Start at the first bucket where an entry whose key is k could be.
❷ If found, return the value associated with k.
❸ Otherwise advance to the next bucket, wrapping around to 0 as needed.
❹ If table[hc] is empty, you know k is not in table.
❺ If found, update the value associated with the Entry for k.
❻ Once you've established that k is not in table, throw RuntimeError if hc is the last empty bucket.
❼ Create new Entry in table[hc] and update N, the count of keys.

Listing 3-5. LinkedEntry node structure to support linked list of (key, value) pairs

class LinkedEntry:
  def __init__(self, k, v, rest=None):
    self.key = k
    self.value = v
    self.next = rest      ❶

❶ rest is an optional argument, which allows the newly created node to link directly to an existing list pointed to by rest.

Listing 3-6. Separate chaining implementation of Hashtable

class Hashtable:
  def __init__(self, M=10):
    self.table = [None] * M
    self.M = M
    self.N = 0

  def get(self, k):
    hc = hash(k) % self.M       ❶
    entry = self.table[hc]      ❷
    while entry:
      if entry.key == k:        ❸
        return entry.value
      entry = entry.next
    return None

  def put(self, k, v):
    hc = hash(k) % self.M       ❶
    entry = self.table[hc]      ❷
    while entry:
      if entry.key == k:        ❸
        entry.value = v         ❹
        return
      entry = entry.next
                                ❺
    self.table[hc] = LinkedEntry(k, v, self.table[hc])  
    self.N += 1

❶ Compute index position, hc, of linked list for hash code that could contain k.
❷ Start with first node in the linked list.
❸ Traverse next reference until you find entry whose key matches k.
❹ Overwrite value associated with k.
❺ Prepend new node for (k, v) in table[hc] and increment count, N.

Listing 3-7. remove() function for separate chaining implementation of Hashtable

def remove(self, k):
  hc = hash(k) % self.M
  entry = self.table[hc]             ❶
  prev = None
  while entry:                       ❷
    if entry.key == k:               ❸
      if prev:
        prev.next = entry.next       ❹
      else:
        self.table[hc] = entry.next  ❺

      self.N -= 1                    ❻
      return entry.value

    prev, entry = entry, entry.next  ❼

  return None

❶ self.table[hc] refers to the first entry in the linked list associated with hash code hc.
❷ Continue iterating as long as there are entries in this linked list.
❸ Locate entry to remove by comparing target k with the key field for entry.
❹ When found, if there is a prev reference, link around entry, which removes it from the list.
❺ If there is no prev reference, then entry is first. Set the linked list at self.table[hc] to point to the second node in the linked list.
❻ Decrement count of entries, N. It is common to return the value associated with the entry being removed.
❼ If key is not found, continue iteration by setting prev to entry and advancing entry to the next entry.

Listing 3-8. Determine load_factor and threshold when creating DynamicHashtable

class DynamicHashtable:
  def __init__(self, M=10):
    self.table = [None] * M
    self.M = M
    self.N = 0

    self.load_factor = 0.75
    self.threshold = min(M * self.load_factor, M–1) ❶

❶ Ensure for M ≤ 3 that threshold is no greater than M – 1.

Listing 3-9. Revised put() method initiates resize()

def put(self, k, v):
  hc = hash(k) % self.M
  entry = self.table[hc]
  while entry:
    if entry.key == k:
      entry.value = v
      return
    entry = entry.next

  self.table[hc] = LinkedEntry(k, v, self.table[hc])  ❶
  self.N += 1

  if self.N >= self.threshold:                        ❷
    self.resize(2*self.M + 1)                         ❸

❶ Prepend new entry to the table[hc] linked list chain.
❷ Check whether N meets or exceeds threshold for resizing.
❸ Resize storage array into new array that is twice original size plus one.

Listing 3-10. Resize method to dynamically grow hashtable storage for separate chaining

def resize(self, new_size):
  temp = DynamicHashtable(new_size)           ❶
  for n in self.table:
    while n:
      temp.put(n.key, n.value)                ❷
      n = n.next
  self.table = temp.table                     ❸
  self.M = temp.M                             ❹
  self.threshold = self.load_factor * self.M

❶ Construct temporary Hashtable of desired new size.
❷ For each node in the linked list for a given bucket, take all nodes and rehash each entry into temp.
❸ Grab the storage array from temp and use for our own.
❹ Be sure to update our M and threshold values.

Listing 3-11. Perfect hashtable for ten words from Shakespeare

G = [0, 8, 1, 4, 7, 10, 2, 0, 9, 11, 1, 5]

S1 = [9, 4, 8, 6, 6]
S2 = [2, 10, 6, 3, 5]

def hash_f(key, T):
  return sum(T[i % 5] * ord(c) for i, c in enumerate(key)) % 12

def perfect_hash(key):
  return (G[hash_f(key, S1)] + G[hash_f(key, S2)]) % 12

Listing 3-12. Partial listing of perfect hash function for English dictionary

S1 =  [394429, 442829, 389061, 136566, 537577, 558931, 481136,
       337378, 395026, 636436, 558331, 393947, 181052, 350962, 657918,
       442256, 656403, 479021, 184627, 409466, 359189, 548390, 241079, 140332]
S2 =  [14818, 548808, 42870, 468503, 590735, 445137, 97305,
       627438, 8414, 453622, 218266, 510448, 76449, 521137, 259248, 371823,
       577752, 34220, 325274, 162792, 528708, 545719, 333385, 14216]

def hash_f(key, T):
  return sum(T[i % 24] * ord(c) for i, c in enumerate(key)) % 667596

def perfect_hash(key):
  return (G[hash_f(key, S1)] + G[hash_f(key, S2)]) % 667596

Listing 3-13. Iterate over all entries in a hashtable using Python generator function

# Iterator for Open Addressing Hashtable
def __iter__(self):
  for entry in self.table:
    if entry:                         ❶
      yield (entry.key, entry.value)  ❷

# Iterator for Separate Chaining Hashtable
def __iter__(self):
  for entry in self.table:
    while entry:                      ❸
      yield (entry.key, entry.value)  ❷
      entry = entry.next              ❹

❶ Skip over table entries that are None.
❷ Generate a tuple containing the key and value using Python yield.
❸ As long as there is a linked list at this bucket, yield a tuple for each node.
❹ Set entry to the next entry in the linked list, or None if none are left.

Listing 3-14. ValueBadHash has a horrible hash() function

class ValueBadHash:
  def __init__(self, v):
    self.v = v

  def __hash__(self):
    return hash(self.v) % 4

  def __eq__(self, other):
    return (self.__class__ == other.__class__ and self.v == other.v)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chapter 3: Better Living Through Better Hashing

Listing 3-1. Code to print readable calendar for any month and year

Listing 3-2. Convert word to an integer assuming base 26

Listing 3-3. Ineffective hashtable implementation

Listing 3-4. Open addressing implementation of Hashtable

Listing 3-5. LinkedEntry node structure to support linked list of (key, value) pairs

Listing 3-6. Separate chaining implementation of Hashtable

Listing 3-7. remove() function for separate chaining implementation of Hashtable

Listing 3-8. Determine load_factor and threshold when creating DynamicHashtable

Listing 3-9. Revised put() method initiates resize()

Listing 3-10. Resize method to dynamically grow hashtable storage for separate chaining

Listing 3-11. Perfect hashtable for ten words from Shakespeare

Listing 3-12. Partial listing of perfect hash function for English dictionary

Listing 3-13. Iterate over all entries in a hashtable using Python generator function

Listing 3-14. ValueBadHash has a horrible hash() function

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally