-
Notifications
You must be signed in to change notification settings - Fork 116
Chapter 3: Better Living Through Better Hashing
The following are the code listings for Chapter 3. [book.py]
- Listing 3-1. Code to print readable calendar for any month and year
- Listing 3-2. Convert word to an integer assuming base 26
- Listing 3-3. Ineffective hashtable implementation
- Listing 3-4. Open addressing implementation of Hashtable
- Listing 3-5. LinkedEntry node structure to support linked list of (key, value) pairs
- Listing 3-6. Separate chaining implementation of Hashtable
- Listing 3-7. remove() function for separate chaining implementation of Hashtable
- Listing 3-8. Determine load_factor and threshold when creating DynamicHashtable
- Listing 3-9. Revised put() method initiates resize()
- Listing 3-10. Resize method to dynamically grow hashtable storage for separate chaining
- Listing 3-11. Perfect hashtable for ten words from Shakespeare
- Listing 3-12. Partial listing of perfect hash function for English dictionary
- Listing 3-13. Iterate over all entries in a hashtable using Python generator function
- Listing 3-14. ValueBadHash has a horrible hash() function
from datetime import date
import calendar
def print_month(month, year):
idx = key_array.index(month) ❶
day = 1
wd = date(year,idx + 1,day).weekday() ❷
wd = (wd + 1) % 7 ❸
end = month_length[idx] ❹
if calendar.isleap(year) and idx == 1: ❺
end += 1
print('{} {}'.format(month,year).center(20))
print('Su Mo Tu We Th Fr Sa')
print(' ' * wd, end='') ❻
while day <= end:
print('{:2d} '.format(day), end='')
wd = (wd + 1) % 7 ❼
day += 1
if wd == 0: print() ❽
print()❶ Find index to use for month_length, an integer from 0 to 11.
❷ Returns weekday for first day of given month, using 0 for Monday. Note date() uses idx + 1 since its month argument must be an integer from 1 to 2.
❸ Adjust to have Sunday be the weekday with a value of 0, instead of Monday.
❹ Determine length of the month corresponding to input parameter.
❺ In a leap year, February (index 1 when counting from 0) has 29 days.
❻ Add spaces in first week to start day 1 at right indentation.
❼ Advance day and weekday for next day.
❽ Insert line break before Sunday to start a new line.
def base26(w):
val = 0
for ch in w.lower(): ❶
next_digit = ord(ch) - ord('a') ❷
val = 26*val + next_digit ❸
return val❶Convert all characters to lowercase.
❷ Compute digit in next position.
❸ Accumulate total and return.
class Hashtable:
def __init__(self, M=10):
self.table = [None] * M ❶
self.M = M
def get(self, k): ❷
hc = hash(k) % self.M
return self.table[hc].value if self.table[hc] else None
def put(self, k, v): ❸
hc = hash(k) % self.M
entry = self.table[hc]
if entry:
if entry.key == k:
entry.value = v
else: ❹
raise RuntimeError('Key Collision: {} and {}'.format(k, entry.key))
else:
self.table[hc] = Entry(k, v)❶ Allocate a table to hold M Entry objects.
❷ The get() function locates the entry associated with the hash code for k and returns its value, if present.
❸ The put() function locates the entry associated with the hash code for k, if present, and overwrites its value; otherwise it stores a new entry.
❹ A collision occurs when two different keys map to the same bucket identified by its hash code value.
class Hashtable:
def __init__(self, M=10):
self.table = [None] * M
self.M = M
self.N = 0
def get(self, k):
hc = hash(k) % self.M ❶
while self.table[hc]:
if self.table[hc].key == k: ❷
return self.table[hc].value
hc = (hc + 1) % self.M ❸
return None ❹
def put(self, k, v):
hc = hash(k) % self.M ❶
while self.table[hc]:
if self.table[hc].key == k: ❺
self.table[hc].value = v
return
hc = (hc + 1) % self.M ❸
if self.N >= self.M - 1: ❻
raise RuntimeError ('Table is Full.')
self.table[hc] = Entry(k, v) ❼
self.N += 1❶ Start at the first bucket where an entry whose key is k could be.
❷ If found, return the value associated with k.
❸ Otherwise advance to the next bucket, wrapping around to 0 as needed.
❹ If table[hc] is empty, you know k is not in table.
❺ If found, update the value associated with the Entry for k.
❻ Once you've established that k is not in table, throw RuntimeError if hc is the last empty bucket.
❼ Create new Entry in table[hc] and update N, the count of keys.
class LinkedEntry:
def __init__(self, k, v, rest=None):
self.key = k
self.value = v
self.next = rest ❶❶ rest is an optional argument, which allows the newly created node to link directly to an existing list pointed to by rest.
class Hashtable:
def __init__(self, M=10):
self.table = [None] * M
self.M = M
self.N = 0
def get(self, k):
hc = hash(k) % self.M ❶
entry = self.table[hc] ❷
while entry:
if entry.key == k: ❸
return entry.value
entry = entry.next
return None
def put(self, k, v):
hc = hash(k) % self.M ❶
entry = self.table[hc] ❷
while entry:
if entry.key == k: ❸
entry.value = v ❹
return
entry = entry.next
❺
self.table[hc] = LinkedEntry(k, v, self.table[hc])
self.N += 1❶ Compute index position, hc, of linked list for hash code that could contain k.
❷ Start with first node in the linked list.
❸ Traverse next reference until you find entry whose key matches k.
❹ Overwrite value associated with k.
❺ Prepend new node for (k, v) in table[hc] and increment count, N.
def remove(self, k):
hc = hash(k) % self.M
entry = self.table[hc] ❶
prev = None
while entry: ❷
if entry.key == k: ❸
if prev:
prev.next = entry.next ❹
else:
self.table[hc] = entry.next ❺
self.N -= 1 ❻
return entry.value
prev, entry = entry, entry.next ❼
return None❶ self.table[hc] refers to the first entry in the linked list associated with hash code hc.
❷ Continue iterating as long as there are entries in this linked list.
❸ Locate entry to remove by comparing target k with the key field for entry.
❹ When found, if there is a prev reference, link around entry, which removes it from the list.
❺ If there is no prev reference, then entry is first. Set the linked list at self.table[hc] to point to the second node in the linked list.
❻ Decrement count of entries, N. It is common to return the value associated with the entry being removed.
❼ If key is not found, continue iteration by setting prev to entry and advancing entry to the next entry.
class DynamicHashtable:
def __init__(self, M=10):
self.table = [None] * M
self.M = M
self.N = 0
self.load_factor = 0.75
self.threshold = min(M * self.load_factor, M–1) ❶❶ Ensure for M ≤ 3 that threshold is no greater than M – 1.
def put(self, k, v):
hc = hash(k) % self.M
entry = self.table[hc]
while entry:
if entry.key == k:
entry.value = v
return
entry = entry.next
self.table[hc] = LinkedEntry(k, v, self.table[hc]) ❶
self.N += 1
if self.N >= self.threshold: ❷
self.resize(2*self.M + 1) ❸❶ Prepend new entry to the table[hc] linked list chain.
❷ Check whether N meets or exceeds threshold for resizing.
❸ Resize storage array into new array that is twice original size plus one.
def resize(self, new_size):
temp = DynamicHashtable(new_size) ❶
for n in self.table:
while n:
temp.put(n.key, n.value) ❷
n = n.next
self.table = temp.table ❸
self.M = temp.M ❹
self.threshold = self.load_factor * self.M❶ Construct temporary Hashtable of desired new size.
❷ For each node in the linked list for a given bucket, take all nodes and rehash each entry into temp.
❸ Grab the storage array from temp and use for our own.
❹ Be sure to update our M and threshold values.
G = [0, 8, 1, 4, 7, 10, 2, 0, 9, 11, 1, 5]
S1 = [9, 4, 8, 6, 6]
S2 = [2, 10, 6, 3, 5]
def hash_f(key, T):
return sum(T[i % 5] * ord(c) for i, c in enumerate(key)) % 12
def perfect_hash(key):
return (G[hash_f(key, S1)] + G[hash_f(key, S2)]) % 12S1 = [394429, 442829, 389061, 136566, 537577, 558931, 481136,
337378, 395026, 636436, 558331, 393947, 181052, 350962, 657918,
442256, 656403, 479021, 184627, 409466, 359189, 548390, 241079, 140332]
S2 = [14818, 548808, 42870, 468503, 590735, 445137, 97305,
627438, 8414, 453622, 218266, 510448, 76449, 521137, 259248, 371823,
577752, 34220, 325274, 162792, 528708, 545719, 333385, 14216]
def hash_f(key, T):
return sum(T[i % 24] * ord(c) for i, c in enumerate(key)) % 667596
def perfect_hash(key):
return (G[hash_f(key, S1)] + G[hash_f(key, S2)]) % 667596# Iterator for Open Addressing Hashtable
def __iter__(self):
for entry in self.table:
if entry: ❶
yield (entry.key, entry.value) ❷
# Iterator for Separate Chaining Hashtable
def __iter__(self):
for entry in self.table:
while entry: ❸
yield (entry.key, entry.value) ❷
entry = entry.next ❹❶ Skip over table entries that are None.
❷ Generate a tuple containing the key and value using Python yield.
❸ As long as there is a linked list at this bucket, yield a tuple for each node.
❹ Set entry to the next entry in the linked list, or None if none are left.
class ValueBadHash:
def __init__(self, v):
self.v = v
def __hash__(self):
return hash(self.v) % 4
def __eq__(self, other):
return (self.__class__ == other.__class__ and self.v == other.v)All content drawn from Learning Algorithms: A Programmer’s Guide to Writing Better Code (c) 2021