Hash Tables

Introduction: The "Magic" of Constant Time

If Arrays are the "contiguous" structure and Linked Lists are the "flexible" structure, Hash Tables are the associative structure. I would argue that Hash Tables (and their high-level abstraction, Dictionaries/Maps) are the single most useful data structure for real-world application logic.

They solve the fundamental problem: How do I find an item instantly without knowing its index?

The Definitions:

  • Dictionary (Map/Associative Array): The Abstract Data Type (ADT). It describes what it does: maps unique Keys to Values.

  • Hash Table: The data structure implementation. It describes how it works: using a hash function to compute an index.

The Mechanics

Under the hood, a Hash Table is actually just an Array. The "magic" lies in how we determine where to put the data.

  1. The Key: The input identifier (e.g., "username", "product_id").

  2. The Hash Function: A mathematical algorithm that takes the Key and converts it into a numerical integer (hash code).

  3. The Modulo Operation: We take that large integer and calculate Index = Hash_Code % Array_Size.

  4. The Bucket: The specific slot in the underlying array where the value lives.

The Collision Problem

What happens if Hash("apple") and Hash("banana") result in the same index? This is a Collision.

  • Chaining (Most Common): Each bucket in the array actually holds a Linked List. If a collision occurs, we just append the new item to the list in that bucket.

  • Open Addressing: We look for the next available slot in the array (Linear Probing).

Operations & Complexity Analysis

Operation

Average Case

Worst Case

Notes

Search (Lookup)

O(1)

O(n)

The "Holy Grail." Usually instant. Becomes O(n) if all keys collide (e.g., a bad hash function).

Insertion

O(1)

O(n)

Computes hash, goes to index, inserts. O(n) if resizing is triggered.

Deletion

O(1)

O(n)

Computes hash, goes to index, removes.

JavaScript Implementation

In JavaScript, we have two primary ways to use this structure: the legacy Object and the modern Map.

1. The Modern Standard: Map

The Map object holds key-value pairs and remembers the original insertion order of the keys.

2. Under the Hood (Conceptual Implementation)

To understand how it works, let's look at a simplified implementation handling collisions with chaining.

Expert Insight: The Hidden Costs

While Hash Tables are usually O(1), two factors can destroy their performance in production systems.

1. The Load Factor

The Load Factor is calculated as Total_Elements / Array_Size.

If you have an array of size 10 and you store 10 elements, your load factor is 1.0. As this number increases, the chance of collisions skyrockets.

The Fix: When the load factor hits a threshold (usually 0.7), the Hash Table must Resize. It doubles the size of the underlying array and Re-hashes every single element to new positions. This is an expensive O(n) operation.

2. Hash DoS Attacks

If an attacker knows your hash function, they can intentionally generate thousands of keys that hash to the exact same index (e.g., Hash(A) = 5, Hash(B) = 5...).

This turns your O(1) lookup into an O(n) linked list scan, causing your server CPU to spike to 100% and taking down your application.

The Defense: Modern languages (Python, Java, V8 JS) often use randomized hash seeds on process start to prevent this predictability.

3. Object vs Map in JS

For years, we abused JS Objects ({}) as dictionaries. This is dangerous because:

  • Prototype Pollution: An attacker can override toString or constructor keys.

  • Performance: Map is optimized for frequent additions and removals; Object is not.

  • Key Types: Object keys are always strings/symbols. Map keys can be anything (functions, objects, numbers).

Conclusion

Use Hash Tables (Maps) when you need fast lookups by a unique ID. They trade memory (sparse arrays) for speed.

  • Use Map for frequent updates and dynamic key collections.

  • Use Object strictly for static records or JSON data structures.

  • Always be aware that "Constant Time" O(1) is an average, not a guarantee.

Last updated