Study Guide: HashMap

# Study Guide: HashMap

*Understanding HashMaps as a unique collection of keys with associated values, with no specified order.*

## Conceptualizing HashMaps

Imagine a HashMap as a series of mailboxes. Each mailbox represents a unique key, and just like how no two mailboxes can have the same address, each key in a HashMap is unique. The order of these mailboxes doesn’t matter; what’s important is the address on the front, which corresponds to the key.

In the last Study Guide on [HashSets](https://www.reddit.com/r/leetcode/comments/1evpn4v/study_guide_hashset/), we discussed how HashSets allow for the storage of unique values with constant lookup time through hashing. The key distinction with HashMaps is the introduction of an additional concept: the key. This is advantageous as it enables us to categorize data into buckets based on these keys.

Key-value pairs are a powerful way to define relationships between data. For instance, consider counting character occurrences in a string. A HashMap<Char, Int> can effectively hold unique character keys and their corresponding count values. However, sometimes the relationship can be more complex, such as grouping anagrams together. (More details on this can be found in the "Designing the Key" section.)

## HashMaps vs HashSets

HashMaps and HashSets operate similarly, which is reflected in their comparable time complexities. Here are three fundamental terms to grasp:

- **KEY**: A unique identifier associated with each value.
- **VALUE**: The data linked to each key.
- **HASH**: A generated value based on the key, indicating where the key-value pair is stored in memory.

The primary difference lies in what is stored at each hash. In a HashSet, only a value is stored, while in a HashMap, an *object* is stored at each hash that includes both the key and value.

In various programming languages, HashMaps can be implemented differently. For example, in Java, a Node Object is typically used to store the hash, key, value, and a reference to the next node. This distinction is crucial: the hash points to an *object*, not just a value. For the sake of this discussion, we will refer to this object as "KEY-VALUE-OBJECT."

## Designing the Key

As stated earlier, **key-value pairs are excellent for defining relationships between data**. Let's delve deeper into this concept.

Consider a problem where you need to group anagrams. For example, the input `["act", "lemon", "cat", "bat", "tab", "arm"]` should yield `[["act", "cat"], ["lemon"], ["bat", "tab"], ["arm"]]`. Here, "act" and "cat" share the same letters, while "bat" and "tab" do as well.

To efficiently sort these words into buckets based on their letters, we can design a hash key that reflects the letters used. Here are two potential strategies:

1. **Sort the letters of each string**: This has a time complexity of `O(s log s)`, where s is the number of characters in the string. The sorted string becomes the key:
   - "act" -> "act", "cat" -> "act" | KEY = "act"
   - "bat" -> "abt", "tab" -> "abt" | KEY = "abt"

2. **Count the frequency of each character**: This can be done in linear time, `O(s)`. We can iterate over the string, count the characters, and construct a hash key:
   - "act" -> 1A1C1T, "cat" -> 1C1A1T | KEY = "act"
   - "bat" -> 1B1A1T, "tab" -> 1T1A1B | KEY = "tab"

By designing the hash key ourselves, we can efficiently sort our strings into respective buckets. The time complexity of key generation varies; Option 1 results in `O(s log s)`, while Option 2 offers `O(s)`. In some cases, you might achieve constant time `O(1)`, especially if grouping by string length.

**It is crucial to separate key generation from hash generation.** The key is stored in the HashMap, but it is not generated there. Thus, we do not factor in the time complexity of key generation when analyzing HashMaps.

## What is the Time Complexity?

### Important Points to Remember:

- **GenerateHash**: `O(1)`
  - The principles of hashing discussed in the HashSet study guide apply to HashMaps. Increased collisions lead to reduced efficiency.
  
- **Insert**: `O(1)`, but can be `O(n)` in specific situations.
  -
comments powered by Disqus