Advanced Data Structures in C: Understanding Hash Tables

In the realm of computer science, data structures are the building blocks that allow us to organize and manage data efficiently. Among the advanced data structures, hash tables stand out as a powerful tool for fast data retrieval and storage. In the C programming language, hash tables offer a way to map keys to values, providing constant-time average-case complexity for basic operations like insertion, deletion, and lookup. This blog post will delve into the fundamental concepts of hash tables in C, their usage methods, common practices, and best practices.

Table of Contents

  1. Fundamental Concepts of Hash Tables
  2. Hash Table Structure in C
  3. Hash Function
  4. Collision Handling
  5. Usage Methods and Code Examples
  6. Common Practices
  7. Best Practices
  8. Conclusion
  9. References

Fundamental Concepts of Hash Tables

A hash table is a data structure that uses a hash function to compute an index into an array of buckets or slots, from which the desired value can be found. The basic idea is to map a key to a specific location in the table, allowing for quick access to the associated value.

Key-Value Pairs

Hash tables store data in the form of key - value pairs. Each key is unique within the hash table, and it is used to retrieve the corresponding value.

Hash Function

A hash function takes a key as input and returns an integer value, which is used as an index into the hash table. The goal of a good hash function is to distribute the keys uniformly across the table to minimize collisions.

Collision

A collision occurs when two different keys map to the same index in the hash table. Since multiple keys cannot occupy the same slot, collision handling techniques are required to manage these situations.

Hash Table Structure in C

We can represent a hash table in C using an array of linked lists. Each element in the array is a pointer to the head of a linked list, also known as a bucket.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

// Structure for a key - value pair
typedef struct KeyValuePair {
    char *key;
    int value;
    struct KeyValuePair *next;
} KeyValuePair;

// Structure for the hash table
typedef struct HashTable {
    int size;
    KeyValuePair **buckets;
} HashTable;

Hash Function

A simple hash function for strings in C can be implemented as follows:

unsigned int hash(char *key, int size) {
    unsigned int hashval = 0;
    for (int i = 0; key[i] != '\0'; i++) {
        hashval = key[i] + (hashval << 6) + (hashval << 16) - hashval;
    }
    return hashval % size;
}

Collision Handling

One common method to handle collisions is chaining. In chaining, each bucket in the hash table is a linked list. When a collision occurs, the new key - value pair is simply added to the linked list at the corresponding bucket.

// Insert a key - value pair into the hash table
void insert(HashTable *table, char *key, int value) {
    unsigned int index = hash(key, table->size);
    KeyValuePair *newPair = (KeyValuePair *)malloc(sizeof(KeyValuePair));
    newPair->key = strdup(key);
    newPair->value = value;
    newPair->next = table->buckets[index];
    table->buckets[index] = newPair;
}

// Lookup a key in the hash table
int lookup(HashTable *table, char *key) {
    unsigned int index = hash(key, table->size);
    KeyValuePair *current = table->buckets[index];
    while (current != NULL) {
        if (strcmp(current->key, key) == 0) {
            return current->value;
        }
        current = current->next;
    }
    return -1; // Key not found
}

Usage Methods and Code Examples

Here is a complete example of using a hash table in C:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

// Structure for a key - value pair
typedef struct KeyValuePair {
    char *key;
    int value;
    struct KeyValuePair *next;
} KeyValuePair;

// Structure for the hash table
typedef struct HashTable {
    int size;
    KeyValuePair **buckets;
} HashTable;

// Hash function
unsigned int hash(char *key, int size) {
    unsigned int hashval = 0;
    for (int i = 0; key[i] != '\0'; i++) {
        hashval = key[i] + (hashval << 6) + (hashval << 16) - hashval;
    }
    return hashval % size;
}

// Insert a key - value pair into the hash table
void insert(HashTable *table, char *key, int value) {
    unsigned int index = hash(key, table->size);
    KeyValuePair *newPair = (KeyValuePair *)malloc(sizeof(KeyValuePair));
    newPair->key = strdup(key);
    newPair->value = value;
    newPair->next = table->buckets[index];
    table->buckets[index] = newPair;
}

// Lookup a key in the hash table
int lookup(HashTable *table, char *key) {
    unsigned int index = hash(key, table->size);
    KeyValuePair *current = table->buckets[index];
    while (current != NULL) {
        if (strcmp(current->key, key) == 0) {
            return current->value;
        }
        current = current->next;
    }
    return -1; // Key not found
}

// Create a new hash table
HashTable *createHashTable(int size) {
    HashTable *table = (HashTable *)malloc(sizeof(HashTable));
    table->size = size;
    table->buckets = (KeyValuePair **)calloc(size, sizeof(KeyValuePair *));
    return table;
}

// Free the hash table
void freeHashTable(HashTable *table) {
    for (int i = 0; i < table->size; i++) {
        KeyValuePair *current = table->buckets[i];
        while (current != NULL) {
            KeyValuePair *temp = current;
            current = current->next;
            free(temp->key);
            free(temp);
        }
    }
    free(table->buckets);
    free(table);
}

int main() {
    HashTable *table = createHashTable(10);
    insert(table, "apple", 10);
    insert(table, "banana", 20);
    insert(table, "cherry", 30);

    int value = lookup(table, "banana");
    if (value != -1) {
        printf("The value of 'banana' is %d\n", value);
    } else {
        printf("Key not found\n");
    }

    freeHashTable(table);
    return 0;
}

Common Practices

  • Initializing the Hash Table: Always initialize the hash table with an appropriate size. A larger size can reduce the number of collisions, but it also consumes more memory.
  • Memory Management: When inserting key - value pairs, allocate memory for the key and the key - value pair itself. Remember to free the memory when the hash table is no longer needed.
  • Error Handling: Check for memory allocation failures when using malloc or calloc.

Best Practices

  • Choose a Good Hash Function: A good hash function should distribute keys uniformly across the hash table. There are many well - known hash functions available, and choosing the right one depends on the type of keys.
  • Resize the Hash Table: If the load factor (the ratio of the number of elements to the number of buckets) becomes too high, consider resizing the hash table to reduce the number of collisions.
  • Use Separate Compilation: For larger projects, separate the hash table implementation into a header file and a source file for better code organization.

Conclusion

Hash tables are a powerful data structure in C that provide fast data retrieval and storage. By understanding the fundamental concepts, implementing a proper hash function, and handling collisions effectively, we can use hash tables to solve a wide range of problems. Following common and best practices ensures that the hash table implementation is efficient, reliable, and maintainable.

References

  • “The C Programming Language” by Brian W. Kernighan and Dennis M. Ritchie
  • CLRS (Introduction to Algorithms) by Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein