`
sunwinner
  • 浏览: 198002 次
  • 性别: Icon_minigender_1
  • 来自: 上海
社区版块
存档分类
最新评论

基础数据结构和算法十二:Hash table

阅读更多

 

Search algorithms that use hashing consist of two separate parts. The first part is to compute a hash function that transforms the search key into an array index. Ideally, different keys would map to different indices. This ideal is generally beyond our reach, so we have to face the possibility that two or more different keys may hash to the same array index. Thus, the second part of a hashing search is a collision-resolution process that deals with this situation. After describing ways to compute hash functions, we shall consider two different approaches to collision resolution: separate chaining and linear probing.

 

Hash functions The first problem that we face is the computation of the hash function, which transforms keys into array indices. If we have an array that can hold M key-value pairs, then we need a hash function that can transform any given key into an index into that array: an integer in the range [0, M–1]. We seek a hash function that both is easy to compute and uniformly distributes the keys: for each key, every integer between 0 and M – 1 should be equally likely (independently for every key). This ideal is somewhat mysterious; to understand hashing, it is worthwhile to begin by thinking carefully about how to implement such a function.

 

 

The hash function depends on the key type. Strictly speaking, we need a different hash function for each key type that we use. If the key involves a number, such as a social security number, we could start with that number; if the key involves a string, such as a person’s name, we need to convert the string into a number; and if the key has multiple parts, such as a mailing address, we need to combine the parts somehow. For many common types of keys, we can make use of default implementations provided by Java.

 

we have three primary requirements in implementing a good hash function for a given data type:

It should be consistent—equal keys must produce the same hash value.

It should be efficient to compute.

It should uniformly distribute the keys.

 

Satisfying these requirements simultaneously is a job for experts. As with many built-in capabilities, Java programmers who use hashing assume that hashCode() does the job, absent any evidence to the contrary.

 

 

Hashing with separate chaining A hash function converts keys into array indices. The second component of a hashing algorithm is collision resolution: a strategy for handling the case when two or more keys to be inserted hash to the same index. A straightforward and general approach to collision resolution is to build, for each of the M array indices, a linked list of the key-value pairs whose keys hash to that index. This method is known as separate chaining because items that collide are chained together in separate linked lists. The basic idea is to choose M to be sufficiently large that the lists are sufficiently short to enable efficient search through a two-step process: hash to find the list that could contain the key, then sequentially search through that list for the key.

public class SeparateChainingHashST<Key, Value> {
    private static final int INIT_CAPACITY = 4;

    // largest prime <= 2^i for i = 3 to 31
    // not currently used for doubling and shrinking
    // private static final int[] PRIMES = {
    //    7, 13, 31, 61, 127, 251, 509, 1021, 2039, 4093, 8191, 16381,
    //    32749, 65521, 131071, 262139, 524287, 1048573, 2097143, 4194301,
    //    8388593, 16777213, 33554393, 67108859, 134217689, 268435399,
    //    536870909, 1073741789, 2147483647
    // };

    private int N;                                // number of key-value pairs
    private int M;                                // hash table size
    private SequentialSearchST<Key, Value>[] st;  // array of linked-list symbol tables


    // create separate chaining hash table
    public SeparateChainingHashST() {
        this(INIT_CAPACITY);
    }

    // create separate chaining hash table with M lists
    public SeparateChainingHashST(int M) {
        this.M = M;
        st = (SequentialSearchST<Key, Value>[]) new SequentialSearchST[M];
        for (int i = 0; i < M; i++)
            st[i] = new SequentialSearchST<Key, Value>();
    }

    // resize the hash table to have the given number of chains b rehashing all of the keys
    private void resize(int chains) {
        SeparateChainingHashST<Key, Value> temp = new SeparateChainingHashST<Key, Value>(chains);
        for (int i = 0; i < M; i++) {
            for (Key key : st[i].keys()) {
                temp.put(key, st[i].get(key));
            }
        }
        this.M = temp.M;
        this.N = temp.N;
        this.st = temp.st;
    }

    // hash value between 0 and M-1
    private int hash(Key key) {
        return (key.hashCode() & 0x7fffffff) % M;
    }

    // return number of key-value pairs in symbol table
    public int size() {
        return N;
    }

    // is the symbol table empty?
    public boolean isEmpty() {
        return size() == 0;
    }

    // is the key in the symbol table?
    public boolean contains(Key key) {
        return get(key) != null;
    }

    // return value associated with key, null if no such key
    public Value get(Key key) {
        int i = hash(key);
        return st[i].get(key);
    }

    // insert key-value pair into the table
    public void put(Key key, Value val) {
        if (val == null) {
            delete(key);
            return;
        }

        // double table size if average length of list >= 10
        if (N >= 10 * M) resize(2 * M);

        int i = hash(key);
        if (!st[i].contains(key)) N++;
        st[i].put(key, val);
    }

    // delete key (and associated value) if key is in the table
    public void delete(Key key) {
        int i = hash(key);
        if (st[i].contains(key)) N--;
        st[i].delete(key);

        // halve table size if average length of list <= 1
        if (M > INIT_CAPACITY && N <= 2 * M) resize(M / 2);
    }

    // return keys in symbol table as an Iterable
    public Iterable<Key> keys() {
        Queue<Key> queue = new Queue<Key>();
        for (int i = 0; i < M; i++) {
            for (Key key : st[i].keys())
                queue.enqueue(key);
        }
        return queue;
    }
}
public class SequentialSearchST<Key, Value> {
    private int N;           // number of key-value pairs
    private Node first;      // the linked list of key-value pairs

    // a helper linked list data type
    private class Node {
        private Key key;
        private Value val;
        private Node next;

        public Node(Key key, Value val, Node next) {
            this.key = key;
            this.val = val;
            this.next = next;
        }
    }

    // return number of key-value pairs
    public int size() {
        return N;
    }

    // is the symbol table empty?
    public boolean isEmpty() {
        return size() == 0;
    }

    // does this symbol table contain the given key?
    public boolean contains(Key key) {
        return get(key) != null;
    }

    // return the value associated with the key, or null if the key is not present
    public Value get(Key key) {
        for (Node x = first; x != null; x = x.next) {
            if (key.equals(x.key)) return x.val;
        }
        return null;
    }

    // add a key-value pair, replacing old key-value pair if key is already present
    public void put(Key key, Value val) {
        if (val == null) {
            delete(key);
            return;
        }
        for (Node x = first; x != null; x = x.next)
            if (key.equals(x.key)) {
                x.val = val;
                return;
            }
        first = new Node(key, val, first);
        N++;
    }

    // remove key-value pair with given key (if it's in the table)
    public void delete(Key key) {
        first = delete(first, key);
    }

    // delete key in linked list beginning at Node x
    // warning: function call stack too large if table is large
    private Node delete(Node x, Key key) {
        if (x == null) return null;
        if (key.equals(x.key)) {
            N--;
            return x.next;
        }
        x.next = delete(x.next, key);
        return x;
    }


    // return all keys as an Iterable
    public Iterable<Key> keys() {
        Queue<Key> queue = new Queue<Key>();
        for (Node x = first; x != null; x = x.next)
            queue.enqueue(x.key);
        return queue;
    }
}

In a separate-chaining hash table with M lists and N keys,the probability that the number of keys in a list is within a small constant factor of N/M is extremely close to 1, the number of compares (equality tests) for search miss and insert is ~N/M.

 

 

Hashing with linear probing Another approach to implementing hashing is to store N key-value pairs in a hash table of size M > N, relying on empty entries in the table to help with collision resolution. Such methods are called open-addressing hashing methods.

The simplest open-addressing method is called linear probing: when there is a collision (when we hash to a table index that is already occupied with a key different from the search key), then we just check the next entry in the table (by incrementing the index). Linear probing is characterized by identifying three possible outcomes:

Key equal to search key: search hit

Empty position (null key at indexed position): search miss

Key not equal to search key: try next entry

We hash the key to a table index, check whether the search key matches the key there, and continue (incrementing the index, wrapping back to the beginning of the table if we reach the end) until finding either the search key or an empty table entry. It is customary to refer to the operation of determining whether or not a given table entry holds an item whose key is equal to the search key as a probe. We use the term interchangeably with the term compare that we have been using, even though some probes are tests for null.

The essential idea behind hashing with open addressing is this: rather than using memory space for references in linked lists, we use it for the empty entries in the hash table, which mark the ends of probe sequences. 

 

Deletion. How do we delete a key-value pair from a linear-probing table? If you think about the situation for a moment, you will see that setting the key’s table position to null will not work, because that might prematurely terminate the search for a key that was inserted into the table later. As an example, suppose that we try to delete C in this way in our trace example, then search for H. The

 

hash value for H is 4, but it sits at the end of the cluster, in position 7. If we set position 5 to null, then get() will not find H. As a consequence, we need to reinsert into the table all of the keys in the cluster to the right of the deleted key, this process is trickier than it might seem.

public class LinearProbingHashST<Key, Value> {
    private static final int INIT_CAPACITY = 4;

    private int N;           // number of key-value pairs in the symbol table
    private int M;           // size of linear probing table
    private Key[] keys;      // the keys
    private Value[] vals;    // the values


    // create an empty hash table - use 16 as default size
    public LinearProbingHashST() {
        this(INIT_CAPACITY);
    }

    // create linear proving hash table of given capacity
    public LinearProbingHashST(int capacity) {
        M = capacity;
        keys = (Key[]) new Object[M];
        vals = (Value[]) new Object[M];
    }

    // return the number of key-value pairs in the symbol table
    public int size() {
        return N;
    }

    // is the symbol table empty?
    public boolean isEmpty() {
        return size() == 0;
    }

    // does a key-value pair with the given key exist in the symbol table?
    public boolean contains(Key key) {
        return get(key) != null;
    }

    // hash function for keys - returns value between 0 and M-1
    private int hash(Key key) {
        return (key.hashCode() & 0x7fffffff) % M;
    }

    // resize the hash table to the given capacity by re-hashing all of the keys
    private void resize(int capacity) {
        LinearProbingHashST<Key, Value> temp = new LinearProbingHashST<Key, Value>(capacity);
        for (int i = 0; i < M; i++) {
            if (keys[i] != null) {
                temp.put(keys[i], vals[i]);
            }
        }
        keys = temp.keys;
        vals = temp.vals;
        M = temp.M;
    }

    // insert the key-value pair into the symbol table
    public void put(Key key, Value val) {
        if (val == null) delete(key);

        // double table size if 50% full
        if (N >= M / 2) resize(2 * M);

        int i;
        for (i = hash(key); keys[i] != null; i = (i + 1) % M) {
            if (keys[i].equals(key)) {
                vals[i] = val;
                return;
            }
        }
        keys[i] = key;
        vals[i] = val;
        N++;
    }

    // return the value associated with the given key, null if no such value
    public Value get(Key key) {
        for (int i = hash(key); keys[i] != null; i = (i + 1) % M)
            if (keys[i].equals(key))
                return vals[i];
        return null;
    }

    // delete the key (and associated value) from the symbol table
    public void delete(Key key) {
        if (!contains(key)) return;

        // find position i of key
        int i = hash(key);
        while (!key.equals(keys[i])) {
            i = (i + 1) % M;
        }

        // delete key and associated value
        keys[i] = null;
        vals[i] = null;

        // rehash all keys in same cluster
        i = (i + 1) % M;
        while (keys[i] != null) {
            // delete keys[i] an vals[i] and reinsert
            Key keyToRehash = keys[i];
            Value valToRehash = vals[i];
            keys[i] = null;
            vals[i] = null;
            N--;
            put(keyToRehash, valToRehash);
            i = (i + 1) % M;
        }

        N--;

        // halves size of array if it's 12.5% full or less
        if (N > 0 && N <= M / 8) resize(M / 2);

        assert check();
    }

    // return all of the keys as in Iterable
    public Iterable<Key> keys() {
        Queue<Key> queue = new Queue<Key>();
        for (int i = 0; i < M; i++)
            if (keys[i] != null) queue.enqueue(keys[i]);
        return queue;
    }

    // integrity check - don't check after each put() because
    // integrity not maintained during a delete()
    private boolean check() {

        // check that hash table is at most 50% full
        if (M < 2 * N) {
            System.err.println("Hash table size M = " + M + "; array size N = " + N);
            return false;
        }

        // check that each key in table can be found by get()
        for (int i = 0; i < M; i++) {
            if (keys[i] == null) continue;
            else if (get(keys[i]) != vals[i]) {
                System.err.println("get[" + keys[i] + "] = " + get(keys[i]) + "; vals[i] = " + vals[i]);
                return false;
            }
        }
        return true;
    }
}

 

 

分享到:
评论

相关推荐

    c++数据结构与算法实现

    TestCuckooHashTable.cpp: Test program for cuckoo hash tables (need to compile CuckooHashTable.cpp also) CaseInsensitiveHashTable.cpp: Case insensitive hash table from STL (Figure 5.23) BinaryHeap.h:...

    数据结构与算法分析java版

    数据结构与算法分析java版 Table of Contents Data Structures and Algorithms in Java - 4 Introduction - 7 Part I Chapter 1 - Overview - 11 Chapter 2 - Arrays - 29 Chapter 3 - Simple Sorting -...

    数据结构常用算法c++实现

    数据结构常用算法c++实现,程序目录如下: Array shuffle Prime test(trial division) Prime test(Miller-Rabin's method) 2D Array Arbitary Integer Linear congruential generator Maximum subarray problem Bit...

    数据结构与算法的知识点

    散列表(Hash table):散列表源自于散列函数(Hash function),其思想是如果在结构中存在关键字和T相等的记录,那么必定在F(T)的存储位置可以找到该记录,这样就可以不用进行比较操作而直接取得所查记录。

    算法:有关C#的流行算法,用于破解代码面试的数据结构和解决方案

    排序和数据结构算法 此存储库是一个C#库,具有已实现的排序算法,结构及其算法。 排序算法: 稳定,通用: 不稳定,通用: 非比较算法: -为整数实现 -为整数实现 -为整数实现 -为整数实现 为字符串实现 数据...

    12 HashTable.rar

    严蔚敏数据结构与算法 课本算法实现

    左程云leetcode-algorithm-and-data-structure:算法+数据结构=程序https://en.wikipedia

    数据结构和算法学习笔记 一、简介 1. 2. 3. 4. 5. 6. 二、数据结构 1. 二维数组(Array2D) 位数组(Bit Set) 静态数组(Fixed Size Array) 有序表(Ordered Array) 2. 队列(Queues) (后进先出) (先进先出)...

    白话算法之散列表(Hash Table)从理论到实用.doc

    白话算法之散列表(Hash Table)从理论到实用.doc

    20151910042-刘鹏-DSA思考题-10 - Graph Traversal and Hash Table1

    数据结构与算法思考题课程名称:数据结构与算法实验年级:2015级成绩:指导教师:陆正福姓名:刘鹏上机实践名称:Graph Traversal and Hash

    数据结构:八大数据结构分析.pdf

    数据结构:⼋⼤数据结构分析 数据结构分类 数据结构是指相互之间存在着⼀种或多种关系的数据元素的集合和该集合中数据元素之间的关系组成 。 常⽤的数据结构有:数组,栈,链表,队列,树,图,堆,散列表等,如图所...

    数据结构与算法分析(Java)

    AvlTree.java BinaryHeap.java BinarySearchTree.java BinomialQueue.java CuckooHashTableClassic.java CuckooHashTable.java DisjSets.java DisjSetsSlow.java Fig01_02.java Fig01_03.java Fig01_04.java Fig02_...

    StructPie:一组C库,用于实现数据结构和算法-开源

    Struct-Pie(结构派)是一组C共享库,用于实现数据结构和算法,以便可以轻松地将其使用/集成到C项目中。 LIFO和FIFO堆栈,二进制搜索树,优先级队列和哈希表已实现并包含在此程序包中。 将来的版本将具有许多其他...

    HASHbiao.rar_hash_哈西table_数据结构实验_文件查找

    自己做的一个哈西表算法,在数据结构实验上能用到,哈西表是进行文件查找的一个非常常用的算法,很重要!

    :link:常用的数据结构和算法

    :link:常用的数据结构和算法-源码

    stl数据结构.docx

    C++ STL (Standard Template Library标准模板库) 是通用类模板和算法的集合,它提供给程序员一些标准的数据结构的实现,称为容器,如 queues(队列)、lists(链表)、和 stacks(栈)等。 STL容器是由一些运用最广的一些...

    左程云leetcode-algorithm-and-data-structure:算法+数据结构=程序

    数据结构和算法学习笔记 一、简介 1. 2. 3. 4. 5. 6. 二、数据结构 1. 二维数组(Array2D) 位数组(Bit Set) 静态数组(Fixed Size Array) 有序表(Ordered Array) 2. 队列(Queues) (后进先出) (先进先出)...

    知名公司数据结构笔试题及答案

    hash table c. stack 6.一个二叉树的三种遍历方法的输出结果 7.链表按升序打印每打印完一个节点就将该节点从链表中删除 8.选择一种算法来整理出一个链接表。你为什么要选择这种方法?现在用o(n)时间来做。 ...

    数据结构与算法分析(Java英文版)

    算法的书籍Table of ContentsData Structures and Algorithms in Java - 4Introduction - 7Part IChapter 1 - Overview - 11Chapter 2 - Arrays - 29Chapter 3 - Simple Sorting - 63Part IIChapter 4 - Stacks and ...

    leetcode和oj-Data-Structures-and-Algorithms:数据结构与算法

    基本数据结构和算法 这些算法全部自己敲一遍: 动态数组 链表 双向链表 栈 队列 树 二叉树 二叉查找树 伸展树(splay tree 分裂树) 平衡二叉树AVL 红黑树 B树,B+,B* R树 Trie树(前缀树) 后缀树 最优二叉树(赫夫曼树) ...

Global site tag (gtag.js) - Google Analytics