`
leonzhx
  • 浏览: 769381 次
  • 性别: Icon_minigender_1
  • 来自: 上海
社区版块
存档分类
最新评论

Hash Tables and Application

阅读更多

 

1.  Purpose of Hash Tables : maintain a (possibly evolving) set of stuff.

    Supported Operations:

    a)  Insert: add new record

    b)  Delete: delete existing record

    c)  Lookup: check for a particular record

    All operations in O(1) time

 

2.  Application: De-Duplication

    Given: a “stream” of objects. (Linear scan of a huge file or objects arriving in real time)

    Goal: remove duplicates

    Solution: when new object x arrives -­- lookup x in hash table H if not found, Insert x into H

 

3.  Application: The 2-SUM Problem

    Input: unsorted array A of n integers. Target sum t.

    Goal: determine whether or not there are two numbers x,y in A with x + y = t

    Naive Solution: O(n^2) time via exhaustive search

    Better Solution: 1) sort A ( O(nlogn) time) 

                              2) for each x in A, look for t-x in A via binary search (O(nlogn) time) 

    Amazing: 1) insert all elements of A to hash table H (O(n) time) 

                   2) for each x in A, Lookup t-x in H ( O(n) time)

 

4.  High-Level Idea of Implementation:

  Setup: universe U (generally, really big)

  Goal: want to maintain evolving set S subset of U (generally, of reasonable size)

  Solution: 1) pick n = # of “buckets” with n ~ |S| (for simplicity assume |S| doesn’t vary much)

                 2) choose a hash function

                 3) use array A of length n, store x in A[h(x)]

 

5.  Resolving Collisions

   Collision: distinct x, y in U such that h(x) = h(y)

   Solution #1 : (separate) chaining 

            - keep linked list in each bucket

            - ­given a key/object x, perform Insert/Delete/Lookup in the list in A[h(x)]

   Solution #2 : open addressing (only one object per bucket)

            -­ Hash function now specifies probe sequence h1(x),h2(x),.. (keep trying till find open slot)

            -­ Examples : linear probing (look consecutively), double hashing

 

6.  Properties of a “Good” Hash function

  1) Should lead to good performance => i.e., should “spread data out” 

      (gold standard – completely random hashing)

  2) Should be easy to store/ very fast to evaluate.

 

7.  Quick-­and-Dirty Hash Functions:

     Object U --> (hash code) Integers --> (compression function like mod n function) buckets {0, 1, ..., n-1}

     How to choose n = # of buckets:

     a) Choose n to be a prime ( within constant factor of # of objects in table)

     b) Not too close to a power of 2

     c) Not too close to a power of 10

 

分享到:
评论

相关推荐

    Disk Based Hash Tables and Quantified Numbers (24th March, 2014)-计算机科学

    Disk Based Hash Tables and Quantified NumbersEdscott Wilson GarciaMarch 24, 2014AbstractThis paper describes the design of Disk Based Hash Tables: n–dimen- sional self–indexed data tables. The ...

    hash table spell checking

    Goals: This assignment is designed to reinforce the student's understanding of the use of hash tables as searchable containers. Outcomes: Students successfully completing this assignment would master...

    数据结构作业Hash表

    Goals: This assignment is designed to reinforce the student's understanding of the use of hash tables as searchable containers. Outcomes: Students successfully completing this assignment would master...

    C++ Data Structures and Algorithms

    Build enhanced applications by using hashtables, dictionaries, and sets Implement searching algorithms such as linear search, binary search, jump search, exponential search, and more Have a positive ...

    Disk-Based.Algorithms.for.Big.Data

    Chapter 8: Distributed Hash Tables Chapter 9: Large File Systems Chapter 10: NoSQL Storage Appendix A: Order Notation Appendix B: Assignment 1: Search Appendix C: Assignment 2: Indices Appendix D: ...

    微软内部资料-SQL性能优化3

    Contents Overview 1 Lesson 1: Concepts – Locks and Lock Manager 3 Lesson 2: Concepts – Batch and Transaction 31 Lesson 3: Concepts – Locks and...The hash value consists of all the key components and ...

    ARM® Compiler v5.06 for µVision® armasm User Guide

    4.26 Exception tables and Unwind tables 4.27 Assembly language changes after RVCT v2.1 5 Condition Codes 5.1 Conditional instructions 5.2 Conditional execution in ARM state 5.3 Conditional execution ...

    Clever Internet Suite (SRC) v9.1.0.0

    All these components make the application development process easy and clean. You can use these components separately from the main protocol components with any other library and even with your own ...

    python3.6.5参考手册 chm

    PEP 456: Secure and Interchangeable Hash Algorithm PEP 436: Argument Clinic Other Build and C API Changes Other Improvements Significant Optimizations Deprecated Deprecations in the Python API ...

    BobBuilder_app

    In this version I have done away with the b+tree and hash index in favor of my own MGIndex structure which for all intents and purposes is superior and the performance numbers speak for themselves....

    sqlmap (懂的入)

    exploiting and query syntax obviously depend upon the web application database management system back-end; * It recognizes valid queries by false ones based upon HTML output page hashes comparison ...

    Bochs - The cross platform IA-32 (x86) emulator

    - Bochs now can be compiled as native Windows x86-64 application (tested with Mingw gcc 4.5.1 and Microsoft Visual Studio Express 2010) - Added ability to configure CPUID stepping through .bochsrc....

    Springer Press- High performance Packet Switching Architectures(2007)

    1 Architectures of Internet Switches and Routers ............................................ 1 Xin Li, Lotfi Mhamdi, Jing Liu, Konghong Pun, and Mounir Hamdi 1.1 Introduction ...........................

    oracle学习文档 笔记 全面 深刻 详细 通俗易懂 doc word格式 清晰 连接字符串

     删除HKEY_LOCAL_MACHINE/SYSETM/CurrentControlSet/Services/Eventlog/application中所有以oracle开头的键。  删除HKEY_CLASSES_ROOT目录下所有以Ora、Oracle、Orcl或EnumOra为前缀的键。  删除HKEY_CURRENT_...

Global site tag (gtag.js) - Google Analytics