1. Purpose of Hash Tables : maintain a (possibly evolving) set of stuff.
Supported Operations:
a) Insert: add new record
b) Delete: delete existing record
c) Lookup: check for a particular record
All operations in O(1) time
2. Application: De-Duplication
Given: a “stream” of objects. (Linear scan of a huge file or objects arriving in real time)
Goal: remove duplicates
Solution: when new object x arrives -- lookup x in hash table H if not found, Insert x into H
3. Application: The 2-SUM Problem
Input: unsorted array A of n integers. Target sum t.
Goal: determine whether or not there are two numbers x,y in A with x + y = t
Naive Solution: O(n^2) time via exhaustive search
Better Solution: 1) sort A ( O(nlogn) time)
2) for each x in A, look for t-x in A via binary search (O(nlogn) time)
Amazing: 1) insert all elements of A to hash table H (O(n) time)
2) for each x in A, Lookup t-x in H ( O(n) time)
4. High-Level Idea of Implementation:
Setup: universe U (generally, really big)
Goal: want to maintain evolving set S subset of U (generally, of reasonable size)
Solution: 1) pick n = # of “buckets” with n ~ |S| (for simplicity assume |S| doesn’t vary much)
2) choose a hash function
3) use array A of length n, store x in A[h(x)]
5. Resolving Collisions
Collision: distinct x, y in U such that h(x) = h(y)
Solution #1 : (separate) chaining
- keep linked list in each bucket
- given a key/object x, perform Insert/Delete/Lookup in the list in A[h(x)]
Solution #2 : open addressing (only one object per bucket)
- Hash function now specifies probe sequence h1(x),h2(x),.. (keep trying till find open slot)
- Examples : linear probing (look consecutively), double hashing
6. Properties of a “Good” Hash function
1) Should lead to good performance => i.e., should “spread data out”
(gold standard – completely random hashing)
2) Should be easy to store/ very fast to evaluate.
7. Quick-and-Dirty Hash Functions:
Object U --> (hash code) Integers --> (compression function like mod n function) buckets {0, 1, ..., n-1}
How to choose n = # of buckets:
a) Choose n to be a prime ( within constant factor of # of objects in table)
b) Not too close to a power of 2
c) Not too close to a power of 10
相关推荐
Disk Based Hash Tables and Quantified NumbersEdscott Wilson GarciaMarch 24, 2014AbstractThis paper describes the design of Disk Based Hash Tables: n–dimen- sional self–indexed data tables. The ...
Goals: This assignment is designed to reinforce the student's understanding of the use of hash tables as searchable containers. Outcomes: Students successfully completing this assignment would master...
Goals: This assignment is designed to reinforce the student's understanding of the use of hash tables as searchable containers. Outcomes: Students successfully completing this assignment would master...
Build enhanced applications by using hashtables, dictionaries, and sets Implement searching algorithms such as linear search, binary search, jump search, exponential search, and more Have a positive ...
Chapter 8: Distributed Hash Tables Chapter 9: Large File Systems Chapter 10: NoSQL Storage Appendix A: Order Notation Appendix B: Assignment 1: Search Appendix C: Assignment 2: Indices Appendix D: ...
Contents Overview 1 Lesson 1: Concepts – Locks and Lock Manager 3 Lesson 2: Concepts – Batch and Transaction 31 Lesson 3: Concepts – Locks and...The hash value consists of all the key components and ...
4.26 Exception tables and Unwind tables 4.27 Assembly language changes after RVCT v2.1 5 Condition Codes 5.1 Conditional instructions 5.2 Conditional execution in ARM state 5.3 Conditional execution ...
All these components make the application development process easy and clean. You can use these components separately from the main protocol components with any other library and even with your own ...
PEP 456: Secure and Interchangeable Hash Algorithm PEP 436: Argument Clinic Other Build and C API Changes Other Improvements Significant Optimizations Deprecated Deprecations in the Python API ...
In this version I have done away with the b+tree and hash index in favor of my own MGIndex structure which for all intents and purposes is superior and the performance numbers speak for themselves....
exploiting and query syntax obviously depend upon the web application database management system back-end; * It recognizes valid queries by false ones based upon HTML output page hashes comparison ...
- Bochs now can be compiled as native Windows x86-64 application (tested with Mingw gcc 4.5.1 and Microsoft Visual Studio Express 2010) - Added ability to configure CPUID stepping through .bochsrc....
1 Architectures of Internet Switches and Routers ............................................ 1 Xin Li, Lotfi Mhamdi, Jing Liu, Konghong Pun, and Mounir Hamdi 1.1 Introduction ...........................
删除HKEY_LOCAL_MACHINE/SYSETM/CurrentControlSet/Services/Eventlog/application中所有以oracle开头的键。 删除HKEY_CLASSES_ROOT目录下所有以Ora、Oracle、Orcl或EnumOra为前缀的键。 删除HKEY_CURRENT_...