

Deciding When to Use a Hash Table
source link: https://www.codesd.com/item/deciding-when-to-use-a-hash-table.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Deciding When to Use a Hash Table
I was soving a competitive programming problem with the following requirements:
I had to maintain a list of unqiue 2d points (x,y), the number of unique points would be less than 500.
My idea was to store them in a hash table (C++ unordered set to be specific) and each time a node turned up i would lookup the table and if the node is not already there i would insert it.
I also know for a fact that i wouldn't be doing more than 500 lookups. So i saw some solutions simply searching through an array (unsorted) and checking if the node was already there before inserting.
My question is is there any reasonable way to guess when should i use a hash table over a manual search over keys without having to benchmark them?
My question is is there any reasonable way to guess when should i use a hash table over a manual search over keys without having to benchmark them?
I am guessing you are familiar with basic algorithmics & time complexity and C++ standard containers and know that with luck hash table access is O(1)
If the hash table code (or some balanced tree code, e.g. using std::map
- assuming there is an easy order on keys) is more readable, I would prefer it for that readability reason alone.
Otherwise, you might make some guess taking into account the approximate timing for various operations on a PC. BTW, the entire http:///norvig.com/21-days.html page is worth reading.
Basically, memory accesses are much more slow than everything else in the CPU. The CPU cache is extremely important. A typical memory access with cache fault requiring fetching data from DRAM modules is several hundreds times slower than some elementary arithmetic operation or machine instruction (e.g. adding two integers in registers).
In practice, it does not matter that much, as long as your data is tiny (e.g. less than a thousand elements), since in that case it is likely to sit in L2 cache.
Searching (linearly) in an array is really fast (since very cache friendly), up to several thousand of (small) elements.
IIRC, Herb Sutter mentions in some video that even inserting an element inside a vector is practically -but unintuitively- faster (taking into account the time needed to move slices) than inserting it into some balanced tree (or perhaps some other container, e.g. an hash table), up to a container size of several thousand small elements. This is on typical tablet, desktop or server microprocessor with a multimegabyte cache. YMMV.
If you really care that much, you cannot avoid benchmarking.
Notice that 500 pairs of integers is probably fitting into the L1 cache!
Recommend
-
24
I released a new project A Simple GPU Hash Table on Github. It is a simple GPU hash table capable of hundreds of mill...
-
13
Freezers as an analogy for hash table behavior While I was out of town on vacation recently, there was a cluster of advisories about malicious HTTP GET and POST requests. In short, crafting the right sort of request and directing...
-
12
Hash table attacks and overzealous analytics There's a hash table attack making the rounds this week. It involves what happens inside a bunch of web frameworks and other librarie...
-
8
1 Why we need partitioning Table Partitioning can resolve some performance issue caused by big tables, such as query latency. If the size of one table is reaching to the memory of the machine, it suggests that you should partition th...
-
6
Using a Date as a Hash Table Key advertisements How can I create a hash table object in JavaSript and use a date as the key? So far I've got t...
-
10
New issue Don't hash the key when searching in an empty table. #305
-
6
Build a Hash Table in Python With TDD Invented...
-
7
Copy link
-
4
The quick and practical "MSI" hash table August 08, 2022 I
-
10
What is Hash Table?An array that stores pointers to records corresponding to a given element. An entry in the hash table is NIL if no existing element has a hash function value equal to the index for the entry. In simple terms, we can sa...
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK