• 5

A PHP Error was encountered

Severity: Notice

Message: Undefined index: userid

Filename: views/question.php

Line Number: 191


File: /home/prodcxja/public_html/questions/application/views/question.php
Line: 191
Function: _error_handler

File: /home/prodcxja/public_html/questions/application/controllers/Questions.php
Line: 433
Function: view

File: /home/prodcxja/public_html/questions/index.php
Line: 315
Function: require_once

name Punditsdkoslkdosdkoskdo

Is there anything faster than dict()?

I need a faster way to store and access around 3GB of k:v pairs. Where k is a string or an integer and v is an np.array() that can be of different shapes. Is there any object, that is faster than the standard python dict in storing and accessing such a table? For example, a pandas.DataFrame?

As far I have understood python dict is a quite fast implementation of a hashtable, is there anything better than that for my specific case?

      • 2
    • Unless your code doesn't do anything else, I'd be quite surprised if dictionary access was your bottleneck. Do you have profiling information showing this is the problem?
      • 2
    • I think dict are pretty fast. Instead of finding the alternative, you consider in optimizing rest of your code :)
    • If your use case involved swapping -- if your data structure were larger than available RAM -- then there would be better answers, but it's not clear whether that's the case.
      • 1
    • @alec_djinn: if your code only loops over the dict, it's easy to make it faster -- remove the loop! But if your code does something inside the loop (say printing, or finding the maximum of the value, or anything other than pass), then if that takes longer than the dictionary access (and it almost certainly will), improving dict access won't improve your net performance at all. At this point, you're going to have to show some code if you want any real advice.

No, there is nothing faster than a dictionary for this task and that’s because the complexity of its indexing and even membership checking is approximately O(1).

Once you save your items in a dictionary, you can have access to them in constant time which means that it's unlikely for your performance problem to have anything to do with dictionary indexing. That being said, you still might be able to make this process slightly faster by making some changes in your objects and their types that may result in some optimizations at under the hood operations.

e.g. If your strings (keys) are not very large you can intern the lookup key and your dictionary's keys. Interning is caching the objects in memory --or as in Python, table of "interned" strings-- rather than creating them as a separate object.

Python has provided an intern() function within the sys module that you can use for this.

Enter string in the table of “interned” strings and return the interned string – which is string itself or a copy. Interning strings is useful to gain a little performance on dictionary lookup...

also ...

If the keys in a dictionary are interned and the lookup key is interned, the key comparisons (after hashing) can be done by a pointer compare instead of a string compare. That reduces the access time to the object.

Here is an example:

In [49]: d = {'mystr{}'.format(i): i for i in range(30)}

In [50]: %timeit d['mystr25']
10000000 loops, best of 3: 46.9 ns per loop

In [51]: d = {sys.intern('mystr{}'.format(i)): i for i in range(30)}

In [52]: %timeit d['mystr25']
10000000 loops, best of 3: 38.8 ns per loop
  • 47
Reply Report

No, I don't think there is anything faster than dict. The time complexity of its index checking is O(1).

Operation    |  Average Case  | Amortized Worst Case  |
Copy[2]      |    O(n)        |       O(n)            | 
Get Item     |    O(1)        |       O(n)            | 
Set Item[1]  |    O(1)        |       O(n)            | 
Delete Item  |    O(1)        |       O(n)            | 
Iteration[2] |    O(n)        |       O(n)            | 

PS https://wiki.python.org/moin/TimeComplexity

  • 4
Reply Report

You can think of storing them in Data structure like Trie given your key is string. Even to store and retrieve from Trie you need O(N) where N is maximum length of key. Same happen to hash calculation which computes hash for key. Hash is used to find and store in Hash Table. We often don't consider the hashing time or computation.

You may give a shot to Trie, Which should be almost equal performance, may be little bit faster( if hash value is computed differently for say

HASH[i] = (HASH[i-1] + key[i-1]*256^i % BUCKET_SIZE ) % BUCKET_SIZE 

or something similar due to collision we need to use 256^i.

You can try to store them in Trie and see how it performs.

  • -4
Reply Report
      • 1
    • The first sentence is kinda misleading. The Big-O notation isn't about speed in general, it's about how the computation time changes when the size of the input changes. So, O(1) could be terribly slow as well as O(n!).
      • 2
    • true, but it doesn't mean it's gonna be fast or faster than O(anything) or even the fastest among all the Os possible. It's about how the amount of work changes when the input size changes.
      • 2
    • ...which is to say: In the real world, constant factors matter. An O(1) process that takes a constant-time 1000ms will often be a worse choice than an O(n) process that takes 1ns per item of input.
      • 1
    • O(1) means "constant time", which means "whatever input is, the computation time is the same". Sudoku solving is constant time if we just examine all possibilities, but it's way more slower than many algorithms.

Trending Tags