Developers Club geek daily blog

1 year, 10 months ago
If you know why after execution of 'hset mySey foo bar' we will spend not less than 296 bytes of random access memory why engineers of instagramm do not use line keys why it is always worth changing hash-max-ziplist-entries/hash-max-ziplist-val and why the data type which is the cornerstone of hash it and part of list, sorted set, set — do not read. For the others I will try to tell about it. The understanding of the device and work a hash of tables in Redis is crucial when writing systems where the economy of memory is important.

About what this article — what expenses incurs Redis on storages of the key that such ziplist and dict when and for what they are used how many borrow in memory. When hash is stored in ziplist when in dicth and that it gives us. What councils from fashionable articles about optimization of Redis you should not perceive seriously and why.
I want to ask forgiveness that broke material into two articles at once. Having finished part about the dictionary, I understood that the second part — about ziplist, turns out a little less than the first. As a result, after tests on colleagues, made a decision to break it into two parts.

At first we will need to deal with how separate structures which are the cornerstone of Hashes work. Then we will count costs for data storage. It will be simpler to you to understand material if you already know that such redisObject and as Redis stores lines.
There is a certain confusion, that the structure of hashes is perceived as a hash the table. Under a cowl there is a separate dict type which represents this a hash the table. Structure of hashes a hybrid from dict and ziplist. We will need the big introduction how are arranged and work a table hash. Having shown article without this material several colleagues it was convinced that without it everything becomes more confused.

In Redis you manage what internal data type will provide specific hash a key. The hash-max-ziplist-entries variable defines the maximum quantity of elements in hash at a cat it is possible to use the coding REDIS_ENCODING_ZIPLIST. For example, at hash-max-ziplist-entries = 100 your hash will be provided as ziplist until in it there are less than 100 elements. As soon as elements become more, it will be converted into REDIS_ENCODING_HT (dict). Similarly hash-max-ziplist-val works — while length of any value of any separate field hash does not exceed hash-max-ziplist-val ziplist will be used. Parameters work in couple — internal representation from ziplist in dict happens as soon as any of them is exceeded. By default, any hash always has the coding ziplist.

Dictionaries (dict) — key structure of Redis. Not only for hashes, but also for list, zset, set. It is the main structure which is used in Redis for storage of any linking of data of a type a key — value. Dictionaries in Redis are classical a hash the table which supports paste operations, replacements, removals, search and receipt of a random element. All implementation of dictionaries can be found in dict.h and disk.c in source codes of Redis.

Any a hash of the table has a number of the parameters crucial for understanding of efficiency of its work and the choice of the correct implementation. In addition to algorithmic complexity, important for any data structure, it is also a number of factors which directly depend on use conditions. The fill factor (fill factor) and strategy of size variation are in this way especially interesting. The last is especially important — to spend in any system working in real time as any of the available theoretical implementations always presents you with a choice a lot of random access memory or processor time. Redis uses so-called inkrimentarny size variation (incremental resizing). Though there are sentences to try and linear hashing, monotonic keys, consistent hashing, resizing by copying depending on specifics of a hash.

Implementation of inkrimentarny size variation (the function dictRehashMilliseconds, dictRehash in dict.c), is reduced to the fact that:
  1. At size variation of the table, we select new a hash the table obviously big than already available (and we do not change old). As well as with lines, Redis will begin to double quantity of slots in the new table. Together with it, any required size (including initial) will be always aligned up to the next power of two. For example, you need 7 slots, will be selected — 16. The minimum border for the new table is defined by a constant of a stage of compilation DICT_HT_INITIAL_SIZE (by default 4, it is defined in dict.h).
  2. During each operation of reading/record we look in both tables.
  3. We perform all paste operations only in the new table.
  4. At any operation we move n of elements from the old table to new.
  5. We delete the old table if all elements are transferred.

In a case with Redis, n is quantity of keys which the server for 1 millisecond from steps 1000 managed to transfer. In other words, 1 step of a rekheshing are not less than 1000 elements. It is important as when using keys with long line values such operation can exceed considerably 1 ms, causing hangup of the server. Additional aspect — the big peak consumption of memory at a table rekheshing which is sharply proving on hashes with big quantity of "long" keys. When calculating we will consider tables out of hashing, remembering, however, that if in our note of Redis there is a big table, there is high failure probability from service at impossibility to change its size. For example, if you rent budget Redis on herocu (only 25 MB of memory).

The fill factor (the constant dict_force_resize_ratio is equal by default 5) defines sparseness degree in the dictionary and when Redis begins process of doubling of the size of the current dictionary. The current value of fill factor is defined as the relation of total number of elements in a hash to the table to number of slots. In other words, if we have only 4 slots and 24 elements (distributed between them) Redis will make a decision on doubling of the size (24/4> 5). Similar check happens at each appeal to any key of the dictionary. If the quantity of elements becomes equal or will exceed number of slots — Redis will also try to double quantity of slots.

Under Redis cowl: Hash table (part 1)
Look at this structure in source codes
typedef struct dict {
    dictType *type;
    void *privdata;
    dictht ht[2];
    long rehashidx; /* rehashing not in progress if rehashidx == -1 */
    int iterators; /* number of iterators currently running */
} dict;

typedef struct dictht {
    dictEntry **table;
    unsigned long size;
    unsigned long sizemask;
    unsigned long used;
} dictht;

typedef struct dictEntry {
    void *key;
    union {
        void *val;
        uint64_t u64;
        int64_t s64;
        double d;
    } v;
    struct dictEntry *next;
} dictEntry;


Each dictionary is constructed around use of three structures — dict, dictht and dictEntry. Dict — root structure, is stored in its ht field from 1 to 2 (during a resheng) the tables dictht with the list of slots. In turn dictht stores the linked list from dictEntry. Each dictEntry stores in itself the pointer on a key and value (in the form of the pointer or double/uint64_t/int64_t). If you use in quality value number — dictEntry will store number, store a line — the pointer on redisObject with sds onboard will be saved in the line.

When Redis addresses a specific key a table hash inside:
  1. By means of the hashing function there is the necessary slot (for search of the slot the hashing MurMur2 function is used now).
  2. All values of the slot get over one for one (chained list), comparing keys.
  3. Required operation over dictEntry is executed (with a key and value)

In the help to HSET or HGET it is written that it is the operations executed for O(1). Trust with care: the most part of time your hset will occupy O(n). In big tables (more than 1000 elements) with the active record hget will also occupy O(n).
The answer to a question how many memory it will be used actually, depends on an operating system, the compiler, type of your process and the used allokator (in redis by default of jemalloc). I give all further calculations for redis 3.0.5 collected on the 64th bit server under control of centos 7.

Now it is possible to esteem how many memory will be spent by a challenge of 'hset mySet foo bar'. An overhead projector on creation of the empty dictionary:
  • dictEntry: 3 * size_of(pointer)
  • dictht: 3 * size_of(unsigned longs) + size_of(pointer) = 24 + size_of(pointer)
  • dict: 2 * size_of(dictht) + 2 * size_of(int) + 2 * size_of(pointer) = 56 + 4 * size_of(pointer)

If to gather everything, then we receive a formula for calculation of approximate overhead costs for storage of n of elements:
	56 + 2 * size_of(pointer) + 3 * next_power(n) * size_of(pointer) 
	-------------------------    -------------------------------------
	      |	                                      |
              ` dict + dictht                    `dictEntry  


This formula does not consider expenses on storage of the key and value. The key always represents sds a line in redisObject. Value, depending on type can be in the line sds or native numbers (whole or with a floating point). Let's check for 'hset mySet foo bar', remembering that the minimum quantity of slots (dictEntry) is equal in the new dictionary to DICT_HT_INITIAL_SIZE (4):
56 + 2 * 8 + 3 * 4 * 8 = 168 + 2 * 56 = 280 bytes (about 296 bytes will be aligned).

We check:
config set hash-max-ziplist-entries 0
+OK
config set hash-max-ziplist-value 1
+OK
hset mySet foo bar
:1
debug object mySet
+Value at:0x7f44894d6340 refcount:1 encoding:hashtable serializedlength:9 lru:5614464 lru_seconds_idle:6

What at the same time shows info memory? It will show you that 320 bytes are spent. Is 24 bytes more, than we expected. This memory — expenses on alignment at memory allocation.

How at the same time the key of mySet and how Redis finds our hash on this name was saved? In the heart of Redis the structure of redisDb lies (redis database representation, in redis.h). It contains the uniform dict dictionary for all keys. Just the same about which it was told above. It is important as gives an idea of expenses on storage of base of keys and will give us a basis for understanding of councils that you should not store many simple keys in one copy.

Let's look that write storage of hundreds of millions of simple keys in article. If you need to store many couples a key value do not use line keys, use a table hash.
Gist with the test from the article Instagram, we will write on LUA that nothing was required for check except the started Redis:
flushall
+OK
info memory
$225
# Memory
used_memory:508408

eval "for i=0,1000000,1 do redis.call('set', i, i) end" 0   
$-1
info memory
$226
# Memory
used_memory:88578952

For storage of 1,000,000 integer keys we needed slightly more than 88 MB. Now we will save the same data in a hash the table, evenly distributing keys between 2000 a hash of tables:
flushall
+OK
info memory
$227
# Memory
used_memory:518496

eval "for i=0,1000000,1 do local bucket=math.floor(i/500); redis.call('hset', bucket, i, i) end" 0
$-1
info memory
$229
# Memory
used_memory:104407616

For storage of the same data with integer fields we spent slightly more than 103 MB. However, once we try to save we will tell 10,000,000 keys a situation sharply to change, namely ~ 970 MB against ~ 165 MB (and all of them in overhead costs of storage in system a hash to the table of keys). In general the rule "if you have hundreds of millions of keys, do not use line keys" — the truth.

Let's consider one more aspect. Very often describing optimization of such type, it is meant that you have many keys of a type entity: the identifier (for example 'set username:4566 RedisMan') also suggest to pass a hash of tables to use
bucket:id identifier value (for example 'hset usernames:300 4566 RedisMan'). It is important to understand that there is a partial substitution of concepts — sds a line 'username:4566'prevrashchayetsya in a field key 4566 to the coding REDIS_ENCODING_INT. It means that instead of sds of a line in redisObject began to use number, thereby the minimum 32 bytes on sds a line (after alignment of jemalloc) turned into 4 bytes.

Let's force the coding a hash of tables in ziplist:
config set hash-max-ziplist-entries 1000                                           
+OK
config set hash-max-ziplist-value 1000000
+OK
flushall
+OK
eval "for i=0,1000000,1 do local b=math.floor(i/500); redis.call('hset', 'usernames:' ..b, i, i) end" 0
$-1
info memory
$228
# Memory
used_memory:16816432

On storage of our data only 16 MB or 72 MB of economy (5 times less) with my example on 1,000,000 keys left. Invitingly? We will pass to an explanation in the second part of article.

In the intermediate conclusion it is worth formulating several important outputs for the loaded system which has to save memory:
  • Use numerical names of keys, values, fields in a hash tables everywhere where it is possible. Do not use prefixes/postfixes.
  • At design of the systems which are actively using Redis you proceed from the principle: one set of requirements to data — one Redis. It is difficult to store diverse data because of the hash-max-ziplist-entries/hash-max-ziplist-value settings and to differentiation of keys without prefixes.
  • Replacing simple keys with groups a hash of tables, you remember that your optimization works at quantity of keys from one million and above.
  • If in your copy millions and hundreds of millions of keys — you incur huge expenses on storage them in the system dictionary and a memory overexpenditure on a reserve of the system dictionary. Let's tell, for 100 million keys it will make about 2,5 GB only on system dict without your data.
  • If you pass under hash-max-ziplist-entries/hash-max-ziplist-value values and your data are stored in ziplist, instead of dict you can receive the economy of memory expressed in hundreds of percent. Also pay for it high use of CPU. About it further.

Table of contents:

The materials used when writing article:

This article is a translation of the original post at habrahabr.ru/post/271205/
If you have any questions regarding the material covered in the article above, please, contact the original author of the post.
If you have any complaints about this article or you want this article to be deleted, please, drop an email here: sysmagazine.com@gmail.com.

We believe that the knowledge, which is available at the most popular Russian IT blog habrahabr.ru, should be accessed by everyone, even though it is poorly translated.
Shared knowledge makes the world better.
Best wishes.

comments powered by Disqus