Developers Club geek daily blog

1 year, 1 month ago
In the first part I told that a hash the table is a few LIST, SET and SORTED SET. You judge — LIST consists of ziplist/linkedlist, SET consists of dict/intset, and SORTED SET is ziplist/skiplist. We already considered the dictionary (dict), and in the second part of article we will consider structure of ziplist — the second most often applicable structure under Redis cowl. Let's look at LIST — the second part of its "kitchen" is simple implementation of the chained list. It is useful to us attentively to consider often mentioned council about optimization a hash of tables through their replacement by lists. Let's consider how many it is required to memory on overhead costs when using these structures what price you pay for economy of memory. Let's sum up the results during the work about a hash as tables, when using the coding in ziplist.

Last time we finished that ziplist of 1,000,000 keys saved with use occupied 16 MB of random access memory whereas in dict the same data demanded 104 MB (ziplist 6 times less!). Let's understand what price:
Under Redis cowl: Hash table (part 2) and List


So, ziplist is a doubly-connected list. Unlike the normal chained list, here links in each node indicate on previous and the subsequent node in the list. According to the doubly-connected list it is possible to move effectively in any direction — both by the beginning, and by the end. In this list it is simpler to make removal and shift of elements as addresses of those elements of the list which pointers are directed to a changeable element are easily available. The Redis developers position the implementation as effective from the point of view of memory. The list is able to store lines and integral numbers, at the same time numbers are stored in a type of numbers, but not redisObject with value. And if you want to save the line "123" it there will be savings as number 123, but not string '1','2','3'.
In a branch 1.kh Redis instead of dict in the dictionary was used by zipmap — very simple (~ 140 lines) the implementation of the chained list optimized on economy of memory where all operations occupy O(n). This structure is not used in Redis (though is in its source codes and it is supported in an actual status), at the same time part of ideas of zipmap formed ziplist basis.

The key and values are stored as located one for one elements of the list. Operations over the list it is search of a key, through search and work with value which an arrangement in the following element of the list. Theoretically paste operations and changes are executed for constant O(1). Actually any such operation in implementation of Redis demands selections and memory relocations and real complexity
directly depends on amount of already used memory. In this case we remember about O(1) as an ideal case and O(n) — as the worst.
+-------------+------+-----+-------+-----+-------+-----+
| Total Bytes | Tail | Len | Entry | ... | Entry | End | 
+-------------+------+-----+-------+-----+-------+-----+
                                ^  
                                | 
+-------------------+--------------+----------+------+-------------+----------+-------+
| Prev raw len size | Prev raw len | Len size |  Len | Header size | Encoding | Value | 
+-------------------+--------------+----------+------+-------------+----------+-------+

Look at this structure in source codes
#define ZIPLIST_HEADER_SIZE  (sizeof(uint32_t)*2+sizeof(uint16_t))

typedef struct zlentry {
    unsigned int prevrawlensize, prevrawlen;
    unsigned int lensize, len;
    unsigned int headersize;
    unsigned char encoding;
    unsigned char *p;
} zlentry;


Let's count its size.
The answer to a question how many memory it will be used actually, depends on an operating system, the compiler, type of your process and the used allokator (in redis by default of jemalloc). I give all further calculations for redis 3.0.5 collected on the 64th bit server under control of centos 7.

With heading everything is simple - it is ZIPLIST_HEADER_SIZE (the constant, is defined in ziplist.c and sizeof(uint32_t) * 2 + is equal to sizeof(uint16_t) + 1). The element size — 5 * size_of(unsigned int) + 1 + 1ne less than 1 byte on value. The general overhead projector on data storage in such coding for n of elements (without value):
12 + 21 * n

Each element begins with the fixed heading consisting of several fragments with information. The first — the size of the previous element is also used for transition according to the doubly-connected list back. The second, the coding with an optional length of the structure in bytes. Length of the previous element works as follows: if length does not exceed 254 bytes (it is the constant ZIP_BIGLEN) that 1 byte for storage of length of value will be used. If length exceeds or is equal 254, 5 bytes will be used. At the same time the first byte will be set in constant ZIP_BIGLEN value that we understood that we have a long value. The remained 4 bytes store length of the previous value. The third — heading which depends on the value which is contained in a node. If value a line — 2 first bits of heading of a header field store coding type for length of a line, with the subsequent number with a length of a line. If znachny — integer, the first 2 bits are exposed in unit. For numbers 2 more bits which define what dimension whole in it it is stored are used.
Interpretation of heading, for persons interested to bother
As lies Byte What lies
|00pppppp | 1 Line, value which is less or equally in 63 bytes (i.e. sds with REDIS_ENCODING_EMBSTR, see the first part of a series)
|01pppppp|B1 byte | 2 Line, value which is less or equally in 16383 bytes (14 bits)
|10 ______ |BBBBB4 byte | 5 Line, value which is more or equally in 16384 bytes
|11000000 | 1 Whole, int16_t (2 bytes)
|11010000 | 1 Whole, int32_t (4 bytes)
|11100000 | 1 Whole, int64_t (8 bytes)
|11110000 | 1 Whole signed which "will get" into 24 bits (3 bytes)
|11111110 | 1 Whole signed which "will get" into 8 bits (1 byte)
|1111xxxx | 1 Where xxxx lies between 0000 and 1101 means whole which "will get" into 4 bits. Without signed integer from 0 to 12. The decoded value upon from 1 to 13, 0000 and 1111 are already busy and we will always read 1 of the decoded value to receive required.
|11111111 | 1 Marker of the end of the list


Let's check and will look on the back at the party of a medal. We will use LUA so you will need nothing except Redis to repeat this test independently. At first we watch what turns out when using dict.
Let's execute
multi
time
eval "for i=0,10000,1 do redis.call('hset', 'test', i, i) end" 0
time

and
multi
time
eval "for i=0,10000,1 do redis.call('hget', 'test', i) end" 0
time

Output from redis=cli
config set hash-max-ziplist-entries 1                   
+OK
config set hash-max-ziplist-value 1000000
+OK
flushall
+OK
info memory 
$225
# Memory
used_memory:508536
multi
+OK
time
+QUEUED
eval "for i=0,10000,1 do redis.call('hset', 'test', i, i) end" 0
+QUEUED
time
+QUEUED
exec
*3
*2
$10
1449161639
$6
948389
$-1
*2
$10
1449161639
$6
967383
debug object test
+Value at:0x7fd9864d6470 refcount:1 encoding:hashtable serializedlength:59752 lru:6321063 lru_seconds_idle:23
info memory
$226
# Memory
used_memory:1025432
multi
+OK
time
+QUEUED
eval "for i=0,10000,1 do redis.call('hget', 'test', i) end" 0
+QUEUED
time
+QUEUED
exec
*3
*2
$10
1449161872
$6
834303
$-1
*2
$10
1449161872
$6
841819

flushall
+OK
info memory
$226
# Memory
used_memory:510152

config set hash-max-ziplist-entries 100000
+OK
multi
+OK
time
+QUEUED
eval "for i=0,10000,1 do redis.call('hset', 'test', i, i) end" 0
+QUEUED
time
+QUEUED
exec
*3
*2
$10
1449162574
$6
501852
$-1
*2
$10
1449162575
$6
212671

debug object test
+Value at:0x7fd9864d6510 refcount:1 encoding:ziplist serializedlength:59730 lru:6322040 lru_seconds_idle:19

info memory 
$226
# Memory
used_memory:592440
multi
+OK
time
+QUEUED
eval "for i=0,10000,1 do redis.call('hget', 'test', i) end" 0
+QUEUED
time
+QUEUED
exec
*3
*2
$10
1449162616
$6
269561
$-1
*2
$10
1449162616
$6
975149


In a case with encoding:hashtable (dict) spent ~ 516 kb of memory and 18,9 ms for saving of 10,000 values, 7,5 ms on their reading. With encoding:ziplist 81 kb of memory and 710 ms on saving, 705 ms on reading are received ~. For the test in 10,000 records received:
Prize on memory at the cost of 6 times of shrinkage on writing rate by 37,5 times and by 94 times on reading.

At the same time it is important to understand that falling of performance not linear and already for 1,000,000 you just you risk not to wait for results. Who will begin to put in ziplist of 10,000 elements? It is, unfortunately, one of the first recommendations from most of consultants. When game still costs candles? I would tell that so far the quantity of elements lies in the range of 1 — 3500 elements you can select, remembering what on memory at ziplist always benefits above from 6 times. Everything that more — measure on the real data, but it will not have any relation to the loaded real-time systems any more. Here that happens to reading/record performance depending on hash size on dict and ziplist (gist on the test):
Under Redis cowl: Hash table (part 2) and List
Why so? The price of an insert, change of length of elements and removal at ziplist terrible is or realloc (all work to lay down on allokator shoulders) or complete evolution of the list from n + 1 from the changed element until the end of the list. Evolution is many melkofragmentny realloc, memmove, memcpy (see __ ziplistCascadeUpdate in ziplist.c).

Why in article about HASH it is important to talk about LIST? Matter in one very important council about optimization of structured data. It went it seems from DataDog, precisely I find it difficult to tell. In transfer sounds so (original):
You store in Redis data of the users, for example 'hmset user:123 id 123 firstname Ivan of lastname Ivanov of location Tomsk twitter ivashka'. Now, if to create one more user, you in empty will spend memory for names of fields — id, firstname, lastname, etc. If such users is a lot of the thrown-out memory much (we already have to consider, how many — for this set of fields 384 bytes on one user).

So, if at you
  1. You have many objects, will tell from 50,000 and above.
  2. Your objects have regular structure (in fact always all fields)

It is possible to use the concept of the named trains from a python — the linear list with access only to reading around which to construct a hash the table "hands". Roughly telling "fisrtname" - it is value on 0 index of the list, "lastname" on the first, etc. Then creation of the user will look as 'lpush user:123 Ivan Ivanov Tomsk ivashka'.

And it is very good advice. It is obvious that the list (LIST) will help you to save strongly — a minimum by 2 times.
As well as in HASH — with the help list-max-ziplist-entries/list-max-ziplist-val you manage what internal data type specific list a key will be provided. For example, at list-max-ziplist-entries = 100 your LIST will be provided as REDIS_ENCODING_ZIPLIST until in it there are less than 100 elements. As soon as elements become more, it will be converted into REDIS_ENCODING_LINKEDLIST. The list-max-ziplist-val setup works similarly hash-max-ziplist-val (see the first part).

We already dealt with ziplist, let's watch REDIS_ENCODING_LINKEDLIST. In implementation of Redis it is very simple (~ 270 lines of a code) not sorted single-connected list (all as in Wikipedia). With all its merits and demerits:
typedef struct listNode {
    struct listNode *prev;
    struct listNode *next;
    void *value;
} listNode;

typedef struct list {
    listNode *head;
    listNode *tail;
    void *(*dup)(void *ptr);
    void (*free)(void *ptr);
    int (*match)(void *ptr, void *key);
    unsigned long len;
} list;

Nothing difficult — each LIST in such coding is 5 pointers and 1 unsigned long. So overhead costs it is 5 * size_of(pointer) + 8 bytes. And each node is 3 pointers. An overhead projector on data storage in such coding for n of elements it:
5 * size_of(pointer) + 8 + N* 3 * size_of(pointer)

In implementation of linkedlist there are no realloc — only memory allocation under a fragment necessary to you, in values — redisObejct without any cunnings and optimization. On a memory overhead projector a difference on office expenses between ziplist and linkedlist about 15%. If to consider service data plus values — many times (from 2 and above), depending on value type. So for the list in 10,000 elements, consisting only of digits in ziplist you will spend about 41 kb of memory, in linkedlist — 360 kb of real memory:
config set list-max-ziplist-entries 1
+OK
flushall
+OK
info memory
$226
# Memory
used_memory:511544
eval "for i=0,10000,1 do redis.call('lpush', 'test', i) end" 0
$-1
debug object test
+Value at:0x7fd9864d6530 refcount:1 encoding:linkedlist serializedlength:29877 lru:6387397 lru_seconds_idle:5
info memory
$226
# Memory
used_memory:832024
config set list-max-ziplist-entries 10000000
+OK
flushall
+OK
info memory
$226
# Memory
used_memory:553144
eval "for i=0,10000,1 do redis.call('lpush', 'test', i) end" 0
$-1
info memory
$226
# Memory
used_memory:594376
debug object test
+Value at:0x7fd9864d65e0 refcount:1 encoding:ziplist serializedlength:79681 lru:6387467 lru_seconds_idle:16

Let's look at a difference in performance at record and receipt of an element through LPOP (to read and delete). Why not LRANGE — its complexity is designated as O(S+N) and for both implementations of the list this time konstantno. With LPOP all not just as is written to documentation — complexity is designated as O(1). But that will occur if I need consecutive reading to consider everything:
Under Redis cowl: Hash table (part 2) and List
What not so with a reading speed when using ziplist? Each 'LPOP' — extraction of an element c of a list head with its complete evolution. By the way, if we use on reading RPOP, instead of LPOP — the situation will not strongly change (hi realloc'u from update handler function of the list in ziplist.c). Why it is important? The RPOPLPUSH/BRPOPLPUSH commands a popular solution at the organization of queues based on Redis (for example Sidekiq, Resque). When in such queue with the coding ziplist value (from several thousand) — receipt of one element any more not konstantno renders a large number and system begins "to be in a fever".

In end came it is time to formulate several outputs:
  • At you will not leave to use ziplist in a hash tables with a large number of values (of 1000) if to you performance at large volume of record is still important.
  • If data in yours a hash to the table have regular structures — forget about a hash of the table and you pass to data storage in lists.
  • What type of the coding you would not use — Redis is ideal for digits, we accept for lines with a length up to 63 bytes and is not unambiguous at storage of lines of the bigger size.
  • If in your system there are a lot of lists, you remember that so far they small (to list-max-ziplist-entries) — you spend not enough memory and everything, in general, is acceptable on performance, but as soon as they begin to grow — memory can sharply grow from 2 times and is higher jump, and process of change of the coding will take powerful time (recreation with a consecutive insert + removal).
  • Be careful with submission of the list (the list-max setup - *) if you use the list for creation of queues or active record / reading with removal. Or otherwise, if you use Redis for creation of queues based on lists — expose list-max-ziplist-entries = 1 (memories all the same you will spend only a little more)
  • Redis never gives the memory which is already selected with system. Consider overhead costs of the service information and strategy of size variation — at active record you can strongly increase fragmentation of memory because of this feature and spend up to the 2 times more random access memory, than calculate. It is especially important when you start N copies of Redis on one physical the server.
  • If it is important to you to store diverse on the volume and access rate this on one Redis — think over to add a little the Redis code and to pass to the list-max setup - * parameters on each key, instead of the server.
  • Codings of the same data type on master/slave can differ that allows you to approach more flexibly requirements — for example, quickly and with a big expense of memory on reading, more slowly and more economically on memory on the slave or on the contrary.
  • Overhead costs when using ziplist are minimum, to store lines cheaper in ziplist than in any other structure (an overhead projector of zlentry of only 21 bytes per line while traditional representation of redisObject + sds a line — 56 bytes).

Table of contents:

The materials used when writing article:

This article is a translation of the original post at habrahabr.ru/post/272089/
If you have any questions regarding the material covered in the article above, please, contact the original author of the post.
If you have any complaints about this article or you want this article to be deleted, please, drop an email here: sysmagazine.com@gmail.com.

We believe that the knowledge, which is available at the most popular Russian IT blog habrahabr.ru, should be accessed by everyone, even though it is poorly translated.
Shared knowledge makes the world better.
Best wishes.

comments powered by Disqus