As such, there's really no "standard" benchmark that will inform you about the best technology to use for your application. Only your requirements, your data, and your infrastructure can tell you what you need to know.
For a start there is a little philosophy. NoSql surrounds and not to escape from it anywhere (though not really that and there was a wish). Let's leave questions of the deep reasons beyond the scope of this text, we will note only that this trend is reflected not only in emergence of new NoSql of solutions and development of old. One more edge — mixing of contrasts, namely support of storage of schema-less of data in traditional relational bases. In this gray area on a joint of a relational model of data storage and all the rest the dizzy quantity of opportunities is covered. But, as well as always, it is necessary to be able to find balance which is suitable for your data. It can be difficult, first of all because it is necessary to compare a little comparable things, for example, solution NoSql performance to the traditional database. In this small note such attempt will be offered and performance comparison of work with jsonb is given in PostgreSQL with json in Mysql and with bson in Mongodb.
What the hell in general occurs?
Short messages from fields:
- PostgreSQL 9.4 — new data type of jsonb which support will be a little expanded in the future PostgreSQL 9.5
- Mysql 5.7.7 — new data type of json
and some other examples about which I will tell next time. Remarkably the fact that these data types assume not text, but binary storage of json what does work with it much more bright. The basic functionality is identical everywhere since it is obvious requirements — to create, select, update, delete. The most ancient, almost cave, desire of the person in this situation — to carry out a number of benchmark'ov. PostgreSQL &Mysql; are selected since implementation of support of json is very similar in both cases (besides they are in identical weight category), and Mongodb — as the old resident of NoSql of the world. The work which is carried out by EnterpriseDB a little in this plan became outdated, but it can be taken, whether as the first step for the road to one thousand. At the moment the purpose of this road is not to show who quicker / medlenee in simulated conditions, and to try to give a neutral assessment and to receive feedback.
Basic data and some parts
pg_nosql_benchmark from EnterpriseDB assumes rather obvious way — at first the set data volume of a different view with easy fluctuations which then registers in the studied base is generated and on it there are selections.
The functionality for work with Mysql in it is absent therefore it was required to be implemented on the basis of similar for PostgreSQL. At this stage there is only one subtlety when we think of indexes — the matter is that in Mysql it is not implemented
indexing of json on a straight line therefore it is necessary to create virtual columns and to index already them. In addition, I was confused that for mongodb the part of the generated data the size exceeds 4096 bytes and does not hold in the mongo shell buffer, i.e. is just discarded. As Haque it turned out to execute insert'y of file js (which also it is necessary to break several chunk'ov since one cannot be more 2GB).
With all received changes inspections were carried out for the following cases:
- PostgreSQL 9.5 beta1, gin
- PostgreSQL 9.5 beta1, jsonb_path_ops
- PostgreSQL 9.5 beta1, jsquery
- Mysql 5.7.9
- Mongodb 3.2.0 storage engine WiredTiger
- Mongodb 3.2.0 storage engine MMAPv1
Each of them was unrolled on separate m4.xlarge to an instansa with ubuntu 14.04 x64 onboard with default settings, tests were carried out on the record count equal 1000000. For tests with jsquery it is necessary to read readme and not to forget
to set bison, flex, libpq-dev and even postgresql-server-dev-9.5. Results will be saved in json the file which can be visualized by means of matplotlib (see here).
All diagrams connected with runtime of requests are presented in seconds, connected with the sizes — in megabytes. Respectively, for both cases the less value, the is more performance.
Adding of tests for updating was one more change of rather original pg_nosql_benchmark code.
Here Mongodb, most likely because in PostgreSQL and Mysql updating even of one value was the explicit leader
on given the moment means rewriting of all field.
I have a bad feeling about this
Performance measurement — too delicate topic, especially in this case. Everything that is described above, cannot be considered as a full-fledged and complete benchmark, it is only the first step understanding the current situation — something like food for reflections. At present we are engaged in carrying out tests with use of ycsb, and also if carries, we will compare performance of cluster configurations. In addition, I will be glad to all constructive proposals, ideas and changes (since I quite could miss something).
This article is a translation of the original post at habrahabr.ru/post/274313/
If you have any questions regarding the material covered in the article above, please, contact the original author of the post.
If you have any complaints about this article or you want this article to be deleted, please, drop an email here: firstname.lastname@example.org.
We believe that the knowledge, which is available at the most popular Russian IT blog habrahabr.ru, should be accessed by everyone, even though it is poorly translated.
Shared knowledge makes the world better.