Developers Club geek daily blog

1 year, 2 months ago
This article is transfer, original article is here, the author of Rob Conery.

Postgres as many know, supports JSON as data storage type, and with an output 9.4, Postgres supports storage of JSON in the form of jsonb now — a binary format.

These are fine news to those who want to step further simple "storages of JSON as the text". jsonb supports indexing with use of a GIN index now, and also the special operator of requests who allows to get advantages of a GIN index has.

Whom does it worry?


It was cheerful to open for itself jsonb in Postgres and to see what it is capable of. What, in own way, is a problem: it only acquaintance and reflections to perform some work, are not enough of it.

The fact that in other systems (such as RethinkDB) the huge, already built-in functionality is available is meant to help you to save documents, to send requests to these documents and to perform optimization. Postgres have some interesting opportunities in this direction too, but writing of requests "from a box" the trifle … is not enough if to be fair.

Let's look at this request:

select document_field -> 'my_key' from my_docs
where document_field @> '{"some_key" : "some_value"}';

It slightly slightly opens strangeness of that moment when business reaches JSON and Postgres: all this lines. Obviously, SQL is not able to distinguish JSON therefore it is necessary to format it as a line. What in turn means that to work with JSON directly in SQL it is pain. Of course, if you have a good means of drawing up requests, then the problem becomes simpler to some extent … but it still exists.

Moreover, storage of the document is quite free. To use one field which is jsonb? Or several fields in larger tabular structure? All this depends only on you that, of course, is quite good, but too big freedom of choice can be the paralyzing factor too.

So why it is worth worrying about it? If you want to use the dokumento-oriented database, then use the dokumento-oriented database. I agree with it … but there is one rather irresistible reason to use Postgres (at least for me) …

image

Postgres ACID is compatible. Means it is possible to calculate that she will write your data and, quite possibly, will not lose them.

Besides, Postgres is relational database that in turn means that at desire over time to pass to more strict scheme it is possible. There is a certain quantity of the reasons for which there can be a desire to select Postgres, for this moment, will assume that a choice is made and time to begin to work with documents and jsonb came.

The best API


As for me, I would like to see more functions which support idea of work with documents. At the moment we have the built-in tools which allow to deal with JSON types, but nothing that supports higher level of abstraction.

It does not mean at all that we will not be able to construct such API the hands … As it was made by me. Begins …

Dokumento-oriyentirovannaya table


I want to store documents in the table which contains metadata, as well as additional ways of work with information, namely: Full-text search (Full Text Search).

The structure of the table can vary — why don't we construct this abstraction! Let's begin with it:

create table my_docs(
  id serial primary key,
  body jsonb not null,
  search tsvector,
  created_at timestamptz not null default now(),
  updated_at timestamptz not null default now()
)

There will be a certain duplication here. The document, in itself, will be stored in the body field, including id which, in turn, is stored as primary key (it is necessary, as all this still Postgres). I use duplication, however, for the following reasons:

  • This API belongs to me and I can be sure that everything is synchronized
  • So becomes in the dokumento-oriented systems

Saving the document


What I would like from function save_document

  • To create tables on the fly
  • To create the appropriate indexes
  • To create timestamp'y and a search field (for a full text index)

It can be reached, having made the own save_document function and, for the sake of fun, I will use PLV8 — javascript in the database. Actually I will create two functions — one will create in unusual way my table, another will save the document.

The first, create_document_table:

create function create_document_table(name varchar, out boolean)
as $$
  var sql = "create table " + name + "(" + 
    "id serial primary key," + 
    "body <b>jsonb</b> not null," + 
    "search tsvector," + 
    "created_at timestamptz default now() not null," + 
    "updated_at timestamptz default now() not null);";

  plv8.execute(sql);
  plv8.execute("create index idx_" + name + " on docs using GIN(body <b>jsonb</b>_path_ops)");
  plv8.execute("create index idx_" + name + "_search on docs using GIN(search)");
  return true;
$$ language plv8;

This function creates the table and the appropriate indexes — one for field jsonb in our dokumento-oriented table, another for tsvector of a full text index. Pay attention that I build SQL lines on the fly and I execute by means of plv8 — here is how it is worth behaving with javascript in Postgres.

Further, let's create our save_document function:

create function save_document(tbl varchar, doc_string jsonb)
returns jsonb
as $$
  var doc = JSON.parse(doc_string);
  var result = null;
  var id = doc.id;
  var exists = plv8.execute("select table_name from information_schema.tables where table_name = $1", tbl)[0];

  if(!exists){
    plv8.execute("select create_document_table('" + tbl + "');");
  }

  if(id){
    result = plv8.execute("update " + tbl + " set body=$1, updated_at = now() where id=$2 returning *;",doc_string,id);
  }else{
    result = plv8.execute("insert into " + tbl + "(body) values($1) returning *;", doc_string);
    id = result[0].id;
    doc.id = id;
    result = plv8.execute("update " + tbl + " set body=$1 where id=$2 returning *",JSON.stringify(doc),id);
  }

  return result[0] ? result[0].body : null;

$$ language plv8;

I am sure that this function looks a little strange, but if to read it line by line, it is possible to understand some things. But why JSON.parse is caused ()?

It is connected with the fact that Postgres'ovsky the jsonb type is not JSONOM here — it a line. Outside our PLV8 of section all this still the world of Postgres and it works with JSON as with a line (storing it in jsonb in a binary format). Thus, when our document gets to our function in the form of a line which needs to be parsit if we want to work with it, as with JSON object in javascript'e.

In case of insert'a it is possible to notice that I should synchronize the ID document with primary key which was created. It is a little bulky, but it well works.

As a result, it is possible to notice that at original insert'e as well as at update, as input argument for plv8.execute doc_string moves. It is also connected with the fact that it is necessary to handle JSON values as with lines in Postgres.

It can really confuse. If I try to give on doc input (our JSON.parsed object), then it will be transformed by plv8 to [Object object]. What is strange.

Moreover, if I try to return to javascript object from this function (we will assume, our doc variable) — I will receive an error that it is the wrong format for the JSON type. What drives into a stupor.

As result I just return data from result of execution of request — and it is a line, want — believe, you want — no, and I just can give it directly as result. Here It should be noted that all results of plv8.execute return in the form of elements with which it is possible to work as with javascript objects.

Result


Works really very well! And quickly. If you want to try it in business — you should install the PLV8 extension and then to write your request agrees:

create extension plv8;
select * from save_document('test_run', '{"name" : "Test"}');

You have to see the new table and new entry in this table:

image

Plans for the future


In the following article I will add some additional features, namely:

  • Weeding automatic update search
  • Insert of a set of documents, using arrays

This good start!

This article is a translation of the original post at habrahabr.ru/post/272395/
If you have any questions regarding the material covered in the article above, please, contact the original author of the post.
If you have any complaints about this article or you want this article to be deleted, please, drop an email here: sysmagazine.com@gmail.com.

We believe that the knowledge, which is available at the most popular Russian IT blog habrahabr.ru, should be accessed by everyone, even though it is poorly translated.
Shared knowledge makes the world better.
Best wishes.

comments powered by Disqus