Developers Club geek daily blog

1 year, 3 months ago
In the first part of this series of articles, I created good function of saving, as well as other function allowing to create the changeable dokumento-oriented tables on the fly. They work regularly and do that it is necessary, but we can make still a lot of things. Especially: I want the full-text search indexed on the fly and saving of many documents in transaction.

Let's make it.

Full-text search


Our dokumento-oriented table has a search tsvector field which is indexed, using a GIN index for speed. I want to update this field every time when I save the document, and I do not want a lot of noise from API when I do it.

In this regard, I will resort to some convention.

Usually, during creation of a full text index, fields are stored with pretty specific names. Such as:
  • Name, or surname, perhaps e-mail address
  • Name, or description something
  • Address information

I would like to check my document at the time of saving for existence of any keys which I would like to index and after that to save them in the field of search. It is possible to make it by means of function which I called update_search:
create function update_search(tbl varchar, id int)
returns boolean
as $$
  //get the record
  var found = plv8.execute("select body from " + tbl + " where id=$1",id)[0];
  if(found){
    var doc = JSON.parse(found.body);
    var searchFields = ["name","email","first","first_name",
                       "last","last_name","description","title", 
                       "street", "city", "state", "zip", ];
    var searchVals = [];
    for(var key in doc){
      if(searchFields.indexOf(key.toLowerCase()) > -1){
        searchVals.push(doc[key]);
      }
    };

    if(searchVals.length > 0){
      var updateSql = "update " + tbl + " set search = to_tsvector($1) where id =$2";
      plv8.execute(updateSql, searchVals.join(" "), id);
    }
    return true;
  }else{
    return false;
  }

$$ language plv8;

I use javascript (PLV8) for these purposes again, and I extend the document based on ID. Then I pass on all keys what to check whether there are among them those which I could want to store, and if is, I place them in an array.

If in this archive there are hits, I concatenate these objects and I save them in the field of document search, using the to_tsvector function which is built in Postgres which takes the literal text and turns it into the indexed values.

Here and it! Executing this script, we receive the following:

image

Ideally — now I can just insert it into the end of my save_document of function and it will be caused transactionally every time when I save something:

create function save_document(tbl varchar, doc_string jsonb)
returns jsonb
as $$
  var doc = JSON.parse(doc_string);
  var result = null;
  var id = doc.id;
  var exists = plv8.execute("select table_name from information_schema.tables where table_name = $1", tbl)[0];

  if(!exists){
    plv8.execute("select create_document_table('" + tbl + "');");
  }

  if(id){
    result = plv8.execute("update " + tbl + " set body=$1, updated_at = now() where id=$2 returning *;",doc_string,id);
  }else{
    result = plv8.execute("insert into " + tbl + "(body) values($1) returning *;", doc_string);
    id = result[0].id;
    doc.id = id;
    result = plv8.execute("update " + tbl + " set body=$1 where id=$2 returning *",JSON.stringify(doc),id);
  }

  //run the search indexer
  plv8.execute("perform update_search($1, $2)", tbl,id);
  return result[0] ? result[0].body : null;

$$ language plv8;

Saving of many documents


At the moment, I can transfer the single document of the save_document function, but I would like to have an opportunity to transfer it an array. I can make it checking argument type then to start a cycle:

create function save_document(tbl varchar, doc_string jsonb)
returns jsonb
as $$
  var doc = JSON.parse(doc_string);

  var exists = plv8.execute("select table_name from information_schema.tables where table_name = $1", tbl)[0];
  if(!exists){
    plv8.execute("select create_document_table('" + tbl + "');");
  }

  //function that executes our SQL statement
  var executeSql = function(theDoc){
    var result = null;
    var id = theDoc.id;
    var toSave = JSON.stringify(theDoc);

    if(id){
      result=plv8.execute("update " + tbl + " set body=$1, updated_at = now() where id=$2 returning *;",toSave, id);
    }else{
      result=plv8.execute("insert into " + tbl + "(body) values($1) returning *;", toSave);

      id = result[0].id;
      //put the id back on the document
      theDoc.id = id;
      //resave it
      result = plv8.execute("update " + tbl + " set body=$1 where id=$2 returning *;",JSON.stringify(theDoc),id);
    }
    plv8.execute("select update_search($1,$2)", tbl, id);
    return result ? result[0].body : null;
  }
  var out = null;

  //was an array passed in?
  if(doc instanceof Array){
    for(var i = 0; i < doc.length;i++){
      executeSql(doc[i]);
    }
    //just report back how many documents were saved
    out = JSON.stringify({count : i, success : true});
  }else{
    out = executeSql(doc);
  }
  return out;
$$ language plv8;

The good party of work here with javascript'om is that the logic necessary for such routine is rather simple (contrary to PLPGSQL). I selected all process of saving in its separate function — which is javascript'om after everything — thus I can avoid duplication.

Then I want to check that the argument given on an input — an array. If this is so, then I go on his members and I cause executeSql, returning everything that collected at execution.

If it is not an array, I just execute still, as well as was, returning the document entirely. Result:

image

Great! The best in is that all this occurs in transaction. It is pleasant to me!

Strangenesses of Node


If only it could work ideally from Node! I tried both in .NET, and in Node, everything simply works with .NET (strange) using Npgsql library. From Node, not that.

To put it briefly: node_pg the driver does very strange conversion when he sees object of an array as input parameter. Let's pay attention to the following:

var pg = require("pg");
var run = function (sql, params, next) {
  pg.connect(args.connectionString, function (err, db, done) {
    //throw if there's a connection error
    assert.ok(err === null, err);

    db.query(sql, params, function (err, result) {
      //we have the results, release the connection
      done();
      pg.end();
      if(err){
        next(err,null);
      }else{
        next(null, result.rows);
      }
    });
  });
};

run("select * from save_document($1, $2)", ['customer_docs', {name : "Larry"}], function(err,res){
  //works just fine
}

It is normal Node/PG a code. Right at the end, the run function is configured to cause my save_document function and to transfer some data. When PG sees input object, it turns it into a line and saving will take place normally.

In case to send an array …

run("select * from save_document($1, $2)", 
         ['customer_docs', [{name : "Larry"}, {name : "Susie"}], 
         function(err,res){
  //crashes hard
}

I receive back the error telling me that it is incorrect JSON. The error message (from Postgres) will report that it is connected with poorly formatted JSON:

{"{name : "Larry"}, ...}

That … yes, it is awful. I tried to formulate that occurs, but it simply looks so that node_pg the driver sorts an external array — perhaps calling the flatten method of Underscores library. I do not know. That to bypass it, you need to change your challenge to the following:

run("select * from save_document($1, $2)", 
         ['customer_docs', JSON.stringify([{name : "Larry"}, {name : "Susie"}]), 
         function(err,res){
  //Works fine
}

Forward!


Procedure of saving is happy smooth and it pleases me. In the following article I will configure search engines, and also I will create function of full-text search.

This article is a translation of the original post at habrahabr.ru/post/272411/
If you have any questions regarding the material covered in the article above, please, contact the original author of the post.
If you have any complaints about this article or you want this article to be deleted, please, drop an email here: sysmagazine.com@gmail.com.

We believe that the knowledge, which is available at the most popular Russian IT blog habrahabr.ru, should be accessed by everyone, even though it is poorly translated.
Shared knowledge makes the world better.
Best wishes.

comments powered by Disqus