Developers Club geek daily blog

3 years ago
Archives and spiral of historyAbout the prompt growth of information volumes, generated around the world, it is told at every turn. Usually remember it when the speech comes about network infrastructure, client content, search technologies and many other things. The same situation is observed and in corporate segment: volumes of the stored information repeatedly grow in the majority of the organizations. According to the report of Forrester Research, about 85% of data in enterprise systems are represented by static content which already will never change. Various requirements of the authorities and regulators oblige to store the organizations different information for some years (for example, information on all clients and the made transactions, etc.). As a result business should spend considerable funds for ensuring storage of this information, being put in park of servers, SHD, acquisition of software, etc.

The desire of many companies to have opportunity to carry out the analysis and search in all available volume of accumulated information has turned out to be one more consequence of growth of volumes of the stored data. Since certain moment such task turns into problem of processing of Big Data. As a result, there is need of search of solutions, it is more suitable for storage and work with similar arrays of information. Therefore many look for more favorable solutions for storage of similar arrays of information and work with them.

The example of the Nokia company which has recently sold the mobile division of Microsoft is curious. According to terms of the contract, Finns had to give to the new owner all information archive of division. Considering large volume of data, in Nokia have approached it creatively: the compact system of archive storage in which database have filled in all necessary information has been acquired, and then all this system entirely have simply sent to Microsoft.

Speaking about growth of volumes of the stored information, it is necessary to mention and data storage from outdated applications. In process of upgrades of information systems work environments change, new software packages are implemented, the structure of databases is rebuilt. As a result the big array of information stored in look which is not used by the organization collects. But, as often it is required to provide availability of these data, additional resources for years are spent for maintenance of already irrelevant equipment and software.

About myths


Today many are inclined to consider archive systems as low or irrelevant technology. For example, it is considered that backup successfully replaces archiving. In practice it is not so interchangeable concepts. Unlike backup, the archive is intended for saving of information without its excess duplication, allows to structure and index data, provides access to them with search capability with optional enciphering and application different the politician. Besides, transfer of static data in archive allows to lower load of applications and to manage cheaper server clusters and SHD.

Also popular belief, what archive? it is the certain chaotic pile of information reflecting history of the company and unnecessary for solution of the current and future business challenges. However we have already mentioned about such trend as the analysis of data array about company performance which is saved up during its existence above. According to the forecast of Gartner, by 2017 about 75% of the organizations will use own archive as initial information source. Today such organizations about 10%.

The following prejudice concerning archiving is connected with desire to hide some inconvenient information from the regulator. After all where it is simpler to find something in archive. However this situation has reverse side: the size of penalties for failure to provide information requested by the regulator can make millions of dollars. And it repeatedly is more than expenses on creation of archive.

By the way about expenses. There is opinion, what archiving? expensive pleasure. However in practice archive systems give noticeable economy of means. It is connected with use of cheaper carriers, with reduction of cost of support and growth of productivity of the main workers of systems. It is also necessary to remind also that in 2014 became record by number of information leakages, and the volume of the stolen data has grown in comparison with 2013 by 78%. Reputation and legal cost can also manage where is more expensive, than use of archive system with data encryption.

At last, one more argument against creation of archives is the opinion that the same functionality is provided also by platform ESM. But there is number of differences. First, the archive is intended for work at the same time with the structured and unstructured data. It is optimized for storage of billions of records and documents. Secondly, as it was noted above, data storage in archive manages cheaper due to transfer on cheaper carriers, and also reduction of the size of backup copy and release of resources of working system.

Archive of modern sample


The modern archive system allows to solve five main objectives:
  • Saving of data for future use.
  • Providing continuous access of users to the stored data.
  • Ensuring confidentiality of access.
  • Decrease in load of working systems due to transfer in archive of static data.
  • Use politician of data storage.

Also important property of archive is storage of the structured and unstructured information in uniform DB. Naturally, the base has to be unrolled on separate horizontally scalable SHD that it was possible to expand without serious consequences archive in process of increase in data volumes.

As similar solution EMC InfoArchive can be used. It is the complex product representing sheaf? SHD + software platform of archiving and enciphering?. Also InfoArchive will be useful when it is necessary to store legacy data from diverse systems and in different formats, and also for tasks of the analysis of lakes of data. Under? lake of data? the repository with very large volume of crude data in initial formats, without any hierarchical structure is meant.

Depending on specific conditions (number of the structured and unstructured data; availability and structure of the outdated systems supported in the company; need for use of analytical tools; creation of cloud services, etc.) can be constructed by InfoArchive on the basis of EMC Isilon, DataDomain, Atmos or Centera. On the selected SHD the EMC Documentum Dynamic Delivery Services (DDS) database based on xDB and using a number of the international standards, including the open XML and OAIS standards (Open Archival Information System) is unrolled.

Archives and spiral of history

Feature of InfoArchive is that all data have to be transferred to system or in the form of information packets of SIP, according to the OAIS standard or in the form of simple XML of structures if it is not required to the customer of compliance with the OAIS standard. All information which is also stored in InfoArchive can be available through JDBC to the subsequent use / recovery in initial application.

Archives and spiral of history

Data in relational DB are presented in the form of the connected tables. When the user requests some information, application sends requests according to tables, aggregates the received answers and provides to the user.

Archives and spiral of history

Archives and spiral of history

Archives and spiral of history

Archives and spiral of history

For storage, ordering and transfers of the structured information and metadata of unstructured information are used XML files. It allows to create the archive integrating data from separate applications. In InfoArchive opportunity to carry out search in all stored data and to use politicians of storage, to provide enciphering on the fly and to exercise access control to certain data and their sets is implemented. Regardless of the volume of archive, only one DBMS is used.

System performance depends on quantity and configuration of SHD, and also on configuration of the platform. For example, at some clients InfoArchive productivity when obtaining structured data reaches 2 million records/hour (to 60 Gb/hour). The system is capable to process to 15 000 search queries per hour, search of one document on average takes 0,5 sec., records? 2,5 sec.

For data security politicians of access and enciphering by means of EMC RSA KeyManager are used. Also InfoArchive can be integrated with other encryption systems.

Conclusion


Today archive systems first of all are implemented in those companies in which most sharply there are problems of increase of data volumes which need to be stored and provided on request of regulators. First of all it is financial sector, the telecommunication industry, municipal services and public sector. And as practice of our company shows, interest in archives is shown even more often by the mid-size companies which are actively trying to strengthen the market positions. Visual evidence of approach of information era.

This article is a translation of the original post at habrahabr.ru/post/262673/
If you have any questions regarding the material covered in the article above, please, contact the original author of the post.
If you have any complaints about this article or you want this article to be deleted, please, drop an email here: sysmagazine.com@gmail.com.

We believe that the knowledge, which is available at the most popular Russian IT blog habrahabr.ru, should be accessed by everyone, even though it is poorly translated.
Shared knowledge makes the world better.
Best wishes.

comments powered by Disqus