Certainly, data protection methods from loss are defined both by the information volume, and the device on which they are stored. Both that, and another constantly evolves.
Therefore long ago disputes between supporters of traditional approach to backup and those who look for the new methods of data protection more convenient from the point of view of architecture of the used storage systems are conducted. Now this dispute escalated as recently a variety of types of storage systems sharply grew. Use of some of them demands to change approaches to usual problems of operation and ensuring availability, including backup about which I want to talk here.
Traditional approach consists even not in saving of a backup copy on a tape. It is about existence of a backup copy in the storage system intended especially for a backup. Whether it be disk SRK, or tape — all the same. This traditional approach means that there is a productive system storing the most up-to-date production data and there is a system where copies of these data regularly remain. In particular, EMC offers for this SHD Data Domain.
Scheme of a traditional backup.
The backup server, on a centralized basis the managing director of this process is also usually used. It performs also recovery of information if basic data were damaged or lost. In this segment well proved the Networker software product. It is the instrument of data protection checked by time satisfying to the mass of the regulating laws. And thanks to existence of software clients for applications, Networker is ideally integrated with them what by many it is deservedly loved for.
As opposed to traditional approach replication of the storage systems given by means is put. In this case productive data were replicated mirrored) on the same, and it is better on other storage system. For creation of a set of recovery points, from this mirror "snepshota" which there can be about several hundred are regularly created. Thanks to the convenience, this approach gradually won popularity for the last years. Most often it was selected:
- administrators of systems of production NetApp in whom snepshota are sewn deeply up in architecture,
- those who not especially paid attention to the regulating legislation about alienable carriers,
- and those who were not concerned by ideal integration into applications.
Generally, fight between these two approaches though went, but without special heat. In my opinion, the reason that these two worlds are a little crossed. My reflections on this subject, by the way, are here: http://denserov.com/2013/01/30/bu-vs-rep/.
I consider that the world of backup began to change essentially when there were storage systems of new types which began to gain weight quickly.
In the context of the changing approach to backup I see three SHD main new types.
- Hybrid SHD using two - and three-level storage (SSD/SAS/NL-SAS). Such systems as VNX, VMAX and their analogs.
- The superscaled SHD (massive scale-out) Isilon, ScaleIO, Elastic Cloud Storage and their analogs.
- Systems with a deduplication of productive data. For example, XtremIO. In certain cases — VNX.
Perhaps, many noticed long ago that these three SHD types need special approach to backup. Below I will state reasons on backup for hybrid systems as I for the most widespread, and in future publications will consider the remained two types.
Now it is the most widespread SHD in a corporate segment. Architecture of the majority of them became result of development of traditional storage systems which were, as a rule, combined with traditional means of a backup (EMC Networker and its analogs). Traditional SHD were mutated in hybrid, and approach to a backup in many respects remained the same. In what the problem with backups consists in these systems if there everything is traditional?
Typical hybrid SHD contains fast, average and slow carriers.
Someone already guessed that the problem is covered in the most bottom level of storage where the "coldest" data "move down". This level, for economy, occasionally is made of very small amount of disks of large volume (NL-SAS, 7200 rpm, from 1 to 6 TB).
Never, never, never use in hybrid systems for automatic multi-level storage the disks NL-SAS the volume of more 2TB! Do not give in to a temptation. And that is why.
By error will consider that if you designed system on the capacity and performance, then with a backup everything will be settled somehow.
Designing three-level systems, it is necessary to remember that from the bottom level data too sometimes should be lifted quickly for different tasks — from backup before forming of the reporting.
With a sad regularity it is necessary to face design of hybrid SHD under databases of one hundred TB. Sometimes even do the MS SQL bases of such volume, putting electronic documents in a DB. At the same time nobody considers, what is the time will occupy backup, and register multi-level storage with the bottom level on 6TB in TZ disks. Sometimes even 5400 rpm that it was cheaper.
Let's consider, small hybrid system with not the worst disk set:
- 16 disks 200GB SSD in RAID 5(7+1)
- 32 disks 600GB 15000 rpm in RAID 1
- 8 disks 2000GB 7200 rpm in RAID6 (6+2)
Decimal capacity (taking into account RAID):
- SSD — 2800 GB
- SAS 15000 rpm — 9600 GB
- NL-SAS 7200 rpm — 24000 GB
TOTAL 36 400 GB.
- SSD 16 * 3500 = 80000 IOPS
- SAS 15000 rpm 32 * 180 = 5760 IOPS
- NL-SAS 7200 rpm 8 * 80 = 640 IOPS
TOTAL 86 400 IOPS.
The volume and performance of such system on only 56 disks turned out impressive. Now it is possible to find room for all this to 1 disk shelf on 60 disks. At worst — 4 regiments on 15 disks.
The similar "flat" system, without application of the Auto-Tiering technologies, would consist of 480 disks 15000 rpm, would consume many times more electricity and would figure prominently many times. The economic prize of "hybrid" approach is obvious.
We assume, of course, here that the design of this system on a profile of an operational load was selected correct, both hot and cold data "sprawled" on it correctly (though a solution of this fascinating task — a subject of separate and very interesting article).
Now I would like to consider how there will be a backup such "the hybrid system which is successfully designed". Let's assume that traditional backup is used. I.e. complete backup on Friday evening, and then series of incremental. Already every week.
Let's count in zero approach, what is the time will occupy a complete backup 36 TB from hybrid system in lack of productive loading.
Calculation is made proceeding from the assumption that in the course of backup there are no other bottlenecks, and neither SHD controllers, nor a direct system of backup prevent disks to give data for a backup.
From an illustration we see that the bottom level of storage "ate" 94% of time of a backup and took 7 days. It is unacceptable. In fact, this consequence of application of multi-level storage. High-speed performance of the application is defined by the speed of the top level and its volume. And backup speed (and recovery!) — speed of the bottom level and its volume. Whether it means that hybrid systems are a little adapted for real life? Someone will tell "yes". But their application from the point of view of efficiency of storage gives very considerable benefits. Therefore, it is necessary just to rethink rules of their operation and to follow common sense.
I see the following options of solutions (or their combination) for hybrid systems:
The first. To refuse complete backups of large volumes of data. To pass to synthetic complete backups, to distribute backups in time. I.e. to reduce the volume of transmitted data in the set interval. Benefit of this option — conditional low cost.
Big shortcoming — time of a complete recovery. Here does not matter any more, synthetic there was a backup or not. It is necessary to fill in data completely even on the bottom level. It will add to daily data loss also about a week of idle time.
The second. To try to do the bottom level not such slow. It is possible to make it of disks on 1 Tb who at the price for GB are approximately in the same price category, as 2 Tb. It is possible to mortgage additional spindles in the bottom level of storage for the purpose of providing acceptable time of a backup. It is necessary to think of it in advance. I.e. to increase as the speed of "return" of data for a backup, and to plan the speed of potential recovery.
It is slightly more expensive, than the first option, but can be many times quicker. The first and second options can be combined freely.
The third. To avoid 100% utilization of the third level of storage for multi-level volumes. It can be filled with "fighting" data not for 100%, and for 15-20%, and other 80% are taken away, for example, under contemporary records which can be bekapit less often. The basic difference between contemporary and operational records consists in it. For someone, maybe, still strange sounds, but the archive and a backup are essentially different things. The archive is a long-term storage of invariable data which move from operational storage there. And the backup is a regularly created copy of operational data for possible recovery. In other words, the archive is the operation move, and a backup — copy. In more detail about an archiving it is possible to read on the website EMC.
Combination of archive and backup.
The fourth. If all above-mentioned does not allow to improve cardinally a situation, then, maybe, it is worth reviewing basic approach to a backup. For a long time there are backup systems with a deduplication of data on a source, for example, of Avamar or Vmware Data Protection Advanced. This technology very effectively reduces the volume of transmitted data between the client and the server of backup. One more alternative — continuous replication of data with snepshota on other storage system. For example, by means of RecoverPoint.
As you can see, at design of hybrid SHD it is necessary to consider a backup context. Even when storage and a backup are different projects.
Let's provide now that we have a system with the automated multi-level storage and we successfully do from it backups. Besides, in design process we considered recovery speed, and with it too serious questions do not arise.
The application works at hybrid storage system, performance sufficient. However, we will remember that in modern environments the most part of loading is the share only of small data volume.
There is a question: with what speed the application after successful recovery from the backup made a half a day ago will work? Nobody guarantees that at recovery with SRK data units will surely sprawl on carriers in strict accordance with their "temperature" before failure. If to consider that the "hottest" data are not in SHD cache, and in random access memory of the DB servers and applications which at data recovery from a backup have to "warm" from scratch all the memory too, then load of disk system will be even higher, than usually. A situation not from pleasant.
As hybrid SHD became almost standard approach to a solution of problems of storage, a question of potential recovery here and there after failure, I think, more than once arose and will arise before administrators in use.
If people face a backup daily, then business reaches full-scale recovery, I think, one hundred times more rare. Therefore when in practice face the scenario considered here, it appears unpleasant, but quite predictable event. Perhaps, it is worth expecting and considering it in advance?
But as at design of backups the speech after all goes about copying/data recovery, but not guaranteed performance after recovery, this subject most often remains somewhere on border of attention of architects. And it is unclear from what budget to finance this project. Backup or SHD? It is possible for this reason I still did not meet accurate and clear recommendations in this respect when planning architecture. Probably, many consider that at recovery the main thing — to be recovered. And further are small nuances.
Here it is necessary to notice that cases when successful data recovery from a backup did not allow to make successful start-up of the application are known. Hundreds and thousands of users literally "broke" on "not heated-up" storage system, is successful it filling up, entering the application server into a coma, and sometimes even damaging thereby data. Similar situations arise even on normal storage systems, not to mention hybrid.
Therefore it is necessary to remember that recovery from a complete backup is means of the last hope. And if to resort to it, then you should not expect an instant output of applications at full capacity. Besides, in most cases all the same it is necessary to resolve the organizational issues connected with rollback of data on some time ago. But it does not remove a question of technical providing from the agenda.
What can I advise for a solution these are the problems characteristic of all disk systems including hybrid?
The first: consider possibilities of a warming up of storage system. Algorithms of one hybrid SHD use Flash as expansion a cache memory, and can react to requirements of applications in minutes, and others need not less than three days that later recovery from a backup "to lift" data from slow disks on fast. So if the hybrid system not really quickly reacts, then it is important not to be overzealous with slow disks. They can significantly limit capability of the be warmed system under loading.
The second: backup on external SRK it always means of the last hope, "the Plan of B" when nothing else worked any more (see in other article). Therefore to minimize uncertainty in high-speed performance of the application after data recovery, I would use means of operational recovery which allow to recover data more "pointwise", "block by block".
It is as about replication at the level of SHD (clones, snepshota), and about replication at the level of the application (Data Guard, etc.). Such approach minimizes not only time of recovery and data loss, but also will preserve nerves of administrators from the point of view of more predictable performance after recovery. That is, it is the best of all to combine a backup on external devices with replication of data, realizing what they are intended for.
The third: if the first two provisions are considered, but there is a wish bigger, then it is possible to use two hybrid systems working in cluster couple, synchronous, steady against accidents, which provides not only storage of identical copies of data, but also completely identical placement of these data at the storage levels. Even if one of storage systems will be entirely put out of action, then data loss or a stop of the application will not occur, and after recovery of faulty system the cluster can synchronize data and their placement on levels on both systems.
In general, I share opinion that good backup has to be integrated not only with application level, but also with productive storage system and be able both to bekapit, and to recover separate data units is similar to snepshota. And it seems that approximately such tendency is noted in the market of backup at present.
In these reasonings I intentionally adhered only to protection and recovery of hybrid SHD. The question about a flash systems will be discussed in one of the following articles. Stay tuned!
This article is a translation of the original post at habrahabr.ru/post/269933/
If you have any questions regarding the material covered in the article above, please, contact the original author of the post.
If you have any complaints about this article or you want this article to be deleted, please, drop an email here: email@example.com.
We believe that the knowledge, which is available at the most popular Russian IT blog habrahabr.ru, should be accessed by everyone, even though it is poorly translated.
Shared knowledge makes the world better.