- 93 % the companies who los the TsOD for 10 and more days because of catastrophe, bec bankrupts within a year (National Archives &Records to Administration in Washington)
- Every week in the USA failed 140 000 hard disks (Mozy Online Backup)
- At 75 % the companies was not present decisions for abnormal recovery (Forrester Research, Inc.)
- 34 % the companies done not test backup copies.
- 77 % those who tested, unreadable drives in the libraries f out.
In the previous posts (time and two) I writing about organizational measures which will accelerate and will facilitate recovery are more IT than systems and the processes of the company connect with them at an emergency situation.
Now we will talk about technical decisions which will help with it. Them cost varied from several thousand to hundreds thousand dollars.
High availability and abnormal recovery
Very often decisions for high availability (HA – High Availability) and abnormal recovery (DR – Disaster Recovery) confused. First of all, when we spoken about a continuity of business, we meant a reserve platform. With reference to IT – reserve TsOD. A continuity of business — it not about reserve copying on library in an adjacent stand (that too are very important). It about that the main building of the company will burn down, and we in some clocks or days could restart operation, ha unroll on a new place:
| High availability
|| Abnormal recovery
| The decision within one TsOD
|| Included some remote TsOD
| Time of recovery <30 minutes
|| Recovery could occupy clocks and even days
| Loss of the data zero or close to zero
|| Loss of the data could reach many clocks
| Demanded quarterly testing
|| Demanded annual testing
The cold reserve implied that there are a certain server location in whom it are possible to deliver the equipment and to unroll it there. At recovery purchase of "iron" could be plan, or storage in a warehouse are more its. It are necessary to consider that the majority of systems are deliver under the order, and quickly to find tens units of servers, SHD, switchboards and so forth will be the nontrivial task. As alternative to stacking of the equipment at itself, it are possible to provide storage of most important or most rare equipment in a warehouse of your suppliers. Thus, telecommunication channels indoors should be present, but the inference of the contract with the provider normally happened after decision-making on start of "cold" TsODA. Recovery of operation in such TsODE at disastrous failure of the main platform quite could occupy some weeks. Be convinc that your company could exist these some weeks without IT and not lose business (because of license withdrawal, or irreplaceable cash rupture, for example) – about it I writing earlier. To tell the truth, I would recommend nobody this variant of reservation. Probably, I exaggerates a role of IT in business of some companies.
It meant that at us the alternate platform in whom there was active the Internet and WAN channels, a core telecommunication and computing infrastructure functioned. It are always more feeble" the main on the computing powers, some equipment there could miss. The most important – that on a platform always were the actual backup copy of the data. Operat "in the old manner" it are possible to organize the regular relocation there backup copies on tapes. The modern method – replication bekapov on a network from the main TsODA. Usage of bekap with a deduplikatsiya will allow to transfer operatively backup copies even on the "thin" channel between Tsodami.
Here it, a choice of the cool guys, support are more IT than the system which idle time even at some o'clock yielded the companies huge losses. There are all necessary equipment for valuable operation are more IT than systems. As normally base of such platform the storage system of the data, on which synchronously or asynchronously zerkaliruyutsya the data from the main TsODA served. That a hot reserve at an o'clock Ickx c fulfill the money put up in it, the regular test transfers of systems should be le, adjustments and the version of OS of servers on the main and reserve platform should be synchroniz permanently – manually or automatically.
The minus of a hot and warm reserve – the expensive equipment stood idle in expectation of catastrophe. An output from this situation are strategy of the distributed TsODA. At such variant two (or more) platforms was equivalent – the majority of applications could work both on one, and on another. It allowed to involve capacities of all equipment and to provide equalization of loading. On the other hand, requirements to automation of transfer seriously raised are more IT than services between Tsodami. If both was TsODA "fighting", business had the right to expect that at an expect pique of loading on one of applications, it can be translat quickly in more free TsOD. More often, in similar TSODAKH there are the synchronous replication between SHD, but small asynchrony (within several minutes) are possible also.
Three magic word
Before pass immediately to technologies of katastrofoustoychivost are more IT than services, I will remind three "magic" words who defined cost of any DR decision: RTO, RPO, RCO.
- RTO (Recovery time objective) – time for which probably to recover are more IT system
- RPO (Recovery point objective) – how many the data will be los at abnormal recovery
- – what part of loading the reserve system should provide RCO (Recovery capacity objective). This index could be measur in percentage, transactions are more IT than systems and other values.
The first division whom we could lead between all diversity of IT solutions for support of katastrofoustoychivost – whether they provided zero RPO or not. Absence of loss of the data at failure are provid with the synchronous replication. More often, it became at level of SHD, but probably to implement and at level of a DBMS or the server (by means of advanc LVM). In the first case the server did not receive acknowledgement on success of record while SHD doing not transfer this transaction to second system from SHD with which it worked, and doing not receive from it acknowledgement that record transiting successfully.
The synchronous replication was able to do 100 % SHD concern an average price segment and some systems of initial level from known vendor. Cost of licenses for the synchronous replication on "simple" SHD began from several thousand dollars. Approximately as much there are a software for replication at level of servers on 2-3 servers. If you have no operating reserve TsODA, do not forget to add cost of purchase of the reserve equipment.
With RPO some minutes asinronny replication at level of SHD, a software of volume management of the server (could provide LVM – Logical volume manager), or a DBMS. Till now the standby-copy of a database remained one of most popular decisions for DR. More often the functional of "log shipping" as it are called at managers of a DBMS, are not licens by the vendor separately. If you have prolitsenzirovana of a DB – replitsiruyte on health. Cost of asynchronous replication for servers and SHD did not differ from the synchronous, the previous point see.
If we spoken about RPO at some o'clock, more frequent it are replication of backup copies with an one platform on another. The majority of disk libraries was able to do it, a part of a software for reserve copying – too. As I already speaking, at such variant deduplikatsiya fine will help. You not only will load less channel transmission of backup copies, but also will make it much more quickly — each transmittable bekap will occupy in tens or hundreds times less time, than in a reality. On the other hand, it are necessary to remember that the first bekap at deduplikatsiya all the same should transfer mass of the unique data in system.« Present» you will see deduplikatsiya after a week cycle of reserve copying. At synchronization of disk libraries — the same. If the estimated time of transmission at your width of the channel between TsOD made some days and even weeks (that could and cost much), there are a sense at first to deliver second library nearby, to fulfill synchronization and to take away it in reserve TsOD.
Synchronization of backup copies between TsOD
When there are a task of minimization of time of recovery (RTO), process should be as much as possible documentary and automat. One of the best and most general-purpose decisions – HA clusters with territorially spac apart sites. More often, such decisions was under construction on the basis of replication of SHD, but other variants was possible also. In the lead products in this area, for example, Symantec Veritas Cluster, incorporated units on operation with SHD, replications switch a direction when it are necessary to restart service on a reserve site. For less advanc clusters (for example Microsoft Cluster Services which had been buil in Windows) the main vendors of SHD (IBM, EMC, HP) offered a superstructure, d of a normal HA cluster the katastrofoustoychivy.
Geographically the distributed cluster
Rarely who reflected about an interesting singularity suppress the majority of decisions on replication of the data – "odnozaryadnost" are more their. You could come into on a reserve platform only an one fortune of the data. If the system with this data for any reason doing not start – we "Would" pass to the plan. It are recovery from a backup copy with the big loss of the data more often. Of the technologies enumerat by me the exception will be ma only by replication of the same bekapov. The answer here are usage of a class of decisions of Continuous Data Protection. The essence that all records c from the server, was mark and sav in particular journal volume on a reserve platform are more their. At recovery of system it are possible to select any point from this log and to come into a fortune not only at the moment of failure in whom the data were spoil, but also for some seconds. Such decisions protected from internal threat – removals of the data by users. In case of replication of SHD – to it all the same what to transfer – empty volume or your most critical DB. At usage of CDP it are possible to choose the right time directly before deleting of the information and to be recover on it. Cost of systems of CDP are normal – tens thousand dollars. One of most successful examples, in my opinion – EMC RecoverPoint.
The circuit of the decision on the basis of RecoverPoint
Recently typed popularity of system of virtualization of SHD. Besides the basic function – join of arrays of different vendor in a uniform pool of resources – they could strongly help and with the organization of the distributed TsODA. An essence of virtualization of SHD that between servers and storage systems there are the intermediate layer of the controlers pass through all traffic. Volumes with SHD was present not directly to servers, and these virtualizators. They, in turn, distributed them to hosts. In a layer of virtualization it are possible to do replication of the data between different SHD, and frequently are and more advanc possibilities — snepshota, multi-level storage and t. d. Thus the most basic function of virtualizators are the most necessary for DR. If we have two SHD in different TSODAKH, connect by the optical trunk, we taken volumes with each of them and we collected "mirror" at level of the virtualizator. As a result we received an one virtual volume on two TsODA which are s by servers. If these servers the virtual – Live Migration of the virtual machines started to work and could translate "on the move" tasks between Tsodami – users will note nothing.
The complete loss of TsODA will be fulfill by a normal HA cluster in an automatic mode in some minutes. Perhaps, virtualization of spac apart SHD allowed to provide the minimum time of recovery for the majority of applications. For CUBD there are an unsurpassed Oracle of RAC and analogs was more its, but cost set thinking. Virtualization of SAN while too are not cheap, for small volumes of SHD cost of the decision could be less than $100 To, but in most cases the price are higher. In my opinion, most checked decision are IBM SAN Volume Controller (SVC), most technically perfect – EMC VPLEX.
By the way, if not your all applications still lived on the virtual environment, it are necessary to design reserve TsOD for them on the virtual machines. At first, will quit much more cheaply, in second, ha ma it for a reserve, nearby and before migration of the main systems under control of any gipervizor …
The competition in the market of outsourcing of TsOD did more favorable rent stoykomest in TsODE of the provider, on comparing with building and maintenance of the reserve center. If you allocated with it the virtual infrastructure, there will be a serious saving on rent payments. But also autsorsingovy TsODY any more at peak of progress. It are more best to build a reserve infrastructure at once in "cloud". With the main systems thus it are possible to provide synchronization of the data with replication at level of the server (there are an excellent family of decisions of DoubleTake from Vision Solutions).
Last, but very important point about whom it are impossible to forget at projection by katastrofoustoychiva are more IT than an infrastructure – workplaces of users. That the database rising, did not mean restoration of business process. The user should have possibility to perform the operation. Even valuable reserve office in whom there was out-of-operation computers for key employees – not the ideal decision. The person on the los workplace could have reference materials, macroes and so forth, valuable operation without which are impossible. For most important users for the company passage to the virtual workplaces (VDI) looked reasonable. Then on a workplace (whether it be the normal PC or the fashionable "thin" client) was not stor any data, it are us only as the terminal to reach Windows XP or Windows 7 work on the virtual machine in TsODE. Access to such workplace are easy for organiz from the house or from any computer in a filial network. For example, if at you it are some buildings and one of them it are inaccessible, key users could arrive to adjacent office and sit down on workplaces of "less key". Then they easy logged in in system, got to the virtual machine and the firm revived!
In completion, here the main questions whom it are necessary to set at an estimation of the DR decision:
- From what failures protected?
- What RPO/RTO/RCO provided?
- How many costed?
- Maintenance are how much difficult?
The Katastrofoustoychivykh of decisions uncountable set – as box, and what can be ma practically the hands. Please, share in comments that are at you and stories as these decisions gain you. If something worked for you from describ above systems or analogs was more their – leave responses, how much easy you slept, when the system under them protection are more IT.
This article is a translation of the original post at habrahabr.ru/post/143877/
If you have any questions regarding the material covered in the article above, please, contact the original author of the post.
If you have any complaints about this article or you want this article to be deleted, please, drop an email here: firstname.lastname@example.org.
We believe that the knowledge, which is available at the most popular Russian IT blog habrahabr.ru, should be accessed by everyone, even though it is poorly translated.
Shared knowledge makes the world better.