Studying of a problem with performance and search of solutions are familiar to much not by hearsay. There is a large number of instruments of visualization and parsing of statistics of input-output. Now automation of the intellectual analysis based on Internet services gains steam.
In this post I want to share an example of the analysis of a problem with SHD performance based on one of such services (Mitrend) and I will offer ways of its solution. In my opinion, this example represents the interesting etude which as I think, can be useful to a wide range of IT readers.
So, the customer asked EMC to look at performance unrolled at it in SAN of hybrid storage system VNX5500. The VMware servers on which all turns "in general" are connected to SHD: from infrastructure tasks to file a sphere and the DB servers. Complaints to podvisaniye of the applications unrolled on connected to VNX servers were the cause of carrying out this express assessment.
For preprocessing I used freely available Mitrend service.
The detailed description of this service is not included into the purposes of this post therefore I invite everyone to learn about it more — to visit its website and to look.
Mitrend receives on an input files with statistics of input-output from the studied system and prepares diagrams in the most often demanded parameters, and does the same preliminary analytics which results will be used further.
One of examples of such analytics is the thermal card showing, different components of system are how loaded into different timepoints. In fact, this schematic image of system and its components, in each of which the diagram of its loading is constructed. The general view on it allows to see potentially problem places. In this case it is visible that the problem place is the cache memory on record. This diagram:
Utilization of write cache is at the high level from where there is regular "lumbago" in a red zone (higher than 90%).
It is a typical symptom of problems with performance. Some kind of "high temperature". In this case we should study what leads to such situation and to plan solutions.
Disks, processors, input/output ports, the disk bus are not loaded. And it is a little strange, that the write cache "is hammered".
Let's look at disks in more detail now. For descriptive reasons I led round multi-colored lines disks of different type and signed from below a legend. In the file with the analysis it is visible also without legend.
Let's look in more detail that in general the considered disk system represents: three flash drives on 200GB, two of which are configured in FAST Cache with the net volume of 183GB, and the third is put in a hot reserve. I.e. very reliable mirrored a cache memory on flesha with a hot reserve. Efficiency of its work can be seen on the diagram below:
In system there are 5 disks of 900 GB which are not used in general. As it is system disks, and they on a habit are tried not to be touched because there is an opinion that it causes problems with performance. My opinion in this respect — that they can be used if to do it is comprehended. Problems with performance usually happen absolutely for other reasons.
Usually, disks of different types integrate in hybrid pools that the system itself defined where it is better to place data (by means of FAST VP). But in this case the specialists executing implementation did not entrust it this responsible business and strictly separated data on types of disks. Therefore disks are divided into 2 separate groups — Pool 0 and Pool 1. Made it to isolate them from the point of view of performance and that noncritical applications did not influence those to which speed is necessary.
Pool 0 (RAID5) is intended for critical application servers and consists of the disks SAS 10k.
Pool 1 (RAID6) are user "spheres" and any environments, undemanding to performance. It consists of the disks NL SAS 7.2k.
Studying of the report on groups of disks shows that FAST Cache is disconnected on Pool 1 group.
Conversation with the customer cleared that it was made for the purpose of increase of a priority of resources for critical to Pool 0 performance.
It is interesting to note that despite it complaints go from the applications using Pool 0 which disks are almost not loaded. Moreover — 80% of all read operations and 91% of all write operations of this pool are serviced by FAST Cache.
That is, despite tremendous efficiency of FAST Cache of the application have problems. Why? To promote further, let's look at LUN-y and distribution of loading on them.
It turns out that three the most loaded LUN-and are placed on slow NL-SAS disks in RAID6. On them there are just no complaints. Conversation with users, showed that they are exclusively happy how their file servers after transition to VNX quickly began to work.
Complaints are on LUN-y on Pool 0 (green on the diagram on top). Specifically — it is about LUN ah c numbers with 0 on 8 which are listed below in the table
If now to look at extent of utilization LUN ов, then it is visible that LUN-y from Pool 0 are utilized rather poorly. On the diagram below numbers LUN ов so it is easy to identify, what LUN-y "our" are in horizontal direction specified. The most "loaded" from them it is busy for only 40%.
The system works "on average well". The average time of a response of volumes within 10 ms. It is average temperature on hospital.
Against what load of problem LUN-y low, it is possible to conclude that problems are caused by their competition for some share.
Let's look how the system cache works. Reading from a cache is very effective.
The analysis of work of write cache shows that its load keeps within the set framework of 60-80% with periodic splashes to 90% and more. It is not really good.
Let's look, how often the system should go to the extremities to clear a cache to acceptable level.
It means that the system does not manage to fulfill surges in record. But system settings can be changed, having shifted the upper and lower bound to more comfortable levels. 30-50%, for example. But it is all the same that to bring down temperature at the patient. It is necessary to do it at first having made the diagnosis and the prime cause. Let's look at pools now and we will try to understand what causes the forced resets of a cache.
We see that on both disk pools there are regular forced resets. And if on Pool 0 it happens extremely seldom (isolated cases), then on Pool 1 this situation has very difficult character (tens and hundreds of events per hour). But we are interested in Pool 0. There everything is good, isn't it?
We closely approached a solution. But to move further — lyrical digression as it is necessary to explain to the logician of management of fullness of write cache in VNX. It is shown below.
In the normal mode the system supports a cache between two borders — High and Low watermarks.
The lower bound — it that threshold below which write cache is not reset, as given which contain in it can be necessary for reading, or to be rewritten. Besides, the write cache of VNX inherently keeps some kolichesto data units, in hope that they can be integrated for record with other, blizraspolozhenny blocks, for record on physical disks. It allows to reduce load of back-end.
The upper bound — a threshold of inclusion of reset of write cache on disks. When the High Watermark Flushing mode joins, reset of data from a cache on disks is executed to the bottom level then again passes into stanby mode.
We do not want that the cache was filled to 100% as then we will not be able to provide the place for new records. Therefore the upper bound is tried to be held at safe distance from 100%. Usually 80% — are normal. But maybe below. Everything depends on character of loading.
If the cache is filled to 100%, then from the High Watermark flush mode the system includes the forced reset of a cache, or Forced Flush.
The Forced Flush mode exerts serious impact on all write operations on SHD. New data are written on SHD with an additional delay. I.e. to write a data unit in SHD it is necessary to make at first room from old data, using algorithm of LRU (Least Recently Used), etc.
Let's return to our situation. It is obvious that slow Pool 1 is a weak link from the point of view of write cache. Data which come to slow disks to RAID6 are delayed in a cache longer, than it is necessary and when business reaches Forced Flush, too long pass to physical disks.
It is necessary to pay attention that Pool 0 uses FAST Cache, and the most part of requests is serviced about a flash of disks. Until there comes Forced Flush, and response time of flash begins to depend on how data on NL-SAS will be quickly reset. It is very likely that the weak link is found. As far as this conclusion is right — check of a hypothesis in practice has to show.
How it is possible to explain then an alibi of "suspect" — low loading of the disks NL-SAS? As znachney loadings — an average for an interval, and in this case the interval of collecting of statistics made 10 minutes, perhaps during this time there took place the short surge in data record causing short "hangup" of applications, and on average in 10 minutes loading appeared not such big. As we found where there is the greatest Forced Flush-ей value — doubts in "guilt" of this disk pool cannot be.
What can be made with it?
In the itself made implementation contains planning errors as old approach to a configuration in system with architecture of new generation is used. Communication with the customer helped to clear that put in the standards accepted earlier which were not reviewed at the time of planning. But as system already fighting, it is also impossible to rebuild it, it is necessary to look for solutions in the area online of reconfigurations not to interrupt operation of applications.
I found at least three measures which can be accepted either separately, or together, supplementing each other. I list on degree of complexity of implementation.
- In order that SHD managed to fulfill periodic splashes in loading, it is necessary to lower Low/High watermarks to level 30/50 and to look, these splashes will be fulfilled how successfully. Ideally filling of write cache during splashes should not reach 90%.
- To include FAST Cache on Pool 1. The most often updatable data will pass from slow disks to SSD. Reset of write cache on SSD happens significantly quicker. It will reduce probability of emergence of Forced Flush
- To create RAID RAID10 group on the free disks SAS 900GB 10k (4 pieces) and to transfer to them the given most often updated LUN-y with Pool 1. In the created RAID group to disconnect write cache.
There are also other methods of optimization, however, I specially tried not to complicate this example that it is more compact to show one of possible approaches.
It is possible to begin also with these measures as all listed changes are reversible and can be applied or cancelled in any order.
In the course of further research of system behavior other useful conclusions can be drawn.
Intellectual storage systems have the rich built-in functionality of both the analysis, and setup of performance. However the detailed manual analysis and setup represent quite labor-consuming tasks which we in this post mentioned only superficially. Usually on full studying of work of SHD and its optimization administrators have not enough time. In the conditions of dynamic operational loads and the becoming complicated IT infrastructures the output on a new level of development and automation is required.
The whole complex of technologies at all levels is developed for a solution of these tasks now.
From more convenient and fast performance review to new intellectual and selfoptimizing systems.
Here only some examples:
- 1. Mitrend — the automated analysis of work of IT infrastructure of different vendors, freely available to all
2. The automated multi-level storage and a cache on SSD: FAST VP and FAST Cache
3. In systems of the next generation the adaptive cache of VNX2 with intellectual autotune of speed of reset of data on each LUN (see whitepaper erased 13) is implemented.
This article is a translation of the original post at habrahabr.ru/post/271799/
If you have any questions regarding the material covered in the article above, please, contact the original author of the post.
If you have any complaints about this article or you want this article to be deleted, please, drop an email here: email@example.com.
We believe that the knowledge, which is available at the most popular Russian IT blog habrahabr.ru, should be accessed by everyone, even though it is poorly translated.
Shared knowledge makes the world better.