Developers Club geek daily blog

1 year, 10 months ago
EMC ViPR 2.1: data management of "the third platform"

ViPR – an element of the program defined data processing center


ViPR implements for a data storage segment approximately the same that VMware made for a segment of servers – creates a possibility of abstraction of resources, forming of pools and implementation of automation for infrastructure. By means of VMware API interfaces the storage pools created in EMC ViPR are presented in VMware vSphere in the form of a simple array. Besides, the ViPR controller provides integration with VMware vStorage API for Storage Awareness (VASA), vCOps, and also with control facilities and orkestration of VMware SDDC, vCloud Automation Center and vCenter Operations Manager. Thus, in ViPR control of storage can be exercised as independent object which is represented per se in virtual environments of Microsoft and OpenStack, and within the program defined VMware data processing center.

Main goal of development of EMC ViPR was simplification and reduction in cost of management of the existing heterogeneous infrastructures of storage, and also creation of a simple data management system and data access in the distributed clustered file systems, for example, based on hadoop-clusters, and also in cloudy environments.
Basic functionality of EMC ViPR is in free access without any payment and without time limit on use. It is provided by components: ViPR Controller and SolutionPack (M&R; – monitoring and reporting). The ViPR Services components which provide support object block and HDFS access, are licensed separately. For expansion of ViPR with basic functionality one VMware ESX virtual computer with two processors suffices. ViPR Solution Pack requires four more virtual processors.

It is possible to unroll ViPR not only on the equipment EMC, but also on servers of third-party vendors. The platform can be used both for management of infrastructure of storage, and for data management, placed on hadoop-clusters. In this case ViPR is in addition developed as the agent on a separate node.

EMC ViPR is intended for cloudy environments and service providers, and also for those corporate customers who pass to the IT as Service model and are engaged in creation of an internal cloud with web access. ViPR is developed based on global distributed architecture that allows to do without movement of large volumes of data on a network. The platform provides horizontal scaling in process of growth of number of devices and data volumes, excludes a uniform point of failure and allows to construct Wednesday with completely off-line control and selection of resources.

Level of management


To software of ViPR Controller it is intended for simplification of management of infrastructure of storage (including heterogeneous) as at the local, and global levels. If to compare ViPR Controller to classical virtualizator of SHD, it represents the solution "Out-Of-Band" as does not store in itself any data and does not pass through itself(himself) any information stream and in fact is not either SHD, or a data storage virtualizator. ViPR Controller is engaged only in management (administration) of a pool of storage and the related services. Creation of pools of storage and further assignment to their applications happens through a self-service portal.

ViPR Controller can improve significantly automation functions, in particular, having reduced to administration as it virtualizes a core infrastructure of storage. Functions of management of storage, such as selection of resources and migration, abstract so that different arrays of storage could be managed as a uniform pool of resources from one console.

At the same time to each pool the corresponding arrays, security features of data, technology settings and others "become attached". Then each pool corresponds to the set service level of service.

After creation of pools of storage they are separated for use by applications. For this purpose the self-service portal serves. In it it is possible to browse the directory of services of data storage and to select resources of services, the most suitable for the tasks.

Will provide for the majority of traditional infrastructures of storage of EMC ViPR only the level of management which executes detection of storage, creation of virtual pools of storage, and also purpose of these pools to applications. At the same time management of all data exchange remains at the level of an array.

ViPR Controller supports all access types to data: block, file, object, and also access to hadoop-clusters (data storage based on a distributed file system – HDFS) under the protocols iSCSI, NFS, REST, etc. At the block level of ViPR is able to work with zoning of SAN (SAN Zoning, the switches Brocade and Cisco).

In the new ViPR Controller version there was a support of standard disks and a large number of third-party arrays of storage thanks to the built-in support or via the connected OpenStack Cinder module. The complete list of the built-in support includes the solutions EMC, Hitachi Data Systems (AMS 2100, USP-V, HUS VM and VSP) and NetApp FAS (only 7-mode), and also standard storage systems. At the OpenStack Cinder ViPR installation supports also arrays of Dell, HP and IBM. Actually, ViPR got support of the majority of the storage systems which are available in the market: Dell EqualLogic, HDS (HUS), HP 3PAR (StoreServ), HP Lefthand (StoreVirtual), Huawei T/Dorado, IBM DS8000, IBM Storwize Family/SVC, IBM XIV, LVM (Reference), NetApp, Nexenta, Solaris (ZFS), SolidFire, Zadara Storage and others. The uniform panel in ViPR 2.0 allows to automate and standardize management of the existing infrastructure of storage and at the same time to implement support new, working at a basis the politician.

In addition, in the new version support of standard disks and services of management of block data on the basis of EMC ScaleIO is added. ViPR Controller 2.0 began to support also convergent infrastructures based on VCE Vblock Systems.

Support of arrays of EMC at the expense of the improved integration and administration of EMC VPLEX, EMC RecoverPoint, SRDF and Data Domain was expanded. Data management on several platforms thanks to functions of space scaling of storage which provide data access, their integrity and protection was among new opportunities. The multiuser functionality for support of geographically distributed systems of storage which are scaled to hundreds of clients in several locations in one namespace is expanded. It means that now services of management of object data of ViPR can work with several locations, offering the most modern functions of space replication and space distribution for ensuring essentially new level of efficiency and performance. Services of management of object data of ViPR offer additional functions of ensuring compliance to requirements of different regulators, and also support of the EMC Centera CAS API interface (Content Addressable Storage). Thereof users of EMC Centera can still use the unique features of long-term storage which are available in their applications on any platform supported by ViPR without change of the existing software.

As ViPR Controller is in free access, one may say, that EMS regarding SRM solutions moves ahead towards their bigger openness and availability.

Monitoring of events


VIPR SolutionPack (Reporting and Monitoring) includes a number of opportunities. So, for example, visualization of tendencies of loading of resources of storage on service levels and on virtual pools of storage (virtual storage pool – VSP) with detailing on virtual arrays is available (virtual storage arrays – VSA). Also there is a possibility of visualization of tendencies of use of VSA on service levels and visualization of tendencies of use of resources of storage on tenants. In addition, the system allows to make monitoring of VIPR events (warning, an error, etc.), and also their representation for a certain time period.

Level of data


In case of traditional operational loads based on files and blocks the EMC ViPR platform "withdraws" and transfers to a basic array a role of level of the data placed in this infrastructure. The majority of operational loads of applications in a data processing center belongs to such model, and, according to EMC, such loadings will grow by 2016 approximately by 70%. But at the same time there are new operational loads of applications which often work with huge volumes and data streams and service thousands or millions of users. These are so-called technologies of "the third platform" which are connected with a wide dissemination of big data, mobile systems, social networks and cloudy services, and create in thousands times more information, than their predecessors, demanding new infrastructures of storage

Features of these new applications assume absolutely new architecture. The obligatory requirement of massive scalability obliges to use simpler approach to infrastructure of storage — object data storage. At the same time access methods also change: traditional protocols (such as NFS and iSCSI) give way new, such as HDFS which are known as a basis of the Hadoop database. For support of these new architecture in the EMC ViPR platform object services of data are implemented.

Object services of data of ViPR provide access through HDFS and API interfaces based on REST, compatible to Amazon S3 and OpenStack Swift, and thanks to it the applications written under these API interfaces work with hardly any trouble at all. They also support the existing arrays of EMC Atmos, EMC VNX and EMC Isilon as permanent level, and also arrays of third-party vendors and a solution based on standard servers. At the moment this list includes about 20 lines of SHD.

ViPR "sees" objects in the form of files that allows to receive performance characteristic of file access and to exclude the delays inherent in object data storage. Besides, service of data of HDFS ViPR allows to execute local analytics on the scale of all heterogeneous environment of storage. As a result extremely labor-consuming and resource-intensive problem of management of heterogeneous environments of storage by itself disappears.

The solution facilitates transition to "the third platform", giving an opportunity of the approved and completely automated management of classical and new infrastructures of storage, and also provides integration with the control facilities and orkestration of higher level offered VMware, OpenStack and Microsoft thanks to what the storage system is seamlessly integrated into system of worker processes of DPC and business processes.

EMC ViPR 2.1: data management of "the third platform"

ViPR HDFS data service


Apache Hadoop represents a set of utilities, libraries and a framework for development and execution of the distributed programs working at clusters from hundreds and thousands of nodes and consists of several modules. Hadoop Distributed File System (HDFS) – a distributed file system which writes data on standard servers, providing the high aggregated flow capacity of all cluster. Hadoop YARN (Yet Another Resource Negotiator) – the resource management platform which is responsible for management of computing resources in clusters and use with their user applications. Hadoop MapReduce – a programming model for data handling of large volume. The ecosystem of Hadoop is an ecosystem of Apache-projects, such as Pig, Hive, Sqoop, Flume, Oozie, Spark, HBase, Zookeeper etc. which increase the value of the project and improve its use.

Architecture of ViPR HDFS Data Service


The main HDFS components are NameNode and DataNode. The first represents the central HDFS element which serves as the server of metadata for file system. HDFS is controlled via the NameNode dedicated server which hosts file system indexes, and secondary NameNode which can generate pictures of structures of memory in, preventing thus damage of file system and reducing data loss. In HDFS separate files break into blocks of fixed size. These blocks are stored in a cluster on one or more nodes, links to which are stored in DataNodes. The DataNode nodes serve for request processing on a read and write according to the instruction NameNode.

Apache Hadoop YARN — this technology of management of a cluster is key feature of the second generation of Hadoop and is characterized as high-scalable distributed operating system for the applications oriented to work with big data. YARN is combined by the centralized resource manager approving use method the application of resources of Hadoop-system with agents of management of nodes (Node Manager) who, in turn, monitor processing of operations by separate nodes of a cluster. Separation of HDFS from MapReduce by means of YARN does Hadoop-Wednesday more suitable for productive (transactional) applications which cannot wait for end of batch jobs.

It should be noted that native implementation of Hadoop has a number of restrictions among which there is a limitation of namespace and performance of a cluster, low reliability of file system, support only of one protocol, high costs for storage, inefficiency of processing of small files, outdated architecture, and also lack of opportunities of corporate level and multilease. Let's dwell upon these restrictions.

Namespace of the HDFS file system is controlled one server and is stored in his memory. Its size is limited to the volume of available memory on NameNode, and performance of file system, in turn, is limited to NameNode performance.

To the Hadoop 2.x, NameNode version was a uniform point of failure. Failure of NameNode resulted in unavailability of a cluster. Recently the option High Availability was added to HDFS, but it has restrictions: Hot Standby NameNode actively does not need to process requests, in addition, for support of STAND-BY NameNode the new equipment.
Native implementation of HDFS provides support only of one protocol for data access. Object and file access methods are not supported.

By default HDFS executes triple replication of all data units. It leads to doubling of costs for storage that becomes extremely excessive, for example, at an archiving.
HDFS is inefficient when processing large volume of small files because metadata for each file in file system have to be saved in memory of one server – in NameNode. For example, one million files consume about 3 GB of random access memory.

As HDFS was designed nearly 10 years ago, it was oriented to unreliable consumer magnetic hard drives and outdated network infrastructure (1GbE). It was supposed that a bottleneck is the network, but not a disk that is already incorrect for modern infrastructures.

The HDFS file system lacks functions of enterprise grade, such as geodistribution, fallback recovery, jellied pictures, a deduplication, control of parameters etc. Besides, multilease functions which can provide the guaranteed isolation of data and performance for a set of the companies are not supported. As result – a set of the isolated clusters with low utilization.

To get rid of the above restrictions and to make a hadoop-cluster as close as possible to corporate requirements ViPR HDFS data service allow irrespective of whether they are established on file servers or/and on ECS. It is hadoop-compatible file system (HCFS, Hadoop Compatible File System) which does possible execution of the applications written for Hadoop 2.2, on file arrays and/or on EMC ECS (Elastic Cloud Storage) and the managed ViPR Controller. When ViPR HDFS the client is established on each node of a cluster, all requests to a node are processed by ViPR HDFS data service client (JAR), and native components are not used any more. ViPR HDFS data service increases efficiency, performance and reliability of Hadoop, at the same time providing a number of benefits.

So, the ECS device can easily be scaled to the petabaytny and ekzabaytny sizes. At the same time the architecture of ViPR data services/ECS allows to perform scaling on performance and capacity of storage independently from each other. ECS provides access within one platform with support of several API objects, and also HDFS access that facilitates life to application developers. Geodistributed security of data ensures complete safety of information at failures on the website and in case of any accidents. As data have a high konsistenstnost, appendices can address them via any website ECS irrespective of where the last information was written.

Erasing of coding provides efficiency of data storage without prejudice to their protection or access to them. The mechanism of storage of ECS implements the scheme of erasing of coding Reed Solomon 12/4 in which the block breaks into 12 fragments of data and 4 fragments of coding. The resulting 16 fragments are distributed between nodes on the local website. The mechanism of storage can recover all block from a minimum from 12 fragments. Besides, ViPR data services/ECS adapts for processing of a large number of both small, and big files. Using the technology called boxing karting (box-carting), ECS can execute a large number of the user transactions along with very insignificant delay. It allows ECS to support operational loads with high operational rates. ECS is also effective when processing very big files. All nodes can process at the same time requests for record of the same object, and each node can write to a set of three disks.

It is also worth noting that ViPR HDFS data service allow to select several Hadoop-vendors and to integrate them for sharing of services.

Expanded packets for the program defined storage systems

Essential changes affected also packets for the program defined EMC storage systems – ViPR SRM and Service Assurance (SA) Suite. The updated complexes give the most visual idea of difficult environments with the equipment of different suppliers. In addition to support of a wide number of the EMC platforms and third-party suppliers, the packet of ViPR SRM provides the improved integration with ViPR and VPLEX thanks to what the organizations have new opportunities of distribution of expenses between divisions for implementation of the IT as Service model outside SLA. Expanded management of virtual storages from the ViPR console was also among improvements of a packet of ViPR SRM. In SAS 9.3 integration with VMware NSX which provides deep visualization of computing and network infrastructure in physical and virtual environments is implemented.

The family of products of ViPR implements two basic functions – virtualization of resource management and providing data access for cloudy infrastructures, at the same time solutions, first of all, are aimed at big infrastructures of large DPCs.

If there is a problem of automation of process of selection of disk resources for virtual computers, and also change trackings of a configuration of the environment, ViPR Controller – the solution automating work with SHD of any vendor. During creation of the virtual computer in any environment of virtualization at once together with it necessary disks will be selected. It is possible to monitor selection of resources and their use on a centralized basis by means of ViPR SRM which also supports solutions of many vendors of SHD. The product of ViPR is constructed so that it is possible to manage and monitor the environment of any size, having parallelized a task on a set of virtual computers. Increase of efficiency of DPC does not require now expensive hardware virtualizator which is placed on the way of data exchange, adding additional delays on Wednesday and slowing down operation of applications.

ViPR Data Services gives an opportunity of creation of the managed cloudy resources of data storage of any type (object, file, block) based on normal servers with local disks. This solution possesses impressive indicators of scalability and was developed taking into account a possibility of providing cloudy resources of storage in lease.

By means of ViPR Controller this type of storages can be integrated into DPC where traditional storage systems of different vendors are used successfully. Virtualization of management will create the uniform consolidated pool of distribution of resources from servers with local disks (DAS), SHD of storage networks (SAN) and SHD of network connection (NAS).


On questions to address: emc@muk.ua.

It should be noted that the solutions EMC through group of companies are available in Moldova now, Georgia, Azerbaijan and Kazakhstan – recently in the territory of these countries the distribution contract was signed.

MUK-Service — all types of IT of repair: warranty, not warranty repair, sale of spare parts, contract service

This article is a translation of the original post at habrahabr.ru/post/271037/
If you have any questions regarding the material covered in the article above, please, contact the original author of the post.
If you have any complaints about this article or you want this article to be deleted, please, drop an email here: sysmagazine.com@gmail.com.

We believe that the knowledge, which is available at the most popular Russian IT blog habrahabr.ru, should be accessed by everyone, even though it is poorly translated.
Shared knowledge makes the world better.
Best wishes.

comments powered by Disqus