Developers Club geek daily blog

1 year, 10 months ago
We build an own fault-tolerant cloud based on OpenNebula with Ceph, MariaDB Galera Cluster and OpenvSwitch

This time I would like to tell how to configure this subject, in a particular each separate component as a result to receive the own, expanded, otkazoustoycheavy cloud based on OpenNebula. In this article I will consider the next moments:


Subjects in itself very interesting so even if you are not interested in an ultimate goal, but setup of some separate component interests. I ask favor under kat.



image

Small introduction



And so what we will receive as a result?


After reading of this stata you will be able to unroll own flexible, expanded and besides a fault-tolerant cloud based on OpenNebula. What do these words mean? — Let's sort:

  • Expanded — it means that you should not rebuild your cloud at expansion. At any time you will be able to broaden your place in a cloud, having only added additional hard drives to ceph pool. Still you can configure a new note without problems and enter it into a cluster at desire.

  • Flexible — the motto OpenNebula and sounds "Flexible Enterprise Cloud Made Simple". OpenNebula very simple in mastering and besides very flexible system. You cannot deal with it, and also if necessary to write for it the module since all system of a postoyen so to be the simply and modular.

  • Fault-tolerant — in case of failure of the hard drive, a cluster itself perestoritsya so to provide necessary quantity of remarks of your data. In case of failure of one note, you will not lose control, and the cloud will continue to function before elimination of a problem by you.


image

What for this purpose it is necessary to us?



  • I will describe installation on 3 notes, but in their your case can be as much as necessary.
    You can also set OpenNebula on one note, but in this case you will not be able to construct failover cluster, and all your installation according to this manual will be reduced only to installation of OpenNebula, and for example, by OpenvSwitch.
    By the way, still you can set CentOS on ZFS, having read my previous article (not for a prodakshen) and to configure OpenNebula on ZFS, using the ZFS driver written by me

  • Also, for Ceph funktsionirovniya, the network is extremely desirable 10G. Otherwise, it does not make sense to you to lift a separate cache pool as speed regulation characteristics of your network will even be lower, than writing rate on a pool from only one HDD.

  • On all notes CentOS 7 is set.

  • Also each note contains:
    • 2SSD on 256GB — for a cache pool
    • 3HDD on 6TB — for the main pool
    • The random access memory sufficient for functioning of Ceph (RAM 1GB on 1TB of data)
    • Well and resources necessary for the cloud, CPU and random access memory which we will use for start of virtual computers

  • Still wanted to add that installation and work of almost all components demands the disconnected SELINUX. So on all three notes it is disconnected:
    sed -i s/SELINUX=enforcing/SELINUX=disabled/g /etc/selinux/config
    setenforce 0
    

  • On each note the EPEL repository is set:
    yum install epel-release
    


Scheme of a cluster


For understanding of all priskhodyashchy, here approximate scheme of our future cluster:
image

And the plate with characteristics of each note:
Hostname kvm1 kvm2 kvm3
Network Interface enp1 enp1 enp1
IP address 192.168.100.201 192.168.100.202 192.168.100.203
HDD sdb sdb sdb
HDD sdc sdc sdc
HDD sdd sdd sdd
SSD sde sde sde
SSD sdf sdf sdf


Everything, it is possible to start setup now! Also we will begin perhaps with creation of storage.



Ceph


About ceph on Habré more than once wrote. For example to article. Rekomendovanno to reading.

Here I will describe the ceph setup for storage of the block devices RBD (RADOS Block Device) for our virtual computers, and also setup a cache pool for acceleration of operations of input-vvyvoda in it.

So we have three notes of kvm1, kvm2, kvm3. Each of them has 2 SSD disks and 3 HDD. On these disks we will also lift two pools, one — the main on HDD, the second — caching on SSD. In total at us dolzho to turn out something similar:
image

Preparation



Installation will be performed by means of ceph-deploy, and it means installation from the so-called administrator's server.
Any computer with the set ceph-depoy and the ssh-client can serve as the administrator's server, in our case one of kvm1 notes will act as such server.

We need to have the user of ceph on each note, and also to permit it to go bezparolno between notes and to execute any commands through sudo also without password.

On each note we execute:

sudo useradd -d /home/ceph -m ceph
sudo passwd ceph
sudo echo "ceph ALL = (root) NOPASSWD:ALL" | sudo tee /etc/sudoers.d/ceph
sudo chmod 0440 /etc/sudoers.d/ceph


We come on kvm1.

Now we will generate a key and we will copy it on other notes
sudo ssh-keygen -f /home/ceph/.ssh/id_rsa
sudo cat /home/ceph/.ssh/id_rsa.pub >> /home/ceph/.ssh/authorized_keys
sudo chown -R ceph:users /home/ceph/.ssh
for i in 2 3; do
    scp /home/ceph/.ssh/* ceph@kvm$i:/home/ceph/.ssh/
done


Installation


We add a key, we will set a repository of ceph and ceph-depoy from it:

sudo rpm --import 'https://download.ceph.com/keys/release.asc'
sudo yum -y localinstall http://download.ceph.com/rpm/el7/noarch/ceph-release-1-1.el7.noarch.rpm
sudo yum install -y ceph-deploy


Ok, now we come for the user of ceph and we create the folder in which we will store configs and keys for ceph.
sudo su - ceph
mkdir ceph-admin
cd ceph-admin


Now we will set ceph on all our notes:
ceph-deploy install kvm{1,2,3}


Now we will create a cluster
ceph-deploy new kvm{1,2,3}


Let's create monitors and we will receive keys:
ceph-deploy mon create kvm{1,2,3}
ceph-deploy gatherkeys kvm{1,2,3}


Now according to our initial scheme we will prepare our disks, and we will start OSD demons:
# Flush disks
ceph-deploy disk zap kvm{1,2,3}:sd{b,c,d,e,f} 
# SSD-disks
ceph-deploy osd create kvm{1,2,3}:sd{e,f}
# HDD-disks
ceph-deploy osd create kvm{1,2,3}:sd{b,c,d}


Let's look what at us turned out:
ceph osd tree
output
ID WEIGHT  TYPE NAME     UP/DOWN REWEIGHT PRIMARY-AFFINITY 
-1 3.00000 root default                                    
-2 1.00000     host kvm1                                               
   0 1.00000         osd.0                 up  1.00000          1.00000 
   1 1.00000         osd.1                 up  1.00000          1.00000 
   6 1.00000         osd.6                 up  1.00000          1.00000 
   7 1.00000         osd.7                 up  1.00000          1.00000 
   8 1.00000         osd.8                 up  1.00000          1.00000 
-3 1.00000     host kvm2                                           
   2 1.00000         osd.2                 up  1.00000          1.00000 
   3 1.00000         osd.3                 up  1.00000          1.00000 
   9 1.00000         osd.9                 up  1.00000          1.00000 
  10 1.00000         osd.10                up  1.00000          1.00000 
  11 1.00000         osd.11                up  1.00000          1.00000 
-4 1.00000     host kvm3                                     
   4 1.00000         osd.4                 up  1.00000          1.00000 
   5 1.00000         osd.5                 up  1.00000          1.00000 
  12 1.00000         osd.12                up  1.00000          1.00000 
  13 1.00000         osd.13                up  1.00000          1.00000 
  14 1.00000         osd.14                up  1.00000          1.00000 


We check a cluster status:
ceph -s


Setup cache pool


image
So we have a full-fledged ceph-cluster.
Let's - configure for it the caching pool, for a start we should edit the CRUSH cards to define rules by which we will distribute data. What our cache pool would be only on SSD disks, and the main pool only on HDD.

For a start we need to prohibit ceph to update the card automatically, we will add in ceph.conf
osd_crush_update_on_start = false


Also we will update it on our notes:
ceph-deploy admin kvm{1,2,3}


Let's save our present card and we will transfer it to a text format:
ceph osd getcrushmap -o map.running
crushtool -d map.running -o map.decompile


let's lead it to such type:

map.decompile
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable straw_calc_version 1

# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
device 3 osd.3
device 4 osd.4
device 5 osd.5
device 6 osd.6
device 7 osd.7
device 8 osd.8
device 9 osd.9
device 10 osd.10
device 11 osd.11
device 12 osd.12
device 13 osd.13
device 14 osd.14

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root

# buckets
host kvm1-ssd-cache {
	id -2		# do not change unnecessarily
	# weight 0.000
	alg straw
	hash 0	# rjenkins1
        item osd.0 weight 1.000
        item osd.1 weight 1.000
}
host kvm2-ssd-cache {
	id -3		# do not change unnecessarily
	# weight 0.000
	alg straw
	hash 0	# rjenkins1
        item osd.2 weight 1.000
        item osd.3 weight 1.000
}
host kvm3-ssd-cache {
	id -4		# do not change unnecessarily
	# weight 0.000
	alg straw
	hash 0	# rjenkins1
        item osd.4 weight 1.000
        item osd.5 weight 1.000
}
host kvm1-hdd {
	id -102		# do not change unnecessarily
	# weight 0.000
	alg straw
	hash 0	# rjenkins1
        item osd.6 weight 1.000
        item osd.7 weight 1.000
        item osd.8 weight 1.000
}
host kvm2-hdd {
	id -103		# do not change unnecessarily
	# weight 0.000
	alg straw
	hash 0	# rjenkins1
        item osd.9 weight 1.000
        item osd.10 weight 1.000
        item osd.11 weight 1.000
}
host kvm3-hdd {
	id -104		# do not change unnecessarily
	# weight 0.000
	alg straw
	hash 0	# rjenkins1
        item osd.12 weight 1.000
        item osd.13 weight 1.000
        item osd.14 weight 1.000
}
root ssd-cache {
	id -1		# do not change unnecessarily
	# weight 0.000
	alg straw
	hash 0	# rjenkins1
	item kvm1-ssd-cache weight 1.000
	item kvm2-ssd-cache weight 1.000
	item kvm3-ssd-cache weight 1.000
}
root hdd {
	id -100		# do not change unnecessarily
	# weight 0.000
	alg straw
	hash 0	# rjenkins1
	item kvm1-hdd weight 1.000
	item kvm2-hdd weight 1.000
	item kvm3-hdd weight 1.000
}

# rules
rule ssd-cache {
	ruleset 0
	type replicated
	min_size 1
	max_size 10
	step take ssd-cache
	step chooseleaf firstn 0 type host
	step emit
}

rule hdd {
	ruleset 1
	type replicated
	min_size 1
	max_size 10
	step take hdd
	step chooseleaf firstn 0 type host
	step emit
}# end crush map


It is possible to notice that instead of one root I made two, for hdd and ssd, too most happened to rule and each host.
When editing the card be manually extremely attentive and do not get confused in id'shnikakh!

Now we will compile and we will assign it:
crushtool -c map.decompile -o map.new
ceph osd setcrushmap -i map.new


Let's look what at us turned out:
ceph osd tree
output
ID   WEIGHT  TYPE NAME                UP/DOWN REWEIGHT PRIMARY-AFFINITY 
-100 3.00000 root hdd                                                   
-102 1.00000     host kvm1-hdd                                         
   6 1.00000         osd.6                 up  1.00000          1.00000 
   7 1.00000         osd.7                 up  1.00000          1.00000 
   8 1.00000         osd.8                 up  1.00000          1.00000 
-103 1.00000     host kvm2-hdd                                         
   9 1.00000         osd.9                 up  1.00000          1.00000 
  10 1.00000         osd.10                up  1.00000          1.00000 
  11 1.00000         osd.11                up  1.00000          1.00000 
-104 1.00000     host kvm3-hdd                                         
  12 1.00000         osd.12                up  1.00000          1.00000 
  13 1.00000         osd.13                up  1.00000          1.00000 
  14 1.00000         osd.14                up  1.00000          1.00000 
  -1 3.00000 root ssd-cache                                             
  -2 1.00000     host kvm1-ssd-cache                                   
   0 1.00000         osd.0                 up  1.00000          1.00000 
   1 1.00000         osd.1                 up  1.00000          1.00000 
  -3 1.00000     host kvm2-ssd-cache                                   
   2 1.00000         osd.2                 up  1.00000          1.00000 
   3 1.00000         osd.3                 up  1.00000          1.00000 
  -4 1.00000     host kvm3-ssd-cache                                   
   4 1.00000         osd.4                 up  1.00000          1.00000 
   5 1.00000         osd.5                 up  1.00000          1.00000 


Now we will describe our configuration in ceph.conf config, and in particular we will write data on monitors and osd.

At me such config turned out:

ceph.conf
[global]
fsid = 586df1be-40c5-4389-99ab-342bd78566c3
mon_initial_members = kvm1, kvm2, kvm3
mon_host = 192.168.100.201,192.168.100.202,192.168.100.203
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true
osd_crush_update_on_start = false

[mon.kvm1]
host = kvm1
mon_addr = 192.168.100.201:6789
mon-clock-drift-allowed = 0.5

[mon.kvm2]
host = kvm2
mon_addr = 192.168.100.202:6789
mon-clock-drift-allowed = 0.5

[mon.kvm3]
host = kvm3
mon_addr = 192.168.100.203:6789
mon-clock-drift-allowed = 0.5

[client.admin]
keyring = /etc/ceph/ceph.client.admin.keyring

[osd.0]
host = kvm1

[osd.1]
host = kvm1

[osd.2]
host = kvm2

[osd.3]
host = kvm2

[osd.4]
host = kvm3

[osd.5]
host = kvm3

[osd.6]
host = kvm1

[osd.7]
host = kvm1

[osd.8]
host = kvm1

[osd.9]
host = kvm2

[osd.10]
host = kvm2

[osd.11]
host = kvm2

[osd.12]
host = kvm3

[osd.13]
host = kvm3

[osd.14]
host = kvm3


Also we will distribute it to our hosts:
ceph-deploy admin kvm{1,2,3}


We check a cluster status:
ceph -s


Creation of pools


image
For creation of pools, we need to count the correct quantity of pg (Placment Group), they are necessary for algorithm of CRUSH. The calculation formula is such:
            (OSDs * 100)
Total PGs = ------------
              Replicas
and an okruleniye up, to the closest degree of number 2

That is in our case if we are going to have only one pool on SSD and one pool on HDD with a remark 2, the calculation formula turns out following:
HDD pool pg = 9*100/2 = 450[округляем] = 512
SSD pool pg = 6*100/2 = 300[округляем] = 512

If pools are planned for our root a little, then the received value should be separated into quantity of pools

We create pools, we assign by it size 2 — the remark size, it means that the data written in it will be duplicated on different disks, and min_size 1 — the minimum size of a remark at the time of record that is how many it is necessary to make remarks at the time of record "to release" write operation.

ceph osd pool create ssd-cache 512
ceph osd pool set ssd-cache min_size 1
ceph osd pool set ssd-cache size 2
ceph osd pool create one 512
ceph osd pool set one min_size 1
ceph osd pool set one size 2
one pool — clear business will be used for storage of images of OpenNebula

We assign rules to our pools:
ceph osd pool set ssd-cache crush_ruleset 0
ceph osd pool set one crush_ruleset 1


We configure that record in a pool of one will be made through our cache pool:
ceph osd tier add one ssd-cache
ceph osd tier cache-mode ssd-cache writeback
ceph osd tier set-overlay one ssd-cache


Ceph uses 2 main operations of cleaning of a cache:
  • Flushing (washing): the agent defines the cooled-down objects and resets them in a storage pool
  • Evicting (eviction): the agent defines not cooled down objects and starting with the oldest, resets them in a storage pool

For determination of "hot" objects the so-called Filter of Blum is used.

We configure settings of our cache:
# Включем фильтр bloom
ceph osd pool set ssd-cache hit_set_type bloom
# Сколько обращений к объекту что бы он считался горячим
ceph osd pool set ssd-cache hit_set_count 4
# Сколько времени объект будет считаться горячим
ceph osd pool set ssd-cache hit_set_period 1200


Also we configure
# Сколько байтов должно заполниться прежде чем включится механизм очистки кэша
ceph osd pool set ssd-cache target_max_bytes 200000000000
# Процент заполнения хранилища, при котором начинается операция промывания
ceph osd pool set ssd-cache cache_target_dirty_ratio 0.4
# Процент заполнения хранилища, при котором начинается операция выселения
ceph osd pool set ssd-cache cache_target_full_ratio 0.8 
# Минимальное количество времени прежде чем объект будет промыт
ceph osd pool set ssd-cache cache_min_flush_age 300 
# Минимальное количество времени прежде чем объект будет выселен
ceph osd pool set ssd-cache cache_min_evict_age 300 


Keys



Let's create the user of one and we will generate for it a key
ceph auth get-or-create client.one mon 'allow r' osd 'allow rw pool=ssd-cache' -o /etc/ceph/ceph.client.one.keyring

As he will not write directly to the main pool, we will issue it the rights only to ssd-cache a pool.

On it the Ceph setup can be considered complete.



MariaDB Galera Cluster


image

Now we will configure fault-tolerant MySQL the database on our notes in which we also will store a configuration of ours date of the center.
MariaDB Galera Cluster is MariaDB a cluster with the master master replication using galera-library for synchronization.
Plus to everything it is quite easy-to-customize:

Installation



On all notes
Let's set a repository:
cat << EOT > /etc/yum.repos.d/mariadb.repo
[mariadb]
name = MariaDB
baseurl = http://yum.mariadb.org/10.0/centos7-amd64
gpgkey=https://yum.mariadb.org/RPM-GPG-KEY-MariaDB
gpgcheck=1
EOT


And the server:
yum install MariaDB-Galera-server MariaDB-client rsync galera


let's start the demon, and we will make initial installation:
service mysql start
chkconfig mysql on
mysql_secure_installation


We configure a cluster:



On each note we will create the user for replication:
mysql -p
GRANT USAGE ON *.* to sst_user@'%' IDENTIFIED BY 'PASS';
GRANT ALL PRIVILEGES on *.* to sst_user@'%';
FLUSH PRIVILEGES;
exit
service mysql stop


Let's lead config/etc/my.cnf to the following type:
For kvm1:
cat << EOT > /etc/my.cnf
collation-server = utf8_general_ci
init-connect = 'SET NAMES utf8'
character-set-server = utf8
binlog_format=ROW
default-storage-engine=innodb
innodb_autoinc_lock_mode=2
innodb_locks_unsafe_for_binlog=1
query_cache_size=0
query_cache_type=0
bind-address=0.0.0.0
datadir=/var/lib/mysql
innodb_log_file_size=100M
innodb_file_per_table
innodb_flush_log_at_trx_commit=2
wsrep_provider=/usr/lib64/galera/libgalera_smm.so
wsrep_cluster_address="gcomm://192.168.100.202,192.168.100.203"
wsrep_cluster_name='galera_cluster'
wsrep_node_address='192.168.100.201' # setup real node ip
wsrep_node_name='kvm1' #  setup real node name
wsrep_sst_method=rsync
wsrep_sst_auth=sst_user:PASS
EOT


By analogy with kvm1 we will write configs for other notes:
For kvm2
cat << EOT > /etc/my.cnf
collation-server = utf8_general_ci
init-connect = 'SET NAMES utf8'
character-set-server = utf8
binlog_format=ROW
default-storage-engine=innodb
innodb_autoinc_lock_mode=2
innodb_locks_unsafe_for_binlog=1
query_cache_size=0
query_cache_type=0
bind-address=0.0.0.0
datadir=/var/lib/mysql
innodb_log_file_size=100M
innodb_file_per_table
innodb_flush_log_at_trx_commit=2
wsrep_provider=/usr/lib64/galera/libgalera_smm.so
wsrep_cluster_address="gcomm://192.168.100.201,192.168.100.203"
wsrep_cluster_name='galera_cluster'
wsrep_node_address='192.168.100.202' # setup real node ip
wsrep_node_name='kvm2' #  setup real node name
wsrep_sst_method=rsync
wsrep_sst_auth=sst_user:PASS
EOT
For kvm3
cat << EOT > /etc/my.cnf
collation-server = utf8_general_ci
init-connect = 'SET NAMES utf8'
character-set-server = utf8
binlog_format=ROW
default-storage-engine=innodb
innodb_autoinc_lock_mode=2
innodb_locks_unsafe_for_binlog=1
query_cache_size=0
query_cache_type=0
bind-address=0.0.0.0
datadir=/var/lib/mysql
innodb_log_file_size=100M
innodb_file_per_table
innodb_flush_log_at_trx_commit=2
wsrep_provider=/usr/lib64/galera/libgalera_smm.so
wsrep_cluster_address="gcomm://192.168.100.201,192.168.100.202"
wsrep_cluster_name='galera_cluster'
wsrep_node_address='192.168.100.203' # setup real node ip
wsrep_node_name='kvm3' #  setup real node name
wsrep_sst_method=rsync
wsrep_sst_auth=sst_user:PASS
EOT


It is ready, time to start our cluster came, on the first note we start:
/etc/init.d/mysql start --wsrep-new-cluster


On other notes:
/etc/init.d/mysql start


Let's check our cluster, on each note we will start:
mysql -p
SHOW STATUS LIKE 'wsrep%';

Example of an output
+------------------------------+----------------------------------------------------------------+
| Variable_name                | Value                                                          |
+------------------------------+----------------------------------------------------------------+
| wsrep_local_state_uuid       | 5b32cb2c-39df-11e5-b26b-6e85dd52910e                           |
| wsrep_protocol_version       | 7                                                              |
| wsrep_last_committed         | 4200745                                                        |
| wsrep_replicated             | 978815                                                         |
| wsrep_replicated_bytes       | 4842987031                                                     |
| wsrep_repl_keys              | 3294690                                                        |
| wsrep_repl_keys_bytes        | 48870270                                                       |
| wsrep_repl_data_bytes        | 4717590703                                                     |
| wsrep_repl_other_bytes       | 0                                                              |
| wsrep_received               | 7785                                                           |
| wsrep_received_bytes         | 62814                                                          |
| wsrep_local_commits          | 978814                                                         |
| wsrep_local_cert_failures    | 0                                                              |
| wsrep_local_replays          | 0                                                              |
| wsrep_local_send_queue       | 0                                                              |
| wsrep_local_send_queue_max   | 2                                                              |
| wsrep_local_send_queue_min   | 0                                                              |
| wsrep_local_send_queue_avg   | 0.002781                                                       |
| wsrep_local_recv_queue       | 0                                                              |
| wsrep_local_recv_queue_max   | 2                                                              |
| wsrep_local_recv_queue_min   | 0                                                              |
| wsrep_local_recv_queue_avg   | 0.002954                                                       |
| wsrep_local_cached_downto    | 4174040                                                        |
| wsrep_flow_control_paused_ns | 0                                                              |
| wsrep_flow_control_paused    | 0.000000                                                       |
| wsrep_flow_control_sent      | 0                                                              |
| wsrep_flow_control_recv      | 0                                                              |
| wsrep_cert_deps_distance     | 40.254320                                                      |
| wsrep_apply_oooe             | 0.004932                                                       |
| wsrep_apply_oool             | 0.000000                                                       |
| wsrep_apply_window           | 1.004932                                                       |
| wsrep_commit_oooe            | 0.000000                                                       |
| wsrep_commit_oool            | 0.000000                                                       |
| wsrep_commit_window          | 1.000000                                                       |
| wsrep_local_state            | 4                                                              |
| wsrep_local_state_comment    | Synced                                                         |
| wsrep_cert_index_size        | 43                                                             |
| wsrep_causal_reads           | 0                                                              |
| wsrep_cert_interval          | 0.023937                                                       |
| wsrep_incoming_addresses     | 192.168.100.202:3306,192.168.100.201:3306,192.168.100.203:3306 |
| wsrep_evs_delayed            |                                                                |
| wsrep_evs_evict_list         |                                                                |
| wsrep_evs_repl_latency       | 0/0/0/0/0                                                      |
| wsrep_evs_state              | OPERATIONAL                                                    |
| wsrep_gcomm_uuid             | 91e4b4f9-62cc-11e5-9422-2b8fd270e336                           |
| wsrep_cluster_conf_id        | 0                                                              |
| wsrep_cluster_size           | 3                                                              |
| wsrep_cluster_state_uuid     | 5b32cb2c-39df-11e5-b26b-6e85dd52910e                           |
| wsrep_cluster_status         | Primary                                                        |
| wsrep_connected              | ON                                                             |
| wsrep_local_bf_aborts        | 0                                                              |
| wsrep_local_index            | 1                                                              |
| wsrep_provider_name          | Galera                                                         |
| wsrep_provider_vendor        | Codership Oy <info@codership.com>                              |
| wsrep_provider_version       | 25.3.9(r3387)                                                  |
| wsrep_ready                  | ON                                                             |
| wsrep_thread_count           | 2                                                              |
+------------------------------+----------------------------------------------------------------+

That's all. Simply – isn't that so?

Attention: if all your notes are switched off in one and too time, MySQL will not rise itself, you will have to select the most actual note and to start the demon with an option - wsrep-new-cluster what other notes could proreplitsirovat from it information.



OpenvSwitch



About OpenvSwitch article, I recommend for reading.

Installation


As there is no OpenvSwitch in standard packets in CentOS we will compile and we will install it separately.

For a start we will set all necessary dependences:
yum -y install wget openssl-devel gcc make python-devel openssl-devel kernel-devel graphviz kernel-debug-devel autoconf automake rpm-build redhat-rpm-config libtool


For compilation of OpenvSwitch we will create the user of ovs and we will log in under it, we will perform further operations from his name.
adduser ovs
su - ovs


Let's download source codes, according to the recommendation of n40lab we will disconnect openvswitch-kmod, and we will compile them.
mkdir -p ~/rpmbuild/SOURCES
wget http://openvswitch.org/releases/openvswitch-2.3.2.tar.gz
cp openvswitch-2.3.2.tar.gz ~/rpmbuild/SOURCES/
tar xfz openvswitch-2.3.2.tar.gz
sed 's/openvswitch-kmod, //g' openvswitch-2.3.2/rhel/openvswitch.spec > openvswitch-2.3.2/rhel/openvswitch_no_kmod.spec
rpmbuild -bb --nocheck ~/openvswitch-2.3.2/rhel/openvswitch_no_kmod.spec
exit


Let's create the folder for configs
mkdir /etc/openvswitch


Let's set the RPM packet received by us
yum localinstall /home/ovs/rpmbuild/RPMS/x86_64/openvswitch-2.3.2-1.x86_64.rpm


Let's start the demon:
systemctl start openvswitch.service
chkconfig openvswitch on


Creation of the bridge



Now we will configure the network bridge to which ports will be added

ovs-vsctl add-br ovs-br0
ovs-vsctl add-port ovs-br0 enp1


Let's correct configs of our interfaces for autostart:

/etc/sysconfig/network-scripts/ifcfg-enp1
DEVICE="enp1"
NM_CONTROLLED="yes"
ONBOOT="yes"
IPV6INIT=no
TYPE="OVSPort"
DEVICETYPE="OVSIntPort"
OVS_BRIDGE=ovs-br0


/etc/sysconfig/network-scripts/ifcfg-ovs-br0

For kvm1:
DEVICE="ovs-br0"
NM_CONTROLLED="no"
ONBOOT="yes"
TYPE="OVSBridge"
BOOTPROTO="static"
IPADDR="192.168.100.201"
NETMASK="255.255.255.0"
HOTPLUG="no"

For kvm2
DEVICE="ovs-br0"
NM_CONTROLLED="no"
ONBOOT="yes"
TYPE="OVSBridge"
BOOTPROTO="static"
IPADDR="192.168.100.202"
NETMASK="255.255.255.0"
HOTPLUG="no"

For kvm3
DEVICE="ovs-br0"
NM_CONTROLLED="no"
ONBOOT="yes"
TYPE="OVSBridge"
BOOTPROTO="static"
IPADDR="192.168.100.203"
NETMASK="255.255.255.0"
HOTPLUG="no"

Let's restart a network, everything has to be got:
systemctl restart network




OpenNebula



Installation


Here also time to set OpenNebula came

On all notes:

Let's set OpenNebula repository:
cat << EOT > /etc/yum.repos.d/opennebula.repo
[opennebula]
name=opennebula
baseurl=http://downloads.opennebula.org/repo/4.14/CentOS/7/x86_64/
enabled=1
gpgcheck=0
EOT


Let's install the OpenNebula server, the web interface to it Sunstone and a note
yum install -y opennebula-server opennebula-sunstone opennebula-node-kvm 


Let's start an interactive script which will set necessary gems in our system:
 /usr/share/one/install_gems


Setup of notes



On each note we had a user of one, it is bezparolno necessary to permit it to go between notes and to execute any commands through sudo without password, just as we also did with the user of ceph.

On each note we execute:

passwd oneadmin
sudo echo "ceph ALL = (root) NOPASSWD:ALL" | sudo tee /etc/sudoers.d/one
sudo chmod 0440 /etc/sudoers.d/one


Let's start the Libvirt and MessageBus services:
systemctl start messagebus.service libvirtd.service
systemctl enable messagebus.service libvirtd.service


We come on kvm1

Now we will generate a key and we will copy it on other notes:
sudo ssh-keygen -f /home/ceph/.ssh/id_rsa
sudo cat /var/lib/one/.ssh/id_rsa.pub >> /var/lib/one/.ssh/authorized_keys
sudo chown -R oneadmin: /var/lib/one/.ssh
for i in 2 3; do
    scp /var/lib/one/.ssh/* one@kvm$i:/var/lib/one/.ssh/
done


On each note we execute:

Let's permit Sunstone to listen on any IP, not only on local:
sed -i 's/host:\ 127\.0\.0\.1/host:\ 0\.0\.0\.0/g' /etc/one/sunstone-server.conf


DB setup



We come on kvm1.

Let's create the database for OpenNebula:
mysql -p
create database opennebula;
GRANT USAGE ON opennebula.* to oneadmin@'%' IDENTIFIED BY 'PASS';
GRANT ALL PRIVILEGES on opennebula.* to oneadmin@'%';
FLUSH PRIVILEGES;


Now we will move the database from sqlite in mysql:

Let's download sqlite3-to-mysql.py script:
curl -O http://www.redmine.org/attachments/download/6239/sqlite3-to-mysql.py


Let's convert and will write our base:
sqlite3 /var/lib/one/one.db .dump | ./sqlite3-to-mysql.py > mysql.sql   
mysql -u oneadmin -pPASS < mysql.sql


Now we will tell OpenNebula to be connected to our DB, we will correct config/etc/one/oned.conf:

Let's replace
DB = [ backend = "sqlite" ]

on
DB = [ backend = "mysql",
     server  = "localhost",
     port    = 0,
     user    = "oneadmin",
     passwd  = "PASS",
     db_name = "opennebula" ]


Let's copy it on other notes:
for i in 2 3; do
    scp /etc/one/oned.conf one@kvm$i:/etc/one/oned.conf
done


Also we have to copy oneadmin authorization key in a cluster on other notes as all control of a cluster of OpenNebula is exercised under it.
for i in 2 3; do
    scp /var/lib/one/.one/one_auth one@kvm$i:/var/lib/one/.one/one_auth
done


Check


Now on each note we will try to start serivis OpenNebula and to check it works or not:

We start
systemctl start opennebula opennebula-sunstone

  • We check: http://node:9869
  • We check a log for errors (/var/log/one/oned.log /var/log/one/sched.log /var/log/one/sunstone.log).

If everything is good, we switch off:
systemctl stop opennebula opennebula-sunstone




Setup of failover cluster



Time to configure OpenNebula HA cluster came
For the unclear reasons of pcs clashes with OpenNebula. On it we will use pacemaker, corosync and crmsh.

On all notes:

Let's turn off autostart of demons of OpenNebula
systemctl disable opennebula opennebula-sunstone opennebula-novnc


Let's add a repository:
cat << EOT > /etc/yum.repos.d/network\:ha-clustering\:Stable.repo
[network_ha-clustering_Stable]
name=Stable High Availability/Clustering packages (CentOS_CentOS-7)
type=rpm-md
baseurl=http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-7/
gpgcheck=1
gpgkey=http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-7/repodata/repomd.xml.key
enabled=1
EOT


Ustvnovim necessary packets:
yum install corosync pacemaker crmsh resource-agents -y


On kvm1:

Let's edit / etc/corosync/corosync.conf, we will lead it to such type:
corosync.conf
totem {
        version: 2
        crypto_cipher: none
        crypto_hash: none
        interface {
                ringnumber: 0
                bindnetaddr: 192.168.100.0
                mcastaddr: 226.94.1.1
                mcastport: 4000
                ttl: 1
        }
}
logging {
        fileline: off
        to_stderr: no
        to_logfile: yes
        logfile: /var/log/cluster/corosync.log
        to_syslog: yes
        debug: off
        timestamp: on
        logger_subsys {
                subsys: QUORUM
                debug: off
        }
}
quorum {
        provider: corosync_votequorum
}
service {
name: pacemaker
ver: 1
}
nodelist {
        node {
                ring0_addr: kvm1
                nodeid: 1
        }
        node {
                ring0_addr: kvm2
                nodeid: 2
        }
        node {
                ring0_addr: kvm3
                nodeid: 3
        }
}


Let's generate keys:
cd /etc/corosync
corosync-keygen


Let's copy a config and keys on other notes:
for i in 2 3; do
    scp /etc/corosync/{corosync.conf,authkey} one@kvm$i:/etc/corosync
ls 
done


Also we will start HA services:
systemctl start pacemaker corosync
systemctl enable pacemaker corosync


Let's check:
crm status

Output
Last updated: Mon Nov 16 15:02:03 2015
Last change: Fri Sep 25 16:36:31 2015
Stack: corosync
Current DC: kvm1 (1) - partition with quorum
Version: 1.1.12-a14efad
3 Nodes configured
0 Resources configured
Online: [ kvm1 kvm2 kvm3 ]

Let's disconnect STONITH (the mechanism of finishing a faulty note)
crm configure property stonith-enabled=false

If at you only two notes disconnect quorum, in avoidance of a splitbrain-situation
crm configure property no-quorum-policy=stop


Now we will create resources:
crm
configure
primitive ClusterIP ocf:heartbeat:IPaddr2 params ip="192.168.100.200" cidr_netmask="24" op monitor interval="30s"
primitive opennebula_p systemd:opennebula \
op monitor interval=60s timeout=20s \
op start interval="0" timeout="120s" \
op stop  interval="0" timeout="120s" 
primitive opennebula-sunstone_p systemd:opennebula-sunstone \
op monitor interval=60s timeout=20s \
op start interval="0" timeout="120s" \
op stop  interval="0" timeout="120s" 
primitive opennebula-novnc_p systemd:opennebula-novnc \
op monitor interval=60s timeout=20s \
op start interval="0" timeout="120s" \
op stop  interval="0" timeout="120s" 
group Opennebula_HA ClusterIP opennebula_p opennebula-sunstone_p  opennebula-novnc_p
exit


These actions we created virtual IP (192.168.100.200), added three of our services to a HA cluster and integrated them in Opennebula_HA group.

Let's check:
crm status

Output
Last updated: Mon Nov 16 15:02:03 2015
Last change: Fri Sep 25 16:36:31 2015
Stack: corosync
Current DC: kvm1 (1) - partition with quorum
Version: 1.1.12-a14efad
3 Nodes configured
4 Resources configured


Online: [ kvm1 kvm2 kvm3 ]

 Resource Group: Opennebula_HA
     ClusterIP	(ocf::heartbeat:IPaddr2):	Started kvm1 
     opennebula_p	(systemd:opennebula):	Started kvm1 
     opennebula-sunstone_p	(systemd:opennebula-sunstone):	Started kvm1 
     opennebula-novnc_p	(systemd:opennebula-novnc):	Started kvm1




OpenNebula setup


Installation is complete, it was necessary only to dobvait our notes, storage and virtual area networks in a cluster.

The web the interface will be always available to the address http://192.168.100.200:9869
login: oneadmin
the password in / var/lib/one/.one/one_auth

  • Create a cluster
  • Add a note
  • Add your virtual area network:
    cat << EOT > ovs.net
    NAME="main"
    BRIDGE="ovs-br0"
    DNS="192.168.100.1"
    GATEWAY="192.168.100.1"
    NETWORK_ADDRESS="192.168.100.0"
    NETWORK_MASK="255.255.255.0"
    VLAN="NO"
    VLAN_ID=""
    EOT
    
    onevnet create ovs.net
    

  • Add your Ceph storage:
    cat << EOT > rbd.conf
    NAME = "cephds"
    DS_MAD = ceph
    TM_MAD = ceph
    DISK_TYPE = RBD
    POOL_NAME = one
    BRIDGE_LIST ="192.168.100.201 192.168.100.202 192.168.100.203"
    CEPH_HOST ="192.168.100.201:6789 192.168.100.202:6789 192.168.100.203:6789"
    CEPH_SECRET ="cfb34c4b-d95c-4abc-a4cc-f8a2ae532cb5" #uuid key, looked at libvirt authentication for ceph
    CEPH_USER = oneadmin
    
    onedatastore create rbd.conf

  • Add a note, a network, your storages to the created cluster via the web interface


HA VM


Now, if you want to configure High Availability (high availability) to your virtual computers, following official documentation just add in / etc/one/oned.conf
HOST_HOOK = [
    name      = "error",
    on        = "ERROR",
    command   = "ft/host_error.rb",
    arguments = "$ID -m -p 5",
    remote    = "no" ]

Also copy it on other notes:
for i in 2 3; do
    scp /etc/one/oned.conf one@kvm$i:/etc/one/oned.conf
done




Sources




PS: I ask if you noticed any defects or errors write me to personal messages

This article is a translation of the original post at habrahabr.ru/post/270187/
If you have any questions regarding the material covered in the article above, please, contact the original author of the post.
If you have any complaints about this article or you want this article to be deleted, please, drop an email here: sysmagazine.com@gmail.com.

We believe that the knowledge, which is available at the most popular Russian IT blog habrahabr.ru, should be accessed by everyone, even though it is poorly translated.
Shared knowledge makes the world better.
Best wishes.

comments powered by Disqus