In this article is written by Mitja Resman, author of the book CentOS High Availability, we will learn cluster resource management on CentOS 6 with the RGManager cluster resource manager. We will learn how and where to find the information you require about the cluster resources that are supported by RGManager, and all the details about cluster resource configuration. We will also learn how to add, delete, and reconfigure resources and services in your cluster. Then we will learn how to start, stop, and migrate resources from one cluster node to another. When we are done with this article, your cluster will be configured to run and provide end users with a service.
(For more resources related to this topic, see here.)
Working with RGManager
When we work with RGManager, the cluster resources are configured within the /etc/cluster/cluster.conf CMAN configuration file. RGManager has a dedicated section in the CMAN configuration file defined by the <rm> tag. Part of configuration within the <rm> tag belongs to RGManager. The RGManager section begins with the <rm> tag and ends with the </rm> tag. This syntax is common for XML files.
The RGManager section must be defined within the <cluster> section of the CMAN configuration file but not within the <clusternodes> or <fencedevices> sections. We will be able to review the exact configuration syntax from the example configuration file provided in the next paragraphs.
The following elements can be used within the <rm> RGManager tag:
- Failover Domain: (tag: <failoverdomains></failoverdomains>): A failover domain is a set of cluster nodes that are eligible to run a specific cluster service in the event of a cluster node failure. More than one failure domain can be configured with different rules applied for different cluster services.
- Global Resources: (tag: <resources></resources>): Global cluster resources are globally configured resources that can be related when configuring cluster services. Global cluster resources simplify the process of cluster service configuration by global resource name reference.
- Cluster Service: (tag: <service></service>): A cluster service usually defines more than one resource combined to provide a cluster service. The order of resources provided within a cluster service is important because it defines the resource start and stop order.
The most used and important RGManager command-line expressions are as follows:
- clustat: The clustat command provides cluster status information. It also provides information about the cluster, cluster nodes, and cluster services.
- clusvcadm: The clusvcadm command provides cluster service management commands such as start, stop, disable, enable, relocate, and others.
By default, RGManager logging is configured to log information related to RGManager to the syslog/var/log/messages file. If the logfile parameter in the Corosync configuration file’s logging section is configured, information related to RGManager will be logged in the location specified by the logfile parameter. The default RGManager log file is named rgmanager.log.
Let’s start with the details of RGManager configuration.
Configuring failover domains
The <rm> tag in the CMAN configuration file usually begins with the definition of a failover domain, but configuring a failover domain is not required for normal operation of the cluster.
A failover domain is a set of cluster nodes with configured failover rules. The failover domain is attached to the cluster service configuration; in the event of a cluster node failure, the configured cluster service’s failover domain rules are applied.
Failover domains are configured within the <rm> RGManager tag. The failover domain configuration begins with the <failoverdomains> tag and ends with the </failoverdomains> tag. Within the <failoverdomains> tag, you can specify one or more failover domains in the following form:
<failoverdomain failoverdomainname failoverdomain_options> </failoverdomain>
The failoverdomainname parameter is a unique name provided for the failover domain in the form of name=”desired_name”.
The failoverdomain_options options are the rules that we apply to the failover domain.
The following rules can be configured for a failover domain:
- Unrestricted: (parameter: restricted=”0″): This failover domain configuration allows you to run a cluster service on any of the configured cluster nodes.
- Restricted: (parameter: restricted=”1″): This failover domain configuration allows you to restrict a cluster service to run on the members you configure.
- Ordered: (parameter: ordered=”1″): This failover domain configuration allows you to configure a preference order for cluster nodes. In the event of cluster node failure, the preference order is taken into account. The order of the listed cluster nodes is important because it is also the priority order.
- Unordered: (parameter: ordered=”0″): This failover domain configuration allows any of the configured cluster nodes to run a specific cluster service.
- Failback: (parameter: nofailback=”0″): This failover domain configuration allows you to configure failback for the cluster service. This means the cluster service will fail back to the originating cluster node once the cluster node is operational.
- Nofailback: (parameter: nofailback=”1″): This failover domain configuration allows you to disable the failback of the cluster service back to the originating cluster node once it is operational.
Within the <failoverdomain> tag, the desired cluster nodes are configured with a <failoverdomainnode> tag in the following form:
<failoverdomainnode nodename/>
The nodename parameter is the cluster node name as provided in the <clusternode> tag of the CMAN configuration file.
You can add the following simple failover domain configuration to your existing CMAN configuration file. In the following screenshot, you can see the CMAN configuration file with a simple failover domain configuration.
The previous example shows a failover domain named simple with no failback, no ordering, and no restrictions configured. All three cluster nodes are listed as failover domain nodes.
Note that it is important to change the config_version parameter in the second line on every CMAN cluster configuration file.
Once you have configured the failover domain, you need to validate the cluster configuration file. A valid CMAN configuration is required for normal operation of the cluster. If the validation of the cluster configuration file fails, recheck the configuration file for common typo errors. In the following screenshot, you can see the command used to check the CMAN configuration file for errors:
Note that, if a specific cluster node is not online, the configuration file will have to be transferred manually and the cluster stack software will have to be restarted to catch up once it comes back online.
Once your configuration is validated, you can propagate it to other cluster nodes. In this screenshot, you can see the CMAN configuration file propagation command used on the node-1 cluster node:
For successful CMAN configuration file distribution to the other cluster nodes, the CMAN configuration file’s config_version parameter number must be increased.
You can confirm that the configuration file was successfully distributed by issuing the ccs_config_dump command on any of the other cluster nodes and comparing the XML output.
Adding cluster resources and services
The difference between cluster resources and cluster services is that a cluster service is a service built from one or more cluster resources. A configured cluster resource is prepared to be used within a cluster service. When you are configuring a cluster service, you reference a configured cluster resource by its unique name.
Resources
Cluster resources are defined within the <rm> RGManager tag of the CMAN configuration file. They begin with the <resources> tag and end with the </resources> tag. Within the <resources> tag, all cluster resources supported by RGManager can be configured.
Cluster resources are configured with resource scripts, and all RGManager-supported resource scripts are located in the /usr/share/cluster directory along with the cluster resource metadata information required to configure a cluster resource. For some cluster resources, the metadata information is listed within the cluster resource scripts, while others have separate cluster resource metadata files.
RGManager reads metadata from the scripts while validating the CMAN configuration file. Therefore, knowing the metadata information is the best way to correctly define and configure a cluster resource.
The basic syntax used to configure a cluster resource is as follows:
<resource_agent_name resource_options"/>
The resource_agent_name parameter is provided in the cluster resource metadata information and is defined as name. The resource_options option is cluster resource-configurable options as provided in the cluster resource metadata information.
If you want to configure an IP address cluster resource, you should first review the IP address of the cluster resource metadata information, which is available in the /usr/share/cluster/ip.sh script file.
The syntax used to define an IP address cluster resource is as follows:
<ip ip_address_options/>
We can configure a simple IPv4 IP address, such as 192.168.88.50, and bind it to the eth1 network interface by adding the following line to the CMAN configuration:
<ip address="192.168.88.50" family="IPv4" prefer_interface="eth1"/>
The address option is the IP address you want to configure. The family option is the address protocol family. The prefer_interface option binds the IP address to the specific network interface.
By reviewing the IP address of resource metadata information we can see that a few additional options are configurable and well explained:
- monitor_link
- nfslock
- sleeptime
- disable_rdisc
If you want to configure an Apache web server cluster resource, you should first review the Apache web server resource’s metadata information in the /usr/share/cluster/apache.metadata metadata file.
The syntax used to define an Apache web server cluster resource is as follows:
<apache apache_web_server_options/>
We can configure a simple Apache web server cluster resource by adding the following line to the CMAN configuration file:
<apache name="apache" server_root="/etc/httpd" config_file="conf/httpd.conf" shutdown_wait="60"/>
The name parameter is the unique name provided for the apache cluster resource.
The server_root option provides the Apache installation location. If no server_root option is provided, the default value is /etc/httpd.
The config_file option is the path to the main Apache web server configuration file from the server_root file. If no config_file option is provided, the default value is conf/httpd.conf.
The shutdown_wait option is the number of seconds to wait before the correct end-of-service shutdown.
By reviewing the Apache web server resource metadata, you can see that a few additional options are configurable and well explained:
- httpd
- httpd_options
- service_name
We can add the IP address and Apache web server cluster resources to the example configuration we are building, as follows.
<resources> <ip address="192.168.10.50" family="IPv4" prefer_interface="eth1"/> <apache name="apache" server_root="/etc/httpd" config_file="conf/httpd.conf" shutdown_wait="60"/> </resources>
Do not forget to increase the config_version parameter number.
Make sure that you the validate cluster configuration file with every change. In the following screenshot, you can see the command used to validate the CMAN configuration:
After we’ve validated our configuration, we can distribute the cluster configuration file to other nodes. In this screenshot, you can see the command used to distribute the CMAN configuration file from the node-1 cluster node to other cluster nodes:
Services
The cluster services are defined within the <rm> RGManager tag of the CMAN configuration file after the cluster resources tag. They begin with the <service> tag and end with the </service> tag.
The syntax used to define a service is as follows:
<service service_options> </service>
The resources within the cluster services are referenced to the globally configured cluster resources. The order of the cluster resources configured within the cluster service is important. This is because it is also a resource start order. The syntax for cluster resource configuration within the cluster service is as follows:
<service service_options> <resource_agent_name ref="referenced_cluster_resource_name"/> </service>
The service options can be the following:
- Autostart: (parameter: autostart=”1″): This parameter starts services when RGManager starts. By default, RGManager starts all services when it is started and Quorum is present.
- Noautostart (parameter: autostart=”0″): This parameter disables the start of all services when RGManager starts.
- Restart recovery (parameter: recovery=”restart”): This is RGManager’s default recovery policy. On failure, RGManager will restart the service on the same cluster node. If the service restart fails, RGManager will relocate the service to another operational cluster node.
- Relocate recovery (parameter: recovery=”relocate”): On failure, RGManager will try to start the service on other operational cluster nodes.
- Disable recovery (parameter: recovery=”disable”): On failure RGManager, will place the service in the disabled state.
- Restart disable recovery (parameter: recovery=”restart-disable”): On failure, RGManager will try to restart the service on the same cluster node. If the restart fails, it will place the service in the disabled state.
Additional restart policy extensions are available, as follows:
- Maximum restarts (parameter: max_restarts=”N”; where N is the desired integer value): the maximum restarts parameter is defined by an integer that specifies the maximum number of service restarts before taking additional recovery policy actions
- Restart expire time (parameter: restart_expire_time=”N”; where N is the desired integer value in seconds): The restart expire time parameter is defined by an integer value in seconds, and configures the time to remember a restart event
We can configure a web server cluster service with respect to the configured IP address and Apache web server resources with the following CMAN configuration file syntax:
<service name="webserver" autostart="1" recovery="relocate"> <ip ref="192.168.88.50"/> <apache ref="apache"/> </service>
A minimal configuration of a web server cluster service requires a cluster IP address and an Apache web server resource.
- The name parameter defines a unique name for the web server cluster service.
- The autostart parameter defines an automatic start of the webserver cluster service on RGManager startup.
- The recovery parameter configures the restart of the web server cluster service on other cluster nodes in the event of failure.
We can add the web server cluster service to the example CMAN configuration file we are building, as follows.
<resources> <ip address="192.168.10.50" family="IPv4" prefer_interface="eth1"/> <apache name="apache" server_root="/etc/httpd" config_file="conf/httpd.conf" shutdown_wait="60"/> </resources> <service name="webserver" autostart="1" recovery="relocate"> <ip ref="192.168.10.50"/> <apache ref="apache"/> </service>
Do not forget to increase the config_version parameter.
Make sure you validate the cluster configuration file with every change. In the following screenshot, we can see the command used to validate the CMAN configuration:
After you’ve validated your configuration, you can distribute the cluster configuration file to other nodes. In this screenshot, we can see the command used to distribute the CMAN configuration file from the node-1 cluster node to other cluster nodes:
With the final distribution of the cluster configuration, a cluster service is configured and RGManager starts the cluster service called webserver. You can use the clustat command to check whether the web server cluster service was successfully started and also which cluster node it is running on. In the following screenshot, you can see the clustat command issued on the node-1 cluster node:
Let’s take a look at the following terms:
- Service Name: This column defines the name of the service as configured in the CMAN configuration file.
- Owner: This column lists the node the service is running on or was last running on.
- State: This column provides information about the status of the service.
Managing cluster services
Once you have configured the cluster services as you like, you must learn how to manage them. We can manage cluster services with the clusvcadm command and additional parameters. The syntax of the clusvcadm command is as follows:
clusvcadm [parameter]
With the clusvcadm command, you can perform the following actions:
- Disable service (syntax: clusvcadm -d <service_name>): This stops the cluster service and puts it into the disabled state. This is the only permitted operation if the service in question is in the failed state.
- Start service (syntax: clusvcadm -e <service_name> -m <cluster_node>): This starts a non-running cluster service. It optionally provides the cluster node name you would like to start the service on.
- Relocate service (syntax: clusvcadm -r <service_name> -m <cluster_node>): This stops the cluster service and starts it on a different cluster node as provided with the -m parameter.
- Migrate service (syntax: clusvcadm -M <service_name> -m <cluster_node>): Note that this applies only to virtual machine live migrations.
- Restart service (syntax: clusvcadm -R <service_name>): This stops and starts a cluster service on the same cluster node.
- Stop service (syntax: clusvcadm -s <service_name>): This stops the cluster service and keeps it on the current cluster node in the stopped state.
- Freeze service (syntax: clusvcadm -Z <service_name>): This keeps the cluster service running on the current cluster node but disables service status checks and service failover in the event of a cluster node failure.
- Unfreeze service (syntax: clusvcadm -U <service_name>): This takes the cluster service out of the frozen state and enables service status checks and failover.
We can continue with the previous example and migrate the webserver cluster service from the currently running node-1 cluster node to the node-3 cluster node. To achieve cluster service relocation, the clusvcadm command with the relocate service parameter must be used, as follows. In the following screenshot, we can see the command issued to migrate the webserver cluster service to the node-3 cluster node:
The clusvcadm command is the cluster service command used to administer and manage cluster services.
The -r webserver parameter provides information that we need to relocate a cluster service named webserver.
The -m node-3 command provides information on where we want to relocate the cluster service.
Once the cluster service migration command completes, the webserver cluster service will be relocated to the node-3 cluster node. The clustat command shows that the webserver service is now running on the node-3 cluster node. In this screenshot, we can see that the webserver cluster service was successfully relocated to the node-3 cluster node:
We can easily stop the webserver cluster service by issuing the appropriate command. In the following screenshot, we can see the command used to stop the webserver cluster service:
The clusvcadm command is the cluster service command used to administer and manage cluster services.
The -s webserver parameter provides the information that you require to stop a cluster service named webserver. Another take at the clustat command should show that the webserver cluster service has stopped; it also provides the information that the last owner of the running webserver cluster service is the node-3 cluster node.
In this screenshot, we can see the output of the clustat command, showing that the webserver cluster service is running on the node-3 cluster node:
If we want to start the webserver cluster service on the node-1 cluster node, we can do this by issuing the appropriate command. In the following screenshot, we can see the command used to start the webserver cluster service on the node-1 cluster node:
clusvcadm is the cluster service command used to administer and manage cluster services.
The -e webserver parameter provides the information that you need to start a webserver cluster service.
The -m node-1 parameter provides the information that you need to start the webserver cluster service on the node-1 cluster node. As expected, another look at the clustat command should make it clear that the webserver cluster service has started on the node-1 cluster node, as follows.
In this screenshot, you can see the output of the clustat command, showing that the webserver cluster service is running on the node -1 cluster node:
Removing cluster resources and services
Removing cluster resources and services is the reverse of adding them. Resources and services are removed by editing the CMAN configuration file and removing the lines that define the resources or services you would like to remove. When removing cluster resources, it is important to verify that the resources are not being used within any of the configured or running cluster services. As always, when editing the CMAN configuration file, the config_version parameter must be increased. Once the CMAN configuration file is edited, you must run the CMAN configuration validation check for errors. When the CMAN configuration file validation succeeds, you can distribute it to all other cluster nodes.
The procedure for removing cluster resources and services is as follows:
- Remove the desired cluster resources and services and increase the config_version number.
- Validate the CMAN configuration file.
- Distribute the CMAN configuration file to all other nodes.
We can proceed to remove the webserver cluster service from our example cluster configuration. Edit the CMAN configuration file and remove the webserver cluster service definition.
Remember to increase the config_version number. Validate your cluster configuration with every CMAN configuration file change.
In this screenshot, we can see the command used to validate the CMAN configuration:
When your cluster configuration is valid, you can distribute the CMAN configuration file to all other cluster nodes. In the following screenshot, we can see the command used to distribute the CMAN configuration file from the node-1 cluster node to other cluster nodes:
Once the cluster configuration is distributed to all cluster nodes, the webserver cluster service will be stopped and removed. The clustat command shows no service configured and running. In the following screenshot, we can see that the output of the clustat command shows no cluster service called webserver existing in the cluster:
Summary
In this article, you learned how to add and remove cluster failover domains, cluster resources, and cluster services. We also learned how to start, stop, and migrate cluster services from one cluster node to another, and how to remove cluster resources and services from a running cluster configuration.
Resources for Article:
Further resources on this subject:
- Replication [article]
- Managing public and private groups [article]
- Installing CentOS [article]