Mastering vRealize Operations Manager(Second Edition)
上QQ阅读APP看书,第一时间看更新

To HA or not to HA?

You have to have a good grasp of the geographical layout of the infrastructure, including remote offices and data centers that you want to monitor.

'You should also consider the number and type of objects for which you want to collect data, which includes the type and number of adapters installed. Determine whether you want to enable high availability.

Based on that information you can determine what type of nodes need to be deployed in your vRealize Operations cluster to meet those requirements.

As you may remember from the previous chapter, you can deploy vRealize Operations on a single node, or on multiple nodes, to create a multi-node cluster for either scalability, availability, or both.  

Every node in the cluster is assigned a role:

  • Master
  • Master replica
  • Data
  • Remote collector

vRealize Operations supports high availability by enabling a master replica node for the vRealize Operations master node.

Although enabling HA is simple enough, it should not be done without proper consideration.

As mentioned earlier, both cache and persistence data is sharded per resource, not per metric or adapter. As such, when a data node is unavailable, not only can metrics not be viewed or used for analytics, but also new metrics for resources that are on affected nodes are discarded, assuming the adapter collector is operational, or failed over. This fact alone would attract administrators to simply enabling HA by default, considering how easy it is to do so.

Each time you enable high availability for the vRealize Operations cluster, a re-balance process takes place in the background, which may sometimes take a long time. This is a necessary action if you lose the master or master replica nodes. You can take advantage of all the benefits vSphere has to offer, such as DRS anti-affinity rules, to make sure nodes are not residing on the same vSphere cluster nodes. vRealize Operations HA ensures you can recover your data when a single node is lost. If more than one node is lost, the loss of data is irreversible. 

Do not stretch your vRealize Operations cluster nodes across different geographical zones or logical data centers, even if they are on the same LAN or subnet. It is not supported.

Although HA is very easy to enable, you must ensure that your cluster is sized appropriately to handle the increased load. As HA duplicates all data stored in both the GemFire cache and Persistence layers, it essentially doubles the load on the system. We will be discussing the sizing considerations in detail later in this chapter.

It is also important to consider that vRealize Operations should not be deployed in a vSphere Cluster where the number of vRealize Operations nodes is greater than the underlying vSphere Cluster hosts. This is because there is little point enabling HA in vRealize Operations if more than one node is residing on the same vSphere host at the same time.

After deploying all your vRealize Operations nodes and enabling HA, ensure a DRS affinity rule is created to keep all nodes on separate vSphere hosts under normal operation. This can be achieved with a DRS separate virtual machine, or a virtual machine to host an affinity rule.

Last, but not least, enabling HA ensures that vRealize Operations can tolerate the loss of only a single cluster node without loss of data. Adding more additional cluster nodes does not increase the number of lost nodes vRealize Operations can tolerate.