<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Adyesha -Desire For Knowledge&#187; cluster installation</title>
	<atom:link href="http://adyesha.com/tag/cluster-installation/feed/" rel="self" type="application/rss+xml" />
	<link>http://adyesha.com</link>
	<description>Share Your knowledge! Meet Your Desire!</description>
	<lastBuildDate>Thu, 29 Jul 2010 15:49:20 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Server cluster Concepts</title>
		<link>http://adyesha.com/2009/08/server-cluster-concepts/</link>
		<comments>http://adyesha.com/2009/08/server-cluster-concepts/#comments</comments>
		<pubDate>Mon, 17 Aug 2009 15:35:46 +0000</pubDate>
		<dc:creator>Administrator</dc:creator>
				<category><![CDATA[Cluster]]></category>
		<category><![CDATA[Computer & Internet]]></category>
		<category><![CDATA[2003 server]]></category>
		<category><![CDATA[active directory cluster]]></category>
		<category><![CDATA[active directory server]]></category>
		<category><![CDATA[application server]]></category>
		<category><![CDATA[backup server]]></category>
		<category><![CDATA[base windows]]></category>
		<category><![CDATA[cluadmin]]></category>
		<category><![CDATA[cluster 2003]]></category>
		<category><![CDATA[cluster application]]></category>
		<category><![CDATA[cluster applications]]></category>
		<category><![CDATA[cluster architecture]]></category>
		<category><![CDATA[cluster configuration]]></category>
		<category><![CDATA[cluster configurations]]></category>
		<category><![CDATA[cluster documentation]]></category>
		<category><![CDATA[cluster environment]]></category>
		<category><![CDATA[cluster exe]]></category>
		<category><![CDATA[cluster failover]]></category>
		<category><![CDATA[cluster guide]]></category>
		<category><![CDATA[cluster hardware]]></category>
		<category><![CDATA[cluster installation]]></category>
		<category><![CDATA[cluster management]]></category>
		<category><![CDATA[cluster microsoft]]></category>
		<category><![CDATA[cluster performance]]></category>
		<category><![CDATA[cluster replication]]></category>
		<category><![CDATA[cluster servers]]></category>
		<category><![CDATA[cluster support]]></category>
		<category><![CDATA[cluster windows 2000]]></category>
		<category><![CDATA[clustering]]></category>
		<category><![CDATA[clusters]]></category>
		<category><![CDATA[command line cluster]]></category>
		<category><![CDATA[compatibility list]]></category>
		<category><![CDATA[computer cluster]]></category>
		<category><![CDATA[configuration server]]></category>
		<category><![CDATA[database cluster]]></category>
		<category><![CDATA[database server]]></category>
		<category><![CDATA[external storage]]></category>
		<category><![CDATA[fibre channel arbitrated loop]]></category>
		<category><![CDATA[hardware architectures]]></category>
		<category><![CDATA[hardware cluster]]></category>
		<category><![CDATA[hardware compatibility test]]></category>
		<category><![CDATA[high availability cluster]]></category>
		<category><![CDATA[load balancing]]></category>
		<category><![CDATA[majority node set cluster]]></category>
		<category><![CDATA[microsoft cluster]]></category>
		<category><![CDATA[microsoft cluster service]]></category>
		<category><![CDATA[microsoft hardware]]></category>
		<category><![CDATA[microsoft server]]></category>
		<category><![CDATA[node clusters]]></category>
		<category><![CDATA[quorum resource]]></category>
		<category><![CDATA[raid adapter]]></category>
		<category><![CDATA[raid storage]]></category>
		<category><![CDATA[redundant array of inexpensive disks]]></category>
		<category><![CDATA[server]]></category>
		<category><![CDATA[server cluster]]></category>
		<category><![CDATA[server cluster requirements]]></category>
		<category><![CDATA[server clustering]]></category>
		<category><![CDATA[server clusters]]></category>
		<category><![CDATA[server linux]]></category>
		<category><![CDATA[server management]]></category>
		<category><![CDATA[server setup]]></category>
		<category><![CDATA[server windows 2003]]></category>
		<category><![CDATA[servers]]></category>
		<category><![CDATA[setup cluster]]></category>
		<category><![CDATA[storage configuration]]></category>
		<category><![CDATA[two network cards]]></category>
		<category><![CDATA[ubuntu cluster]]></category>
		<category><![CDATA[unix cluster]]></category>
		<category><![CDATA[web application server]]></category>
		<category><![CDATA[websphere]]></category>
		<category><![CDATA[websphere cluster]]></category>
		<category><![CDATA[windows 2003 cluster]]></category>
		<category><![CDATA[windows cluster]]></category>
		<category><![CDATA[windows operating system]]></category>
		<category><![CDATA[windows server]]></category>

		<guid isPermaLink="false">http://adyesha.com/?p=116</guid>
		<description><![CDATA[<div>Server cluster Concepts (Server Clusters: Frequently Asked Questions for Windows 2000 and Windows Server 2003)</div>
<p><strong>Q.</strong> What hardware do you need to build a Server cluster?</p>
<p><strong>A.</strong> The most important criteria for Server cluster hardware is that it be included in a validated Cluster configuration on the Microsoft Hardware Compatibility List (HCL), indicating it has passed the Microsoft Cluster Hardware Compatibility Test. All qualified solutions appear on the Microsoft HCL (<a href="http://go.microsoft.com/fwlink/?linkid=67738"> http://go.microsoft.com/fwlink/?linkid=67738</a>). Only cluster solutions listed on the HCL are supported by Microsoft.</p>
<p>In general, the criteria for building a server cluster include the following:</p>
<ul>
<li><strong>Servers:</strong> Two or more PCI-based machines running one of the operating system releases that support Server clusters (see below). Server clusters can run on all hardware architectures supported by the base Windows operating system, however, you cannot mix 32-bit and 64-bit architectures in the same cluster.</li>
<li><strong>Storage:</strong> Each server needs to be attached to a shared, external storage bus(es) that is/are separate from the bus containing the system disk, the startup disk or the pagefile disk. Applications and data are stored on one or more disks attached to this bus. There must be enough storage capacity on the shared cluster bus(es) for all of the applications running in the cluster environment. This shared storage configuration allows applications to failover between servers in the cluster.<br />
Microsoft recommends hardware Redundant Array of Inexpensive Disks (RAID) for all cluster disks to eliminate disk drives as a potential single point of failure. This means using a RAID storage unit, a host-based RAID adapter that implements RAID across disks, etc.<br />
SCSI is supported for 2-node cluster configurations only. Fibre channel arbitrated loop is supported for 2-node clusters only. Microsoft recommends using fibre channel switched fabrics for clusters of more than two nodes.</li>
<li><strong>Network:</strong> Each server needs at least two network cards. Typically, one is the public network and the other is a private network between the two nodes. A static IP address is needed for each group of applications that move as a unit between nodes. Server clusters can project the identity of multiple servers from a single cluster by using multiple IP addresses and computer names: this is known as a <em>virtual server</em>.</li>
</ul>
<p><strong>Q.</strong> What is a cluster resource?</p>
<p><strong>A.</strong> A cluster resource is the lowest level unit of management in a Server cluster. A resource represents a physical object or an instance of running code. For example, a physical disk, an IP address, an MSMQ queue, a COM object all of these things are considered to be resources. From a management perspective, resources can be independently started and stopped and each is monitored to ensure that it is healthy.</p>
<p>Server cluster can monitor any arbitrary resource type. This is possible because Server clusters define a resource plug-in model. Each resource type has an associated resource plug-in or <em>resource dll</em> that is used to start, stop and provide health information that is specific to the resource type. For example, starting and stopping SQL Server is different from starting and stopping a physical disk. The resource dll takes care of the differences. Application developers and system administrators can build new resource dlls for their applications that can be registered with the cluster service.</p>
<p>Server clusters provides some generic plug-ins that can be used to make existing applications cluster-aware very quickly, known as <em>Generic Service</em> and <em>Generic Application</em>. With Windows Server 2003, a <em>Generic Script</em> resource plug-in was added that allows the resource dll to be written in any scripting language supported by the Windows operating system.</p>
<p><strong>Q.</strong> What is a resource dependency?</p>
<p><strong>A.</strong> A complete application actually consists of multiple pieces or multiple resources, some pieces are code and others are physical resources required by the application. The resources are related in different ways; for example, an application that writes to a disk cannot come online until the disk is online. If the disk fails, then, by definition, the application cannot continue to run since it writes to the disk. Resource dependencies can be defined by the application developer or system administrator to capture these relationships. Resource dependencies define the order that resources are brought online and control how failures are propagated to the various pieces of the application.</p>
<p><strong>Q.</strong> What is a resource group?</p>
<p><strong>A.</strong> A resource group is a collection of one or more resources that are managed and monitored as a single unit. A resource group can be started or stopped. If a resource group is started, each resource in the group is started (taking into account any start order defined by the dependencies between resources in the group). If a resource group is stopped, all of the resources in the group are stopped. Dependencies between resources cannot span a group. In other words, the set of resources within a group is an autonomous unit that can be started and stopped independently from any other group. A group is a single, indivisible unit that is hosted on one server in a Server cluster at any point in time and it is the unit of failover.</p>
<p><strong>Q.</strong> Can I have dependencies between resources in different groups?</p>
<p><strong>A.</strong> No, resource dependencies are confined to a single group.</p>
<p><strong>Q.</strong> What is a virtual server?</p>
<p><strong>A.</strong> A virtual server is a resource group that contains an IP address resource and a network name resource. When an application is hosted in a virtual server, the application can be accessed by clients using the IP address or network name in that resource group. As the resource group fails over across the cluster, the IP address and network name remain the same, therefore the client becomes unaware of the physical location of the application and will continue to work in the event of a failure of one of the servers in the cluster.</p>
<p><strong>Q.</strong> How can I take advantage of extensibility features of ISA Server?</p>
<p><strong>A.</strong> A number of third-party vendors offer solutions such as virus detection, content filtering, site categorization, reporting, and administration. Customers and developers also have the ability to create their own extensions to ISA Server. ISA Server includes a comprehensive software development kit for developing tools that build on ISA Server firewall, caching, and management features.</p>
<p><strong>Q.</strong> What is failover?</p>
<p><strong>A.</strong> Server clusters monitor the health of the nodes in the cluster and the resources in the cluster. In the event of a server failure, the cluster software re-starts the failed server&#8217;s workload on one or more of the remaining servers. If an individual resource or application fails (but the server does not), Server clusters will typically try to re-start the application on the same server; if that fails, it moves the application&#8217;s resources and re-starts it on the other server. The process of detecting failures and restarting the application on another server in the cluster is known as <em>failover</em>.</p>
<p>The cluster administrator can set various recovery policies such as whether or not to re-start an application on the same server, and whether or not to automatically &#8220;failback&#8221; (re-balance) workloads when a failed server comes back online.</p>
<p><strong>Q.</strong> Is failover transparent to users?</p>
<p><strong>A.</strong> Server clusters do not require any special software on client computers, so the user experience during failover depends on the nature of the client side of their client-server application. Client reconnection can be made transparent, because the Server clusters software has restarted the applications, file shares, etc. at exactly the same IP address.</p>
<p>If a client is using &#8220;state-less&#8221; connections such as a standard browser connection, then the client would be unaware of a failover if it occurred between server requests. If a failure occurs while a client is connected to the failed resource, then the client will receive whatever standard notification is provided by the client side of their application when the server side becomes unavailable. This might be, for example, the standard &#8220;Abort, Retry, or Cancel?&#8221; prompt you get when using the Windows Explorer to download a file at the time a server or network goes down. In this case, client reconnection is not automatic (the user must choose &#8220;Retry&#8221;), but the user is fully informed of what is happening and has a simple, well-understood method of re-establishing contact with the server. Of course, in the meantime, the cluster service is busily re-starting the service or application so that, when the user chooses &#8220;Retry&#8221;, it re-appears as if it never went away.</p>
<p><strong>Q.</strong> What is failback?</p>
<p><strong>A.</strong> In the event of the failure of a server in a cluster, the applications and resources are failed over to another node in the cluster. When the failed node rejoins the cluster (after reboot for example), that node now is free to be used by applications. A cluster administrator can set policies on resources and resource groups that allow an application to automatically move back to a node if it becomes available, thus automatically taking advantage of a node rejoining the cluster. These policies are known as <em>failback</em> policies. You should take care when defining automatic failback policies since depending on the application, automatically moving the application (which was working just fine) may have undesirable consequences on the clients using the applications.</p>
<p><strong>Q.</strong> When an application restarts after failover, does it restore the application state at the time of failure?</p>
<p><strong>A.</strong> No, Server clusters provide a fast crash restart mechanism. When an application is failed over and restarted, the application is restarted from scratch. Any persistent data written out to a database or to files is available to the application, but any in-memory state that the application had before the failover is lost.</p>
<p><strong>Q.</strong> At what level does failover exist?</p>
<p><strong>A.</strong> At the resource group level.</p>
<p><strong>Q.</strong> What is a Quorum Resource and how does it help Server clusters provide high availability?</p>
<p><strong>A.</strong> Server clusters require a quorum resource to function. The quorum resource, like any other resource, is a resource which can only be owned by one server at a time, and for which servers can negotiate for ownership. Negotiating for the quorum resource allows Server clusters to avoid &#8220;split-brain&#8221; situations where the servers are active and think the other servers are down. This can happen when, for example, the cluster interconnect is lost and network response time is problematic. The quorum resource is used to store the definitive copy of the cluster configuration so that regardless of any sequence of failures, the cluster configuration will always remain consistent.</p>
<p><strong>Q.</strong> What is active/active verses active/passive?</p>
<p><strong>A.</strong> Active/Active and Active/Passive are terms used to describe how applications are deployed in a cluster. Unfortunately, they mean different things to different people and so the terms tend to cause confusion.</p>
<p>From the perspective of a single application or database:</p>
<ul>
<li> Active/Active means that the same application or pieces of the same service can be run concurrently on different nodes in the cluster. For example SQL Server 2000 can be configured such that the database is partitioned and each node can be running a single instance of the database. SQL Server provides the notion of views to provide a single image of the entire database.</li>
<li> Active/Passive means that only one node in the cluster can be hosting the given application. For example, a single file share is active/passive. Any given file share can only be hosted on one node at a time.</li>
</ul>
<p>From the perspective of a set of instances of an application or service:</p>
<ul>
<li> Active/Active means that different instances of the same application can be running concurrently on different cluster nodes. For example, each node in a cluster can be running SQL Server against a different database. A single cluster can support many file shares that are hosted on the nodes in a cluster concurrently.</li>
<li> Active/Passive means that only one instance of a service can be running anywhere in the cluster. For example, there must only be a single instance of the DHCP service running in the cluster at any point in time.</li>
</ul>
<p>From the perspective of the cluster:</p>
<ul>
<li> Active/Active means that all nodes in the cluster are running applications. These may be multiple instances of the same application or different applications (for example, in a 2-node cluster, WINS may be running on one node and DHCP may be running on the other node).</li>
<li> Active/Passive means that one of the cluster nodes is spare and not being used to host applications.</li>
</ul>
<p>Server clusters support all of these different combinations; the terms are really about how specific applications or sets of applications are deployed.</p>
<p>With the advent of more than two servers in a cluster, starting with Windows 2000 Datacenter, the term active/active is confusing because there may be four servers. When there are multiple servers, the set of options available for deployment becomes more flexible, allowing different configurations such as N+I.</p>
<p><strong>Q.</strong> How do I benefit from more than two nodes in a cluster?</p>
<p><strong>A.</strong> Failover is the mechanism that instance applications and the individual partitions of a partitioned application typically employ for high availability (the term Pack has been coined to describe a highly available, single instance application or partition).</p>
<p>In a 2-node cluster, defining failover policies is trivial. If one node fails, the only option is to failover to the remaining node. As the size of a cluster increases, different failover policies are possible and each one has different characteristics.</p>
<p><strong>Failover Pairs</strong></p>
<p>In a large cluster, failover policies can be defined such that each application is set to failover between two nodes. The simple example below shows two applications App1 and App2 in a 4-node cluster.</p>
<p><img src="http://i.technet.microsoft.com/cc781023.d3606a01-202b-4c98-af93-f8c852007d63%28de-de%29.gif" alt="d3606a01-202b-4c98-af93-f8c852007d63" /></p>
<p><strong>Figure 1: Failover pairs</strong></p>
<p>Configuration has pros and cons:</p>
<table border="0">
<tbody>
<tr>
<td>Pro</td>
<td>Good for clusters that are supporting heavy-weight<span>1</span> applications such as databases. This configuration ensures that in the event of failure, two applications will not be hosted on the same node.</td>
</tr>
<tr>
<td>Pro</td>
<td>Very easy to plan capacity. Each node is sized based on the application that it will need to host (just like a 2-node cluster hosting one application).</td>
</tr>
<tr>
<td>Pro</td>
<td>Effect of a node failure on availability and performance of the system is very easy to determine.</td>
</tr>
<tr>
<td>Pro</td>
<td>Get the flexibility of a larger cluster. In the event that a node is taken out for maintenance, the buddy for a given application can be changed dynamically (may end up with standby policy below).</td>
</tr>
<tr>
<td>Con</td>
<td>In simple configurations such as the one above, only 50% of the capacity of the cluster is in use.</td>
</tr>
<tr>
<td>Con</td>
<td>Administrator intervention may be required in the event of multiple failures.</td>
</tr>
</tbody>
</table>
<p><span>1</span> A heavy-weight application is one that consumes a significant number of system resources such as CPU, memory or IO bandwidth.</p>
<p>Failover pairs are supported by server clusters on all versions of Windows by limiting the <em>possible owner list</em> for each resource to a given pair of nodes.</p>
<p><strong>Hot-Standby Server</strong></p>
<p>To reduce the overhead of failover pairs, the spare node for each pair may be consolidated into a single node, providing a hot standby server that is capable of picking up the work in the event of a failure.</p>
<p><img src="http://i.technet.microsoft.com/cc781023.5dcf9736-6f00-4737-bcd7-e872dab93657%28de-de%29.gif" alt="5dcf9736-6f00-4737-bcd7-e872dab93657" /></p>
<p><strong>Figure 2: Standby Server</strong></p>
<p>Configuration has pros and cons:</p>
<table border="0">
<tbody>
<tr>
<td>Pro</td>
<td>Good for clusters that are supporting heavy-weight applications such as databases. This configuration ensures that in the event of a single failure, two applications will not be hosted on the same node.</td>
</tr>
<tr>
<td>Pro</td>
<td>Very easy to plan capacity. Each node is sized based on the application that it will need to host, the spare is sized to be the maximum of the other nodes.</td>
</tr>
<tr>
<td>Pro</td>
<td>Effect of a node failure on availability and performance of the system is very easy to determine.</td>
</tr>
<tr>
<td>Con</td>
<td>Configuration is targeted towards a single point of failure.</td>
</tr>
<tr>
<td>Con</td>
<td>Does not really handle multiple failures well. This may be an issue during scheduled maintenance where the spare may be in use.</td>
</tr>
</tbody>
</table>
<p>Server clusters support standby servers today using a combination of the possible owners list and the preferred owners list. The preferred node should be set to the node that the application will run on by default and the possible owners for a given resource should be set to the preferred node and the spare node.</p>
<p><strong>N+I</strong></p>
<p>Standby server works well for 4-node clusters in some configurations, however, its ability to handle multiple failures is limited. N+I configurations are an extension of the standby server concept where there are N nodes hosting applications and I nodes spare.</p>
<p><img src="http://i.technet.microsoft.com/cc781023.f58ca427-f41a-48ee-a859-62d441c75912%28de-de%29.gif" alt="f58ca427-f41a-48ee-a859-62d441c75912" /></p>
<p><strong>Figure 3: N+I Spare node configuration</strong></p>
<p>Configuration has pros and cons:</p>
<table border="0">
<tbody>
<tr>
<td>Pro</td>
<td>Good for clusters that are supporting heavy-weight applications such as databases or Exchange. This configuration ensures that in the event of a failure, an application instance will failover to a spare node, not one that is already in use.</td>
</tr>
<tr>
<td>Pro</td>
<td>Very easy to plan capacity. Each node is sized based on the application that it will need to host.</td>
</tr>
<tr>
<td>Pro</td>
<td>Effect of a node failure on availability and performance of the system is very easy to determine.</td>
</tr>
<tr>
<td>Pro</td>
<td>Configuration works well for multiple failures.</td>
</tr>
<tr>
<td>Con</td>
<td>Does not really handle multiple applications running in the same cluster well. This policy is best suited to applications running on a dedicated cluster.</td>
</tr>
</tbody>
</table>
<p>Server cluster supports N+I scenarios in the Windows Server 2003 release using a cluster group public property <strong>AntiAffinityClassNames</strong>. This property can contain an arbitrary string of characters. In the event of a failover, if a group being failed over has a non-empty string in the <strong>AntiAffinityClassNames</strong> property, the failover manager will check all other nodes. If there are any nodes in the possible owners list for the resource that are NOT hosting a group with the same value in <strong>AntiAffinityClassNames</strong>, then those nodes are considered a good target for failover. If all nodes in the cluster are hosting groups that contain the same value in the <strong>AntiAffinityClassNames</strong> property, then the preferred node list is used to select a failover target.</p>
<p><strong>Failover Ring </strong></p>
<p>Failover rings allow each node in the cluster to run an application instance. In the event of a failure, the application on the failed node is moved to the next node in sequence.</p>
<p><img src="http://i.technet.microsoft.com/cc781023.b3d9ec24-a915-4373-a12a-8d066cd7e47a%28de-de%29.gif" alt="b3d9ec24-a915-4373-a12a-8d066cd7e47a" /></p>
<p><strong>Figure 4: Failover Ring</strong></p>
<p>Configuration has pros and cons:</p>
<table border="0">
<tbody>
<tr>
<td>Pro</td>
<td>Good for clusters that are supporting several small application instances where the capacity of any node is large enough to support several at the same time.</td>
</tr>
<tr>
<td>Pro</td>
<td>Effect on performance of a node failure is easy to predict.</td>
</tr>
<tr>
<td>Pro</td>
<td>Easy to plan capacity for a single failure.</td>
</tr>
<tr>
<td>Con</td>
<td>Configuration does not work well for all cases of multiple failures. If one Node 1 fails, Node 2 will host two application instances and Nodes 3 and 4 will host one application instance. If Node 2 then fails, Node 3 will be hosting three application instances and Node 4 will be hosting one instance</td>
</tr>
<tr>
<td>Con</td>
<td>Not well suited to heavy-weight applications since multiple instances may end up being hosted on the same node even if there are lightly-loaded nodes.</td>
</tr>
</tbody>
</table>
<p>Failover rings are supported by server clusters on the Windows Server 2003 release. This is done by defining the order of failover for a given group using the preferred owner list. A node order should be chosen and then the preferred node list should be set up with each group starting at a different node.</p>
<p><strong>Random</strong></p>
<p>In large clusters or even 4-node clusters that are running several applications, defining specific failover targets or policies for each application instance can be extremely cumbersome and error prone. The best policy in some cases is to allow the target to be chosen at random, with a statistical probability that this will spread the load around the cluster in the event of a failure.</p>
<p>Configuration has pros and cons:</p>
<table border="0">
<tbody>
<tr>
<td>Pro</td>
<td>Good for clusters that are supporting several small application instances where the capacity of any node is large enough to support several at the same time.</td>
</tr>
<tr>
<td>Pro</td>
<td>Does not require an administrator to decide where any given application should failover to.</td>
</tr>
<tr>
<td>Pro</td>
<td>Provided that there are sufficient applications or the applications are partitioned finely enough, this provides a good mechanism to statistically load balance the applications across the cluster in the event of a failure.</td>
</tr>
<tr>
<td>Pro</td>
<td>Configuration works well for multiple failures.</td>
</tr>
<tr>
<td>Pro</td>
<td>Very well tuned to handling multiple applications or many instances of the same application running in the same cluster well.</td>
</tr>
<tr>
<td>Con</td>
<td>Can be difficult to plan capacity. There is no real guarantee that the load will be balanced across the cluster.</td>
</tr>
<tr>
<td>Con</td>
<td>Effect on performance of a node failure is not easy to predict.</td>
</tr>
<tr>
<td>Con</td>
<td>Not well suited to heavy-weight applications since multiple instances may end up being hosted on the same node even if there are lightly-loaded nodes.</td>
</tr>
</tbody>
</table>
<p>The Windows Server 2003 release of server clusters randomizes the failover target in the event of node failure. Each resource group that has an empty preferred owners list will be failed over to a random node in the cluster in the event that the node currently hosting it fails.</p>
<p><strong>Customized control</strong></p>
<p>There are some cases where specific nodes may be preferred for a given application instance.</p>
<p>Configuration has pros and cons:</p>
<table border="0">
<tbody>
<tr>
<td>Pro</td>
<td>Administrator has full control over what happens when a failure occurs.</td>
</tr>
<tr>
<td>Pro</td>
<td>Capacity planning is easy, since failure scenarios are predictable.</td>
</tr>
<tr>
<td>Con</td>
<td>With many applications running in a cluster, defining a good policy for failures can be extremely complex.</td>
</tr>
<tr>
<td>Con</td>
<td>Very hard to plan for multiple cascaded failures.</td>
</tr>
</tbody>
</table>
<p>Server clusters provide full control over the order of failover using the preferred node list feature. The full semantics of the preferred node list can be defined as:</p>
<table border="0">
<tbody>
<tr>
<th> Preferred Node List</th>
<th> Move group to best possible initiated via administrator</th>
<th> Failover due to node or group failure</th>
</tr>
<tr>
<td>Contains all nodes in cluster</td>
<td>Group is moved to highest node in preferred node list that is up and running in the cluster.</td>
<td><strong>Group is moved to the next node on the preferred node list.</strong></td>
</tr>
<tr>
<td>Contains a subset of the nodes in the cluster</td>
<td>Group is moved to highest node in preferred node list that is up and running in the cluster.</p>
<p>If no nodes in the preferred node list are up and running, the group is moved to a random node.</td>
<td><strong>Group is moved to the next node on the preferred node list.</strong></p>
<p><strong>If the node that was hosting the group is the last on the list or was not in the preferred node list, the group is moved to a random node.</strong></td>
</tr>
<tr>
<td><strong>Empty</strong></td>
<td><strong>Group is moved to a random node.</strong></td>
<td><strong>Group is moved to a random node.</strong></td>
</tr>
</tbody>
</table>
<p><strong>Q.</strong> How many resources can be hosted in a cluster?</p>
<p><strong>A.</strong> The theoretical limit for the number of resources in a cluster is 1,674; however, you should be aware that the cluster service periodically polls the resources to ensure they are alive. As the number of resources increases, the overhead of this polling also increases.</p>



Share and Enjoy:


	
	
	
	
	
	
	
	
	
	
	
	
	
	




Related posts:Windows Cluster Architecture
Deploying Exchange Server 2003 in a Cluster
How to configure clustered IIS virtual servers on Windows Server



Related posts:<ol><li><a href='http://adyesha.com/2009/08/windows-cluster-architecture/' rel='bookmark' title='Permanent Link: Windows Cluster Architecture'>Windows Cluster Architecture</a></li>
<li><a href='http://adyesha.com/2009/08/deploying-exchange-server-2003-in-a-cluster/' rel='bookmark' title='Permanent Link: Deploying Exchange Server 2003 in a Cluster'>Deploying Exchange Server 2003 in a Cluster</a></li>
<li><a href='http://adyesha.com/2009/08/how-to-configure-clustered-iis-virtual-servers-on-windows-server/' rel='bookmark' title='Permanent Link: How to configure clustered IIS virtual servers on Windows Server'>How to configure clustered IIS virtual servers on Windows Server</a></li>
</ol>]]></description>
		<wfw:commentRss>http://adyesha.com/2009/08/server-cluster-concepts/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Windows Cluster Architecture</title>
		<link>http://adyesha.com/2009/08/windows-cluster-architecture/</link>
		<comments>http://adyesha.com/2009/08/windows-cluster-architecture/#comments</comments>
		<pubDate>Mon, 17 Aug 2009 14:44:48 +0000</pubDate>
		<dc:creator>Administrator</dc:creator>
				<category><![CDATA[Cluster]]></category>
		<category><![CDATA[Computer & Internet]]></category>
		<category><![CDATA[2000 cluster]]></category>
		<category><![CDATA[2000 windows]]></category>
		<category><![CDATA[additions]]></category>
		<category><![CDATA[beowulf cluster]]></category>
		<category><![CDATA[cluster 2003]]></category>
		<category><![CDATA[cluster applications]]></category>
		<category><![CDATA[cluster architecture]]></category>
		<category><![CDATA[cluster computer]]></category>
		<category><![CDATA[cluster computers]]></category>
		<category><![CDATA[cluster computing]]></category>
		<category><![CDATA[cluster configuration]]></category>
		<category><![CDATA[cluster failover]]></category>
		<category><![CDATA[cluster installation]]></category>
		<category><![CDATA[cluster node]]></category>
		<category><![CDATA[cluster nodes]]></category>
		<category><![CDATA[cluster one]]></category>
		<category><![CDATA[cluster operations]]></category>
		<category><![CDATA[cluster performance]]></category>
		<category><![CDATA[cluster replication]]></category>
		<category><![CDATA[cluster resources]]></category>
		<category><![CDATA[cluster server]]></category>
		<category><![CDATA[cluster server 2003]]></category>
		<category><![CDATA[cluster service]]></category>
		<category><![CDATA[cluster software]]></category>
		<category><![CDATA[cluster storage]]></category>
		<category><![CDATA[cluster sun]]></category>
		<category><![CDATA[cluster technology]]></category>
		<category><![CDATA[cluster training]]></category>
		<category><![CDATA[clustered]]></category>
		<category><![CDATA[clustered server]]></category>
		<category><![CDATA[clustering]]></category>
		<category><![CDATA[clustering exchange]]></category>
		<category><![CDATA[clustering software]]></category>
		<category><![CDATA[clustering technology]]></category>
		<category><![CDATA[clusters]]></category>
		<category><![CDATA[common resources]]></category>
		<category><![CDATA[communication interfaces]]></category>
		<category><![CDATA[computer clusters]]></category>
		<category><![CDATA[data replication]]></category>
		<category><![CDATA[database cluster]]></category>
		<category><![CDATA[development cluster]]></category>
		<category><![CDATA[disk cluster]]></category>
		<category><![CDATA[exchange 2003 cluster]]></category>
		<category><![CDATA[exchange cluster]]></category>
		<category><![CDATA[exchange server cluster]]></category>
		<category><![CDATA[external data storage]]></category>
		<category><![CDATA[failover]]></category>
		<category><![CDATA[grid cluster]]></category>
		<category><![CDATA[grid computing cluster]]></category>
		<category><![CDATA[group moves]]></category>
		<category><![CDATA[hardware cluster]]></category>
		<category><![CDATA[hardware devices]]></category>
		<category><![CDATA[high availability cluster]]></category>
		<category><![CDATA[high performance computing]]></category>
		<category><![CDATA[individual resources]]></category>
		<category><![CDATA[install cluster]]></category>
		<category><![CDATA[install exchange 2003 cluster]]></category>
		<category><![CDATA[load balancing]]></category>
		<category><![CDATA[load balancing cluster]]></category>
		<category><![CDATA[logical unit]]></category>
		<category><![CDATA[microsoft cluster]]></category>
		<category><![CDATA[microsoft cluster server]]></category>
		<category><![CDATA[microsoft cluster service]]></category>
		<category><![CDATA[microsoft windows nt server]]></category>
		<category><![CDATA[network names]]></category>
		<category><![CDATA[open cluster]]></category>
		<category><![CDATA[physical hardware]]></category>
		<category><![CDATA[quorum resource]]></category>
		<category><![CDATA[resource dlls]]></category>
		<category><![CDATA[samba cluster]]></category>
		<category><![CDATA[server]]></category>
		<category><![CDATA[server cluster]]></category>
		<category><![CDATA[server cluster architecture]]></category>
		<category><![CDATA[server clustering]]></category>
		<category><![CDATA[server clusters]]></category>
		<category><![CDATA[server high availability]]></category>
		<category><![CDATA[servers]]></category>
		<category><![CDATA[servers cluster]]></category>
		<category><![CDATA[service cluster]]></category>
		<category><![CDATA[setup cluster]]></category>
		<category><![CDATA[sql cluster]]></category>
		<category><![CDATA[sql server cluster]]></category>
		<category><![CDATA[storage arrays]]></category>
		<category><![CDATA[virtual server cluster]]></category>
		<category><![CDATA[web cluster]]></category>
		<category><![CDATA[windows 2003]]></category>
		<category><![CDATA[windows 2003 architecture]]></category>
		<category><![CDATA[windows 2003 clustering]]></category>
		<category><![CDATA[windows cluster]]></category>
		<category><![CDATA[windows cluster backup]]></category>
		<category><![CDATA[windows cluster group]]></category>
		<category><![CDATA[windows cluster heartbeat]]></category>
		<category><![CDATA[windows cluster iis]]></category>
		<category><![CDATA[windows cluster manager]]></category>
		<category><![CDATA[windows cluster msdtc]]></category>
		<category><![CDATA[windows cluster network]]></category>
		<category><![CDATA[windows cluster quorum]]></category>
		<category><![CDATA[windows cluster requirements]]></category>
		<category><![CDATA[windows cluster resource]]></category>
		<category><![CDATA[windows cluster resources]]></category>
		<category><![CDATA[windows cluster san]]></category>
		<category><![CDATA[windows cluster services]]></category>
		<category><![CDATA[windows cluster software]]></category>
		<category><![CDATA[windows cluster virtual]]></category>
		<category><![CDATA[windows clustering]]></category>
		<category><![CDATA[windows configuration]]></category>
		<category><![CDATA[windows high availability]]></category>
		<category><![CDATA[windows installation]]></category>
		<category><![CDATA[windows installing]]></category>
		<category><![CDATA[windows microsoft]]></category>
		<category><![CDATA[windows server]]></category>
		<category><![CDATA[windows server 2003]]></category>
		<category><![CDATA[windows server 2003 architecture]]></category>
		<category><![CDATA[windows server architecture]]></category>
		<category><![CDATA[windows service architecture]]></category>
		<category><![CDATA[windows setup]]></category>

		<guid isPermaLink="false">http://adyesha.com/?p=112</guid>
		<description><![CDATA[<p><strong>Microsoft Cluster Server </strong>(MSCS) in Microsoft Windows NT Server 4.0 Enterprise Edition was the first server cluster technology offered by Microsoft. Individual servers that compose a cluster are referred to as nodes. A Cluster service is a collection of components on each node that perform cluster-specific tasks. Hardware and software components in the cluster that are managed by the Cluster service are referred to as resources. Server clusters provide the instrumentation mechanism for managing resources through resource DLLs, which define resource abstractions (in other words, they abstract a clustered resource from a specific physical node, enabling the resource to move from one node to another), communication interfaces, and management operations.</p>
<p>Resources are elements in a cluster that are:</p>
<ul>
<li> Brought online (in service) and taken offline (out of service)</li>
<li> Managed in a server cluster</li>
<li> Owned by only one node at a time</li>
</ul>
<p>A resource group is a collection of resources, managed by the Cluster service as a single, logical unit. This logical unit is often referred to as a failover unit, because the entire group moves as a single unit between nodes. Resources and cluster elements are grouped logically according to the resources added to a resource group. When a Cluster service operation is performed on a resource group, the operation affects all individual resources contained in the group. Typically, a resource group is created that contains the individual resources required by the clustered program.</p>
<p>Cluster resources may include physical hardware devices, such as disk drives and network cards, and logical items such as IP addresses, network names, and application components.</p>
<p>Clusters also include common resources, such as external data storage arrays and private cluster networks. Common resources are accessible by each node in the cluster. One common resource is the quorum resource, which plays a critical role in cluster operations. The quorum resource must be accessible for all node operations, including forming, joining or modifying a cluster.</p>
<p><strong>Server Clusters</strong><br />
Windows Server 2003 Enterprise Edition provides two types of cluster technologies for use with Exchange Server 2003 Enterprise Edition. The first is Cluster services, which provide failover support for back-end mailbox servers that require a high level of availability. The second is Network Load Balancing (NLB), which complements server clusters by supporting highly available and scalable clusters of front-end Exchange protocol virtual servers (for example, HTTP, IMAP4, and POP3).</p>
<p>Server clusters use a shared-nothing model. Model types define how servers in a cluster manage and use local and common cluster devices and resources. In the shared-nothing cluster, each server owns and manages its local devices. Devices common to the cluster, such as common disk arrays and connection media, are selectively owned and managed by only one node at a time.</p>
<p>Server clusters use standard Windows drivers to connect to local storage devices and media. Server clusters support multiple connection media for the external common devices, which must be accessible by all servers in the cluster. External storage devices support standard PCI–based SCSI connections, SCSI over Fibre Channel, and SCSI bus with multiple initiators. Fibre connections are SCSI devices that are hosted on a Fibre Channel bus, instead of on a SCSI bus.</p>
<p>The following figure illustrates components of a two-node server cluster, which is comprised of servers running Windows Server 2003 Enterprise Edition, with shared storage device connections using SCSI or SCSI over Fibre Channel.<br />
<strong>Sample two-node Windows cluster</strong><br />
<img src="http://i.technet.microsoft.com/Bb124225.fe1e275f-ae17-433d-a305-14dd3a8c405a%28en-us,EXCHG.65%29.gif" alt="Bb124225.fe1e275f-ae17-433d-a305-14dd3a8c405a(en-us,EXCHG.65).gif" /><span> </span></p>
<div>
<div><strong>Server Cluster Architecture</strong></div>
<p>Server clusters are designed as separate, isolated sets of components, which work closely together with Windows Server 2003. Modifications to the operating system are enabled when the Cluster service is installed. These modifications include the following:</p>
<ul>
<li> Support for dynamic creation and deletion of network names and addresses</li>
<li> Modifications to the file system, to enable closing open files during disk drive dismounts</li>
<li> Modifications to the storage subsystem, to enable sharing disks and volumes among multiple nodes</li>
</ul>
<p>Apart from these and other minor modifications, a server running the Windows Cluster service runs identically to a server that is not running the Windows Cluster service.</p>
<p>Cluster service is at the core of server clusters. Cluster service is comprised of multiple functional units, including Node Manager, Failover Manager, Database Manager, Global Update Manager, Checkpoint Manager, Log Manager, Event Log Replication Manager, and Backup/Restore Manager.</p></div>
<p><span> </span></p>
<div><strong>Cluster Service Components</strong></div>
<p>The Cluster service runs on Windows Server 2003 Enterprise Edition, using network drivers, device drivers, and resource instrumentation processes specifically designed for server clusters and their component processes. The Cluster service includes the following components:</p>
<ul>
<li> <strong>Checkpoint Manager</strong> This component saves application registry keys in a cluster directory stored on the quorum resource. To make sure that the Cluster service can recover from a resource failure, Checkpoint Manager checks registry keys when a resource is brought online and writes checkpoint data to the quorum resource when a resource goes offline. Checkpoint Manager also supports resources with application-specific registry trees that are instantiated at the cluster node, where the resource comes online. A resource can have one or more registry trees associated with it. When the resource is online, Checkpoint Manager monitors changes to these registry trees. If Checkpoint Manager detects changes, it transfers the registry tree to the owner node of the resource. Checkpoint Manager then transfers the file to the owner node of the quorum resource. Checkpoint Manager performs batch transfers, so that frequent changes to registry trees do not place too heavy a load on the Cluster service.</li>
<li> <strong>Database Manager</strong> Database Manager maintains cluster configuration information about all physical and logical entities in a cluster. These entities include the cluster itself, cluster node membership, resource groups, resource types, and descriptions of specific resources, such as disks and IP addresses.<br />
Persistent and volatile information stored in the configuration database tracks the current and desired state of a cluster. Each instance of Database Manager running on each node in the cluster cooperates to maintain consistent configuration information across the cluster and to ensure consistency of the configuration database copies on all nodes.<br />
Database Manager also provides an interface for use by other Cluster components, such as Failover Manager and Node Manager. This interface is similar to the registry interface of Microsoft Win32 APIs. However, the Database Manager interface writes changes made to cluster entities in both the registry and in the quorum resource.<br />
Database Manager supports transactional updates of the cluster registry hive and only presents interfaces to internal Cluster service components. Failover Manager and Node Manager typically use this transactional support to get replicated transactions. The Cluster API presents all Database Manager functions to clients, with the exception of transactional support functions. For additional information on the Cluster API, see <a href="http://go.microsoft.com/fwlink/?LinkId=33142" target="_blank"> Cluster API</a> on MSDN.</li>
</ul>
<table border="0">
<tbody>
<tr>
<th>Note:</th>
</tr>
<tr>
<td>The application registry key data and changes are recorded by Checkpoint Manager in quorum log files, in the quorum resource.</td>
</tr>
</tbody>
</table>
<p><span> </span></p>
<div>
<div style="display: block;">
<ul>
<li> <strong>Event Service</strong> Event Service serves as a switchboard, sending events to and from applications, and to the Cluster service components on each node. The Event Processor component of the Event Service helps Cluster service components to disseminate information about important events to all other components. The Event Processor component supports the Cluster API event mechanism. It also performs miscellaneous services, such as delivering signal events to cluster-aware applications and maintaining cluster objects.</li>
<li> <strong>Event Log Replication Manager</strong> The Event Log Replication Manager replicates event log entries from one node to all other nodes in the cluster. By default, the Cluster service interacts with the Windows Event Log service in the cluster to replicate event log entries to all cluster nodes. When the Cluster service starts on the node, it invokes a private API in the local Event Log service and requests that the Event Log service bind to the Cluster service. The Event Log service then binds to the CLUSAPI interface by using local remote procedure calls (RPCs). When the Event Log service receives an event to be logged, it logs it locally, drops the event into a persistent batch queue, and schedules a timer thread to run within the next 20 seconds, if there is no timer thread that is active already. When the timer threads fires, it processes the batch queue and sends the events, as one consolidated buffer, to the Cluster API interface, where the Event Log service was previously bound. The Cluster API interface then sends the event to the Cluster service.<br />
After the Cluster service receives batched events from the Event Log service, it drops the events into a local outgoing queue and returns from the RPC. The event broadcaster thread, in the Cluster service, then processes this queue and sends the events, using the intra-cluster RPC, to all active cluster nodes. The server side API then drops the events into an incoming queue. An event log writer thread then processes this queue and requests, through a private RPC, that the local Event Log service write the events locally.<br />
The Cluster service uses lightweight remote procedure call (LRPC) to invoke the Event Log service&#8217;s private RPC interfaces. The Event Log service also uses LRPCs to invoke the Cluster API interface and then request that the Cluster service replicate events.</li>
<li> <strong>Failover Manager</strong> Failover Manager performs resource management and initiates appropriate actions, such as startup, restart, and failover. Failover Manager stops and starts resources, manages resource dependencies, and initiates failover of resource groups. To perform these actions, Failover Manager receives resource and system state information from Resource Monitors and cluster nodes.<br />
Failover Manager also decides which nodes in the cluster should own which resource group. When resource group arbitration finishes, nodes that own an individual resource group return control of the resources in the resource group to Node Manager. If a node cannot handle a failure of one of its resource groups, Failover Managers on each node work together to reassign ownership of the resource group.<br />
If a resource fails, Failover Manager restarts the resource or takes the resource offline together with its dependent resources. If Failover Manager takes the resource offline, it indicates that the ownership of the resource will be moved to another node. The resource is then restarted, under the ownership of the new node. This is referred to as failover, as explained in the section &#8220;Cluster Failover&#8221; later in this topic.</li>
<li> <strong>Global Update Manager</strong> Global Update Manager provides the global update service that is used by cluster components. Global Update Manager is used by internal cluster components, such as Failover Manager, Node Manager, and Database Manager, to replicate changes to the cluster database across nodes. Global Update Manager updates are typically initiated as a result of a Cluster API call. When a Global Update Manager update is initiated at a client node, it first requests a locker node to obtain a global lock. If the lock is not available, the client waits for one to become available.<br />
When the lock is available, the locker grants the lock to the client, and issues the update locally (on the locker node). The client then issues the update to all other healthy nodes, including itself. If an update succeeds on the locker, but fails on some other node, that node will be removed from the current cluster membership. If the update fails on the locker node itself, the locker merely returns the failure to the client.</li>
<li> <strong>Log Manager</strong> Log Manager writes changes to recovery logs that are stored on the quorum resource. Log Manager, together with Checkpoint Manager, ensures that the recovery log on the quorum resource contains the most recent configuration data and change checkpoints. If one or more cluster nodes are down, configuration changes can still be made to the remaining nodes. While these nodes are down, Database Manager uses Log Manager to log configuration changes to the quorum resource.<br />
When failed nodes return to service, they read the location of the quorum resource from their local cluster registry hives. Because the hive data could be stale, mechanisms are in place to detect invalid quorum resources read from a stale cluster configuration database. Database Manager then requests that Log Manager update the local copy of the cluster hive, using the checkpoint file in the quorum resource. The log file is then replayed in the quorum disk, starting from the checkpoint log sequence number. The result is a completely updated cluster hive. Cluster hive snapshots are taken whenever the quorum log is reset and once every four hours.</li>
<li> <strong>Membership Manager </strong>Membership Manager monitors cluster membership and the health of all nodes in the cluster. Membership Manager (also referred to as the Regroup Engine) maintains a consistent view of which cluster nodes are currently up or down. The core of the Membership Manager component is a regroup algorithm that is invoked whenever there is evidence that one or more nodes failed. At the completion of the algorithm, all participating nodes reach identical conclusions on the new cluster membership.</li>
<li> <strong>Node Manager</strong> Node Manager assigns resource group ownership to nodes, based on group preference lists and node availability. Node Manager runs on each node and maintains a local list of nodes that belong to the cluster. Periodically, Node Manager sends messages, named heartbeats, to its counterparts running on other nodes in the cluster to detect node failures. All nodes in the cluster must have exactly the same view of cluster membership.<br />
If a cluster node detects a communication failure with another cluster node, it transmits a multicast message to the entire cluster. This regroup event causes all members to verify their view of the current cluster membership. During the regroup event, the Cluster service prevents write operations to any disk devices common to all nodes in the cluster, until the membership stabilizes. If an instance of Node Manager on an individual node does not respond, the node is removed from the cluster, and its active resource groups are moved to another active node. To make this change, Node Manager identifies possible owners (nodes) that may own individual resources and the node on which a resource group prefers to run. Node Manager then selects the node and moves the resource group. In a two-node cluster, Node Manager simply moves resource groups from a failed node to the remaining node. In a cluster comprised of three or more nodes, Node Manager selectively distributes resource groups among the remaining nodes.<br />
Node Manager also acts as a gatekeeper, allowing joiner nodes into the cluster and processing requests to add or evict a node.</li>
<li> <strong>Resource Monitor</strong> Resource Monitor verifies the health of each cluster resource by using callbacks to resource DLLs. Resource Monitors run a separate process and communicate with Cluster Server through RPCs. This protects the Cluster service from failures of individual cluster resources.<br />
Resource Monitors provide the communication interface between resource DLLs and the Cluster service. When the Cluster service must obtain data from a resource, Resource Monitor receives the request and forwards it to the appropriate resource DLL. Conversely, when a resource DLL must report its status or notify the Cluster service of an event, Resource Monitor forwards the information from the resource to the Cluster service.<br />
The Resource Monitor process (RESRCMON.EXE), is a child process of the Cluster service process (CLUSSVC.EXE). Resource Monitor loads resource DLLs that monitor cluster resources in its process space. Loading the resource DLLs in a process separate from the Cluster service process helps to isolate faults. Multiple Resource Monitors can be instantiated at the same time.<br />
Each Resource Monitor functions as an LRPC server for the Cluster service process. When the Cluster service receives a Cluster API call that requires talking to a resource DLL, it uses the LRPC interface to invoke the Resource Monitor RPC. To receive responses from Resource Monitor, the Cluster service creates one notification thread per Resource Monitor process. This notification thread invokes an RPC that is located permanently in Resource Monitor. The thread acquires notifications when they are generated. The thread is released only when Resource Monitor fails or when the thread is manually stopped by a shutdown command from the Cluster service.<br />
Resource Monitor does not maintain a persistent state on its own. It retains a limited, in-memory state of the resources, but all of its initial state information is supplied by the Cluster service. Resource Monitor communicates with the resource DLLs through well-defined entry points that the DLLs must present. Resource Monitor completes the following operations on its own:</p>
<ul>
<li> It polls resource DLLs through the IsAlive and LooksAlive entry points, alternately checking failure events signaled by resource DLLs.</li>
<li> To monitor pending timeouts of resource DLLs, it spawns timer threads that return ERROR_IO_PENDING from the DLL&#8217;s Online or Offline entry points.</li>
<li> It detects crashes of the Cluster service and shuts down the resources.</li>
</ul>
<p>Its other actions occur as a result of operations requested by the Cluster service through the RPC interface. No hang detection is perfomed by the Cluster service. The Cluster service does, however, monitor crashes, and it restarts a monitor if it detects a process crash.<br />
The Cluster service and Resource Monitor process share a memory-mapped section backed by the paging file. The handle to the section is passed to Resource Monitor at Resource Monitor startup. Resource Monitor then duplicates the handle and records the entry point number and resource name into this section immediately before calling a resource DLL entry point. If Resource Monitor crashes, the Cluster service reads the shared section to detect the resource and the entry point that caused the crash.</li>
<li> <strong>Backup/Restore Manager</strong> Backup/Restore Manager works with Failover Manager and Database Manager to back up or restore the quorum log file and all checkpoint files. The Cluster service uses the BackupClusterDatabase API for database backup. First, the BackupClusterDatabase API contacts the Failover Manager layer. The Failover Manager layer forwards the request to the node that currently owns the quorum resource. That node then invokes Database Manager, which makes a backup of the quorum log file and all checkpoint files.<br />
The Cluster service also registers itself at startup as a backup writer with Volume Shadow Copy service. When a backup client invokes the Volume Shadow Copy service to perform a system state backup, it also invokes the Cluster service, through a series of entry point calls, to perform the cluster database backup. The server code in the Cluster service invokes the Failover Manager to perform the backup, and the rest of the operation occurs via the BackupClusterDatabase API.<br />
The Cluster service uses the RestoreClusterDatabase API to restore the cluster database from a backup path. This API can only be invoked locally from one of the cluster nodes. When the RestoreClusterDatabase API is invoked, it stops the Cluster service, restores the cluster database from the backup, sets a registry value that contains the backup path, and then re-starts the Cluster service. On startup, the Cluster service detects that a restore is requested and restores the cluster database from the backup path to the quorum resource.</li>
</ul>
</div>
</div>
<p><span> </span></p>
<div><strong>Cluster Failover</strong></div>
<p>Failover can occur automatically because of an unplanned hardware or software failure, or it can occur as the result of manual initiation by an administrator. The algorithm and behavior in both situations is almost identical. However, in a manually initiated failover, resources are shut down in an orderly way; whereas in unplanned failovers, resources are shut down in a sudden and disruptive way (for example, the power goes out, or a crucial hardware component fails).</p>
<p>When an entire node in a cluster fails, its resource groups transfer to one or more available nodes in the cluster. Automatic failover is similar to planned administrative reassignment of resource ownership. However, it is more complicated, because the orderly steps of a planned shutdown might be interrupted or might not have occurred at all. Therefore, extra steps are required to evaluate the state of the cluster at the time of failure.</p>
<p>When your network experiences an automatic failover, it is important to determine what groups were running on the failed node and which nodes should take ownership of the various resource groups. All nodes in the cluster that are capable of hosting the resource groups negotiate for ownership. This negotiation is based on node capabilities, current load, application feedback, the node preference list, or the use of the AntiAffinityClassNames property, which is discussed in the <a href="http://technet.microsoft.com/en-us/library/bb124140%28EXCHG.65%29.aspx"> Cluster-Specific Configurations</a>. When negotiation of the resource group is completed, all nodes in the cluster update their databases and track which node owns the resource group.</p>
<p>In clusters with more than two nodes, the node preference list for each resource group can specify a preferred server, plus one or more prioritized alternatives. This enables cascading failover, in which a resource group can survive multiple server failures, each time cascading, or failing over to the next server on its node preference list.</p>
<p>An alternative to automatic failover, is commonly called N+I failover. This method establishes the node preference lists for all cluster groups. The node preference list identifies the standby cluster nodes, to which resources are moved at the first failover. The standby nodes are servers in the cluster that are mostly idle or that have workloads that can be easily pre-empted if a failed server&#8217;s workload must be moved to the standby node.</p>
<p>Cascading failover assumes that every other server in the cluster has some excess capacity and can absorb a portion of any other failed server&#8217;s workload. N+I failover assumes, that the +I standby servers are the primary recipients of excess capacity.</p>
<p><span> </span></p>
<div><strong>Cluster Failback</strong></div>
<p>When a node comes back online, Failover Manager can decide to move one or more resource groups back to the recovered node. This is referred to as failback. The properties of a resource group must have a preferred owner defined to fail back to a recovered or restarted node. Resource groups for which the recovered or restarted node is the preferred owner are moved from the current owner to the recovered or restarted node.</p>
<p>Failback properties of a resource group can include the hours of the day during which failback is allowed and a limit on the number of times failback is attempted. This enables the Cluster service to prevent failback of resource groups during peak processing times or to nodes that have not been correctly recovered or restarted.</p>
<p><span> </span></p>
<div>
<div><strong>Cluster Quorum</strong></div>
<div>Each cluster has a special resource referred to as the quorum resource. A quorum resource can be any resource that does the following:</p>
<ul>
<li> Provides a means for arbitration leading to membership and cluster state decisions</li>
<li> Provides physical storage to hold configuration information</li>
</ul>
<p>A quorum log is a configuration database for the entire server cluster. The quorum log contains cluster configuration information, such as the servers that are part of the cluster, the resources that are installed in the cluster, and the state of those resources (for example, online or offline).</p>
<p>The quorum is important in a cluster for the following two reasons:</p>
<ul>
<li> <strong>Consistency</strong> A cluster is made up of multiple physical servers acting as a single virtual server. It is critical that each of the physical servers has a consistent view of the cluster configuration. The quorum acts as the definitive repository for all configuration information relating to the cluster. If the Cluster service is unable to access and read the quorum, it cannot start.</li>
<li> <strong>Tie-breaking</strong> The quorum is used as a tie-breaker to avoid split-cluster scenarios. A split-cluster scenario occurs when all network communication links between two or more cluster nodes fail. If this occurs, the cluster might be split into two or more partitions that cannot communicate with each other. The quorum ensures that cluster resources are brought online on one node only. It does this by allowing the partition that owns the quorum to continue, while the other partitions are evicted from the cluster.</li>
</ul>
</div>
</div>
<p><span> </span></p>
<div>
<div><strong>Standard Quorum</strong></div>
<p>As mentioned earlier in this section, the quorum is a configuration database for the Cluster service that is stored in the quorum log file. A standard quorum uses a quorum log file, located on a disk hosted in the shared storage array, which is accessible by all members of the cluster.</p>
<p>Each member connects to the shared storage using SCSI or Fibre Channel. Storage is made up of external hard disks (usually configured as RAID disks) or a SAN, in which logical slices of the SAN are presented as physical disks.</p></div>
<table style="height: 112px;" border="0" width="594">
<tbody>
<tr>
<th>Note:</th>
</tr>
<tr>
<td>It is important that the quorum uses a physical disk resource, rather than a disk partition, because the entire physical disk resource is moved during failover. Furthermore, it is possible to configure server clusters to use the local hard disk on a server to store the quorum. This type of implementation, referred to as a lone wolf cluster, is supported only for testing and development purposes. Lone wolf clusters should not be used to cluster Exchange 2003 in a production environment because, being singular, they are incapable of providing failover.</td>
</tr>
</tbody>
</table>
<p><span> </span></p>
<div>
<div><strong>Majority Node Set Quorums</strong></div>
<div>From a server cluster perspective, a majority node set (MNS) quorum is a single quorum resource. The data is stored by default on the local disk of each node in the cluster. The MNS resource makes sure that the cluster configuration data, stored on the MNS resource, is consistent across different disks. The MNS implementation provided in Windows Server 2003 uses a directory on each node&#8217;s local disk to store the quorum data. If the configuration of the cluster changes, that change is reflected across each node&#8217;s local disk. The change is considered committed, or made persistent, only if the change is made to: (Number of nodes/2) + 1.</p>
<p>The MNS quorum makes sure that most nodes have an up-to-date copy of the data. The Cluster service starts up and brings resources online only if a majority of the nodes that are configured as part of the cluster are up and are running the Cluster service. If the MNS quorum determines that a majority does not exist, the cluster is considered not to have quorum, and the Cluster service waits in a restart loop until more nodes try to join. When a majority or quorum of nodes is available, the Cluster service starts and brings the resources online. Because the up-to-date configuration is written to a majority of the nodes, regardless of node failures, the cluster always guarantees that it has the most current configuration at startup.</p>
<p>If a cluster failure occurs, or if the cluster somehow enters a split-cluster scenario, all partitions that do not contain a majority of nodes are taken offline. This ensures that if there is a partition running that contains a majority of the nodes, it can safely start any resources that are not running on that partition, because it is the only partition in the cluster that is running resources.</p>
<p>Because of the differences in the way the shared disk quorum clusters behave compared to MNS quorum clusters, you must consider carefully when deciding which model to use. For example, if you have only two nodes in your cluster, the MNS model is not recommended. In this instance, failure of one node leads to failure of the entire cluster, because a majority of nodes is impossible.</p>
<p>Majority node set (MNS) quorums are available in Windows Server 2003 Enterprise Edition and Windows Server 2003 Datacenter Edition clusters. The only benefit that MNS clusters provide for Exchange clusters is to eliminate the need for a dedicated disk in the shared storage array on which to store the quorum resource.</p></div>
</div>
<p><span> </span></p>
<div>
<div><strong>Cluster Resources</strong></div>
<div>The Cluster service manages all resource objects using Resource Monitors and resource DLLs. The Resource Monitor interface provides a standard communication interface that enables the Cluster service to initiate resource management commands and obtain resource status data. The Resource Monitor obtains actual command functions and data through resource DLLs. The cluster Service uses resource DLLs to bring resources online, manage their interaction with other resources in the cluster, and monitor their health.To enable resource management, a resource DLL uses a few simple resource interfaces and properties. Resource Monitor loads a particular resource DLL in its address space, as privileged code running under the SYSTEM account. The SYSTEM account (that is, LocalSystem), is a security principal account that represents the operating system. The Cluster service, which runs under a user security context, uses the SYSTEM account to perform security functions within the operating system.</p>
<p>When resources depend on the availability of other resources to function, these dependencies can be defined by the resource DLL. When a resource is dependent on other resources, the Cluster service brings the dependent resource online only after it brings the resources on which it depends online in the correct sequence.</p>
<p>Resources are taken offline in a similar manner. The Cluster service takes resources offline only after any dependent resources are taken offline. This prevents introducing circular dependencies when loading resources.</p>
<p>Each resource DLL can also define the type of computer and device connection required by the resource. For example, a disk resource may require ownership only by a node that is physically connected to the disk device. Local restart policies and desired actions during failover events can also be defined in the resource DLL.</p></div>
</div>
<p><span> </span></p>
<div>
<div><strong>Cluster Administration</strong></div>
<div>Clusters are managed using Cluster Administrator. Cluster Administrator is a graphical administrator&#8217;s tool that enables the Cluster.exe command line tool to perform maintenance, monitoring, and failover administration. Server clusters also provide an automation interface. This interface can be used to create custom scripting tools for administering cluster resources, nodes, and the cluster itself. Applications and administration tools, such as Cluster Administrator, can access this interface using RPCs, whether the tool is running on a node in the cluster or on an external computer.</div>
</div>
<p><span> </span></p>
<div>
<div><strong>Cluster Formation and Operation</strong></div>
<div>When the Cluster service is installed and running on a server, the server can participate in a cluster. Cluster operations reduce single points of failure and enable high availability of clustered resources. The following sections briefly describe node behavior during cluster creation and operation.</div>
</div>
<p><span> </span></p>
<div>
<div><strong>Creating a Cluster</strong></div>
<div>Server clusters include a cluster installation utility that is used to install the cluster software on a server and create a new cluster. To create a new cluster, the utility is run on the computer selected as the first member of the cluster. This first step defines the new cluster by establishing a cluster name, and creating the cluster database and initial cluster membership list.The next step in creating a cluster is to add the common data storage devices that will be available to all members of the cluster. This establishes the new cluster with a single node and its own local data storage devices and cluster common resources (generally disk or data storage and connection media resources).</p>
<p>The final step in creating a cluster is to run the installation utility on each additional computer that will be a member of the cluster. As each new node is added to the cluster, it automatically receives a copy of the existing cluster database from the original member of the cluster. When a node joins or forms a cluster, the Cluster service updates the node&#8217;s private copy of the configuration database.</p></div>
</div>
<p><span> </span></p>
<div>
<div><strong>Forming a Cluster</strong></div>
<div>A server can form a cluster if it is running the Cluster service and cannot locate other nodes in the cluster. To form the cluster, a node must be able to acquire exclusive ownership of the quorum resource.When a cluster is formed, the first node in the cluster contains the cluster configuration database. As each additional node joins the cluster, it receives and maintains its own local copy of the cluster configuration database. The quorum resource stores the most current version of the configuration database as recovery logs. The logs contain node-independent cluster configuration and state data.</p>
<p>During cluster operations, the Cluster service uses the quorum recovery logs to do the following:</p>
<ul>
<li> Guarantee that only one set of active nodes is allowed to form a cluster</li>
<li> Enable a node to form a cluster only if it can gain control of the quorum resource</li>
<li> Allow a node to join or remain in an existing cluster only if it can communicate with the node that controls the quorum resource</li>
</ul>
<p>When a cluster is formed, each node in the cluster can be in one of three distinct states. These states are recorded by Event Processor (described below) and replicated by Event Log Manager to other nodes in the cluster. The three Cluster service states are as follows:</p>
<ul>
<li> <strong>Offline</strong> The node is not an active member of the cluster. The node and its Cluster service might or might not be running.</li>
<li> <strong>Online</strong> The node is an active member of the cluster. It adheres to cluster database updates, contributes input into the quorum algorithm, maintains cluster network and storage heartbeats, and can own and run resource groups.</li>
<li> <strong>Paused</strong> The node is an active member of the cluster. The node adheres to cluster database updates, contributes input into the quorum algorithm, and maintains network and storage heartbeats, but it cannot accept resource groups. It can support only those resource groups that it currently owns. The paused state enables maintenance to be performed. Online and paused states are treated as equivalent states by the majority of the server cluster components.</li>
</ul>
</div>
</div>
<p><span> </span></p>
<div>
<div><strong>Joining a Cluster</strong></div>
<div>To join an existing cluster, a server must be running the Cluster service, and it must successfully locate another node in the cluster. After finding another node in the cluster, the joining server must be authenticated for membership in the cluster and must receive a replicated copy of the cluster configuration database.The process of joining an existing cluster begins when Windows Service Control Manager starts the Cluster service on the node. During the startup process, the Cluster service configures and mounts the node&#8217;s local data devices. It does not attempt to bring the common cluster data devices online as nodes, because the existing cluster might be using the devices.</p>
<p>To locate other nodes, a discovery process is started. When the node discovers any member of the cluster, it performs an authentication sequence. The first cluster member authenticates the new node and returns a status of success if the new node is successfully authenticated. If authentication is not successful, as when a joining node is not recognized as a cluster member or has an invalid account password, the request to join the cluster is denied.</p>
<p>After successful authentication, the first node online in the cluster checks the copy of the configuration database of the joining node. If it is out-of-date, the cluster node sends the joining server an updated copy of the database. After receiving the replicated database, the node joining the cluster can use it to find shared resources and bring them online as needed.</p></div>
</div>
<p><span> </span></p>
<div>
<div><strong>Leaving a Cluster</strong></div>
<div>A node can leave a cluster when it shuts down or when the Cluster service is stopped. However, a node can also be evicted from a cluster when the node fails to perform cluster operations (such as failure to commit an update to the cluster configuration database).When a node leaves a cluster, as in a planned shutdown, it sends a ClusterExit message to all other members of the cluster, notifying them that it is leaving. The node does not wait for any responses and immediately proceeds to shut down resources and close all cluster connections. Because the remaining nodes receive this exit message, they do not perform the regroup process to reestablish cluster membership that occurs when a node unexpectedly fails or network communications stop.</div>
</div>
<p><span> </span></p>
<div>
<div><strong>Failure Detection</strong></div>
<div>Failure detection and prevention are key benefits provided by server clusters. When a node or application in a cluster fails, server clusters can respond by restarting the failed application or distributing the work from the failed system to remaining nodes in the cluster. Server cluster failure detection and prevention include bi-directional failover, application failover, parallel recovery, and automatic failback.When the Cluster service detects failures of individual resources or an entire node, it dynamically moves and restarts application, data, and file resources on an available, healthy server in the cluster. This allows resources such as database, file shares, and applications to remain highly available to users and to client applications.</p>
<p>Server clusters are designed with two different failure detection mechanisms:</p>
<ul>
<li> <strong>Heartbeats for detecting node failures</strong> Periodically, each node exchanges user datagram protocol-based messages with other nodes in the cluster over the private cluster network. These messages are referred to as the heartbeat. The heartbeat exchange enables each node to check the availability of other nodes and their resources. If a server fails to respond to a heartbeat exchange, the surviving servers initiate failover processes, including ownership arbitration for resources and applications owned by the failed server. Arbitration is performed using a challenge and defense protocol. The node that appears to have failed is given a time window to demonstrate, in any one of several ways, that it is still running correctly and can communicate with the surviving nodes. If the node is unable to respond, it is removed from the cluster. Failure to respond to a heartbeat message is caused by several events, such as computer failure, network interface failure, network failure, or even periods of unusually high activity. Typically, when all nodes are communicating, the Configuration Database Manager sends global configuration database updates to each node. When a heartbeat exchange failure occurs, Log Manager saves configuration database changes to the quorum resource. This ensures that remaining nodes can access the most recent cluster configuration and local node registry data during the recovery processes.<br />
The failure detection algorithm is very conservative. If the cause of the heartbeat response failure is temporary, it is best to avoid the potential disruption a failover might cause. However, there is no way to know whether the node will respond in another millisecond, or if it suffered a catastrophic failure. Therefore, a failover is initiated after a timeout period.</li>
<li> <strong>Resource Monitor and resource DLLs for detecting resource failures</strong> Failover Manager and Resource Monitor work together to detect and recover from resource failures. Resource Monitors keep track of resource status by using the resource DLLs to periodically poll resources. Polling involves two steps, a cursory LooksAlive query and a longer, more definitive, IsAlive query. When Resource Monitor detects a resource failure, it notifies Failover Manager and continues to monitor the resource.<br />
Failover Manager maintains resources and resource group status. It also performs recovery when a resource fails and invokes Resource Monitors in response to user actions or failures.<br />
After a resource failure is detected, Failover Manager performs recovery actions that include restarting a resource and its dependent resources, or moving the entire resource group to another node. The recovery action that is taken is determined by resource and resource group properties, in addition to node availability.<br />
During failover, the resource group is treated as the unit of failover. This ensures that resource dependencies are correctly recovered. When a resource recovers from a failure, Resource Monitor notifies Failover Manager. Failover Manager then performs automatic failback of the resource group, based on the configuration of the resource group failback properties.</li>
</ul>
</div>
</div>
<p><strong>Microsoft Cluster Server </strong>(MSCS) in Microsoft Windows NT Server 4.0 Enterprise Edition was the first server cluster technology offered by Microsoft. Individual servers that compose a cluster are referred to as nodes. A Cluster service is a collection of components on each node that perform cluster-specific tasks. Hardware and software components in the cluster that are managed by the Cluster service are referred to as resources. Server clusters provide the instrumentation mechanism for managing resources through resource DLLs, which define resource abstractions (in other words, they abstract a clustered resource from a specific physical node, enabling the resource to move from one node to another), communication interfaces, and management operations.</p>
<p>Resources are elements in a cluster that are:</p>
<ul>
<li> Brought online (in service) and taken offline (out of service)</li>
<li> Managed in a server cluster</li>
<li> Owned by only one node at a time</li>
</ul>
<p>A resource group is a collection of resources, managed by the Cluster service as a single, logical unit. This logical unit is often referred to as a failover unit, because the entire group moves as a single unit between nodes. Resources and cluster elements are grouped logically according to the resources added to a resource group. When a Cluster service operation is performed on a resource group, the operation affects all individual resources contained in the group. Typically, a resource group is created that contains the individual resources required by the clustered program.</p>
<p>Cluster resources may include physical hardware devices, such as disk drives and network cards, and logical items such as IP addresses, network names, and application components.</p>
<p>Clusters also include common resources, such as external data storage arrays and private cluster networks. Common resources are accessible by each node in the cluster. One common resource is the quorum resource, which plays a critical role in cluster operations. The quorum resource must be accessible for all node operations, including forming, joining or modifying a cluster.</p>
<p><strong>Server Clusters</strong><br />
Windows Server 2003 Enterprise Edition provides two types of cluster technologies for use with Exchange Server 2003 Enterprise Edition. The first is Cluster services, which provide failover support for back-end mailbox servers that require a high level of availability. The second is Network Load Balancing (NLB), which complements server clusters by supporting highly available and scalable clusters of front-end Exchange protocol virtual servers (for example, HTTP, IMAP4, and POP3).</p>
<p>Server clusters use a shared-nothing model. Model types define how servers in a cluster manage and use local and common cluster devices and resources. In the shared-nothing cluster, each server owns and manages its local devices. Devices common to the cluster, such as common disk arrays and connection media, are selectively owned and managed by only one node at a time.</p>
<p>Server clusters use standard Windows drivers to connect to local storage devices and media. Server clusters support multiple connection media for the external common devices, which must be accessible by all servers in the cluster. External storage devices support standard PCI–based SCSI connections, SCSI over Fibre Channel, and SCSI bus with multiple initiators. Fibre connections are SCSI devices that are hosted on a Fibre Channel bus, instead of on a SCSI bus.</p>
<p>The following figure illustrates components of a two-node server cluster, which is comprised of servers running Windows Server 2003 Enterprise Edition, with shared storage device connections using SCSI or SCSI over Fibre Channel.<br />
<strong>Sample two-node Windows cluster</strong><br />
<img src="http://i.technet.microsoft.com/Bb124225.fe1e275f-ae17-433d-a305-14dd3a8c405a%28en-us,EXCHG.65%29.gif" alt="Bb124225.fe1e275f-ae17-433d-a305-14dd3a8c405a(en-us,EXCHG.65).gif" /><span> </span></p>
<div>
<div><strong>Server Cluster Architecture</strong></div>
<p>Server clusters are designed as separate, isolated sets of components, which work closely together with Windows Server 2003. Modifications to the operating system are enabled when the Cluster service is installed. These modifications include the following:</p>
<ul>
<li> Support for dynamic creation and deletion of network names and addresses</li>
<li> Modifications to the file system, to enable closing open files during disk drive dismounts</li>
<li> Modifications to the storage subsystem, to enable sharing disks and volumes among multiple nodes</li>
</ul>
<p>Apart from these and other minor modifications, a server running the Windows Cluster service runs identically to a server that is not running the Windows Cluster service.</p>
<p>Cluster service is at the core of server clusters. Cluster service is comprised of multiple functional units, including Node Manager, Failover Manager, Database Manager, Global Update Manager, Checkpoint Manager, Log Manager, Event Log Replication Manager, and Backup/Restore Manager.</p></div>
<p><span> </span></p>
<div><strong>Cluster Service Components</strong></div>
<p>The Cluster service runs on Windows Server 2003 Enterprise Edition, using network drivers, device drivers, and resource instrumentation processes specifically designed for server clusters and their component processes. The Cluster service includes the following components:</p>
<ul>
<li> <strong>Checkpoint Manager</strong> This component saves application registry keys in a cluster directory stored on the quorum resource. To make sure that the Cluster service can recover from a resource failure, Checkpoint Manager checks registry keys when a resource is brought online and writes checkpoint data to the quorum resource when a resource goes offline. Checkpoint Manager also supports resources with application-specific registry trees that are instantiated at the cluster node, where the resource comes online. A resource can have one or more registry trees associated with it. When the resource is online, Checkpoint Manager monitors changes to these registry trees. If Checkpoint Manager detects changes, it transfers the registry tree to the owner node of the resource. Checkpoint Manager then transfers the file to the owner node of the quorum resource. Checkpoint Manager performs batch transfers, so that frequent changes to registry trees do not place too heavy a load on the Cluster service.</li>
<li> <strong>Database Manager</strong> Database Manager maintains cluster configuration information about all physical and logical entities in a cluster. These entities include the cluster itself, cluster node membership, resource groups, resource types, and descriptions of specific resources, such as disks and IP addresses.<br />
Persistent and volatile information stored in the configuration database tracks the current and desired state of a cluster. Each instance of Database Manager running on each node in the cluster cooperates to maintain consistent configuration information across the cluster and to ensure consistency of the configuration database copies on all nodes.<br />
Database Manager also provides an interface for use by other Cluster components, such as Failover Manager and Node Manager. This interface is similar to the registry interface of Microsoft Win32 APIs. However, the Database Manager interface writes changes made to cluster entities in both the registry and in the quorum resource.<br />
Database Manager supports transactional updates of the cluster registry hive and only presents interfaces to internal Cluster service components. Failover Manager and Node Manager typically use this transactional support to get replicated transactions. The Cluster API presents all Database Manager functions to clients, with the exception of transactional support functions. For additional information on the Cluster API, see <a href="http://go.microsoft.com/fwlink/?LinkId=33142" target="_blank"> Cluster API</a> on MSDN.</li>
</ul>
<table border="0">
<tbody>
<tr>
<th>Note:</th>
</tr>
<tr>
<td>The application registry key data and changes are recorded by Checkpoint Manager in quorum log files, in the quorum resource.</td>
</tr>
</tbody>
</table>
<p><span> </span></p>
<div>
<div style="display: block;">
<ul>
<li> <strong>Event Service</strong> Event Service serves as a switchboard, sending events to and from applications, and to the Cluster service components on each node. The Event Processor component of the Event Service helps Cluster service components to disseminate information about important events to all other components. The Event Processor component supports the Cluster API event mechanism. It also performs miscellaneous services, such as delivering signal events to cluster-aware applications and maintaining cluster objects.</li>
<li> <strong>Event Log Replication Manager</strong> The Event Log Replication Manager replicates event log entries from one node to all other nodes in the cluster. By default, the Cluster service interacts with the Windows Event Log service in the cluster to replicate event log entries to all cluster nodes. When the Cluster service starts on the node, it invokes a private API in the local Event Log service and requests that the Event Log service bind to the Cluster service. The Event Log service then binds to the CLUSAPI interface by using local remote procedure calls (RPCs). When the Event Log service receives an event to be logged, it logs it locally, drops the event into a persistent batch queue, and schedules a timer thread to run within the next 20 seconds, if there is no timer thread that is active already. When the timer threads fires, it processes the batch queue and sends the events, as one consolidated buffer, to the Cluster API interface, where the Event Log service was previously bound. The Cluster API interface then sends the event to the Cluster service.<br />
After the Cluster service receives batched events from the Event Log service, it drops the events into a local outgoing queue and returns from the RPC. The event broadcaster thread, in the Cluster service, then processes this queue and sends the events, using the intra-cluster RPC, to all active cluster nodes. The server side API then drops the events into an incoming queue. An event log writer thread then processes this queue and requests, through a private RPC, that the local Event Log service write the events locally.<br />
The Cluster service uses lightweight remote procedure call (LRPC) to invoke the Event Log service&#8217;s private RPC interfaces. The Event Log service also uses LRPCs to invoke the Cluster API interface and then request that the Cluster service replicate events.</li>
<li> <strong>Failover Manager</strong> Failover Manager performs resource management and initiates appropriate actions, such as startup, restart, and failover. Failover Manager stops and starts resources, manages resource dependencies, and initiates failover of resource groups. To perform these actions, Failover Manager receives resource and system state information from Resource Monitors and cluster nodes.<br />
Failover Manager also decides which nodes in the cluster should own which resource group. When resource group arbitration finishes, nodes that own an individual resource group return control of the resources in the resource group to Node Manager. If a node cannot handle a failure of one of its resource groups, Failover Managers on each node work together to reassign ownership of the resource group.<br />
If a resource fails, Failover Manager restarts the resource or takes the resource offline together with its dependent resources. If Failover Manager takes the resource offline, it indicates that the ownership of the resource will be moved to another node. The resource is then restarted, under the ownership of the new node. This is referred to as failover, as explained in the section &#8220;Cluster Failover&#8221; later in this topic.</li>
<li> <strong>Global Update Manager</strong> Global Update Manager provides the global update service that is used by cluster components. Global Update Manager is used by internal cluster components, such as Failover Manager, Node Manager, and Database Manager, to replicate changes to the cluster database across nodes. Global Update Manager updates are typically initiated as a result of a Cluster API call. When a Global Update Manager update is initiated at a client node, it first requests a locker node to obtain a global lock. If the lock is not available, the client waits for one to become available.<br />
When the lock is available, the locker grants the lock to the client, and issues the update locally (on the locker node). The client then issues the update to all other healthy nodes, including itself. If an update succeeds on the locker, but fails on some other node, that node will be removed from the current cluster membership. If the update fails on the locker node itself, the locker merely returns the failure to the client.</li>
<li> <strong>Log Manager</strong> Log Manager writes changes to recovery logs that are stored on the quorum resource. Log Manager, together with Checkpoint Manager, ensures that the recovery log on the quorum resource contains the most recent configuration data and change checkpoints. If one or more cluster nodes are down, configuration changes can still be made to the remaining nodes. While these nodes are down, Database Manager uses Log Manager to log configuration changes to the quorum resource.<br />
When failed nodes return to service, they read the location of the quorum resource from their local cluster registry hives. Because the hive data could be stale, mechanisms are in place to detect invalid quorum resources read from a stale cluster configuration database. Database Manager then requests that Log Manager update the local copy of the cluster hive, using the checkpoint file in the quorum resource. The log file is then replayed in the quorum disk, starting from the checkpoint log sequence number. The result is a completely updated cluster hive. Cluster hive snapshots are taken whenever the quorum log is reset and once every four hours.</li>
<li> <strong>Membership Manager </strong>Membership Manager monitors cluster membership and the health of all nodes in the cluster. Membership Manager (also referred to as the Regroup Engine) maintains a consistent view of which cluster nodes are currently up or down. The core of the Membership Manager component is a regroup algorithm that is invoked whenever there is evidence that one or more nodes failed. At the completion of the algorithm, all participating nodes reach identical conclusions on the new cluster membership.</li>
<li> <strong>Node Manager</strong> Node Manager assigns resource group ownership to nodes, based on group preference lists and node availability. Node Manager runs on each node and maintains a local list of nodes that belong to the cluster. Periodically, Node Manager sends messages, named heartbeats, to its counterparts running on other nodes in the cluster to detect node failures. All nodes in the cluster must have exactly the same view of cluster membership.<br />
If a cluster node detects a communication failure with another cluster node, it transmits a multicast message to the entire cluster. This regroup event causes all members to verify their view of the current cluster membership. During the regroup event, the Cluster service prevents write operations to any disk devices common to all nodes in the cluster, until the membership stabilizes. If an instance of Node Manager on an individual node does not respond, the node is removed from the cluster, and its active resource groups are moved to another active node. To make this change, Node Manager identifies possible owners (nodes) that may own individual resources and the node on which a resource group prefers to run. Node Manager then selects the node and moves the resource group. In a two-node cluster, Node Manager simply moves resource groups from a failed node to the remaining node. In a cluster comprised of three or more nodes, Node Manager selectively distributes resource groups among the remaining nodes.<br />
Node Manager also acts as a gatekeeper, allowing joiner nodes into the cluster and processing requests to add or evict a node.</li>
<li> <strong>Resource Monitor</strong> Resource Monitor verifies the health of each cluster resource by using callbacks to resource DLLs. Resource Monitors run a separate process and communicate with Cluster Server through RPCs. This protects the Cluster service from failures of individual cluster resources.<br />
Resource Monitors provide the communication interface between resource DLLs and the Cluster service. When the Cluster service must obtain data from a resource, Resource Monitor receives the request and forwards it to the appropriate resource DLL. Conversely, when a resource DLL must report its status or notify the Cluster service of an event, Resource Monitor forwards the information from the resource to the Cluster service.<br />
The Resource Monitor process (RESRCMON.EXE), is a child process of the Cluster service process (CLUSSVC.EXE). Resource Monitor loads resource DLLs that monitor cluster resources in its process space. Loading the resource DLLs in a process separate from the Cluster service process helps to isolate faults. Multiple Resource Monitors can be instantiated at the same time.<br />
Each Resource Monitor functions as an LRPC server for the Cluster service process. When the Cluster service receives a Cluster API call that requires talking to a resource DLL, it uses the LRPC interface to invoke the Resource Monitor RPC. To receive responses from Resource Monitor, the Cluster service creates one notification thread per Resource Monitor process. This notification thread invokes an RPC that is located permanently in Resource Monitor. The thread acquires notifications when they are generated. The thread is released only when Resource Monitor fails or when the thread is manually stopped by a shutdown command from the Cluster service.<br />
Resource Monitor does not maintain a persistent state on its own. It retains a limited, in-memory state of the resources, but all of its initial state information is supplied by the Cluster service. Resource Monitor communicates with the resource DLLs through well-defined entry points that the DLLs must present. Resource Monitor completes the following operations on its own:</p>
<ul>
<li> It polls resource DLLs through the IsAlive and LooksAlive entry points, alternately checking failure events signaled by resource DLLs.</li>
<li> To monitor pending timeouts of resource DLLs, it spawns timer threads that return ERROR_IO_PENDING from the DLL&#8217;s Online or Offline entry points.</li>
<li> It detects crashes of the Cluster service and shuts down the resources.</li>
</ul>
<p>Its other actions occur as a result of operations requested by the Cluster service through the RPC interface. No hang detection is perfomed by the Cluster service. The Cluster service does, however, monitor crashes, and it restarts a monitor if it detects a process crash.<br />
The Cluster service and Resource Monitor process share a memory-mapped section backed by the paging file. The handle to the section is passed to Resource Monitor at Resource Monitor startup. Resource Monitor then duplicates the handle and records the entry point number and resource name into this section immediately before calling a resource DLL entry point. If Resource Monitor crashes, the Cluster service reads the shared section to detect the resource and the entry point that caused the crash.</li>
<li> <strong>Backup/Restore Manager</strong> Backup/Restore Manager works with Failover Manager and Database Manager to back up or restore the quorum log file and all checkpoint files. The Cluster service uses the BackupClusterDatabase API for database backup. First, the BackupClusterDatabase API contacts the Failover Manager layer. The Failover Manager layer forwards the request to the node that currently owns the quorum resource. That node then invokes Database Manager, which makes a backup of the quorum log file and all checkpoint files.<br />
The Cluster service also registers itself at startup as a backup writer with Volume Shadow Copy service. When a backup client invokes the Volume Shadow Copy service to perform a system state backup, it also invokes the Cluster service, through a series of entry point calls, to perform the cluster database backup. The server code in the Cluster service invokes the Failover Manager to perform the backup, and the rest of the operation occurs via the BackupClusterDatabase API.<br />
The Cluster service uses the RestoreClusterDatabase API to restore the cluster database from a backup path. This API can only be invoked locally from one of the cluster nodes. When the RestoreClusterDatabase API is invoked, it stops the Cluster service, restores the cluster database from the backup, sets a registry value that contains the backup path, and then re-starts the Cluster service. On startup, the Cluster service detects that a restore is requested and restores the cluster database from the backup path to the quorum resource.</li>
</ul>
</div>
</div>
<p><span> </span></p>
<div><strong>Cluster Failover</strong></div>
<p>Failover can occur automatically because of an unplanned hardware or software failure, or it can occur as the result of manual initiation by an administrator. The algorithm and behavior in both situations is almost identical. However, in a manually initiated failover, resources are shut down in an orderly way; whereas in unplanned failovers, resources are shut down in a sudden and disruptive way (for example, the power goes out, or a crucial hardware component fails).</p>
<p>When an entire node in a cluster fails, its resource groups transfer to one or more available nodes in the cluster. Automatic failover is similar to planned administrative reassignment of resource ownership. However, it is more complicated, because the orderly steps of a planned shutdown might be interrupted or might not have occurred at all. Therefore, extra steps are required to evaluate the state of the cluster at the time of failure.</p>
<p>When your network experiences an automatic failover, it is important to determine what groups were running on the failed node and which nodes should take ownership of the various resource groups. All nodes in the cluster that are capable of hosting the resource groups negotiate for ownership. This negotiation is based on node capabilities, current load, application feedback, the node preference list, or the use of the AntiAffinityClassNames property, which is discussed in the <a href="http://technet.microsoft.com/en-us/library/bb124140%28EXCHG.65%29.aspx"> Cluster-Specific Configurations</a>. When negotiation of the resource group is completed, all nodes in the cluster update their databases and track which node owns the resource group.</p>
<p>In clusters with more than two nodes, the node preference list for each resource group can specify a preferred server, plus one or more prioritized alternatives. This enables cascading failover, in which a resource group can survive multiple server failures, each time cascading, or failing over to the next server on its node preference list.</p>
<p>An alternative to automatic failover, is commonly called N+I failover. This method establishes the node preference lists for all cluster groups. The node preference list identifies the standby cluster nodes, to which resources are moved at the first failover. The standby nodes are servers in the cluster that are mostly idle or that have workloads that can be easily pre-empted if a failed server&#8217;s workload must be moved to the standby node.</p>
<p>Cascading failover assumes that every other server in the cluster has some excess capacity and can absorb a portion of any other failed server&#8217;s workload. N+I failover assumes, that the +I standby servers are the primary recipients of excess capacity.</p>
<p><span> </span></p>
<div><strong>Cluster Failback</strong></div>
<p>When a node comes back online, Failover Manager can decide to move one or more resource groups back to the recovered node. This is referred to as failback. The properties of a resource group must have a preferred owner defined to fail back to a recovered or restarted node. Resource groups for which the recovered or restarted node is the preferred owner are moved from the current owner to the recovered or restarted node.</p>
<p>Failback properties of a resource group can include the hours of the day during which failback is allowed and a limit on the number of times failback is attempted. This enables the Cluster service to prevent failback of resource groups during peak processing times or to nodes that have not been correctly recovered or restarted.</p>
<p><span> </span></p>
<div>
<div><strong>Cluster Quorum</strong></div>
<div>Each cluster has a special resource referred to as the quorum resource. A quorum resource can be any resource that does the following:</p>
<ul>
<li> Provides a means for arbitration leading to membership and cluster state decisions</li>
<li> Provides physical storage to hold configuration information</li>
</ul>
<p>A quorum log is a configuration database for the entire server cluster. The quorum log contains cluster configuration information, such as the servers that are part of the cluster, the resources that are installed in the cluster, and the state of those resources (for example, online or offline).</p>
<p>The quorum is important in a cluster for the following two reasons:</p>
<ul>
<li> <strong>Consistency</strong> A cluster is made up of multiple physical servers acting as a single virtual server. It is critical that each of the physical servers has a consistent view of the cluster configuration. The quorum acts as the definitive repository for all configuration information relating to the cluster. If the Cluster service is unable to access and read the quorum, it cannot start.</li>
<li> <strong>Tie-breaking</strong> The quorum is used as a tie-breaker to avoid split-cluster scenarios. A split-cluster scenario occurs when all network communication links between two or more cluster nodes fail. If this occurs, the cluster might be split into two or more partitions that cannot communicate with each other. The quorum ensures that cluster resources are brought online on one node only. It does this by allowing the partition that owns the quorum to continue, while the other partitions are evicted from the cluster.</li>
</ul>
</div>
</div>
<p><span> </span></p>
<div>
<div><strong>Standard Quorum</strong></div>
<p>As mentioned earlier in this section, the quorum is a configuration database for the Cluster service that is stored in the quorum log file. A standard quorum uses a quorum log file, located on a disk hosted in the shared storage array, which is accessible by all members of the cluster.</p>
<p>Each member connects to the shared storage using SCSI or Fibre Channel. Storage is made up of external hard disks (usually configured as RAID disks) or a SAN, in which logical slices of the SAN are presented as physical disks.</p></div>
<table style="height: 112px;" border="0" width="594">
<tbody>
<tr>
<th>Note:</th>
</tr>
<tr>
<td>It is important that the quorum uses a physical disk resource, rather than a disk partition, because the entire physical disk resource is moved during failover. Furthermore, it is possible to configure server clusters to use the local hard disk on a server to store the quorum. This type of implementation, referred to as a lone wolf cluster, is supported only for testing and development purposes. Lone wolf clusters should not be used to cluster Exchange 2003 in a production environment because, being singular, they are incapable of providing failover.</td>
</tr>
</tbody>
</table>
<p><span> </span></p>
<div>
<div><strong>Majority Node Set Quorums</strong></div>
<div>From a server cluster perspective, a majority node set (MNS) quorum is a single quorum resource. The data is stored by default on the local disk of each node in the cluster. The MNS resource makes sure that the cluster configuration data, stored on the MNS resource, is consistent across different disks. The MNS implementation provided in Windows Server 2003 uses a directory on each node&#8217;s local disk to store the quorum data. If the configuration of the cluster changes, that change is reflected across each node&#8217;s local disk. The change is considered committed, or made persistent, only if the change is made to: (Number of nodes/2) + 1.</p>
<p>The MNS quorum makes sure that most nodes have an up-to-date copy of the data. The Cluster service starts up and brings resources online only if a majority of the nodes that are configured as part of the cluster are up and are running the Cluster service. If the MNS quorum determines that a majority does not exist, the cluster is considered not to have quorum, and the Cluster service waits in a restart loop until more nodes try to join. When a majority or quorum of nodes is available, the Cluster service starts and brings the resources online. Because the up-to-date configuration is written to a majority of the nodes, regardless of node failures, the cluster always guarantees that it has the most current configuration at startup.</p>
<p>If a cluster failure occurs, or if the cluster somehow enters a split-cluster scenario, all partitions that do not contain a majority of nodes are taken offline. This ensures that if there is a partition running that contains a majority of the nodes, it can safely start any resources that are not running on that partition, because it is the only partition in the cluster that is running resources.</p>
<p>Because of the differences in the way the shared disk quorum clusters behave compared to MNS quorum clusters, you must consider carefully when deciding which model to use. For example, if you have only two nodes in your cluster, the MNS model is not recommended. In this instance, failure of one node leads to failure of the entire cluster, because a majority of nodes is impossible.</p>
<p>Majority node set (MNS) quorums are available in Windows Server 2003 Enterprise Edition and Windows Server 2003 Datacenter Edition clusters. The only benefit that MNS clusters provide for Exchange clusters is to eliminate the need for a dedicated disk in the shared storage array on which to store the quorum resource.</p></div>
</div>
<p><span> </span></p>
<div>
<div><strong>Cluster Resources</strong></div>
<div>The Cluster service manages all resource objects using Resource Monitors and resource DLLs. The Resource Monitor interface provides a standard communication interface that enables the Cluster service to initiate resource management commands and obtain resource status data. The Resource Monitor obtains actual command functions and data through resource DLLs. The cluster Service uses resource DLLs to bring resources online, manage their interaction with other resources in the cluster, and monitor their health.To enable resource management, a resource DLL uses a few simple resource interfaces and properties. Resource Monitor loads a particular resource DLL in its address space, as privileged code running under the SYSTEM account. The SYSTEM account (that is, LocalSystem), is a security principal account that represents the operating system. The Cluster service, which runs under a user security context, uses the SYSTEM account to perform security functions within the operating system.</p>
<p>When resources depend on the availability of other resources to function, these dependencies can be defined by the resource DLL. When a resource is dependent on other resources, the Cluster service brings the dependent resource online only after it brings the resources on which it depends online in the correct sequence.</p>
<p>Resources are taken offline in a similar manner. The Cluster service takes resources offline only after any dependent resources are taken offline. This prevents introducing circular dependencies when loading resources.</p>
<p>Each resource DLL can also define the type of computer and device connection required by the resource. For example, a disk resource may require ownership only by a node that is physically connected to the disk device. Local restart policies and desired actions during failover events can also be defined in the resource DLL.</p></div>
</div>
<p><span> </span></p>
<div>
<div><strong>Cluster Administration</strong></div>
<div>Clusters are managed using Cluster Administrator. Cluster Administrator is a graphical administrator&#8217;s tool that enables the Cluster.exe command line tool to perform maintenance, monitoring, and failover administration. Server clusters also provide an automation interface. This interface can be used to create custom scripting tools for administering cluster resources, nodes, and the cluster itself. Applications and administration tools, such as Cluster Administrator, can access this interface using RPCs, whether the tool is running on a node in the cluster or on an external computer.</div>
</div>
<p><span> </span></p>
<div>
<div><strong>Cluster Formation and Operation</strong></div>
<div>When the Cluster service is installed and running on a server, the server can participate in a cluster. Cluster operations reduce single points of failure and enable high availability of clustered resources. The following sections briefly describe node behavior during cluster creation and operation.</div>
</div>
<p><span> </span></p>
<div>
<div><strong>Creating a Cluster</strong></div>
<div>Server clusters include a cluster installation utility that is used to install the cluster software on a server and create a new cluster. To create a new cluster, the utility is run on the computer selected as the first member of the cluster. This first step defines the new cluster by establishing a cluster name, and creating the cluster database and initial cluster membership list.The next step in creating a cluster is to add the common data storage devices that will be available to all members of the cluster. This establishes the new cluster with a single node and its own local data storage devices and cluster common resources (generally disk or data storage and connection media resources).</p>
<p>The final step in creating a cluster is to run the installation utility on each additional computer that will be a member of the cluster. As each new node is added to the cluster, it automatically receives a copy of the existing cluster database from the original member of the cluster. When a node joins or forms a cluster, the Cluster service updates the node&#8217;s private copy of the configuration database.</p></div>
</div>
<p><span> </span></p>
<div>
<div><strong>Forming a Cluster</strong></div>
<div>A server can form a cluster if it is running the Cluster service and cannot locate other nodes in the cluster. To form the cluster, a node must be able to acquire exclusive ownership of the quorum resource.When a cluster is formed, the first node in the cluster contains the cluster configuration database. As each additional node joins the cluster, it receives and maintains its own local copy of the cluster configuration database. The quorum resource stores the most current version of the configuration database as recovery logs. The logs contain node-independent cluster configuration and state data.</p>
<p>During cluster operations, the Cluster service uses the quorum recovery logs to do the following:</p>
<ul>
<li> Guarantee that only one set of active nodes is allowed to form a cluster</li>
<li> Enable a node to form a cluster only if it can gain control of the quorum resource</li>
<li> Allow a node to join or remain in an existing cluster only if it can communicate with the node that controls the quorum resource</li>
</ul>
<p>When a cluster is formed, each node in the cluster can be in one of three distinct states. These states are recorded by Event Processor (described below) and replicated by Event Log Manager to other nodes in the cluster. The three Cluster service states are as follows:</p>
<ul>
<li> <strong>Offline</strong> The node is not an active member of the cluster. The node and its Cluster service might or might not be running.</li>
<li> <strong>Online</strong> The node is an active member of the cluster. It adheres to cluster database updates, contributes input into the quorum algorithm, maintains cluster network and storage heartbeats, and can own and run resource groups.</li>
<li> <strong>Paused</strong> The node is an active member of the cluster. The node adheres to cluster database updates, contributes input into the quorum algorithm, and maintains network and storage heartbeats, but it cannot accept resource groups. It can support only those resource groups that it currently owns. The paused state enables maintenance to be performed. Online and paused states are treated as equivalent states by the majority of the server cluster components.</li>
</ul>
</div>
</div>
<p><span> </span></p>
<div>
<div><strong>Joining a Cluster</strong></div>
<div>To join an existing cluster, a server must be running the Cluster service, and it must successfully locate another node in the cluster. After finding another node in the cluster, the joining server must be authenticated for membership in the cluster and must receive a replicated copy of the cluster configuration database.The process of joining an existing cluster begins when Windows Service Control Manager starts the Cluster service on the node. During the startup process, the Cluster service configures and mounts the node&#8217;s local data devices. It does not attempt to bring the common cluster data devices online as nodes, because the existing cluster might be using the devices.</p>
<p>To locate other nodes, a discovery process is started. When the node discovers any member of the cluster, it performs an authentication sequence. The first cluster member authenticates the new node and returns a status of success if the new node is successfully authenticated. If authentication is not successful, as when a joining node is not recognized as a cluster member or has an invalid account password, the request to join the cluster is denied.</p>
<p>After successful authentication, the first node online in the cluster checks the copy of the configuration database of the joining node. If it is out-of-date, the cluster node sends the joining server an updated copy of the database. After receiving the replicated database, the node joining the cluster can use it to find shared resources and bring them online as needed.</p></div>
</div>
<p><span> </span></p>
<div>
<div><strong>Leaving a Cluster</strong></div>
<div>A node can leave a cluster when it shuts down or when the Cluster service is stopped. However, a node can also be evicted from a cluster when the node fails to perform cluster operations (such as failure to commit an update to the cluster configuration database).When a node leaves a cluster, as in a planned shutdown, it sends a ClusterExit message to all other members of the cluster, notifying them that it is leaving. The node does not wait for any responses and immediately proceeds to shut down resources and close all cluster connections. Because the remaining nodes receive this exit message, they do not perform the regroup process to reestablish cluster membership that occurs when a node unexpectedly fails or network communications stop.</div>
</div>
<p><span> </span></p>
<div>
<div><strong>Failure Detection</strong></div>
<div>Failure detection and prevention are key benefits provided by server clusters. When a node or application in a cluster fails, server clusters can respond by restarting the failed application or distributing the work from the failed system to remaining nodes in the cluster. Server cluster failure detection and prevention include bi-directional failover, application failover, parallel recovery, and automatic failback.When the Cluster service detects failures of individual resources or an entire node, it dynamically moves and restarts application, data, and file resources on an available, healthy server in the cluster. This allows resources such as database, file shares, and applications to remain highly available to users and to client applications.</p>
<p>Server clusters are designed with two different failure detection mechanisms:</p>
<ul>
<li> <strong>Heartbeats for detecting node failures</strong> Periodically, each node exchanges user datagram protocol-based messages with other nodes in the cluster over the private cluster network. These messages are referred to as the heartbeat. The heartbeat exchange enables each node to check the availability of other nodes and their resources. If a server fails to respond to a heartbeat exchange, the surviving servers initiate failover processes, including ownership arbitration for resources and applications owned by the failed server. Arbitration is performed using a challenge and defense protocol. The node that appears to have failed is given a time window to demonstrate, in any one of several ways, that it is still running correctly and can communicate with the surviving nodes. If the node is unable to respond, it is removed from the cluster. Failure to respond to a heartbeat message is caused by several events, such as computer failure, network interface failure, network failure, or even periods of unusually high activity. Typically, when all nodes are communicating, the Configuration Database Manager sends global configuration database updates to each node. When a heartbeat exchange failure occurs, Log Manager saves configuration database changes to the quorum resource. This ensures that remaining nodes can access the most recent cluster configuration and local node registry data during the recovery processes.<br />
The failure detection algorithm is very conservative. If the cause of the heartbeat response failure is temporary, it is best to avoid the potential disruption a failover might cause. However, there is no way to know whether the node will respond in another millisecond, or if it suffered a catastrophic failure. Therefore, a failover is initiated after a timeout period.</li>
<li> <strong>Resource Monitor and resource DLLs for detecting resource failures</strong> Failover Manager and Resource Monitor work together to detect and recover from resource failures. Resource Monitors keep track of resource status by using the resource DLLs to periodically poll resources. Polling involves two steps, a cursory LooksAlive query and a longer, more definitive, IsAlive query. When Resource Monitor detects a resource failure, it notifies Failover Manager and continues to monitor the resource.<br />
Failover Manager maintains resources and resource group status. It also performs recovery when a resource fails and invokes Resource Monitors in response to user actions or failures.<br />
After a resource failure is detected, Failover Manager performs recovery actions that include restarting a resource and its dependent resources, or moving the entire resource group to another node. The recovery action that is taken is determined by resource and resource group properties, in addition to node availability.<br />
During failover, the resource group is treated as the unit of failover. This ensures that resource dependencies are correctly recovered. When a resource recovers from a failure, Resource Monitor notifies Failover Manager. Failover Manager then performs automatic failback of the resource group, based on the configuration of the resource group failback properties.</li>
</ul>
</div>
</div>



Share and Enjoy:


	
	
	
	
	
	
	
	
	
	
	
	
	
	




Related posts:Server cluster Concepts
Deploying Exchange Server 2003 in a Cluster
How to configure clustered IIS virtual servers on Windows Server



Related posts:<ol><li><a href='http://adyesha.com/2009/08/server-cluster-concepts/' rel='bookmark' title='Permanent Link: Server cluster Concepts'>Server cluster Concepts</a></li>
<li><a href='http://adyesha.com/2009/08/deploying-exchange-server-2003-in-a-cluster/' rel='bookmark' title='Permanent Link: Deploying Exchange Server 2003 in a Cluster'>Deploying Exchange Server 2003 in a Cluster</a></li>
<li><a href='http://adyesha.com/2009/08/how-to-configure-clustered-iis-virtual-servers-on-windows-server/' rel='bookmark' title='Permanent Link: How to configure clustered IIS virtual servers on Windows Server'>How to configure clustered IIS virtual servers on Windows Server</a></li>
</ol>]]></description>
		<wfw:commentRss>http://adyesha.com/2009/08/windows-cluster-architecture/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
