Wednesday, 26 November 2008

Resource Guide for Using Microsoft NLB with ISA Server 2006 Enterprise Edition

Of the many questions posted on ISA Server forums, a re-occurring subject that seems to cause confusion is that of using Microsoft Network Load Balancing (NLB) with ISA Server Enterprise Edition.

Please Note: As we are still (only just) in an ISA Server world and not a Forefront TMG one, the discussions here are based upon using a Windows Server 2003 platform for ISA Server, as opposed to Windows Server 2008. Therefore, all NLB discussions are relevant to this platform only, and do not cater for any enhancements that have been made in Windows Server 2008. More detail on these enhancements can be found here.

Based upon some gruelling questions on the forum recently and the result of asking Microsoft some direct questions, I thought it would be useful to try and create a blog entry that captures some of the key concepts, questions and answers in order to provide an overall resource guide. I am by no means an NLB guru, but hopefully I can summarise some of the common questions for ISA folk new to NLB, and provide my own personal slant on several NLB related issues.

Although not directly associated with NLB, the subject of intra-array communications is also covered within this article as NLB can have a direct impact on this service. Even when not using NLB, I would still personally recommend implementing a dedicated network interface card (NIC) into each array member. This will ensure that all intra-array communications are given sufficient bandwidth and contention ratios from a performance perspective, and probably more importantly, isolated from all other networks for the highest level of security for this type of 'private' communication. More on this later...

Rather than re-hashing a lot of already great documentation that has been provided by Microsoft (and others) it is useful to make reference to the following existing articles which provide some excellent information:

Network Load Balancing: Frequently Asked Questions for Windows 2000 and Windows Server 2003

Network Load Balancing: Security Best Practices for Windows 2000 and Windows Server 2003

Network Load Balancing Integration Concepts for Microsoft Internet Security and Acceleration (ISA) Server 2006

Network Load Balancing in ISA Server 2004 Enterprise Edition

Using NLB with ISA Server, Part 1: How Network Load Balancing Works

Using NLB with ISA Server Part 2: Layer 2 Fun with Unicast and Multicast Modes

Using NLB with ISA Server Part 3: Configuring NLB Array Parameters

One of the key aims of this article is to try and make some of the theory a little more understandable, hopefully adding my own experience with designing and deploying solutions that provide ISA Server high-availability. To give this article a FAQ like feel, I have provided the information as a series of common questions I have seen, and also the sort of questions I have asked myself whilst trying to get to grips with the technology.

Question 1: Do I need to use dedicated NICs in each ISA Server array member to allow for inter-host communication when using NLB?

This isn't quite a simple yes or no answer, so please bear with me! :)

If you are not running Windows Server 2003 SP1/SP2 and plan to use the default unicast NLB mode, then you will need dedicated NICs in each array member to support inter-host communications.

Historically (prior to Windows Server 2003 Service Pack 1 that is) it was necessary to implement dedicated NICs on each NLB enabled array member, as ISA Server only supported the use of unicast mode. This prerequisite arose because in Microsoft Windows Server 2003 NLB, network load-balanced hosts that operate in unicast mode cannot communicate with each other. This behaviour occurs because NLB makes the Media Access Control (MAC) address the same on array members. Therefore, the network redirector never actually sends any packets to the other NLB hosts. Consequently, without dedicated NICs, communication between array members was just not possible...

With the introduction of Windows Server 2003 SP1, support for communication between NLB nodes configured for unicast mode was added by way of a UnicastInterHostCommSupport registry value. This is discussed in more detail within the following knowledgebase article: Unicast NLB nodes cannot communicate over an NLB-enabled network adaptor in Windows Server 2003

If you are running any version of Windows Server 2003 (RTM/SP1/SP2) with multicast NLB, then you will not need dedicated NICs in each array member to support inter-host communications.

In addition to the changes in SP1, Microsoft now supports the use of multicast NLB with ISA Server 2006 Enterprise Edition as covered in the following knowledgebase article An update enables multicast operations for ISA Server integrated NLB. Although the necessary code update was provided as part of a hotfix package, this update is now included within ISA Server 2006 SP1 and therefore more commonly implemented as part of this update. Further information on enabling multicast mode can be found in one of my previous blog entries here.

Now, here's the rub! - considering all these changes, it would appear that it is no longer necessary to utilise dedicated NICs per array member for intra-array communications when enabling NLB if you choose the correct service pack level or NLB operational mode. However, its never that simple, as it is still recommended for performance and security reasons to utilise dedicated NICs, if you can, but this recommendation is for intra-array communications and not NLB. This recommendation is covered within the ISA Server 2006 Security Guide and confirmed in my recent questions to Microsoft.

Therefore, the use of NLB is actually irrelevant as you should be using dedicated NICs in each array member anyhow to satisfy the Microsoft recommended performance and security needs for intra-array communications. However, you may forego this advice (some do, some don't, mileages vary) so the above information could still be of use with specific relevance to NLB and is discussed for completeness.

Question 2: If using NLB unicast mode, what impact does this have on the ISA Server network connectivity design?

In unicast mode (the default ISA integrated NLB operational mode) NLB induces switch flooding, by design, in order that packets sent to the VIP address(es) is relayed to all the cluster hosts. Switch flooding is part of the NLB strategy of obtaining the best throughput for any specific load of client requests. However, if the NLB interfaces share the switch with other (noncluster) computers, switch flooding can add to the other computers' network overhead by including them in the flooding and consequently have a detrimental affect on network and/or server performance.

The obvious solution to solve this problem is to isolate the NLB hosts so that the inherent switch flooding mechanism only affects cluster nodes, as opposed to other noncluster computers on the same network (broadcast domain).This can be achieved by placing the NLB interfaces into their own LAN or virtual LAN, thereby creating an isolated network for NLB related communications.
One option to avoid flooding noncluster computers is to place a network hub between the switch and the NLB interfaces, and then disabling the MaskSourceMAC feature that is inherent to unicast mode. However, in reality this option provides a poor solution with obvious limitations. I can't believe Microsoft support still appears to tell people to do this :(

My personal preference is to create dedicated VLANs to isolate NLB enabled interfaces, as required. More detail on this area will be provided in Question 4.

Question 3: If using NLB multicast mode, what impact does this have on the ISA Server network connectivity design?

Although multicast is often used to remove unicast mode limitations like switch flooding, this operational mode can also cause switch flooding. As with unicast mode, this can be solved by placing the NLB interfaces into their own LAN or virtual LAN, thereby creating an isolated network across which to pass multicast traffic. If this is not possible, the switch ports to which the NLB enabled interfaces are attached can also be mapped to the NLB cluster MAC address via static entries in the Content-Addressable Memory (CAM) table of the switch. This ensures that the switch is aware of exactly which switch ports are 'NLB enabled' and hence eliminate the need to flood all ports.

If you have the network hardware to support it, the recommended design is to use the 'multicast with IGMP' operational mode and then configure appropriate network devices to support IGMP snooping. This combination aims to restrain multicast traffic when used in a switched network without the use of dedicated VLANs.

By default, a LAN switch floods multicast traffic within the broadcast domain, and this can consume a lot of bandwidth if many multicast servers are sending streams to the same segment. With IGMP snooping, the switch intercepts IGMP messages from the host itself and updates its MAC table accordingly. As discussed above, without snooping, it is necessary to manually configure multicast dynamic Content-Addressable Memory (CAM) entries in order to avoid flooding the subnet with multicast traffic. This is an administrative burden, however, and is not a dynamic solution like IGMP snooping. Consequently, multicast with IGMP is seen a much more elegant solution, assuming you have the network hardware to support it...

So, at first glance multicast appears solve some of the unicast limitations, but it also introduces new challenges that need to be considered. As discussed in more detail in Question 4, the key one being that some switches/routers cannot make the necessary ARP mapping and it is necessary to add a static ARP entries to solve common multicast related issues.

Question 4: What needs to be considered when connecting NLB enabled interfaces to Layer 2 and Layer 3 switches?

Layer 2 Switches

NLB is 'Layer 2 Switch aware' and assumes that NLB interfaces are connected to a Layer 2 device by default. This default configuration uses a feature called MaskSourceMAC to ensure that the switch is unable to learn the original source MAC address of each NLB host. This way, it cannot learn that the NLB traffic (the NLB cluster MAC address to be precise) should be associated with a specific individual switch port.

If we consider unicast mode first...

If the switch is unable to associate a MAC address with a particular port (because it has been masked) it will have to send the data to all switch ports; thereby ensuring that all NLB hosts can process the traffic.

If you look at the substitute source MAC address that is used, you will notice that they are similar to the original MAC address, but the first two fields are replaced as follows:

02-[Host ID including zero]-[Original MAC address values]

Consequently, a NLB host with a host ID of 3 and a MAC address of 00-19-BB-3C-29-08 has a substituted source MAC address of:


This actually makes spotting NLB enabled hosts pretty easy if looking at switches, routers or network tracing/sniffing software. Some good information NLB substitute MAC addresses is also available in Russ Kaufman's blog (a Microsoft HA MVP) found here.

If we next consider multicast mode...

When you mask the source MAC address, the ARP response from an NLB hosts has a substitute source MAC address in the Ethernet frame, but contains the correct NLB cluster MAC address in the ARP header. Some Layer 3 switches and routers are (not surprisingly!) confused by this type of response and cannot make this ARP mapping automatically. Hence, in this scenario it is necessary to create a static ARP entry on the affected switch/router which maps the NLB virtual IP address to the NLB cluster MAC address. This static ARP entry negates the network device from using address resolution protocol (ARP) to determine the virtual IP's MAC address, as it has already been defined statically.

Layer 3 Switches

As Layer 3 switches provide routing capabilities, it is important to configure these devices carefully to interoperate with Microsoft NLB. This can be achieved by creating VLANs that operate in Layer 2 mode and then connect NLB enabled interfaces to ports which are associated with this special 'Layer 2 mode' VLAN. Once configured this way, the VLAN will then function as discussed in the previous 'Layer 2 Switches' section (above).

Question 5: Why would you use NLB multicast mode as opposed to the default NLB unicast mode?

I have heard a few people say that they feel that Multicast mode is less problematic, but even so, I am not sure I would recommend multicast mode as the default deployment method, especially if you can easily satisfy the requirements for unicast mode. They key reason for this is the administrative overhead of needing to add static ARP entries for Layer 3 devices when using multicast NLB. In order to prevent network devices from associating VIPs with the MAC address of an individual node, this static ARP entry associates the VIP with the MAC address of the NLB cluster to ensure correct load balancing and failover operations. For a small network, these static ARP entries may be negligible, but I am not sure that this solution scales well if you have a complex network and/or utilise a large number of VIPs, as the entries will need to be made on multiple network devices and a static ARP entry is required for every individual VIP in use.

As discussed in Question 4 above, the use of multicast doesn't necessarily eliminate switch flooding, which is unhelpful when people move away from unicast specifically for this very reason.

I think a better way to answer the original question is to ask "What is wrong with using the default method of unicast mode?" If the answer this question is "Not sure..." or simply "Nothing..." then give unicast mode a try before you start thinking about enabling multicast mode! If you run into problems, multicast should hopefully be there to save the day, it's just important that you realise that this operational mode also has it's own limitations and complications that you need to consider...

Anyhow, onto some of the official answers:

Microsoft Answer: The reasons to use one method or another are mainly related to the switches and routers you use. From my experience I have seen many cases where customers need a specific method because they already have the network hardware which supports it and they do not want to spend more money on upgrading hardware.

And finally, my own view:

My Answer: If you cannot meet the recommended requirements of using the default unicast mode, or your existing switches/routers do not support it, or you just don't like unicast for some reason, multicast mode support in ISA now provides additional flexibility to solve problems that you may encounter with unicast mode in your organisation. To summarise, if unicast works for you, stick with it, if not, multicast is now achievable, and more importantly, a supported option, if and when needed!

Russ Kaufman also has some good information on this subject here.

Question 6: When using dedicated NICs for the intra-array communications, does this network connection also carry the NLB heartbeat traffic (Ethertype 0x886F packets) or does this still only occur on NLB enabled interfaces?

From what I have seen, the use dedicated NICs for intra-array communications seems to make people believe that this connection is also being used as the conduit for NLB heartbeat traffic. It sounds like this would make sense, but in reality, this is just not how NLB works.

A quick network sniff with NetMon or Wireshark on NLB enabled interfaces should also confirm this is the case...

Anyhow, onto some of the official answers:

Microsoft Answer 1: NLB heartbeats occur only over NLB enabled interfaces. That is an NLB fact, irrespective of ISA Server integrated NLB.

Microsoft Answer 2: AFAIK NLB does not allow exchanging heart beats through different NICs and even if it does, ISA definitely doesn’t configure NLB to utilise it.

And finally, my own view:

My Answer: Forget thinking about the intra-array connection as an 'NLB heartbeat' connection, it isn't! The dedicated intra-array connection is an isolated, secure network over which to pass messages between array members, these include:

  • ADAM replication and ISA synchronisation with ADAM (only if the CSS role is installed on array members).

  • ISA level heartbeat (over http) to determine availability of array members. This is done in order to correctly handle and forward CARP requests.

  • CARP traffic, where one member forwards http request to another member for cache retrieval purposes.
Therefore, placing this type of communication on an isolated network sure makes sense to me! :)

Question 7: Can you explain the following statement from Technet? “When NLB is enabled, it synchronizes array members by using pure Ethernet protocol communication. This low-level traffic is not protected by ISA Server. To help secure that traffic, we strongly recommend that you place a Layer-3 router between the Internet and the NLB-enabled array. This Layer-3 router will not allow the low-level Ethernet protocol to pass, thereby helping protect the array from potentially malicious Ethernet traffic from the Internet that could disrupt the operation of NLB."

Before I got my head around which actual interfaces were used to exchange NLB heartbeat packets in a multi-homed scenario, I never really understood this statement, which I had come across several times.

Anyhow, onto some of the official answers:

Microsoft Answer 1: NLB heartbeats and any other NLB handshake communication are low level, ISA doesn’t see it. A malicious user who can access your interfaces can interfere with this handshake. By isolating the NLB enabled interfaces with a layer-3 router you prevent malicious traffic from interfering with NLB operation. This recommendation is not specific to ISA, the same recommendation is just as valid for something like an NLB enabled web server.

And finally, my own view:

My Answer: From Question 4, we now know that NLB heartbeats occur on the actual NLB enabled interfaces themselves, and subsequently this Technet statement makes sense, especially if read slowly :) (see further clarification in my next question).

Question 8: Does the use of the word 'Internet' in Question 7 above actually imply any potentially untrusted network? Consequently, should all NLB enabled interfaces be separated from untrusted networks using a Layer 3 device?

This seems an obvious follow on question, as the days of fearing the Internet as the only source of untrusted or malicious intent are very much over. Consequently, the concept of NLB interference is actually valid for any NLB enabled network.

Anyhow, onto some of the official answers:

Microsoft Answer 1: Yes. Any NLB enabled interface should be isolated from untrusted traffic to prevent malicious interference with NLB packets.

And finally, my own view:

My Answer: I think this Technet statement should possibly be reworded, as we cannot assume that the Internet is the only source of malicious attack and potentially all NLB enabled interfaces should be protected accordingly. At the simplest level, this could be achieved through the use of dedicated Layer 2 VLANs for each NLB enabled network, hosted on a Layer 3 switch, which is then configured to route between the VLANs as necessary.

Question 9: If I have installed the Configuration Storage Servers (CSS) role on my ISA Servers, will CSS functionality be impacted by enabling NLB?

Yes, if you are using Windows based authentication for intra-array credentials.

In the event that the CSS role has been installed on the actual array members (although this is not recommended as best practice) it is necessary to make several changes before NLB can be enabled on any of the array networks. These changes are necessary due to the use of Kerberos based authentication between array members and the CSS role which necessitates the use of correct Service Principal Names (SPNs). If these SPN changes are not made, a request to connect from an array member to the Configuration Storage server may fail.

The required configuration to regain normal operation is provided in the Multiple Network Adapters and NLB section of this document Network Load Balancing Integration Concepts for Microsoft Internet Security and Acceleration (ISA) Server 2006

Question 10: I have enabled ISA integrated NLB using multicast mode. What do I need to consider to ensure client devices on remote networks separated by a firewall or router will function correctly?

As discussed in Questions 4/5 the use of multicast mode can introduce the need to add static ARP entries to routing devices (Layer 3) to ensure they are able to correctly utilise the NLB cluster MAC address as opposed to relying on the usual ARP response. The addition of static ARP entries varys between vendors, but a couple of examples that I am personally aware of are provided below:

For example, on a Cisco device, a static ARP entry can be added with the following syntax:

arp [NLB Virtual IP Address] [NLB Cluster MAC Address] ARPA

Whilst on a NetScreen device, a static ARP entry can be added with the following syntax:

set arp [NLB Virtual IP Address] [NLB Cluster MAC Address]

Question 11: Can you provide an example design which covers the key elements to consider for an ISA integrated NLB design?

This is not an easy question to answer, as many factors can affect the overall ISA Server high-availability design. However, I have provided an example architecture diagram below, upon which to highlight areas of consideration. This is by no means a full 'best practice' design but covers a lot of 'good practice' in my opinion.

The design is based upon a two-tier perimeter network that uses a two-node ISA Server Enterprise Edition array, in addition to a couple of hardware front/edge firewalls. The key role for the front firewall in this design is to offload processing of unwanted, 'noisy' traffic from the ISA Servers and also provide ISP fault tolerance as ISA Server 2006 is not able to provide this functionality at this time. I am not going to say much else about the overall architecture design, as this article is about NLB and not firewall/perimeter network design :)

Key NLB Design Features

  • The use of Layer 3 switches throughout ensures that NLB interfaces are protected against malicious interference of NLB related packets as discussed in Questions 6/7.

  • As per ISA Server best practice, only the external interfaces should be configured with a default gateway. Consequently, it is necessary to configure static routes on the ISA Server internal interfaces to allow communication with internal hosts which exists behind/outside of the PRIVISA VLAN. As the Layer 2 VLAN does not have any form of gateway address by default, it is necessary to provide some form of gateway so that appropriate static routes can be created on each array member. In terms of Cisco hardware, this is achieved using a feature called a Switch Virtual Interface (SVI). With this feature configured, each array member can then be configured to use this SVI IP address as the gateway IP address when defining 'route add' statements for communication with internal hosts and networks.

  • The ISA Server primary VIP on the external interface is used by the front firewalls (Front FW) as the primary route for all inbound traffic. This ensures that all inbound traffic is load balanced across both array members and still available in the event that one of the array members fails.

  • The ISA Server primary VIP on the internal interface is used by the core switches as the primary route for all outbound traffic. This ensures that all outbound traffic is load balanced across both array members and still available in the event that one of the array members fails.

  • The ISA Server primary VIP on the DMZ interfaces is used by the hosts in those networks as the primary route for all outbound traffic. This ensures that all outbound traffic is load balanced across both array members and still available in the event that one of the array members fails.

  • Each array member is connected to separate switches (denoted by SW1 and SW2 references) to ensure that a single switch failure will only result in the loss of a single array member, as opposed to the entire array. Furthermore, each array member is homed to the same switch for all interfaces, as the loss of a single interface will cause ISA to remove the array member from the NLB cluster as discussed in Question 13.

Question 12: Does using NLB prevent me from using other network technologies like NIC teaming and VLAN tagging?

Yes, unfortunately NLB and other network technologies like NIC teaming and VLAN tagging are often mutually exclusive.

Some vendors claim that NIC teaming and NLB are supported (HP for example) but only if multicast mode is used, as opposed to unicast. However, I have never been able to actually get this to work with ISA integrated NLB, even when ISA Server has been specifically configured to utilise multicast mode. My testing resulted in the same 'NLB cannot apply local configuration' errors with multicast mode as I experienced with unicast mode. If someone has this working, I would be very grateful to hear the solution!

Russ Kaufman provides a similar viewpoint here.

Question 13: When I disconnect any NLB enabled NIC, the entire array member is removed from the cluster and consequently no longer accepts connections - Is this normal?

I believe this behaviour is by design, as the loss of any NLB enabled interface results in the NLB service being stopped. This seems to be a sensible approach if you consider that fact that ISA Server has no understanding of which interfaces are critical and those that are expendable. In addition, it may be that traffic could enter the firewall on one interface but not be able to exit the firewall on a failed interface. Hence, in order to preserve a suitable state of stability and operational confidence, the entire array member is removed from the NLB cluster until the problem can be rectified.

Based upon this knowledge, this design feature therefore necessitates that the network architecture needs to be carefully considered and designed to work alongside ISA integrated NLB. For example, connecting all array members into a single switch would render the entire array inoperable in the event of switch failure. Therefore, a high-availability network design is also required, and ideally array members should be distributed across network hardware to ensure that failure of a network component only affects a single array member, or worst case scenario, a small subset of array members, as opposed to the entire array.

Right, I think that is a pretty good braindump for now, which hopefully answers a few of the most common questions I often come across. I hope this resource guide proves useful, and as always, please feel free to comment either way :)