SQL Server 2019 Failover Cluster on EC2 – Lessons Learned
Not as straightforward as hoped
Due to collation and framework restrictions, the recommended way to set up a SQL cluster on AWS—via RDS—was not an option. Instead, we opted for a SQL Failover Cluster running on EC2 Windows Server instances. The AWS reference setup is mostly accurate and helpful, but one crucial tweak was needed to get the cluster up and running.
TL;DR
Do not create secondary ENIs for the IPs required by Windows Failover Cluster and SQL Failover Cluster. Instead, predefine the secondary IPs yourself and attach them to the primary network interface before installing the clusters.
Prerequisites
- AMI used: ami-0927dabc35e7e1948 - Windows Server 2019 with SQL 2019 Standard
- EC2 Instance Type: t3.xlarge
The tweak: Secondary IPs instead of secondary ENIs
SQL Server Failover Cluster Instances (FCI) operate on a Windows Server Failover Cluster (WSFC). Each node in the cluster requires a dedicated IP address for network communication. The AWS guide suggests creating secondary Elastic Network Interfaces (ENIs) to accommodate these IPs. However, in our setup, secondary ENIs caused issues, preventing proper cluster communication and failover behavior. Either the Windows refused to use the IP Address for the cluster as it was already in use at a network adapter or the cluster broke when attaching the secondary ENI with a cluster IP.
What Worked
Instead of adding secondary ENIs, I manually assigned secondary IPs to the primary network interface (eth0) of each EC2 instance before installing the failover cluster. Here’s how:
- Ensure Network Configuration is correct: The instance needs to be joined to an AD domain and the network adapter (in Windows)/primary network interface (in AWS) should be set to use Active Directory (AD) DNS servers.
- Predefine Secondary IPs: Depending on the subnet where a cluster node will reside, define which available IP of the subnet will be used. You will need two IPs, one for the Windows Failover cluster and one for the SQL Failover cluster.
- Assign Additional Private IPs: Edit the primary network interface settings of the running instance and assign the defined secondary IPs.
Try setting up the Windows Failover Cluster, like mentioned in the AWS reference setup, with powershell static addresses. If the creation of the cluster fails, you can go ahead create the cluster with standard settings, just ommit the block about static IP addresses in the powershell command or use the Failover Cluster Manager GUI. If the cluster was created without static IP addresses, manually remove the IP addresses created during standard installation and assign the secondary IP addresses you want to use for the cluster via powershell or, what worked well for me, through the Failover Cluster Manager.
Windows will then assign these secondary IP addresses to the primary network interface of the respective node and, as AWS already knows about the IP assignment, traffic will flow. The same will happen for the IP addresses used in the subsequent SQL Failover cluster installation. After the successful installation each nodes network adapter should show three IP addresses.
Additional Considerations
SQL Service Account
Ensure that the SQL Server Agent and SQL Server Database Engine service accounts have the necessary permissions in Active Directory. These accounts require:
- Permissions to create and manage cluster objects in the AD
- The ability to log on as a service.
Troubleshooting Installation Blocks
When adding a node to a deployed SQL failover cluster, I encountered an issue where credentials for service accounts have not been properly loaded. Thus the installation could not proceed. If you have fields in the Service Account that are misconfigured, you can manually override the installation using the following command (in the SQL setup directory):
...
setup.exe /Action=AddNode /INSTANCENAME=MSSQLSERVER /SQLSVCACCOUNT="aws\admin" /SQLSVCPASSWORD="YourPassword" /AGTSVCACCOUNT="aws\admin" /AGTSVCPASSWORD="YourPassword"
...
Performance Considerations
For optimal performance:
- Consider using NVMe-based EC2 instances, such as the m5 or i3 instance families. These provide significantly better disk I/O performance, especially for temporary tables, compared to traditional EBS-backed instances.
- Ensure that FSx for NetApp ONTAP is configured with sufficient throughput capacity to handle SQL Server workload demands.
Conclusion
Setting up a SQL Server Failover Cluster on EC2 is certainly possible, but not without its challenges. By manually assigning secondary IPs to the primary network interface before installation, you can avoid unnecessary complications and ensure a smoother deployment process.
Additionally, ensuring that:
- The instance is AD-joined before configuring the cluster,
- The primary network adapter uses AD DNS servers, and
- SQL Server service accounts have appropriate permissions
will help prevent common pitfalls.
By following these best practices, your SQL Server Failover Cluster should be up and running efficiently on EC2 in no time. If you need support doing your transition seamlessly, use our experience and do not hesitate to contact tecRacer at https://www.tecracer.com/kontakt/. Happy clustering!
— Patrik