Sergio Martinez · Apr 8, 2016

References about failover architecture with members in different datacenters


I need to give an answer for a RFP where it'll be considered an extra to have a failover system where each member of the failover is located in a different datacenter, separated by more than 100-200 miles.We have no previous experience with mirror. Of course, we have our application running in other sites but we lack the experience with mirror (and no knowledge in particular with failover mirror requirements) to know which are the specifications that the communication link between datacenters  should have.

Any similar architecture out there? I know that how it performs it will depend also of the application itself but some real numbers based on real experiences could help us to have a better idea of what could we need.




3 0 1 239
Log in or sign up to continue

Hopefully someone will chime in with real-life numbers, but I thought it would be helpful to take you through the principles at play to guide your thinking...

1. With any mirror configuration that is going over a WAN (for failover or just DR), you're going to need to ensure sufficient bandwidth to transfer of journals over the network at the peak rate of journal creation.  This is application- and load- specific of course, so this is derived from measuring a reference system running that application.  It's important to base this on peak journal creation, not average journal creation rate, giving plenty of room for spikes, additional growth, etc.

2016.1 introduces network compression for journal transfer and that can substantially reduce bandwidth (70% or more for typical journal contents).  Although it can add a computation latency to the latency you'd consider in #2 below, if you're already going to use SSL encryption, compression may actually save some latency compared to SSL encryption alone.  See documentation on Journal data compression.

2. With failover members in different data centers, latency can be a factor for certain application events.  Specifically it's a factor when an application uses synchronous commit mode transactions or journal Sync() API to ensure that a particular update is durably committed. That requires a synchronous round trip to the backup, which of course incurs any network latency.  This is discussed under Network latency considerations

3. You'll need a strategy for IP redirection when failover occurs. For an intro to the subject, read Mirroring Configurations For Dual Data Centers and Geographically Separated Disaster Recovery.  Then see Mark Bolinsky's excellent article here on the community

4. You'll need a location for the arbiter that is in neither of the two data centers as discussed in Locating the Arbiter to Optimize Mirror Availability