I am currently reading this: https://docs.microsoft.com/en-us/azure/sql-database/sql-database-auto-failover-group, and I have a hard time understanding the automatic failover policy:
By default, a failover group is configured with an automatic failover policy. The SQL Database service triggers failover after the failure is detected and the grace period has expired. The system must verify that the outage cannot be mitigated by the built-in high availability infrastructure of the SQL Database service due to the scale of the impact. If you want to control the failover workflow from the application, you can turn off automatic failover.
When defining the failover group in an ARM template:
{
"condition": "[equals(parameters('redundancyId'), 'pri')]",
"type": "Microsoft.Sql/servers",
"kind": "v12.0",
"name": "[variables('sqlServerPrimaryName')]",
"apiVersion": "2014-04-01-preview",
"location": "[parameters('location')]",
"properties": {
"administratorLogin": "[parameters('sqlServerPrimaryAdminUsername')]",
"administratorLoginPassword": "[parameters('sqlServerPrimaryAdminPassword')]",
"version": "12.0"
},
"resources": [
{
"condition": "[equals(parameters('redundancyId'), 'pri')]",
"apiVersion": "2015-05-01-preview",
"type": "failoverGroups",
"name": "[variables('sqlFailoverGroupName')]",
"properties": {
"serverName": "[variables('sqlServerPrimaryName')]",
"partnerServers": [
{
"id": "[resourceId('Microsoft.Sql/servers/', variables('sqlServerSecondaryName'))]"
}
],
"readWriteEndpoint": {
"failoverPolicy": "Automatic",
"failoverWithDataLossGracePeriodMinutes": 60
},
"readOnlyEndpoint": {
"failoverPolicy": "Disabled"
},
"databases": [
"[resourceId('Microsoft.Sql/servers/databases', variables('sqlServerPrimaryName'), variables('sqlDatabaseName'))]"
]
},
"dependsOn": [
"[variables('sqlServerPrimaryName')]",
"[resourceId('Microsoft.Sql/servers/databases', variables('sqlServerPrimaryName'), variables('sqlDatabaseName'))]",
"[resourceId('Microsoft.Sql/servers', variables('sqlServerSecondaryName'))]"
]
},
{
"condition": "[equals(parameters('redundancyId'), 'pri')]",
"name": "[variables('sqlDatabaseName')]",
"type": "databases",
"apiVersion": "2014-04-01-preview",
"location": "[parameters('location')]",
"dependsOn": [
"[variables('sqlServerPrimaryName')]"
],
"properties": {
"edition": "[variables('sqlDatabaseEdition')]",
"requestedServiceObjectiveName": "[variables('sqlDatabaseServiceObjective')]"
}
}
]
},
{
"condition": "[equals(parameters('redundancyId'), 'pri')]",
"type": "Microsoft.Sql/servers",
"kind": "v12.0",
"name": "[variables('sqlServerSecondaryName')]",
"apiVersion": "2014-04-01-preview",
"location": "[variables('sqlServerSecondaryRegion')]",
"properties": {
"administratorLogin": "[parameters('sqlServerSecondaryAdminUsername')]",
"administratorLoginPassword": "[parameters('sqlServerSecondaryAdminPassword')]",
"version": "12.0"
}
}
I specify the readWriteEndpoint like this:
"readWriteEndpoint": {
"failoverPolicy": "Automatic",
"failoverWithDataLossGracePeriodMinutes": 60
}
With a failoverWithDataLossGracePeriodMinutes set to 60 minutes.
What does this mean? I cannot find a clear answer anywhere. Does it mean that:
- When an outage is happening in my primary region where my primary database resides, the read/write endpoint points to the primary and only after 60 minutes it fails over to my secondary, which becomes the new primary. In the 60 minutes, the only way to read my data is to use the readOnlyEndpoint directly? OR
- My read/write endpoint is turned instantly, if they somehow can detect that there was no data to be synced
I think it boils down to: do I have to manually make the failover, if I detect an outage, if I don't care about data loss, but I want to be able to write to my database?
Bonus question: is the reason why the grace period is present because there can be unsynced data on the primary, that will be overwritten, or tossed away, if the secondary becomes the new primary (if i switch manually)?
Sorry, I can't keep it to only one question. I have read a lot and I really need to know this.