We have an Azure SQL database. Up until a few weeks ago, we were set at 10 DTUs (S0). Recently, we've gotten more SQL timeout errors, prompting us to increase our DTUs to 50 (S2). We get the errors less frequently, but still on occasion. When we get these timeouts, we see spikes on the Resource graph hitting 100%. Drilling into that, it's generally Data I/O operations that are making it spike. But when we check Query Performance Insight, none of the listed queries show that they're using that much resources.
Another thing to note is that our database has grown steadily in size. It is now about 19 GB, and the majority (18 GB) of that comes from one table that has a lot of long JSON strings in it. The timeout errors generally do happen on a certain query that has several joins, but they do not interact with the table with the long strings.
We tested making a copy of the database and removing all the long strings, and it did not get any timeouts at 10 DTU, but performed the same as the database with all the long strings at 50 DTU as far as load times.
We have rebuilt our indexes and, though it helped, we continue to experience timeout errors.
Given that the query that gets timeouts is not touching the table with long strings, could the table with long strings still be the culprit for DTU usage? Would it have to do with SQL caching? Could the long strings be hogging the cache and causing a lot of data I/O? (They are accessed fairly frequently too.)