Assume that we're developing an Instagram with billions of users. We shard photo tables in multiple databases(in different sharding instances/servers/devices), and in photo tables we have a createdAt column. Now a user opens the home tab in the app, the app should show the most recent 20 photos(order by createdAt desc) globally(not locally) across the photo tables in multiple databases. How the SQL query should be?
We must shard the photo table because billions of users would make hundred billions of photos. We can't store and serve hundred billions of photos in one table in one database in one server.
Say we have 100 database servers, one possible solution is that querying select id from photo order by createdAt desc limit 20 over the photo tables over 100 database servers. Then in our backend, we get 20*100 = 2000 photo rows, and sort them by createdAt in backend(Node.js, Java, Python, etc) and return only first 20 rows.