For the better part of the last year my company has been slicing up a monolith and building new products upon principles of (micro) service architecture. This is all fine and gives us great flexibility in keeping UI and backend logic separate and lowering the amount of dependencies.
BUT!
There is an important part of our business that has a growing headache as a result of this, namely reporting.
Since we make sure that there is no data replication (and business logic sharing) between services, then each service knows its own data and if another service really needs to keep a reference for that data, they do it through ID's (entity linking, essentially). And while otherwise its great, it's not great for reporting.
Our business often needs to create ad-hoc reports about specific instances happening with our customers. In the 'old days' you made a simple SQL query that joined a couple of database tables and queried whatever you needed, but it is not possible with decoupled services. And this is a problem as business sees it.
I am personally not a fan of data replication for reporting purposes in the back end, as that may have another tendency to grow into a nightmare (which it already is even in our legacy monoliths). So this problem is really not about legacy monoliths versus modern microservices, but about data dependencies in general.
Have you faced issues like this and if yes, then how did you solve it?
EDIT:
We have been discussing in-house the few potential solutions how to solve this, but none of them are actually good and I've not gotten the answer I am looking for yet that solves the issues in large scale.
Good old replicate-everything-and-let-BI-people-figure-it-out is what is still used to this day. From the old monolith times the BI/data-warehouse team made duplicates of all databases, but same practice is more inconvenient, but still done to this day for all microservices that use a database. This is not good for various reasons and comes with the shared sandbox cancer you can expect.
Build a separate microservice or a set of microservices that are meant for fetching out specific reports. Each of them connect to set microservices that carries the relevant data and builds the report as expected. This introduces tighter coupling however and can be incredibly complicated and slow with large datasets.
Build a separate microservice or a set of microservices that each have databases replicated from other databases in background. This is problematic as team databases are being coupled and data is directly replicated and there is a strong dependency on technology of databases that is being used.
Have each service send out an event to RabbitMQ that BI services would pick up on and then fetch additional data, if needed. It sounds by far the best for me, but by far the most complex to implement as all services need to start publishing relevant data. It is what I would personally choose at present time, from a very abstract level, that is.