1
votes

I am designing an review analysis platform in microservices architecture.

Application is works like below;

  • all product reviews retrieved from ecommerce-site-a ( site-a ) as an excel file
  • reviews are uploaded to system with excel
  • Analysis agent can list all reviews, edit some of them, delete or approve
  • Analysis agent can export all reviews for site-a
  • Automated regexp based checks are applied for each review on upload and editing.

I have 3 microservices.

  • Reviews: Responsible for Review Crud operations plus operations similar to approve/reject..
  • Validations: Responsible for defining and applying validation rules on review.
  • Export/Import: Export service exports huge files given site name (like site-a)

The problem is at some point, validation service requires to get all reviews for site-a, apply validation rules and generate errors if is there any. I know sharing database schema's and entities breaks micro-services architecture.

One possible solution is

  • Whenever validation service requires reviews for a site, it requests gateway, gateway redirects request to Reviews service and response taken.

Two possible drawbacks of this approach is

  • validation service knows about gateway? Is it brings a dependency?
  • in case I have 1b reviews for a site, getting all reviews via rest request may be a problem. ( or not, I can make paginated requests from validation service to gateway..)

So what is the best practice for sharing huge data between micro-services without

  • sharing entity
  • and dublicating data

I read lot about using messaging queues but I think in my case it is not good to use messaging queue to share gigabytes of data.


edit 1: Instead of sharing entity, using data stores with rest API can be a solution? Assume I am using mongodb, instead of sharing my entity object between microservices, I can use rest interface of mongo (http://restheart.org/) and query data whenever possible.

2
You can try enterprise integration patterns. I don't know of a pattern which solves exact use case but it should be covered in them.k1133

2 Answers

10
votes

Your problem here is not "sharing huge data", but rather the boundaries you choose to separate your micro services based on.

I can tell from your requirements that the 3 micro services you chose to separate (Reviews, Validations, Import/Export) are actually operating on the same context and business domain .. which is Reviews.

I would encourage you to reconsider your design decision and consider Reviews, as a single micro service, that handles all reviews operations and logic as a black box.

0
votes

I assume that reviews are independent from each other and that validating a review therefore requires only that review, and no other reviews.

You don't want to share entities, which rules out things like shared databases, Hadoop clusters or data stores like Redis. You also don't want to duplicate data, thereby ruling out plain file copies or trigger-based replication on database level.

In summary, I'd say your aim should be a stream. Let the Validator request everything from Reviews about Site A, but not in one bulk, but in a stream of single or small packages of reviews.

The Validator can now process the reviews one after the other, at constant memory and processor consumption. To gain performance, you can make multiple instances of the Validator who pull different, disjunct pieces of the stream at the same time. Similarly, you can create multiple instances of the Reviews microservice if one alone wouldn't be able to answer the pull fast enough.

The Validator does not persist this stream, it produces only the errors and stores or sends them somewhere; this should fulfill your no-sharing no-duplication requirements.