0
votes

I am writing a nodejs backend which will make a series of http requests based on category, aggregate the results, and return the results to a mobile client.

For example, the following metadata will need to be saved in a data store

key: category1
value: {
  name: 'Cat 1 Name',
  requestUrls: [
    {name: 'Request url 1', url: 'http://reqfoo1/'},
    {name: 'Request url 2', url: 'http://reqfoo2/'},
    {name: 'Request url 3', url: 'http://reqfoo3/'},
    {name: 'Request url 4', url: 'http://reqfoo4/'},
    ....
    {name: 'Request url 50', url: 'http://reqfoo50/'}
  ]
}

key: category2
value: {
  name: 'Cat 2 Name',
  requestUrls: [
    {name: 'Request url 1', url: 'http://reqbar1/'},
    {name: 'Request url 2', url: 'http://reqbar2/'},
    {name: 'Request url 3', url: 'http://reqbar3/'},
    {name: 'Request url 4', url: 'http://reqbar4/'},
    ....
    {name: 'Request url 50', url: 'http://reqbar50/'}
  ]
}

Since no categories will ever share the same urls, it probably makes sense to store this in a key/value data store so I can quickly access the urls to call. There will be about 1500 categories, each with about 30-50 URLs to hit.

Backend Service: API GET getAllDataByCategory(catId) will

  • query the metadata, pulling the value for a specific category

  • asynchronously iterate through requestUrls and make an http request for each URL

  • aggregate all results and return a concatenated result back to the client. (aggregate of about 1000 total items)

(NOTE: since all urls return XML, I still havent decided wether to do the JSON parsing to a common model in the server or client. I am leaning toward just returning the aggregated XML string from the API and parsing in the client since parsing is a CPU-bound task and would block in node.js

Given that

  • The metadata (category data/urls) will only change about twice a month

  • I do not need to store the aggregated http results in a data store, but would like to cache them

  • The http result cache should be invalidated after 1 minute.

  • My backend will be deployed in Amazon Web Services (EC2)

What is a good solution for A) storing the metadata, B) caching the results from all http requests?

  1. Metadata storage - Which data store is sufficient for storing the category metadata? Can I just use Postgres to store key/value data? Or should I use mongo or DynamoDB? Or should I even use Redis persistence to store this?

  2. Given that I'm storing the metadata in a store, would it faster to cache the metadata in Redis rather than querying the DB for metadata on each request? Remember this data will only change twice a month. Could I use Redis to cahce AND store the metadata?

  3. What is the best caching mechanism for storing the aggregated XML (or JSON if I decide to parse in backend)? The cache will need to be invalidated every minute and I can store the cache in key/value such below? Some people have said Varnish is good for this use case (although im not caching files), some people say that Redis/memached is good for this (prefer Redis over memcached)

    key: category1 value: data for item1.....item 1000

This is basically what I am trying to achieve

  1. User requests data for category 1 on client

  2. Client makes request to API getAllDataByCategory(cat1) to server

  3. First look for cached aggregate response data for cat1 and if avail and not expired, return it to client

  4. If no response data cache avail, look for metadata for cat1 (look for metadata cache first, then DB if metadata cache not available)

  5. Call all URLs from metadata, aggregate results, cache data, then return to client

  6. If there are 500 client requests per minute, all use cache unless minute has expired.

I am prepared to research the heck out of anything I need to, however If anyone can give me some direction on where to go, or which technology has done well for them in a similar case I would greatly appreciate it. Keep in mind i am using AWS, they offer elasticache (Redis/memached) so maybe this is the way to go?

Thanks!!

1

1 Answers

0
votes

Elasticache/redis should be plenty to handle both A) storing the metadata and B) caching the results from all http requests. KISS.