2
votes

My company has a service-oriented architecture. My app's GraphQL server therefore has to call out to other services to fullfill the data requests from the frontend.

Let's imagine my GraphQL schema defines the type User. The data for this type comes from two sources:

  1. A user account service that exposes a REST endpoint for fetching a user's username, age, and friends.
  2. A SQL database used just by my app to store User-related data that is only relevant to my app: favoriteFood, favoriteSport.

Let's assume that the user account service's endpoint automatically returns the username and age, but you have to pass the query parameter friends=true in order to retrieve the friends data because that is an expensive operation.

Given that background, the following query presents a couple optimization challenges in the getUser resolver:

query GetUser {
  getUser {
    username
    favoriteFood
  }
}

Challenge #1 When the getUser resolver makes the request to the user account service, how does it know whether or not it needs to ask for the friends data as well?

Challenge #2 When the resolver queries my app's database for additional user data, how does it know which fields to retrieve from the database?

The only solution I can find to both challenges is to inspect the query in the resolver via the fourth info argument that the resolver receives. This will allow it to find out whether friends should be requested in the REST call to the user account service, and it will be able to build the correct SELECT query to retrieve the needed data from my app's database.

Is this the correct approach? It seems like a use-case that GraphQL implementations must be running into all the time and therefore I'd expect to encounter a widely accepted solution. However, I haven't found many articles that address this, nor does a widely used NPM module appear to exist (graphql-parse-resolve-info is part of PostGraphile but only has ~12k weekly downloads, while graphql-fields has ~18.5k weekly downloads).

I'm therefore concerned that I'm missing something fundamental about how this should be done. Am I? Or is inspecting the info argument the correct way to solve these optimization challenges? In case it matters, I am using Apollo Server.

1

1 Answers

5
votes

If you want to modify your resolver based on the requested selection set, there's really only one way to do that and that's to parse the AST of the requested query. In my experience, graphql-parse-resolve-info is the most complete solution for making that parsing less painful.

I imagine this isn't as common of an issue as you'd think because I imagine most folks fall into one of two groups:

  • Users of frameworks or libraries like Postgraphile, Hasaura, Prisma, Join Monster, etc. which take care of optimizations like these for you (at least on the database side).
  • Users who are not concerned about overfetching on the server-side and just request all columns regardless of the selection set.

In the latter case, fields that represent associations are given their own resolvers, so those subsequent calls to the database won't be fired unless they are actually requested. Data Loader is then used to help batch all these extra calls to the database. Ditto for fields that end up calling some other data source, like a REST API.

In this particular case, Data Loader would not be much help to you. The best approach is to have a single resolver for getUser that fetches the user details from the database and the REST endpoint. You can then, as you're already planning, adjust those calls (or skip them altogether) based on the requested fields. This can be cumbersome, but will work as expected.

The alternative to this approach is to simple fetch everything, but use caching to reduce the number of calls to your database and REST API. This way, you'll fetch the complete user each time, but you'll do so from memory unless the cache is invalidated or expires. This is more memory-intensive, and cache invalidation is always tricky, but it does simply your resolver logic significantly.