53
votes

I'm struggling to apply RESTful principles to a new web application I'm working on. In particular, it's the idea that to be RESTful, each HTTP request should carry enough information by itself for its recipient to process it to be in complete harmony with the stateless nature of HTTP.

The application allows users to search for medications. The search accepts filters as input, for example, return discontinued medicines, include complimentary therapy etc..etc. In total there are around 30 filters that can be applied.

Additionally, patient details can be entered including the patients age, gender, current medications etc.

To be Restful, should all this information be included with every request? This seems to place a huge overhead on the network. Also, wouldn't the restrictions on URL length, at least for GET, make this unfeasible?

5

5 Answers

85
votes

The "Filter As Resource" is a perfect tact for this.

You can PUT the filter definition to the filter resource, and it can return the filter ID.

PUT is idempotent, so even if the filter is already there, you just need to detect that you've seen the filter before, so you can return the proper ID for the filter.

Then, you can add a filter parameter to your other requests, and they can grab the filter to use for the queries.

GET /medications?filter=1234&page=4&pagesize=20

I would run the raw filters through some sort of canonicalization process, just to have a normalized set, so that, e.g. filter "firstname=Bob lastname=Eubanks" is identical to "lastname=Eubanks firstname=Bob". That's just me though.

The only real concern is that, as time goes on, you may need to obsolete some filters. You can simply error out the request should someone make a request with a missing or obsolete filter.

Edit answering question...

Let's start with the fundamentals.

Simply, you want to specify a filter for use in queries, but these filters are (potentially) involved and complicated. If it was simple /medications/1234, this wouldn't be a problem.

Effectively, you always need to send the filter to the query. The question is how to represent that filter.

The fundamental issue with things like sessions in REST systems is that they're typically managed "out of band". When you, say, go and create a medication, you PUT or POST to the medications resource, and you get a reference back to that medication.

With a session, you would (typically) get back a cookie, or perhaps some other token to represent that session. If your PUT to the medications resource created a session also, then, in truth, your request created two resources: a medication, and a session.

Unfortunately, when you use something like a cookie, and you require that cookie for your request, the resource name is no longer the true representation of the resource. Now it's the resource name (the URL), and the cookie.

So, if I do a GET on the resource named /medications/search, and the cookie represents a session, and that session happens to have a filter in it, you can see how in effect, that resource name, /medications/search, isn't really useful at all. I don't have all of the information I need to make effective use, because of the side effect of the cookie and the session and the filter therein.

Now, you could perhaps rewrite the name: /medications/search?session=ABC123, effectively embedding the cookie in the resource name.

But now you run in to the typical contract of sessions, notably that they're short lived. So, that named resource is less useful, long term, not useless, just less useful. Right now, this query gives me interesting data. Tomorrow? Probably not. I'll get some nasty error about the session being gone.

The other problem is that sessions typically are not managed as a resource. For example, they're usually a side effect, vs explicitly managed via GET/PUT/DELETE. Sessions are also the "garbage heap" of web app state. In this case, we're just kind of hoping that the session is properly populated with what is needed for this request. We actually don't really know. Again, it's a side effect.

Now, let's turn it on its head a little bit. Let's use /medications/search?filter=ABC123.

Obviously, casually, this looks identical. We just changed the name from 'session' to 'filter'. But, as discussed, Filters, in this case, ARE a "first class resource". They need to be created, managed, etc. the same as a medication, a JPEG, or any other resource in your system. This is the key distinction.

Certainly, you could treat "sessions" as a first class resource, creating them, putting stuff in them directly, etc. But you can see how, at least from a clarity point of view, a "first class" session isn't really a good abstraction for this case. Using a session, its like going to the cleaners and handing over your entire purse or briefcase. "Yea, the ticket is in there somewhere, dig out what you want, give me my clothes", especially compared to something explicit like a filter.

So, you can see how, at 30,000 feet, there's not a lot of difference in the case between a filter and a session. But when you zoom in, they're quite different.

With the filter resource, you can choose to make them a persistent thing forever and ever. You can expire them, you can do whatever you want. Sessions tend to have pre-conceived semantics: short live, duration of the connection, etc. Filters can have any semantics you want. They're completely separate from what comes with a session.

If I were doing this, how would I work with filters?

I would assume that I really don't care about the content of a filter. Specifically, I doubt I would ever query for "all filters that search by first name". At this juncture, it seems like uninteresting information, so I won't design around it.

Next, I would normalize the filters, like I mentioned above. Make sure that equivalent filters truly are equivalent. You can do this by sorting the expressions, ensuring fieldnames are all uppercase, or whatever.

Then, I would store the filter as an XML or JSON document, whichever is more comfortable/appropriate for the application. I would give each filter a unique key (naturally), but I would also store a hash for the actual document with the filter.

I would do this to be able to quickly find if the filter is already stored. Since I'm normalizing it, I "know" that the XML (say) for logically equivalent filters would be identical. So, when someone goes to PUT, or insert a new filter, I would do a check on the hash to see if it has been stored before. I may well get back more than one (hashes can collide, of course), so I'll need to check the actual XML payloads to see whether they match.

If the filters match, I return a reference to the existing filter. If not, I'd create a new one and return that.

I also would not allow a filter UPDATE/POST. Since I'm handing out references to these filters, I would make them immutable so the references can remain valid. If I wanted a filter by "role", say, the "get all expire medications filter", then I would create a "named filter" resource that associates a name with a filter instance, so that the actual filter data can change but the name remain the same.

Mind, also, that during creation, you're in a race condition (two requests trying to make the same filter), so you would have to account for that. If your system has a high filter volume, this could be a potential bottleneck.

Hope this clarifies the issue for you.

13
votes

To be Restful, should all this information be included with every request?

No. If it looks like your server is sending (or receiving) too much information, chances are that there are one or more resources you haven't yet identified.

The first and most important step in designing a RESTful system is to identify and name your resources. How would you do that for your system?

From your description, here's one possible set of resources:

  • User - a user of the system (maybe a doctor or patient (?) - Role might need to be exposed as a resource here)
  • Medication - the stuff in the bottle, but it also might represent the kind of bottle (quantity and contents), or it might represent a particular bottle - depending on if you're a pharmacy or just a help desk.
  • Disease - the condition for which a Patient might want to take a Medication.
  • Patient - a person who might take a Medication
  • Recommendation - a Medication that might be beneficial to a Patient based on a Disease they suffer from.

Then you could look for relationships among resources;

  • User has and belongs to many Roles
  • Medication has and belongs to many Diseases
  • Disease has many Recommendations.
  • Patient has and belongs to many Medications and Diseases (poor chap)
  • Patient has many Recommendations
  • Recommendation has one Patient and has one Disease

The specifics are probably not right for your particular problem, but the idea is simple: create a network of relationships among your resources.

At this point it might be helpful to think about URI structure, although keep in mind that REST APIs must be hypertext-driven:

# view all Recommendations for the patient
GET http://server.com/patients/{patient}/recommendations

# view all Recommendations for a Medication
GET http://servier.com/medications/{medication}/recommendations

# add a new Recommendation for a Patient
PUT http://server.com/patients/{patient}/recommendations

Because this is REST, you'll spend most of your time defining the media types used to transfer representations of your resources between client and server.

By exposing more resources, you can cut down on the amount of data that needs to be transferred during each request. Also notice there are no query parameters in the URIs. The server can be as stateful as it needs to be to keep track of it all, and each request can be fully self-contained.

10
votes

REST is for APIs, not (typical) applications. Don't try to wedge a fundamentally stateful interaction into a stateless model just because you read about it on wikipedia.

To be Restful, should all this information be included with every request? This seems to place a huge overhead on the network. Also, wouldn't the restrictions on URL length, at least for GET, make this unfeasible?

The size of parameters is usually insignificant compared to the size of resources the server sends. If you're using such large parameters that they are a network burden, place them on the server once and then use them as resources.

There are no significant restrictions on URL length -- if your server has such a limit, upgrade it. It's probably years old and chock-full of security vulnerabilities anyway.

5
votes

No all of that does not have to be in every request.

Each resource (medication, patient history, etc) should have a canonical URI that uniquely identifies it. In some applications (eg, Rails-based ones) this will be something like "/patients/1234" or "/drugs/5678" but the URL format is unimportant.

A client that has previously obtained the URI for a resource (such as from a search, or from a link embedded in another resource) can retrieve it using this URI.

0
votes

Are you working on a RESTful API that other apps will use to search your data? Or are you building a end-user focused web application where users will log in and perform these searches?

If your users are logging in, then you're already stateful as you'll have some type of session cookie to maintain the logged in state. I would go ahead and create a session object that contains all the search filters. If a user hasn't set any filters, then this object will be empty.

Here's a great blog post about using GET vs POST. It mentions a URL length limit set by Internet Explorer of 2,048 characters, so you want to use POST for long requests.

http://carsonified.com/blog/dev/the-definitive-guide-to-get-vs-post/