Thursday, 2 March 2017

A REST API is more than just REST

After a bunch of years doing REST-y API's, I thought I'd drop some thoughts here.

Firstly, there are three types of API, and only one of them is really discussed by Roy Fielding, and even that has certain issues. Possibly someone will show me the error of my ways. If so, bring it on.

The three types are: CRUD type, transactional type and query type.


CRUD type data is pretty much what REST grew from, after all it was a system for publishing scientific documents with hyperlinks. That abstraction informs pretty much everything. It does, however, have limits that are important. 

REST overlays the document publishing idea with the constraints of massive distribution scale, in a mostly read, occasional write style. It is document oriented, so has no need of transactions. It has some generic provisions for out-of-band parameters using headers.

It is extremely successful in this niche, and can be extended to the other types with a bit of effort.

The key problem it faces, in my experience, is the wrangling of the verbs which approximate the semantics of CRUD, but given the architectural constraints, are not quite as simple as they look. This is soluble, but involves a lot of referring to documentation and some argument.

A particular problem that is less soluble is the assumption that the representation and the underlying data are equivalent, with limited side-effects. This is manageable, but means that the PUT verb, which assumes a total replacement of existing data, can't always do this; for various reasons, all of the data associated with a representation may not be available, and may not be derivable. This is usually manageable for reference data (which are like documents), but not so much for transactional data.

So, use all the verbs if it is like a CRUD operation on a document, and your representation can write through to the underlying data, and you don't have too much metadata (custom headers!).

Well, maybe not PATCH, unless you are a masochist.

Action-y REST

Transactional type data is intrinsically Verb-y/action oriented. Classical REST is not designed for this, and requires some mental gymnastics.

A transaction is almost always somewhat ephemeral, and involves side-effects (stuff is supposed to happen). It is common to noun-ify it, and there is always HATEOAS, though the use of GET hyperlinks to achieve side-effects is, in my opinion, just a poor solution which confuses things. A subject for another day perhaps.

Making an action into a noun is the equivalent of sending (POST-ing!) a letter. It is message oriented. As such we can use POST to achieve it with a reasonably good semantic match. For example it is common to see an "order" endpoint, which is both verb and noun! Or we just add "message" on the end, so "dye socks" becomes "dye socks message". This is far superior to dodgy "doStuff" SOAP end-points where the intent is hidden in the message.

Architecturally there is a problem with getting an asynchronous result, polling is either efficient or effective, but not both. This is intrinsic to the client-server basis and cannot be wished away. There is significant complexity around this, and the various solutions using place-holder links and response codes are unpleasant to use. I part ways from classical REST here. Sorry Roy.

The natural approach is to have a 2-way web. Web sockets are a means to do this, though there are some browser based approximations like SSE which can be used for out-of-band events. Using any of these, events can be communicated to the client, and they can then respond, avoiding the polling loop. Polling loops lead to performance pain. Avoid wherever possible.

PUT/DELETE etc are almost always inappropriate. GET is appropriate to retrieve the original message which should remain unchanged i.e. don't overload the document with status. That is for the response code. It may be appropriate to supply the derived values e.g. the status and/or result links, though see above for concerns around polling. Use it only where a 2-way solution is not available, and consider using a generic event channel with a short cache timeout, correct GET version headers and an efficient HEAD implementation at the edge.

Query type REST

Report type data is generated at the time of the query, so is ephemeral, may have variant structure, and often requires rich query semantics.

The problem here is that GET really doesn't like to deal with proper queries. And by proper, I mean like SQL queries, where the output is some derivation of the underlying data. That immediately makes a PUT irrelevant in the standard meaning, there is no document to replace. POST would also seem irrelevant, unless we are seeing it as a synchronous transaction, which we will see very shortly.
GET is the obvious verb, but here is the problem: GET only has URI/URL's. It doesn't have a body. This forces us to encode a query in the URI. This is highly unnatural and has significant limits e.g. you can't provide a long list of id's for "joining", as is very common in microservice architectures.

This brings us back to POST. POST allows us to pass anything we like as a message, and we also have the options of synchronous and asynchronous semantics. That avoids the tremendous coupling between the query and the syntax of REST. This is why SOAP used POST everywhere. SOAP has many issues, but this is not one of them.

This doesn't stop us from creating a document that can be "GET" later, like a memoised cache of a result, or a batch run report. I am perfectly happy to use a GET style where appropriate, but in the server-to-server context, it is less useful. In the flexible data browser mode also, it is less useful.

PUT/DELETE are unlikely to be needed.

Use POST for everything?

Where REST gets confusing, go for POST. For everything else, use simple REST.

Not so hard really, and will make life a lot simpler. I know there are circumstances where this is a little difficult, but why waste cycles on "the one true way" when it is really easy to just do the easy thing? You'll know you are there when you are looking hard at headers for content negotiation on a server-side API.

So, classify your REST calls into the three buckets, ask if it is simple, or complicated, and go for the most effective approach.

Good luck!

Tuesday, 21 February 2017

Blueprints for success

Well it's been a while, in my echoey end of the internet. To business. Due to "stochastic externalities" I have some time on my hands. Start-ups. Meh.

So to make myself pretty for my next job, I'm reviewing all things cloud etc, and it strikes me, the IT world really hasn't moved on much. What I mean is, we are still crafting a lot of systems, carefully carving out individual solutions to solved problems. We are (re)inventing various abstractions, which is nice, but the libraries and tools are over-complicated, badly documented, etc etc. This is, no doubt a function of my own lack of familiarity, but then this is the very point I am making.

For example, security of applications/API's. This is an entirely solved problem, and indeed standard solutions are available, but they have the same problems they always did, specifically they deal in authentication and authorisation of functions, not data. Furthermore, the obvious approach, which is a a nested authentication gateway at every service boundary (as distinct from just the edge of the entire system), is not implemented in a simple or obvious way. Add in federation, and the vicious complexity and risks around security, I would expect a standard black-box, OWASP/PCI/blah compliant, certified good architecture to be available.

Maybe it is, and I've missed it.

So, I'm going to start by looking at AWS Cognito, and trying to extend it to the "micro-services" world, particularly the idea that applications need to be self-sufficient to some degree rather than relying purely on boundary security. Maybe later I'll broaden this to data handling in micro-services generally (TL;DR; it is not well described).

Then there is message oriented approaches and why they rule, and therefore, why http does not. Even Netflix is getting this now. Which brings us to SOAP versus JSON, and why, after 15 years or so, I'm looking for something better than either.

Blog entries should be short and often. Any tips from the wider internet world are welcome.

Oh, and anyone wanting to give me a job, get in touch, I'm available!

Wednesday, 5 March 2014

Dumb clients are dumb, intelligent networks are dumber

Best film ever!
So you need process A to talk to process B, but you want asynchronicity, reliability, decoupling blah blah, and you go to the interweb, and they say "Message Queue!" a million times.

Now since SOA was just a marketing term for EAI back in the bad-old-days of the late '90s, early '00's, this was valid. Clients really were dumb, because they'd never had to talk to anyone else before, and they weren't about to change now.