Friday, September 30, 2011

Access Control for Your RESTful API

Problem

When your RESTful API gains popularity and different types of client applications (browsers, mobile kits, desktop UIs, other programs) start consuming it, it becomes apparent that they are all using your interface differently. Moreover, they all should have different set of resources available for them. Some applications only need to read couple of resources while others need read/write access to most of your resources. At that point, coming up with a flexible access control scheme becomes crucial. You could try to identify the application that is consuming your interface and based on that only allow access to certain resources, but that would be too easy to break. The only secure way of doing the access control, is to check the authorization of each request. This means tying the access control to the logged in user. However, even then there are usually different types of users (groups with different permissions) accessing the interface.

Let's take an online book store as an example. The API supports three different user groups:

  • Administrator has access to every bit of information supported by the API 
  • Merchant may add and remove books from the store
  • Buyer can place orders, cancel them and modify her account details
We also have to deal with more fine grained access control situations when, for example clients fetch a list of orders from the system. What should be requested and what should be returned? They both can do GET request on /order/ resource but most likely the result will look different for Merchant and Buyer. So, even if the request looks the same, Merchant will see all orders made to her shop whereas User will only see her own orders. Granted, we could alleviate this by providing different URLs for each list (e.g. /shop/123/order/ and /user/jedi/order/) but usually you end up in a situation where you need to filter the results based on user's authorization anyway.

Even more complicated access control rules are required when you actually need to filter out some of the properties from representations. An example of this might be when a client requests user information (GET on /user/jedi/) and some of that information -- such as SSN or credit card number -- should be hidden from Merchant but visible to the User.

Solution

What I've come up with so far, is what I called 4-tier filtering. This access control scheme has the following tiers:
  1. Filter resources -- possibility to hide resources from a user
  2. Filter methods -- possibility to define which methods (post/get/put/delete) are allowed for a user
  3. Filter resultset -- possibility to filter the result of a collection type* resource
  4. Filter properties -- possibility to leave out some of the properties (of the representation)
* by collection type resource I mean resources that return list of elements (e.g. /order/).

Implementation

Not many REST-frameworks take special interest in fine grained and flexible configuration of access control (please, post links in comments if you can suggest some). Of the 4 tiers defined above, the first two are fairly easy to implement generically while tiers 3 and 4 are much more difficult. What I've been using for the first two tiers so far is a Django middleware -- codenamed Bulldog -- that I implemented. I should mention that there's nothing Python or Django specific in this approach; it should be applicable in any environment (I'm actually in the process of doing the same thing in node.js).

Bulldog combines the first and second tiers of the access control in a way we have come to know from relational databases. This requires having users, groups and permissions. The implementation utilizes django's built-in support for these entities. Bulldog defines four permissions (CRUD) for each resource that the given interface supports. These permissions are then assigned to users and groups. A user automatically receives all permissions from the group she belongs to. So, in the permission table we have permissions like (these are automatically populated by the middleware):

resource_order_post
resource_order_get
resource_order_put
resource_order_delete
resource_user_post
....

The format being resource_<resource name>_<method>. The resource_ prefix is used for making a distinction between permissions dealing with access control tiers one and two and all other permissions. User may have any combination of these resources. That is, she may be able to GET order, but not UPDATE it and she may be able to DELETE a user but not POST (create) it.

By default Bulldog denies all access to any resource. Only if the user has been granted with any of the resource_* permissions, she is allowed to access those resources. This being a django middleware the incoming HTTP request never sees the REST framework let alone the API implementation if it doesn't have proper authorization.

Next

I haven't yet figured out what's the best way to implement tiers 3 and 4 generically but obviously I'll need a different set of permissions and the implementation will probably need to grant everything by default (as opposed to denying everything).

In theory, tier 3 could also be implemented as a middleware by filtering the resultset before returning it to the client. In practice however, this would be terrible waste of CPU and memory because client would potentially end up doing a full table scan into memory and then filtering out most of the records. More likely, access control for tier 3 will have to be presented using a specification pattern and that specification instance is passed along with the request. Later on, a repository or another source of data can use the specification to filter the data.

Now, in order to implement tier 4, the REST framework should provide means to define representations using for example, JSON schema or similar. In that definition, I could add annotations (required permissions) for the properties that are subject to access control.

I'll give this some more thought and write a followup with (hopefully) some code...

Wednesday, September 21, 2011

Oracle instant client and cx_Oracle on OS X Lion

In case you are doing Python development on a Mac and connecting to an Oracle database, there's a good chance that you've already run into the segfault (Segmentation fault: 11) screen. First, I thought it was something to do with the cx_Oracle that I had just updated but it turns out that the 64-bit version of the Oracle instant client is busted on the OS X Lion platform.

Only way around this is to use the 32-bit version of the instant client instead. The way you do this is that you download and install the 32-bit instant client (basic-lite and sdk) from Oracle, use Python in 32 bit mode and install cx_Oracle.

  1. instant client comes with pretty decent installation instructions, so just follow them (set three env vars and create the symlink)
  2. to run python in 32 bit mode you have two options (this is all explained in Python's man page):
    1. % defaults write com.apple.versioner.python Prefer-32-Bit -bool yes
    2. % export VERSIONER_PYTHON_PREFER_32_BIT=yes
  3. remove the old cx_Oracle by simply removing the .egg under /Library/Python/2.7/site-packages/. So, for example: % sudo rm /Library/Python/2.7/site-packages/cx_Oracle-5.1-py2.7-macosx-10.7-intel.egg
  4. lastly say: sudo -E easy_install cx_Oracle

These instructions assume you're installing cx_Oracle globally to your system, hence sudo. I actually tried going through this process in a virtualenv but it didn't work. However, I only tried once so maybe I missed something.. and I had to do a global install anyway so I haven't bothered with the virtualenv for now. Will try that again later.

(In case you run into problems, you might want to reboot after the installation because I think OS X leaves some libraries into memory and when you switch back and forth with different versions of libraries, you may actually end up using a different library than what you think.)