Skip to content

Using Google App Engine With Amazon Web Services (Idle Words)#

08.13.2008

Using Google App Engine With Amazon Web Services

(My non-technical readers should pull the ripcord here)

Sometimes it can be handy to duct-tape the Google Application Engine to other web tools, such as Amazon's S3 (storage) or SQS (message queue) service.

For example, I have been building a little search engine that can import and index a complete list of bookmarks from a del.icio.us account. Depending on how many bookmarks the account contains, this import can take a few dozens of seconds.

Since GAE doesn't allow you to run background tasks, the browser will hang while this import runs. If it takes too much time (which can happen for large collections of bookmarks), the import risks being killed by the GAE hosting environment. But even if it finishes before being killed, the user is stuck looking at what appears to be an unresponsive browser page for however long it takes to complete.

To avoid this problem, I have rigged the upload form handler in my app to store the user's uploaded bookmarks file to an S3 account and then put a message on an SQS queue. A faraway worker process (living in a cloud on an EC2 server) polls this queue, dutifully retrieves the file, does its indexing magic and then uploads the bookmarks into the user's account using GAE's bulk loading API. While this is happening in the background, the user can continue to interact with the web application as usual. After a few seconds, imported bookmarks begin to appear in his account, and within a few minutes the account is up to date.

This workers + queue setup is a very common way of handling asynchronous tasks in web apps, but setting up communication between GAE and Amazon web services can be tricky due to security restrictions in Google's Python runtime. In particular, any Python module that wraps a socket, including urllib, is disallowed. GAE instead requires that you use its custom URL loader. This means that the standard SQS and S3 python modules provided by Amazon won't work without some modifications.

I've put together versions of both modules that are usable from within GAE. The module for talking to S3 is a simple patch of Amazon's boilerplate module to use GAE's URL fetcher instead of urllib. The SQSUrlBuilder module is a factory for generating properly signed queue-manipulation URLs.

Idle Words

brevity is for the weak

Your Host

Maciej Cegłowski
maciej @ ceglowski.com

Threat

Please ask permission before reprinting full-text posts or I will crush you.