Tuesday, October 25, 2011

Serving-up GeoJSON while having a REST on the GeoCouch

Wow, it's been a while.  Between having my tonsils removed (which is not pleasant as an adult), to finishing up with my employer of 11 years and starting a new job, things have been rather hectic over the last few months, but I have found time to put together my third post on my GeoJSON adventure that I have been dabbling with for the last 6 months or so.


I set out on this GeoJSON journey to investigate the definition as a spatial format specification, and to see what I could get for free in terms of interoperability, and I have been pleasantly surprised by the number of applications that support the format.


In my first post on Tile5 rendering GeoJSON I was interested mainly on the map rendering side of things and what I needed to do to massage my data into GeoJSON to take advantage of that rendering.


In my second post on Leaflet rendering GeoJSON passed from ExpressJS/NodeJS I delved more into the interoperability of GeoJSON between Leaflet on the client, and the javascript application server serving data from PostGIS, which was able to return query results in GeoJSON format.


In this third post I have taken the next step, storing my GeoJSON data in a NoSql database - CouchDB, and exposing it directly from the database to the client.


What is CouchDB?


CouchDB is a document database, which unlike a relational database, has no schema, allowing you to store structured objects against a key.


CouchDB provides a RESTful api exposing data in JSON, which made me wonder whether I could use CouchDB to store GeoJSON structured objects and execute queries directly from the client, without any "middle-man" application server logic.


The interesting thing about queries in CouchDB is that records are not in a table as such, so views are defined which use a MapReduce function to define an index of documents.  The view can then be filtered by values relating to the view.


In the case of my investigations, I wanted to use the same data set that I have used previously, being the cadastral layer for the state of Queensland, which represents 2.5 million land parcels over an area spanning 2200km x 1700km, so the ability to index and select small extents of records is very important, an using a tradional index is not sophisticated enough for making spatial queries.


What is GeoCouch?


As I investigated CouchDB further I found that there is an extension to CouchDB called GeoCouch created by Volker Mische.  The purpose of GeoCouch is to provide spatial views in CouchDB, utilising a spatial index structured as an R-Tree.


This interested me a lot because it provided me with a solution for querying my spatial data.


My Application


As a recap, for those that have not read my first or second post on GeoJSON, the objective of my application to provide a simple mapping application with a base map layer which then overlays cadastral data which is selected by the extent of the map currently being viewed.


To achieve this objective I wanted to use CouchDB to store my data in GeoJSON format, and serve it directly to my Leaflet based client application without having to use an application server to arrange the data retrieved from the database, as per the solution in my second post.


Setting Up CouchDB


The easiest way for me to install an instance of CouchDB with GeoCouch already built-in was to install Couchbase.  Couchbase comes in a number of flavours, from the ultra-scalable Membase Server right down to a mobile device capable version.  In my case I installed the Couchbase Single Server.  I downloaded and installed the 1.1.2 deb package using:


sudo dpkg -i couchbase-single-server-community_x86_1.1.2.deb


In my case the service did not start after install, so I started it at the command line using:


sudo /etc/init.d/couchbase-server start


You can verify that Couchbase is installed and executing by opening the url http://127.0.0.1:5984 in a browser (on the server machine), which will respond with a welcome message and the version number in a JSON response.


Kicking back on the Futon


The Couchbase Single Server comes with an application called Futon which is used for managing it.  Futon can be opened in a browser window with the url http://127.0.0.1:5984/_utils.


The opening screen shows the overview of the server, with a listing of the databases on the server.  Initially there will only be one database - _users.


You can create your database by clicking the Create Database link.  My database was called spatial.


Futon can be used for a number of maintenance functions, including creating Views, which I will discuss more of further on.


Loading the Data


Obviously after creating my database the next step was to load my data into Couchbase.  On my same machine I had my PostGIS database that contained my cadastral dataset, so I wrote a simple application in NodeJS to load the data from PostGIS into Couchbase.


Why NodeJS?  Well I wanted to have a look at the javascript frameworks that could access CouchDB, and I could reuse some of the code I used to perform queries on the PostGIS database for my second post on GeoJSON.


The library I used to access the CouchDB database was Cradle.  Cradle provides basic database access for reading and saving documents, as well as a number of administrative and maintenance functions.


My data loading applicaton steps through the PostGIS data 1000 records at a time, and saves the block of 1000 documents to the CouchDB database.  Since NodeJS is asynchronous the program operates sequentially by calling the main loadPostgisRecords function recursively from the deepest closure, after Cradle performs the save of the block of 1000 documents.



It took around 30 minutes for my 2.5 million records to load, which isn't too bad considering it took about 20 minutes in PostGIS.  My PostGIS table is approximately 950MB in size, while my CouchBase database, with just the data is 3.7GB, so there is a fairly large difference there.


Creating the Spatial View


Once the data is loaded in Couchbase, the next step is to index the spatial features.  To do this we create a spatial view.  I hit a few snags with this step due to the documentation not being really obvious, and because I was using Futon to create the view, and probably because it was a little new to me generally, but the steps below show how to create a spatial view using Futon.




In Futon, click the name of the database in the table list on the overview page.  The resulting screen will by default show the first 10 documents in the database.



 In the top-right part of the screen, expand the View drop down and select Temporary view....  This will display the View designer page, showing a pane for defining the Map and Reduce functions.




Delete the default function defined in the Map pane, and click the Save As... button.  Choose a name for the design document that the view will be saved into, and a name for the view.  When the Save button is clicked, an error will be displayed due to the missing Map function.  This was done for a reason which I will discuss soon.  Regardless of the error, the design document will be created, with the view.






Expand the View drop down again and select Design documents.  This will display the list of design documents in the database.




Choose the document that was just created.  The resulting screen will show the structure of the document, including a property called views, which will show a structured item of the name defined in the save screen for the view.




The GeoCouch documentation defines that a spatial view will not be inside a views property, but a property called spatial.  So, rename the views property to spatial.


The spatial view will be named the same as the view name that was defined in the save screen, but instead of having a child property called map, which represents the spatial view function, the view name will be the property that represents the function, e.g. the example shows a view called state_1_sp which has a property called map which has a blank string value which is the function that was not saved.  This should instead be a property called state_1_sp which has a string value that represents the spatial view.




The spatial view must emit a GeoJSON value as the key, which in the case of my data is the geometry property, because each of my documents is a GeoJSON Feature.  In my case I decided to just return the id as the value, because the result from the index will be a list of index keys with a property called geometry, which in essence is the same as a GeoJSON feature, all that I would need to do is create a type property with a string value of Feature for Leaflet to render the data as a FeatureCollection.





When the spatial view has been defined in the document, save the document.  Nothing will happen in Futon at this point because it is not aware of spatial views.  To start generating the spatial view, open another browser window and supply the following url 


http://127.0.0.1:5984/<database>/_design/<design_document>/_spatial/<spatial_view>


This will cause GeoCouch to create the spatial view before trying to return the results.  NOTE: the spatial view for my dataset took around 6 hours to create, and took up 85GB in space, which is significantly larger and took longer to create than the index in PostGIS which is about 200MB.


I mentioned above about saving the initial view without a function, which I recommend because I misunderstood the documentation and ended up creating a normal view with a prefix of "spatial" which resulted in Futon creating a typical View with the Map function, which took 35 hours to complete and took up approx 130GB.  By removing the Map function before saving, I was able to get around Futon attempting to create the View. 


Once the Spatial View is created the features within a particular extent can be retrieved using the bbox query string argument appended to the url above - http://127.0.0.1:5984/<database>/_design/<design_document>/_spatial/<spatial_view>?bbox=<lower_left_long>,<lower_left_lat>,<upper_right_long>,<upper_right_lat>


Calling CouchBase from Leaflet Application


My solution for this example is almost identical to the second post I did on GeoJSON where Leaflet is used to render GeoJSON retrieved from the server as the map is panned.  The difference in this case is that I am calling the REST interface for GeoCouch to retrieve the required spatial data as GeoJSON using the url defined above.



As you can see from the snippet of the GeoCouch bbox query result below, the spatial view results almost match the structure of a FeatureCollection as defined in the GeoJSON specification, except that there are a few other properties on the root object and the row item objects in the GeoCouch results.  The rows property of the root object can be seen as synonymous with the features property of a FeatureCollection.  Each item in the rows array needs to have a "type" property defined as "Feature", and then the geometry property is exactly the same as the geometry property of a Feature.  By making these simple alterations to the result, the structure can then be passed to Leaflet to render the shapes.



The problem I did strike with this application is that the Couchbase instance is considered a different domain to my web server hosting the html Leaflet based application page.  Because of this I had problems making calls to Couchbase where the request would succeed, but would pass back no data.  This was due to cross domain scripting restrictions in the browser.


To resolve this I switched the jQuery ajax call to use JSONP, but I still had problems with the request.  It took a little digging, but I found that Couchbase does not allow JSONP requests by default, so this needs to be switched on by clicking on Configuration, and then scrolling to the httpd section, and then change allow_jsonp to true.




Once that change was saved, the application worked as desired, although the performance from Couchbase is a lot slower compared with the previous solution using ExpressJS/NodeJS and PostGIS, taking around 8 seconds in Couchbase to perform operations that would take 0.5 of a second in PostGIS.


Conclusion


Looking back at the evolution of my investigations into GeoJSON it has been interesting to see the depth of interoperability I was able to achieve.  When I first began looking I had no idea that I would be able to store my geometry, as an object in a database, and return it as data straight to the browser for rendering.


While the size/time to create spatial indexes in CouchDB is much larger/longer than PostGIS, I think it is a platform that will improve over time.


One of the things to remember is that my dataset is a large contiguous layer of data that spans a very large area and possibly isn't really suited for high performing visual rendering of dynamically retrieved data.


CouchDB has other benefits such as the distributed architecture that allows it to scale out, as well as Couchbase having a mobile solution as well, which when combined with the master-master replication scheme could enable some compelling mobile solutions.


It would be interesting to investigate Couchbase using a membase server spanning multiple machines, and see if the spatial indexing improves with the parallel index lookup.

4 comments:

  1. Hi Todd,

    thank you some much for this blog post. It is such a great tutorial on how to get started with GeoCouch, when all you have is some data in PostGIS.

    I also like that you mention the limitations, such as the huge size of the indexes. It's true it's an area where GeoCouch needs to and will improve in the future.

    Cheers,
    Volker

    ReplyDelete
  2. trying to create spatial view and i get this error:

    {"error":"{{badmatch,{ok,4802315}},\n [{vtree_bulk,omt_write_tree,4},\n {vtree_bulk,omt_write_tree,4},\n {vtree_bulk,omt_write_tree,2},\n {vtree_bulk,bulk_load,4},\n {vtree,add_remove,5},\n {couch_spatial_updater,'-write_changes/4-fun-2-',5},\n {lists,zipwith,3},\n {couch_spatial_updater,write_changes,4}]}","reason":"{gen_server,call,[<0.17996.4>,{request_group,596919},infinity]}"}

    ---

    i meticulously reviewed the steps and content.

    here's the value of state_1_sp:
    { "state_1_sp": "function(doc) { if (doc.geometry) { emit(doc.geometry, { \"id\": doc._id }); } } " }

    and my url:
    http://127.0.0.1:5984/parcels/_design/main/_spatial/state_1_sp

    thanks for your blog posts - i've been learning a lot!

    ReplyDelete
  3. Hi, its a really helpful blog and got it how to get started with GeoCouch, have u done anything further... can u post some other blogs and post on GeoCouch please...

    ReplyDelete
  4. TopoJSON can help relieve some of the issues with file sizes and make the stack more efficient overall. As TopoJSON and GeoCouch evolve, this might be a very powerful solution that is preferred over PostGIS for many scenarios.

    ReplyDelete