The JDB Library
Over the years, I've written a lot of code, but no piece of code has kept my interest like the B-tree implementation I first wrote for a college file structures course. At Oklahoma State University, building a working B-tree was a bit of a rite of passage for all Computer Science students. Sad as it may be, few were able to accomplish the feat. But me? I was hooked. It seems like I coded solid for a couple of days to build the initial B-tree without delete support and spent another day implementing delete. As it turned out, I messed up one rotation in delete, but it just meant the tree was in an underflow condition on a page, not out of order.
During the initial R&D phase for Shepherd while trying to build a data store for the directory, I experimented with Sybase SQL Anywhere, DB2, and even OpenLDAP. Unfortunately, none of them measured up to the combination of speed, ease of installation, and porability I wanted. Sybase was easy to install and came with royalty free runtime licenses, but for directory data, it was slow. DB2, speed demon that it is, was just too expensive and too painful to install in this situation. OpenLDAP worked okay, but it forced Shepherd users into running some kind of Unix box since it wasn't portable to OS/2. At the time, the primary target platform was OS/2 so OpenLDAP wasn't an option. With no other viable alternatives, I set out to write my own data store.
Creative in naming as always, the JDB library was born. The initial version drew on features I liked from Sybase SQL Anywhere and disliked with xBase databases. All data was stored in a single file. The goal with this approach was simplicity. Having supported xBase databases in past projects, getting someone to zip up a set of 30 .dbf and .cdx files was a serious pain. Even though Shepherd users would be more technically adept, it still would be easier to ask for just one file. While this early version of JDB worked, it suffered from its lack of support for transactions and logging. Power outages and software crashes frequently destroyed the database requiring a rebuild from an LDIF import file.
So, a major rewrite took place to add transactions and logging and clean up the code. The end result was a faster JDB library with a more professional disposition. Over the past few years, I've been slowly debugging the code so that I can talk about uptime in months instead of weeks. As of last month and with the help of Mac OS X and its development tools, I was able to find two bugs which were capable of corrupting portions of the B+Trees within the data file. Now, I am cautiously optimistic that our uptime will soon be measured in years.
How does JDB work, you ask? Glad you did. It may not be as elegant as some would like, but it gets the job done.
JDB does not work with a row or column metaphor. Instead, it works on data as a whole. As such, it might be more appropriately called an object database. Instead of rows, it includes an interface with a simple set of read/write functions not unlike Java's Serializable interface. Applications wishing to use JDB as a data store implement these two methods for any class they wish to store in JDB and then use the API to create the database, create tables, store data, commit, rollback, etc. There is no SQL interface, but one could easily be added by creating a "row" data type with the appropriate supporting SQL language tools.
So here I sit with a transactional storage engine with commit, rollback, and crash recovery capabilities. The question I keep asking myself is what should I do with it, if anything? Should I implement a MySQL storage engine and go up against InnoDB? Given the recent uncertainty, it doesn't seem like a bad idea. Should I open source it as a general library and see if anyone has any interest in it? Unfortunately, either of those options carries with it the weight of time required to comb through source for release and potentially remove lower-level libraries we don't wish to open. It's not a competitive or licensing issue in those cases but more of a code quality issue. Some of our core library dates back to my first few years of programming. As such, the design of the code is very poor and would reflect badly on the company if open sourced.
If you have any ideas, please let us know.
Posted at 12:54AM Dec 14, 2006 by Jason Koeninger in General | Comments[1]
Posted by Chad Koinm on January 09, 2007 at 03:43 PM CST #