Mobile Sync Musings

Due to the proliferation of smartphone platforms like iOS and Android, Mobile Sync(hronization) is a topic that is coming up more and more.  Though the nomenclature is different, mobile sync is basically just a specialization of data replication.  I tend to think of replication as being always connected like 2 database servers on the Internet.  When I think of Mobile Sync, however, I think of a disconnected scenario between a mobile client and a server.  Further, the mobile client usually only syncs a subset of the server data keyed by the user's identity.

 

Over the years, I have had the opportunity to write a few different replication systems along with one true mobile sync solution.  Though they all have worked and performed as designed, all of my solutions have suffered from shortcomings either due to their original specification or from a particular element of their implementation.  In this article, I am going to review my implementations and their shortcomings, some 3rd party options I have researched, and my Mobile Sync plans moving forward.

 

Shepherd Replication Agent

 

I first implemented replication in Shepherd, a directory-based tcp server framework I developed and used internally for several years (actually, it is still used in JAG's E-NDMS, much to my chagrin).  After playing a file copying game with Shepherd's database between our POP3/SMTP and FTP servers for a few weeks, I buckled down and wrote a master-slave replication system based largely around writing any successful data to an external log for processing by the replication agent.  With a few less than artful backflips in the code, the replication agent running on the master would connect to the clients and execute the appropriate commands to create matching data.  The logging mechanism provided modest fault tolerance across multiple slave systems.

 

Because Shepherd's replication service did not work in a transactional context, its replication agent occasionally produced extra records on the slaves or failed to replicate some records from master to slave.  Since there was no state comparison system, the data mismatches often went undetected until reported by end users.  Even worse, as it was running POP3 and SMTP services on redundant mail servers, the failures seemed almost random.  Over time, I was forced to create export/import utilities and setup scripts to forcefully bring the servers into sync on a weekly basis.

 

HOSA Conference Management System (CMS)

 

The development of HOSA's synchronization service was a bit backwards, but it more closely resembled a Mobile Sync scenario.  The HOSA CMS began as a standalone Java application.  Actually, to be completely honest, it began as a DOS application, graduated to 16-bit Windows, and finally moved over to Java (yes, I'm old).  Users would collect conference registrations on paper, enter them into the system, and then use the system to process registration, billing, scheduling, and tabulation.  It eventually morphed into a largely online system, but it retained its need to be a disconnected, standalone system for those conferences where Internet access was slow or impractical.

 

If I knew then what the system would become, I might have approached the sync process differently, but the online integration requirements crept up slowly.  HOSA first wanted to implement online registration for the national conference.  Bringing in state-level users and running state conferences came later.  As a result, I had a lot of control of the human process, and was able to simplify the requirements by downloading data into the application only once at the close of conference registration.

 

When HOSA then moved toward implementing state conference registration, time was short and delivering a stable system was of the utmost importance.  There were also a variety of technical issues confronting me in the new Java implementation.  Consequently, I merely adapted the one-time, one-way sync already developed for the national conference.  Eventually, that proved impractical, and the algorithm was further adjusted to permit additional syncs for new data only.  More recently, a schema change and optional system component caused me to implement yet another, separate sync service to handle online testing data.

 

Needless to say, I wish I had known the eventual feature set and designed sync correctly from the outset, but the steady progression of the system's features and scope led to a less than optimal implementation.

 

DianHua Dictionary

 

Though it operates under a separate company name, DianHua Dictionary is my personal project.  It is a Chinese-English dictionary for the iPhone and iPod touch originally released in August 2008.  At first, it was just a dictionary without much additional functionality, but over time, I added study tools to the system.  Adding study tools meant tracking user data like word lists, study results, and search history. At the time I released those tools, iCloud and iTunes file access didn't exist.  I saw many developers lose users because their software failed to provide an external means of accessing data.

 

When I I implemented word lists and flashcards, I wanted to make sure my users always had access to their data and that they could get that data out of the device and into other software like Anki for studying.  This meant introducing a true Mobile Sync solution.  I set what I considered fairly demanding design goals:

 

  • Multi-device Sync:  Data could be updated on any mobile device or at the DianHuaDictionary.com website.
  • Disconnected:  One of the important parts of DianHua Dictionary to me was the ability to run offline.  With data roaming charges being what they are for foreign travel, I did not want to require the presence of an Internet connection.  Sync only occurred as a result of user initiated action so the server and mobile databases could diverge greatly between each sync.
  • No iOS Bugs:  In 2009-2010, I saw a lot of developers take it on the chin when they had a show-stopping bug in an application and had to wait out the App Store approval process.  My design required the iOS app to be as bullet-proof as possible which meant isolating all sync logic on the server.
  • Single Server Database:  To avoid a management nightmare, I wanted to aggregate all DianHua users' data into a single PostgreSQL server database.
  • Dynamic Sync Logic Changes:  I am not the type to roll out a new WAR to Tomcat without extensive testing.  While I may use some agile methods in my development work, I remain a bit more measured in my release process.  Knowing my own proclivities, I decided the sync logic needed to be externalized and scripted as much as possible.

 

Unlike Shepherd or the HOSA CMS, these requirements immediately sent me to work on a state machine.  The iOS app had the dumbest logic possible:  read all data, spit out xml, transfer xml to the server, process xml received from server.  The server then handled checking the user's data against the copy of their data on the server and resolved any conflicts in the two systems.  It did this primarily through a set of stored procedures in the PostgreSQL database.  Any changes that needed to be made on the mobile device were sent back to the client once the process completed.

 

Even though I spent a significant amount of time debugging sync issues, they were all confined to the server and the vast majority were found in the easily updated stored procedures.  The only fix required by the iOS portion of the sync process was a network timeout in the HTTP library used for the web service calls.

 

Needless to say, I was pleased with the results.  That is, until I wanted to add some new features for the subsequent release.  I had failed to design the Mobile Sync algorithm with any tolerance of database schema changes, and that lack of foresight has handcuffed me ever since.

 

3rd Party Options

 

As I plan for new releases of DianHua Dictionary and HOSA CMS with full Mobile Sync capabilities, I have been reviewing a variety of 3rd party alternatives.  I have narrowed my basic requirements to the following:  multi-device, disconnected, and schema change tolerant.

 

The first and most obvious choice is iCloud.  iCloud is a fairly recent addition to iOS and doesn't support other platforms.  As a result, it's not an option for the HOSA CMS, but it is a potentially viable option for DianHua Dictionary.  Based on my understanding of it, iCloud can sync whole files, or it can sync data based on Apple's Core Data much like it does with Contacts on iOS.  Unfortunately, there are some challenges with using it in DianHua:

 

  • The data is confined to the user's iOS/iCloud setup.  Like many of my users, I like to create my word lists on DianHuaDictionary.com (faster and easier than working on an iPhone) and study them on the device.  iCloud doesn't provide any method I know of to pass that data out to my server.
  • iCloud requires iOS 5.x so users on older devices are left out.

 

The second product I reviewed was CouchDB.  CouchDB has an extremely impressive replication system for setting up pools of identical data stores, and there has been a lot of work done in 2011-12 to bring that replication capability to mobile devices.  CouchDB also stores data as documents without a formal schema which means schema changes are a non-issue.

 

After some experimentation on IrisCouch, I decided to give CouchDB a try.  It took some digging around Github and Google Groups, but I finally decided to use the bleeding edge TouchDB implementation of Couch on iOS.  Instead of an Erlang/Javascript-based system like Couchbase Mobile (big and slow), TouchDB used a SQLite data store to emulate CouchDB's web api along with a full implementation of the replication service.

 

It didn't take long to get a working version of DianHua Dictionary up and running with TouchDB, and initially, I was extremely pleased.  You can find tweets from me singing TouchDB's praises.  Unfortunately, I have run into some challenges since then:

 

  • System Administration:  The most practical implementation at this point in time would require one CouchDB database on the server per user.  There are ways to do filtered replication, but based on what I was reading, one database per user sounded better for my needs.  Though intimidating from a system administration standpoint, it was still doable.
  • Platform Stability:  The mobile implementations are changing rapidly.  Between when I first looked at CouchDB and Couchbase Mobile and when I attempted an implementation, Couchbase Mobile had essentially been shelved in favor of TouchDB (barely alpha quality).  TouchDB development progressed quickly and then slowed right after my implementation was complete.  It then shifted over to Couchbase's Syncpoint product (which I believe includes TouchDB as part of its core).
  • Backward Compatibility:  A few technical issues in the TouchDB api require iOS 5.x and would require some changes to the TouchDB source to try to achieve 4.x compatibility.
  • Performance:  I had to aggregate documents in a somewhat unnatural way because it was taking 20 minutes to complete a relatively small sync (~5,000 small documents).  It worked well and was logical enough until I started trying to add attachments.  At that point, my document design really need to go back to what it was when I had the performance problem.  Granted, once you sync that first time, CouchDB can be very fast, but I couldn't see users with pretty modest word lists waiting that long for their data to sync.

 

The final product I reviewed was Sybase Mobilink.  For those of you who are younger and grew up in the Sqlite, MySQL, and PostgreSQL days, you may not have heard of Sybase.  Back when most of us were cutting our teeth on xBase class libraries (7 miles, uphill, in the snow) and cursing the frequent file corruption (think MySQL ISAM tables), Sybase was producing SQL Anywhere, a single-file, lightweight, transactional, SQL database.  Having spent a lot of time on IBM's DB2 enduring all of its SQL limitations, SQL Anywhere was a welcome relief, and before budget limitations and the emergence of PostgreSQL short-circuited it, I was seriously considering a wholesale change of all of our systems from IBM's DB2 UDB to Sybase Adaptive Server.

 

To my surprise, one of my many Google searches for Mobile Sync solutions popped up Sybase Mobilink.  Based on the documentation I read, Mobilink is exactly what I want from a Mobile Sync setup.  It uses a lightweight database on mobile devices and any one of several database engines on the server side (unfortunately, no PostgreSQL, but MySQL if you like that sort of thing).  The sync engine supports scripted conflict resolution schemes as well as schema upgrades, and it can aggregate all end-user data into a single server database.

 

Unfortuantely, Sybase is rather proud of its Mobilink product.  When I last used SQL Anywhere in a distributed application, they had a royalty-free developer license for a reasonable price.  Now they appear to want $30 per Mobisync client. Did I mention DianHua Dictionary is free?

 

Moving Forward

 

Having spent way too much time on Mobile Sync as it is, my plan now is to build my own, self-contained system capable of supporting both desktop Java and iOS clients running on SQLite databases.  The server will run PostgreSQL and, at least initially, aggregate data under user accounts.  Eventually, I might look to role-based aggregation or even scripted filtering, but for now, I just need user-oriented data synchronization.  The bells and whistles can come later, but schema upgrades and backward compatibility will be core requirements.

 

If you're looking for a Mobile Sync solution, hopefully you will find the information presented here informative, and by all means, if you know of a better solution, please let me know.

 

 

 

 

 

About the author