Monday, May 2, 2011

MACID Haskell persistent data store overhauled

Over the past few weeks Lemmih has been working on a major overhaul of MACID. MACID is a persistent data store explicitly designed for use with Haskell. It can store arbitrary Haskell data structures and queries are written in straight-forward Haskell (no funky DSLs). It is thread-safe, and supports the classic ACID properties. Unplug your machine with out losing your data.

The homepage for the new MACID library can be found here.

There are two new libraries acid-state and safecopy, which will replace happstack-data and happstack-state in Happstack 7.

safecopy provides versioned binary serialization of datatypes with version migration. It adds this functionality on top of the cereal library.

acid-state provides ACID transactions on top of safecopy.

What's New


The new MACID is like the old MACID, but better. The fundamental concepts are the same, but it is a lot cleaner, more robust and less magical.

The rewrite of MACID addresses many long standing wish-list items, including:

MACID now a separate project


acid-state and safecopy are now free standing libraries, with no references or dependencies on anything Happstack. They have their own maintainer, homepage, source repository, etc. So, now you can used MACID, even if you don't use Happstack. (This has actually been true for a while, but now it is a lot more obvious).

If all you want is versioned binary serialization and migration, then you can use the safecopy library on its own.

update / query take an explicit state handle now


In happstack-state, query and update accessed the state via a global IORef. This meant that you could only have one instance of MACID per application. It also weakened type-checking resulting runtime errors that should have been avoidable.

It also meant that you could only have one ACID store per application.

Now update and query take an explicit handle.

No more Component class


happstack-state has a Component class, which is mostly worthless. It gives the promise of power, but doesn't really deliver anything. It is just extra boilerplate for the most part. This class is gone entirely in acid-state.

Less boilerplate for serialization instances


In happstack-data, creating a Serialization instance for a type required you to call $(deriveSerialize ''Foo) and you have to create an instance Version Foo. With safecopy, you only need one line, '$(deriveSafeCopy 0 'base ''Foo)'

In happstack-data there were three type classes, Serialize, Version, and Migrate. This has been simplified down to just two in safecopy: SafeCopy and Migrate.

Better Safety


happstack-state has a number of corner cases that it does not handle gracefully. For example, bad things happen if update events call fail or error. acid-state handles these correctly and includes a test suite specifically for testing these types of failures.

Additionally, acid-state is a lot better at flushing data to disk.

Performance


Initial performance testing shows that acid-state is fast. A simple data store that holds an integer which is incremented by an update event was able to achieve 13,300 updates per second.

Still Coming


There is still a bunch of work to come.

acid-state currently does not support Windows. This is because the code to safely flush data to disk requires the posix library. Windows support can be added via some #ifdefs. If you are familiar with Windows and file IO and want to help, your contribution will be greatly appreciated.

Migrating data from happstack-state to acid-state is certainly feasible, but the process is not yet documented. But it will be.

There are also major improvements to IxSet planned.

And, of course, replication and sharding. Fortunately, it will be easier to implement these on top of the new acid-state code base.

Get Started Today


acid-state is usable today! You can install it from hackage. There are code examples here and here. If you are starting a new MACID based project, you are encouraged to consider acid-state.

To use happstack-ixset you will currently need to add this extra instance to your code:
import Data.SafeCopy
import Happstack.Data.IxSet            (IxSet)
import qualified Happstack.Data.IxSet  as IxSet

instance (SafeCopy a, Ord a, Typeable a, IxSet.Indexable a) => SafeCopy (IxSet a) where
    putCopy ixSet = contain $ safePut (IxSet.toList ixSet)
    getCopy = contain $ IxSet.fromList <$> safeGet

Once IxSet is factored out of happstack into a separate library that instance will be provided automatically.

It is safe to mix happstack-state and acid-state in the same application if you want to use acid-state in an existing application.

More updates to come as things develop!

2 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. I'd prefer if people referred to the project as acid-state.

    ReplyDelete