Monday, November 15, 2010

ANN: happstack-heist now available

Thanks to cdsmith's blog post, and Happstack's recent update to mtl-2, I am pleased to announce happstack-heist.

Detailed documentation on using Heist with Happstack is available in the Happstack Crash Course.

Happstack is a flexible Haskell Web Framework with many supporting optional components.

Heist is an XML templating engine. The static portions of your templates are written in XML files which are loaded by the server at runtime. This makes it easy to modify the templates without having to recompile and restart the server. It also makes it easy to work with template designers who do not know Haskell.

The dynamic portions of the templates are generated in Haskell and are spliced into the templates. This means you have the full expressive power of Haskell at your disposal for the generated portions of the templates. That is a lot nicer than trying to using something like XSLT (though Happstack does support that as well).

Happstack offers a wide variety of templating solutions including BlazeHtml, HSP, Hamlet, and HStringTemplate. But Heist fills a nice hole in the spectrum, and we are pleased to be able to offer it now.

It is available in darcs and on hackage. It has been tested against Happstack from darcs, but should work against Happstack stable as well.

If there are any bugs or improvements you would like to see, let us know!

- jeremy

Wednesday, October 20, 2010

Recompile your Haskell-based templates faster than you can hit F5.

There are two main classes of templating solutions in Happstack:

1. DSLs/libraries such as BlazeHtml, HSP, Hamlet, etc, where your templates are written in Haskell and compiled at compile time.

2. Libraries like heist, HStringTemplate, etc, where your templates are written in some external template file and read at runtime by the server.

Each method has strengths and weaknesses -- and so each project needs to pick the solution that works best for them.

For my projects I love using HSP. I like having the full expressive power of Haskell in my templates, and the added safety that the type checker provides. But I hate having to recompile, relink, and restart my app server dozens and dozens of times when I am developing my templates. And, so it is with great pleasure that I present the triumphant return of happstack-plugins!

happstack-plugins


happstack-plugins leverages the recently revived plugins package so that individual page templates can be automatically recompiled and reloaded into a running happstack application. happstack-plugins uses hinotify to watch the haskell source files containing your page templates. Whenever you save changes, the page is automatically recompiled and reloaded into the running server. Typically this happens fast enough that by the time you switch to the browser and hit reload, the updated page is already available.

You can see a demo of happstack-plugins in action here:



How to use happstack-plugins



Using happstack-plugins is very straight-forward. First you need to install the happstack-plugins library which is currently only available in the happstack darcs repository:


darcs get http://patch-tag.com/r/mae/happstack


For best performance you should put each page template in its own module so that it can be recompiled and reloaded faster.

The templates themselves require no special modifications. Here is a simple helloPage template:

> module HelloPage where
>
> import Happstack.Server
>
> helloPage :: String -> ServerPart Response
> helloPage noun = ok (toResponse $ "hello, " ++ noun)

This template takes a single String argument and returns a text/plain page which says, "hello, <string>". We could just as well use BlazeHTML, HSP, etc, but using String keeps this example short and simple.

As I mentioned, there is nothing new going on here, it just a normal happstack ServerPart.

The interesting changes are in the Main module. There are only 3 simple changes required to support templates. But first, some boring stuff at the top of the module:



> {-# LANGUAGE CPP, TemplateHaskell #-}
> module Main where
>
> import Control.Monad (msum)
> import Happstack.Server



1. Here we #ifdef some module imports. These two modules provide the same interface. The Dynamic version actually does page recompilation and reloading. The Static version just links things in the normal way. This makes it easy to use dynamic loading during development but static linking for the live server by simply defining or undefining PLUGINS.



> #ifdef PLUGINS
> import Happstack.Server.Plugins.Dynamic
> #else
> import Happstack.Server.Plugins.Static
> #endif
> import HelloPage






2. In main we call initPlugins which starts the recompiler/reloader and hinotify. If you import Happstack.Server.Plugins.Static, initPlugins is a 'noop', so we do not have to add any extra #ifdefs.



> main :: IO ()
> main =
> do ph <- initPlugins
> simpleHTTP nullConf $ pages ph



3. Here is where we actually specify a template to load dynamically:



> pages :: PluginHandle -> ServerPart Response
> pages ph =
> msum [ $(withServerPart 'helloPage) ph $ \helloPage ->
> (helloPage "hello")
> ]



Normally we would just have:


> pages :: PluginHandle -> ServerPart Response
> pages ph =
> msum [ helloPage "world"
> ]


So the new part is the template haskell function withServerPart which effectively takes three arguments:

1. the name of the symbol to dynamically load
2. the PluginHandle which initPlugins returned
3. a function which will use the loaded symbol

so, withServerPart effectively has the type:



> withServerPart :: (MonadIO m, ServerMonad m) => Name -> PluginHandle -> (a -> m b) -> m b



Even though we are dynamically reloaded the page at runtime, the compiler will still check that the types are correct when will compile the main application.

If we change helloPage "hello" to helloPage 1 and try to build Main.lhs we will get the error.



Main.lhs:50:28:
No instance for (Num String)
arising from the literal `1' at Main.lhs:50:28
Possible fix: add an instance declaration for (Num String)
In the first argument of `helloPage', namely `1'
In the expression: (helloPage 1)
In the second argument of `($)', namely
`\ helloPage -> (helloPage 1)'
Failed, modules loaded: HelloPage.



What's left to do?



There are two big features on the TODO list. If you think happstack-plugins is cool, I encourage you to work on them!

1. The underlying plugins library is broken when it comes to hierarchical modules. Ideally I would put all the pages in Pages.*. For example Pages.HelloPage. But, that does not work. As a hack, you can modify System.Plugins.Make.build and comment out output in the let flags = ... declaration. This fixes hierachical modules, but requires you to run your app with its working directory set to the root directory of your project. That is fine for happstack app development, but not an ideal solution for all users of the plugins library. If someone could fix hierarchical module support in plugins, that would be great for everyone.

2. hinotify is only supported under Linux. However, it should not be that hard to make hinotify support optional (via a compile time flag). With out hinotify, we would just do a quick stat() everytime the template is invoked and see if a recompilation is needed. When a compilation is needed, you will have to wait for that page to recompile and reload -- but it will still be much faster than rebuilding and restarting the whole server.

Wednesday, October 13, 2010

Is the RqData monad still needed?

cdsmitch recently asked if RqData is really needed in Happstack. The answer is, "no, but it is still useful sometimes."


I can say "no" with certainty because in the darcs version of Happstack, it is already optional.


The new and improved RqData


Functions like look now work in any monad which is an instance of HasRqData:



> look :: (Functor m, Monad m, HasRqData m) => String -> m String


Since there is a HasRqData instance for ServerPart, we effectively have the function:



> look :: String -> ServerPart String


Here is an example of using look with out having to jump through any hoops:



> module Main where
>
> import Happstack.Server (ServerPart, look, nullConf, simpleHTTP, ok)
>
> helloPart :: ServerPart String
> helloPart =
> do greeting <- look "greeting"
> noun <- look "noun"
> ok $ greeting ++ ", " ++ noun
>
> main :: IO ()
> main = simpleHTTP nullConf $ helloPart


Now if we visit http://localhost:8000/?greeting=hello&noun=rqdata, we will get the message hello, rqdata


Sweet!


But why keep RqData around?


Using look in the ServerPart monad is simple. But when it fails, it just calls mzero. That can be very frustrating if you are debugging your forms or debugging calls to your web service API. Instead of an error telling you what parameter was missing, you simply get a generic 404 error.


Using the RqData monad/applicative functor gives you the option to provide detailed error messages when something goes wrong:



> module Main where
>
> import Control.Applicative ((<$>), (<*>))
> import Happstack.Server (ServerPart, badRequest, nullConf, ok, simpleHTTP)
> import Happstack.Server.RqData (RqData, look, getDataFn)
>
> helloRq :: RqData (String, String)
> helloRq =
> (,) <$> look "greeting" <*> look "noun"
>
> helloPart :: ServerPart String
> helloPart =
> do r <- getDataFn helloRq
> case r of
> (Left e) ->
> badRequest $ unlines e
> (Right (greet, noun)) ->
> ok $ greet ++ ", " ++ noun
>
> main :: IO ()
> main = simpleHTTP nullConf $ helloPart

If you visit http://localhost:8000/?greeting=hello&noun=world, you will get the familiar greeting hello, world.

But if you leave off the query parameters http://localhost:8000/, you will get a list of errors:


Parameter not found: greeting
Parameter not found: noun

This is really nice when you are debugging your code.


Now with more composability!


Since RqData and ServerPart are instances of Applicative and Alternative you can now reuse many functions from those libraries. For example, if a query parameter is optional, you can simply write:



>     do greet <- optional $ look "greeting"

There is also a new combinator checkRq which can be used to validate query parameters, or to convert a query parameter to another type:



> checkRq :: (Monad m, HasRqData m) => m a -> (a -> Either String b) -> m b

If you are curious be sure to check out the Happstack Crash Course where the new RqData module is documented in detail with many working examples.


I would love to hear feedback on the new and improved RqData module, and any suggestions for improvement!


Also, be on the look out for a future blog post about the RqData Arrow. :)

Monday, July 19, 2010

Changes to request body and RqData in head

I have just pushed some patches which affect the way the Request body and RqData are handled in happstack 0.6. This contains user visible changes which will affect you if you:



  • Use RqData

  • Directly use the rqBody field in Request

  • Directly use the rqInput field in Request

  • Directly work with the Input type

  • Allow file uploads


Some of the changes fix bugs (design flaws), and others are for new features and functionality. The non-compatible API changes are pretty small, so it should be easy to port your code. It basically comes down to:



  1. getDataFn, withDataFn, etc take an extra argument of the type BodyPolicy

  2. getDataFn, withDataFn, etc return Either [String] a instead of Maybe a

  3. the inputValue field of the Input type is now Either FilePath L.ByteString instead of L.ByteString

  4. you have to explicitly import the module Happstack.Server.RqData


In this post I will describe what motivated these changes. I am
hoping to also get feedback and these changes before we release 0.6 since it will be less painful to make further changes now.


the Request body and space usage


In the old code the Request type stores the request body as a simple lazy ByteString:


> newtype RqBody = Body { unBody :: L.ByteString } deriving (Read,Show,Typeable)
>
> data Request = Request { ...
> , rqBody :: RqBody
> }

This feels nice, because it is a simple, pure value. Unfortunately, it is really not a great idea in practice. The request body does not initially require any space, because it is an unevaluated lazy ByteString. But the ServerPart holds the Request in its environment, and that means the garbage collection can not free the RqBody as you evaluate it. If the request body contained gigabytes of data, that could be disastrous.


The solution in Happstack 0.6 is to use an MVar to hold the request body:


> 
> data Request = Request { ...
> , rqBody :: MVar RqBody
> }

Instead of using rqBody directly, it is better to use takeRequestBody, so that your code will not break if we switch to IORef or something else.


> takeRequestBody :: Request -> IO (Maybe RqBody)
> takeRequestBody rq = tryTakeMVar (rqBody rq)

Now, when you process the RqBody the Request will not be holding onto it, so the garbage collection can free it (assuming your code to not hold onto it and introduce a new space leak).


This does have a drawback however. A ServerPart can call mzero at anytime, and processing will move onto the next ServerPart. However, if you have already taken the RqBody then the next ServerPart may be missing critical data it needs. But, if we left the RqBody intact, that would result in the space leak. I think that in practice, if a ServerPart made enough progress that it started consuming the RqBody and then failed, it is unlikely that another ServerPart would succeed and need the RqBody. If another ServerPart succeeds, it is probably just a 404 Not Found handler or something similar, which does not need the request body. So it seems like it is better to have the default behavior be the more space friendly solution.


We will also provide peekRequestBody and/or putRequestBody functions so that you can opt to leave the request body intact. It is up to you to be sensible about using them.


BodyInput and space usage


In RqData, the cookies, QUERY_STRING, and request body (when appropriate) are parsed into a [(String, Input)], where String is the name of the key, and Input is the value.


In Happstack 0.6, Input will be the type:


> data Input = Input
> { inputValue :: Either FilePath L.ByteString
> , inputFilename :: Maybe FilePath
> , inputContentType :: ContentType
> } deriving (Show,Read,Typeable)

In Happstack 0.5 the inputValue is simply a L.ByteString. Once again, this seems fine at first. After all, the inputValues are lazy ByteString, so we can process them lazily, right? Well, not quite. In the unprocessed request body, the key/value pairs are laid out like this:



key1
value1
key2
value2
key3
value3
key4
value4
...

If we were to consume the key/value pairs in a sequential manner, then we would be ok. But, generally we want to use functions which can lookup a specific key. Imagine we want to look up key4. In order to do that we have to first read in all the preceding key/value pairs. If we knew we only cared about key4 then we could just toss the rest. But with the monadic RqData code we don't know that. (A future post will talk about an arrow based alternative where we do know that). So, we have to store all the key/value pairs in case we want to lookup key1 after key4.


In Happstack 0.5, we store all those values in RAM. But, some of those values might be (huge) files. That clearly isn't going to work. So we once again trade off a bit a simplicity/elegance for the practical matter of not having unlimited amounts of RAM. Instead we store some values in RAM and some values on the disk. How do we decide what goes where? That brings us to BodyPolicy.


BodyPolicy


When parsing the request body, we need some way to decide what values should be stored in RAM and what values should be saved to disk. Additionally, we want to impose limits on how much data can be stored in either location. If a user decides to post the contents of /dev/random you are likely to want to cut them off at some point. However, the specific values for the quotas are application specific. In fact, they may be specific to the particular form that is being processed. For example, an admin user might have higher quotas than a regular user.


The answers to these questions are provided by the BodyPolicy, which looks like:


> data BodyPolicy 
> = BodyPolicy { inputWorker :: Int64 -> Int64 -> Int64 -> InputWorker
> , maxDisk :: Int64 -- ^ maximum bytes to save to disk (files)
> , maxRAM :: Int64 -- ^ maximum bytes to hold in RAM
> , maxHeader :: Int64 -- ^ maximum header size (this only affects headers in the multipart/form-data)
> }

The inputWorker is the function that actually decides where values should be saved, and implements the quotas. Its Int64 arguments are the quotas for the disk, ram, and other headers which don't really get saved, but which can temporarily take up space. The next three fields are the values to pass to the inputWorker.


In most cases, you do not need to write you own inputWorker. It is sufficient to use the defaultBodyPolicy:

> defaultBodyPolicy :: FilePath -> Int64 -> Int64 -> Int64 -> BodyPolicy

The first argument is the directory to store temporary files in, and the next three arguments are the quota values. I am not going to cover defaultBodyPolicy in detail in this post. But it is well documented in the Happstack Crash Course.


Improvements to RqData


The new RqData module also includes a number of new features.


There is now an Applicative functor instance for RqData. The applicative functor instance accumulates errors. This means if you try look up multiple invalid keys, the error message will report all the missing values, not just the first one. This is nice when you are debugging your code, and is also nice if you provide a web service (REST API, etc) and want to provide your API users with detailed error messages instead of "Invalid Request".


We now provide two filters (body and queryString) which limit the scope of the look* functions to either the request body or the QUERY_STRING.


A new function lookFile is provided to assist with handling file uploads.


A new function checkRq is provided to help you convert
request parameters to Haskell types, or to check that a value meets some conditions.


Summary


This post gives some of the background on the changes to how we handle the request body and form data. To actually see what the changes look like in practice, you should check out the RqData section in the Happstack Crash Course. It gives detailed examples of all the features and changes I talked about in this post. I have also updated the haddock documentation in darcs.


I would love to hear your opinions. Do you love the changes? Hate the changes? Have better ideas about how to solve the problems? In terms of handling the raw request body, I believe both Yesod and Snap use the same basic approach -- the first handler to try to use the request body gets the whole thing, and everyone else gets nothing. (And they provide ways to put the request body back if you want to..).

Sunday, July 11, 2010

sendfile 0.7.1

I have just uploaded sendfile 0.7.1 to hackage.

The sendfile library exposes zero-copy sendfile functionality in a portable way. If a platform does not support sendfile, a fallback implementation in Haskell is provided. It currently has zero-copy support for Linux, Darwin, FreeBSD, and Windows.

The sendfile functionality typically reduces CPU-load and (possibly) increases IO throughput.

The new release of sendfile adds the ability to hook into the send loop. This is useful if you want to tickle timeouts or update a progress bar while the file is being sent.

This turned out to be rather tricky because each platform implements sendfile a little differently. But, the point of the sendfile library is to provide a unified interface so that other developers do not have to know any of the platform specific details.

The solution in 0.7.1 is to use a simple, specialized iteratee. Each pass of the sendfile loop can end in one of three states:

(1) the requested number of bytes for that iteration was sent
successfully, there are more bytes left to send.

(2) some (possibly 0) bytes were sent, but the file descriptor
would now block if more bytes were written. There are more bytes
left to send.

(2) All the bytes were sent, and there is nothing left to send.

We handle these three cases by using a type with three
constructors:

data Iter
= Sent Int64 (IO Iter)
| WouldBlock Int64 Fd (IO Iter)
| Done Int64

All three constructors provide an Int64 which represents the
number of bytes sent for that particular iteration. (Not the total
byte count).

The Sent and WouldBlock constructors provide IO Iter as their
final argument. Running this IO action will send the next block of
data.

The WouldBlock constructor also provides the Fd for the output
socket. You should not send anymore data until the Fd would not
block. The easiest way to do that is to use threadWaitWrite to
suspend the thread until the Fd is available.

A very simple function to drive the Iter might look like:

runIter :: IO Iter -> IO ()
runIter iter =
do r <- iter
case r of
(Done _n) -> return ()
(Sent _n cont) -> runIter cont
(WouldBlock _n fd cont) ->
do threadWaitWrite fd
runIter cont

You would use it as the first argument to a *IterWith function, e.g.

sendFileIterWith runIter outputSocket "/path/to/file" 2^16

If we want to do something fancier, such as update timeouts or a progress bar, we can do it in a custom runIter function. If we are using a non-standard I/O manager, we might be able to suspend the thread via a call other than threadWaitWrite.

What Next?


The new version of sendfile will be used to improve the timeout handling in the Haskell web framework, Happstack.

It would be nice if the sendfile library could export a low-level function like:

sendfile :: Fd -> Fd -> Int64 -> Int64 -> IO (Bool, Int64)

It would take the output socket, and input file descriptor, an offset, and length, and return the number of bytes written, and whether the output socket blocked.

Unfortunately, it is not possible to provide a portable implementation of this sendfile function. That would require functions which can operate directly on the Fds. But those functions live in the unix package, which is not portable.

Another non-solution is to have a module like, Network.Socket.SendFile.LowLevel which is only exported on the platforms which provide a low-level sendfile implementation. However, it is my understanding that this is not really allowed by the cabal policy because there would be no way to specify that you require a version of the sendfile library that exports .LowLevel.

So, I believe a more correct solution is to create a *new* package, sendfile-lowlevel, which exports Network.Socket.SendFile.LowLevel. This assumes that there is some way to mark that a package is only available on certain platforms. However, I am not sure if that can be done.

Hopefully the new API provides enough flexibility that there is no need for an even lower-level API to be exposed. If you think you need something lower-level, let me know, and let's see if we can work something out.