I have just pushed some patches which affect the way the Request
body and RqData
are handled in happstack 0.6. This contains user visible changes which will affect you if you:
- Use
RqData
- Directly use the
rqBody
field inRequest
- Directly use the
rqInput
field inRequest
- Directly work with the
Input
type - Allow file uploads
Some of the changes fix bugs (design flaws), and others are for new features and functionality. The non-compatible API changes are pretty small, so it should be easy to port your code. It basically comes down to:
getDataFn, withDataFn, etc
take an extra argument of the typeBodyPolicy
getDataFn, withDataFn, etc
returnEither [String] a
instead ofMaybe a
- the
inputValue
field of theInput
type is nowEither FilePath L.ByteString
instead ofL.ByteString
- you have to explicitly import the module
Happstack.Server.RqData
In this post I will describe what motivated these changes. I am
hoping to also get feedback and these changes before we release 0.6 since it will be less painful to make further changes now.
the Request body and space usage
In the old code the Request
type stores the request body as a simple lazy ByteString
:
> newtype RqBody = Body { unBody :: L.ByteString } deriving (Read,Show,Typeable)
>
> data Request = Request { ...
> , rqBody :: RqBody
> }
This feels nice, because it is a simple, pure value. Unfortunately, it is really not a great idea in practice. The request body does not initially require any space, because it is an unevaluated lazy ByteString
. But the ServerPart
holds the Request
in its environment, and that means the garbage collection can not free the RqBody
as you evaluate it. If the request body contained gigabytes of data, that could be disastrous.
The solution in Happstack 0.6 is to use an MVar
to hold the request body:
>
> data Request = Request { ...
> , rqBody :: MVar RqBody
> }
Instead of using rqBody
directly, it is better to use takeRequestBody
, so that your code will not break if we switch to IORef
or something else.
> takeRequestBody :: Request -> IO (Maybe RqBody)
> takeRequestBody rq = tryTakeMVar (rqBody rq)
Now, when you process the RqBody
the Request
will not be holding onto it, so the garbage collection can free it (assuming your code to not hold onto it and introduce a new space leak).
This does have a drawback however. A ServerPart
can call mzero
at anytime, and processing will move onto the next ServerPart
. However, if you have already taken the RqBody
then the next ServerPart
may be missing critical data it needs. But, if we left the RqBody
intact, that would result in the space leak. I think that in practice, if a ServerPart
made enough progress that it started consuming the RqBody
and then failed, it is unlikely that another ServerPart
would succeed and need the RqBody
. If another ServerPart
succeeds, it is probably just a 404 Not Found
handler or something similar, which does not need the request body. So it seems like it is better to have the default behavior be the more space friendly solution.
We will also provide peekRequestBody
and/or putRequestBody
functions so that you can opt to leave the request body intact. It is up to you to be sensible about using them.
BodyInput
and space usage
In RqData, the cookies, QUERY_STRING, and request body (when appropriate) are parsed into a [(String, Input)]
, where String
is the name of the key, and Input
is the value.
In Happstack 0.6, Input
will be the type:
> data Input = Input
> { inputValue :: Either FilePath L.ByteString
> , inputFilename :: Maybe FilePath
> , inputContentType :: ContentType
> } deriving (Show,Read,Typeable)
In Happstack 0.5 the inputValue
is simply a L.ByteString
. Once again, this seems fine at first. After all, the inputValues
are lazy ByteString
, so we can process them lazily, right? Well, not quite. In the unprocessed request body, the key/value pairs are laid out like this:
key1
value1
key2
value2
key3
value3
key4
value4
...
If we were to consume the key/value pairs in a sequential manner, then we would be ok. But, generally we want to use functions which can lookup a specific key. Imagine we want to look up key4
. In order to do that we have to first read in all the preceding key/value pairs. If we knew we only cared about key4
then we could just toss the rest. But with the monadic RqData
code we don't know that. (A future post will talk about an arrow based alternative where we do know that). So, we have to store all the key/value pairs in case we want to lookup key1
after key4
.
In Happstack 0.5, we store all those values in RAM. But, some of those values might be (huge) files. That clearly isn't going to work. So we once again trade off a bit a simplicity/elegance for the practical matter of not having unlimited amounts of RAM. Instead we store some values in RAM and some values on the disk. How do we decide what goes where? That brings us to BodyPolicy
.
BodyPolicy
When parsing the request body, we need some way to decide what values should be stored in RAM and what values should be saved to disk. Additionally, we want to impose limits on how much data can be stored in either location. If a user decides to post the contents of /dev/random you are likely to want to cut them off at some point. However, the specific values for the quotas are application specific. In fact, they may be specific to the particular form that is being processed. For example, an admin user might have higher quotas than a regular user.
The answers to these questions are provided by the BodyPolicy
, which looks like:
> data BodyPolicy
> = BodyPolicy { inputWorker :: Int64 -> Int64 -> Int64 -> InputWorker
> , maxDisk :: Int64 -- ^ maximum bytes to save to disk (files)
> , maxRAM :: Int64 -- ^ maximum bytes to hold in RAM
> , maxHeader :: Int64 -- ^ maximum header size (this only affects headers in the multipart/form-data)
> }
The inputWorker
is the function that actually decides where values should be saved, and implements the quotas. Its Int64
arguments are the quotas for the disk, ram, and other headers which don't really get saved, but which can temporarily take up space. The next three fields are the values to pass to the inputWorker
.
In most cases, you do not need to write you own inputWorker
. It is sufficient to use the defaultBodyPolicy
:
> defaultBodyPolicy :: FilePath -> Int64 -> Int64 -> Int64 -> BodyPolicy
The first argument is the directory to store temporary files in, and the next three arguments are the quota values. I am not going to cover defaultBodyPolicy
in detail in this post. But it is well documented in the Happstack Crash Course.
Improvements to RqData
The new RqData
module also includes a number of new features.
There is now an Applicative
functor instance for RqData
. The applicative functor instance accumulates errors. This means if you try look up multiple invalid keys, the error message will report all the missing values, not just the first one. This is nice when you are debugging your code, and is also nice if you provide a web service (REST API, etc) and want to provide your API users with detailed error messages instead of "Invalid Request".
We now provide two filters (body
and queryString
) which limit the scope of the look* functions to either the request body or the QUERY_STRING.
A new function lookFile
is provided to assist with handling file uploads.
A new function checkRq
is provided to help you convert
request parameters to Haskell types, or to check that a value meets some conditions.
Summary
This post gives some of the background on the changes to how we handle the request body and form data. To actually see what the changes look like in practice, you should check out the RqData section in the Happstack Crash Course. It gives detailed examples of all the features and changes I talked about in this post. I have also updated the haddock documentation in darcs.
I would love to hear your opinions. Do you love the changes? Hate the changes? Have better ideas about how to solve the problems? In terms of handling the raw request body, I believe both Yesod and Snap use the same basic approach -- the first handler to try to use the request body gets the whole thing, and everyone else gets nothing. (And they provide ways to put the request body back if you want to..).