The case of the non-standard non-PSGI unbuffered input
It all started with a question. Ain't it always the case? A question, a simple little question?
I was at my desk at work, hacking along as usual, when a message popped up. Suspicious, since I'm unlisted, I probed the lines carefully. The message was from a dear friend, who had been wondering about a special use case for a web framework. Not just any web framework, but the web framework that I loved. Dancer, her name.
He had been struck with the interesting requirement of writing a web application to upload files. These were no ordinary files, but rather very large ones. The question was whether it is possible, using Dancer, to upload just a header of the file, verify the type, and upon that verification allow or disallow further uploading.
After a short examination, it was clear that Dancer buffers the input files prior to giving you access to them, such as many other frameworks do. We had left it at that.
Still, the thought of the idea of unbuffered input streams and the intriguing question he raised stayed at the back of my mind, unflinching. When I got home I set out to inquire this peculiar concept. I started going over the PSGI specs.
The PSGI Standard:
Dancer, as you might already know, is based on a standard: the PSGI standard, which dictates how servers and applications, most likely web frameworks, should interact. It defines what a server can do, not how it does it, and how a user (in our case: Dancer) will be able to access said features.
PSGI clearly states that servers provide you with an IO::Handle for input, so you could read input one chunk at a time, if you wish. A proof of concept was far too easy to write. If servers have the environment variable 'psgi.input.buffered' set to false, it would mean their input is unbuffered. Truly, this is what I was looking for! All I needed was to find a web server that supported unbuffered input. Apparently, FCGI and uwsgi in Perl provide unbuffered input.
However, on my journey to find which server supports it on CPAN, I had to look into which server supports which feature and decided to make a pretty little table out of it. The code can actually be found on Github. Back to my story though...
The Tornado solution
My friend's coworker, a Python enthusiast, one might say, has taken a jab at handling this problem. His solution was as hackish as it was intelligent. He had decided to override methods in the Tornado web server that gets the input stream and to introspect and decide there whether to continue uploading the file or not.
This didn't seem right with me. One might say it was because of Python, and perhaps that's true, but there was something else about it. As I continued to ponder it, the reason suddenly emerged: it was standard-free. Tornado isn't WSGI standard. What does it mean? Two problematic situations:
1. Not having standards means that you're bound to an implementation. This is tricky, since implementations are susceptible to change while interfaces do not. This means that if the server changes the implentation, you have to update your code. This means that your code should be aware of different server versions and keep it up to date. If this had been done using a standard, you work at the interface level, and that doesn't really change.
2. Hacking on the web server means you're bound to that web server. Suppose this web server doesn't work for your OS, or it's slower there, or you're using an old version and get into system packages dependency hell, or you want to run it on a different machine now, or you want to change the web server application to pick something else. None of those will work, because you're now attached to that one web server. If this had been done using a standard, you only care about servers that support the aspects of the standard that you're using. Change the OS, change machine, change web server [to another that supports your features as well], it's all good.
I've been thinking a bit on how to implement this in Dancer cleanly. Apparently we already read from the handle one chunk at a time, so theoretically if we add callback methods that get the handle and allow to return a response, we just might be able to add streamed input relatively easily. You might just see a new branch sprouting for this.
A solution from another mother:
Miyagawa pointed me to Perlbal. It seems like this could be solved using it. However, this exercise is left to the reader. :)