CherryPy Project Download

Handling of big file uploads directly to disk

This work is based on the example at http://www.cherrypy.org/wiki/FileUpload adapted to CherryPy version 3.0.

Main differences

  • Filter replaced by a tool, disabling cherrypy's request body processing
  • Default timeouts changed
  • Default request body size limit changed
  • Temporary file used by cgi.FieldStorage changed to tempfile.NamedTemporaryFile so as to avoid file copy after HTTP upload; this is very important when dealing with big files for speed and space efficiency reasons.

The code

#!/usr/bin/python2.4

import cherrypy
import cgi
import tempfile
import os


__author__ = "Ex Vito"




class myFieldStorage(cgi.FieldStorage):

        """Our version uses a named temporary file instead of the default
        non-named file; keeping it visibile (named), allows us to create a
        2nd link after the upload is done, thus avoiding the overhead of
        making a copy to the destination filename."""

        def make_file(self, binary=None):

                return tempfile.NamedTemporaryFile()




def noBodyProcess():

        """Sets cherrypy.request.process_request_body = False, giving
        us direct control of the file upload destination. By default
        cherrypy loads it to memory, we are directing it to disk."""

        cherrypy.request.process_request_body = False



cherrypy.tools.noBodyProcess = cherrypy.Tool('before_request_body', noBodyProcess)





class fileUpload:

        """fileUpload cherrypy application"""

        @cherrypy.expose
        def index(self):

                """Simplest possible HTML file upload form. Note that the encoding
                type must be multipart/form-data."""

                html = """
                        <html>
                        <body>
                                <form action="upload" method="post" enctype="multipart/form-data">
                                        File: <input type="file" name="theFile"/> <br/>
                                        <input type="submit"/>
                                </form>
                        </body>
                        </html>
                        """

                return html



        @cherrypy.expose
        @cherrypy.tools.noBodyProcess()
        def upload(self, theFile=None):

                """upload action

                We use our variation of cgi.FieldStorage to parse the MIME
                encoded HTML form data containing the file."""


                # the file transfer can take a long time; by default cherrypy
                # limits responses to 300s; we increase it to 1h

                cherrypy.response.timeout = 3600


                # convert the header keys to lower case

                lcHDRS = {}
                for key in cherrypy.request.headers.keys():
                        lcHDRS[key.lower()] = cherrypy.request.headers[key]


                # at this point we could limit the upload on content-length...
                # incomingBytes = int(lcHDRS['content-length'])


                # create our version of cgi.FieldStorage to parse the MIME encoded
                # form data where the file is contained

                formFields = myFieldStorage(fp=cherrypy.request.rfile, headers=lcHDRS, environ={'REQUEST_METHOD':'POST'}, keep_blank_values=True)


                # we now create a 2nd link to the file, using the submitted
                # filename; if we renamed, there would be a failure because
                # the NamedTemporaryFile, used by our version of cgi.FieldStorage,
                # explicitly deletes the original filename

                theFile = formFields['theFile']
                os.link(theFile.file.name, '/tmp/'+theFile.filename)


                # reply to the user

                return "ok, got it filename='%s'" % theFile.filename




# remove any limit on the request body size; cherrypy's default is 100MB
# (maybe we should just increase it ?)

cherrypy.server.max_request_body_size = 0


# increase server socket timeout to 60s; we are more tolerant of bad
# quality client-server connections (cherrypy's defult is 10s)

cherrypy.server.socket_timeout = 60


# ok, let's start the server

cherrypy.quickstart(fileUpload())


Possible Improvements

  • Maybe we don't need to lower case the headers for the cgi.FieldStorage invocation ?
  • os.link will fail if the destination name already exists - should be handled somehow

Final Notes

My python and cherrypy experience is limited. You are welcome to improve and/or correct the code and style.

Hosted by WebFaction

Log in as guest/cherrypy to create/edit wiki pages