Handling of big file uploads directly to disk
This work is based on the example at http://www.cherrypy.org/wiki/FileUpload adapted to CherryPy version 3.0.
Main differences
- Filter replaced by a tool, disabling cherrypy's request body processing
- Default timeouts changed
- Default request body size limit changed
- Temporary file used by cgi.FieldStorage changed to tempfile.NamedTemporaryFile so as to avoid file copy after HTTP upload; this is very important when dealing with big files for speed and space efficiency reasons.
The code
#!/usr/bin/python2.4 import cherrypy import cgi import tempfile import os __author__ = "Ex Vito" class myFieldStorage(cgi.FieldStorage): """Our version uses a named temporary file instead of the default non-named file; keeping it visibile (named), allows us to create a 2nd link after the upload is done, thus avoiding the overhead of making a copy to the destination filename.""" def make_file(self, binary=None): return tempfile.NamedTemporaryFile() def noBodyProcess(): """Sets cherrypy.request.process_request_body = False, giving us direct control of the file upload destination. By default cherrypy loads it to memory, we are directing it to disk.""" cherrypy.request.process_request_body = False cherrypy.tools.noBodyProcess = cherrypy.Tool('before_request_body', noBodyProcess) class fileUpload: """fileUpload cherrypy application""" @cherrypy.expose def index(self): """Simplest possible HTML file upload form. Note that the encoding type must be multipart/form-data.""" return """ <html> <body> <form action="upload" method="post" enctype="multipart/form-data"> File: <input type="file" name="theFile"/> <br/> <input type="submit"/> </form> </body> </html> """ @cherrypy.expose @cherrypy.tools.noBodyProcess() def upload(self, theFile=None): """upload action We use our variation of cgi.FieldStorage to parse the MIME encoded HTML form data containing the file.""" # the file transfer can take a long time; by default cherrypy # limits responses to 300s; we increase it to 1h cherrypy.response.timeout = 3600 # convert the header keys to lower case lcHDRS = {} for key, val in cherrypy.request.headers.iteritems(): lcHDRS[key.lower()] = val # at this point we could limit the upload on content-length... # incomingBytes = int(lcHDRS['content-length']) # create our version of cgi.FieldStorage to parse the MIME encoded # form data where the file is contained formFields = myFieldStorage(fp=cherrypy.request.rfile, headers=lcHDRS, environ={'REQUEST_METHOD':'POST'}, keep_blank_values=True) # we now create a 2nd link to the file, using the submitted # filename; if we renamed, there would be a failure because # the NamedTemporaryFile, used by our version of cgi.FieldStorage, # explicitly deletes the original filename theFile = formFields['theFile'] os.link(theFile.file.name, '/tmp/'+theFile.filename) return "ok, got it filename='%s'" % theFile.filename # remove any limit on the request body size; cherrypy's default is 100MB # (maybe we should just increase it ?) cherrypy.server.max_request_body_size = 0 # increase server socket timeout to 60s; we are more tolerant of bad # quality client-server connections (cherrypy's defult is 10s) cherrypy.server.socket_timeout = 60 cherrypy.quickstart(fileUpload())
Possible Improvements
- Maybe we don't need to lower case the headers for the cgi.FieldStorage invocation ?
- os.link will fail if the destination name already exists - should be handled somehow
Final Notes
My python and cherrypy experience is limited. You are welcome to improve and/or correct the code and style.

