Menu

As simple as possible, as complex as necessary

Uploading files via a Taffy REST API in CFML

8 March 2019

We've found ourselves doing quite a bit of REST API building of late, which for the most part involves setting up endpoints to receive and return JSON.

However, there's one area of functionality where the data being received doesn't naturally serialize into the JSON format: file uploads.

Base64 encoding

One approach is to Base64 encode the binary to a string and include that in your JSON body. But there can be a cost to this in terms of performance and increased file size: the encoded file may be 30% larger or more.

Multipart/form-data

Another option is to upload the file in exactly the same way as you probably do routinely in non-API applications: using multipart/formdata which lets you post the binary and metadata (such as title) as form fields. Having been uploading files this way for years this was certainly my first thought, and it's also the approach suggested in the documentation for Taffy, our REST framework of choice.

This will work, but as Phil Sturgeon points out, what happens behind the scenes is pretty messy (see his and Adam's examples of the multi-part boundary segments). But Phil's main objection is that, given the API context, it's not JSON.

Separate metadata/binary requests

Phil's preferred approach, also favoured by Google, is to make two requests: the first to send the file's metadata as JSON, and the second the raw binary.

Not only is this "cleaner" in terms of what gets sent, it also allows you to make checks on the file's size before the binary is uploaded.

Let's see how we could implement this in CFML, using a simple HTML/Javascript page as the client.

IMPORTANT: Allowing uploads to your server obviously has security implications. What follows is only intended as a guide and you should ensure you have put in place security measures appropriate to your context.

There are 4 basic steps.

  1. Client: POST the metadata for a new file.
  2. API: Check the metadata, create the record and return an endpoint location for the binary upload.
  3. Client: PUT the binary to the returned URL location.
  4. API: Check the uploaded binary and associate it with the record

Client calls

As this is an API, the client could be anything which "speaks" HTTP. For this post I've set up a very simple HTML form (view the full code as a Gist) which uses AJAX to make the requests.

Screen shot of a simple upload form

When a file is chosen and the form submitted, two AJAX requests will be made.

The first sends 3 items of metadata to the API: 1) the title of the file as entered in the form, 2) the name of the chosen file, and 3) its size in bytes.

var sendMetaData = function( title, file ){
    var request1 = new XMLHttpRequest;
    var url = "api/upload/file";
    request1.open( "POST", url, false );
    request1.onreadystatechange = function(){
      if( this.readyState == 4 && this.status == 200 ){
        sendFile( file, this.getResponseHeader( "location" ) );
      }
    };
    request1.send( JSON.stringify( { title: title, name: file.name, size: file.size } ) );
};

The second request is sent to the URL in the Location header returned by the API. It uses the PUT method and contains only the binary file in the body. The location URL will include an ID parameter linking the request to the previously sent metadata.

var sendFile = function( file, url ){
    var request2 = new XMLHttpRequest;
    request2.open( "PUT", url, false );
    request2.send( file );
};

(Note: as the requests are being made by a web browser in this case, the content-type and content-length headers will be sent automatically.)

The Taffy API

Taffy makes it really easy to set up REST endpoints by creating a component for each resource and defining functions within it to handle each type of request, i.e. GET, POST etc.

Within our /resources/upload/file.cfc we need to define two methods to handle the client requests: post() and put().

Post()

This endpoint expects just the title, filename and size of the file to be sent as JSON in the request body.

public function post( required string title, required string filename, required numeric size ){
    // Reject files exceeding size limit before they are uploaded
    if( arguments.size > 5120 )
        return rep( { error: "File size exceeds exceeds 5MB limit" } ).withStatus( 400 );
    // Create and save a new file object with the passed metadata however you would normally do this
    var file = New File( ID=CreateUUID() );
    file.populate( title=arguments.title, filename=arguments.filename, size=arguments.size );
    file.save();
    // Send back a Location header which includes the file ID
    return noData()
        .withHeaders( { location: "/api/upload/file?ID=#file.getID()#" } )
        .withStatus( 201, "Created" );
}

In this example we've set a limit of 5MB on uploads, and if the file is over that limit we can decline immediately at this point without having to upload and check the binary.

If the size is ok, then we create an object record for the file with a new ID, which we include as a query parameter in the URL sent back to the client as a Location header.

Bypassing Taffy

Before looking at the put() handler we need to address a problem with Taffy: it doesn't support binary content in the request body. By default it expects JSON. It does have a mechanism for defining other types of content such as XML using Custom Deserializers, but the content must be deserializable to a struct, meaning binary content will fail.

We can work around this by writing a simple deserializer which returns a dummy struct to stop the framework from throwing errors, and then handling the binary content directly in our resource method.

binaryDeserializer.cfc

component extends="taffy.core.nativeJsonDeserializer"{
 
    // Dummy deserializer to intercept requests with binary body content
    any function getFromBinary( body )
        taffy_mime="application/pdf,application/msword" /* List of acceptable binary mime types */
    {
        return {};//Dummy data to prevent Taffy errors. We will handle the binary directly in the resource.
    }
 
}

Note that we need to define the mime types we are willing to accept as binary uploads in the taffy_mime attribute of the function metadata. Any other mime types will be handled by the default JSON deserializer and obviously fail. From a security perspective this "whitelist" approach is what we want.

(As a side note, it would be useful to be able to define the acceptable mime types in our deserializer dynamically.)

Put()

This method handles the actual binary upload, bypassing Taffy and extracting it directly from the request.

public function put(){
    // Check the file ID has been sent as a url parameter
    if( !arguments.KeyExists( "ID" ) )
        return rep( { error: "Missing file ID" } ).withStatus( 400 );
    // Load the previously saved file object to match it up with the binary
    var file = loadFileByID( arguments.ID );
    // if the ID is invalid
    if( IsNull( file ) )
        return rep( { error: "Invalid file ID" } ).withStatus( 404 );
    // Check the uploaded file's size per the content-length header against the recorded metadata
    var headers = GetHTTPRequestData().headers;
    if( headers[ "content-length" ] != file.getSize() )
        return rep( { error: "The file size doesn't match your metadata" } ).withStatus( 400 );
    // Get the binary from the request, bypassing Taffy
    var binary = GetHTTPRequestData().content;
    // Check it's actually a binary
    if( !IsBinary( binary ) )
        return rep( { error: "No binary found in the request body" } ).withStatus( 400 );
    // Write the binary to the file system
    FileWrite( file.getPath(), binary );
    // Double check the actual file size against the recorded metadata
    if( GetFileInfo( file.getPath() ) != file.getSize() ){
        FileDelete( file.getPath() );
        return rep( { error: "The file size doesn't match your metadata" } ).withStatus( 400 );
    }
    // Check the mime type matches the header value
    var actualMimeType = FileGetMimeType( file.getPath(), true );
    if( actualMimeType != headers[ "content-type" ] ){
        FileDelete( file.getPath() );
        return rep( { error: "The binary file mime type does not match the content-type in the request header" } ).withStatus( 400 );
    }
    // All checks passed. Do any further processing if required then return a successful response
    return rep( { message: "File successfully uploaded", fileID: file.getID() } );
}

Although we checked the file size in the previously sent metadata, and Taffy has made sure the mime type in the header of this request is acceptable, our put() method repeats those checks because the size metadata and content-length and mime headers could be forged.

Never trust client input.

Conclusion

File uploads in CFML are relatively straightforward using the "traditional" multipart/form method, and in comparison this alternative approach might seem complex.

But there are clear benefits to the two-step approach in terms of screening out unacceptable files before they consume bandwidth, and in terms of data format consistency, especially when used within an API.

Finally, Phil points out a further potential advantage in separating the binary upload into a separate request:

What's cool about this approach, is that URL could be part of your main API, or it could be a totally different service. It could be a direct-to-S3 URL, or some Go service, or anything.

Binary transfers can be a resource hog, so redirecting them to a dedicated service could remove that burden from your API and thereby improve its performance.

Back to the top