Processing large spreadsheet files efficiently in CFML

19 February 2025

It's ten years since the first release of the Lucee Spreadsheet library, as it was called back then.

Initially intended to address what seemed to me to be sub-par support for spreadsheet handling in Railo/Lucee having just migrated from Adobe ColdFusion (ACF), it has since evolved into Spreadsheet CFML which provides a wide range of spreadsheet and CSV handling capabilities to both engines.

However, a couple of years ago I added a new feature for reading large files which I couldn't get to work on ACF due to an issue with a third-party java library.

readLargeFile() now works with ColdFusion

I'm pleased to say that with the current version of the dependency (and version 4.4.0 of the spreadsheet-cfml library) that issue seems to have gone away, meaning streamed reading of large XLSX files is now available with ACF as well as Lucee.

An even more efficient option for large XLSX files

While readLargeFile() works well by avoiding loading the entire spreadsheet file into memory, it still requires a certain amount of buffering space to build and return the full contents of the file as a variable to your CFML code, normally a query object (but optionally as CSV or HTML).

Another more recent addition to the library was readCsv() which included an option not to return any data at all but process each row as it is read using a passed function/closure. This is an even more efficient way of dealing with large CSV files and the latest version 4.5.0 of the library offers the same approach for XLSX files via a new processLargeFile() method.

How it works

Using a "builder" syntax you point to the file you want to process and provide a CFML function that will run on every row as it's streamed in by the library.

myProcessingFunction = function( row, rowNumber ){
  // insert row into DB etc
}
spreadsheet = New spreadsheet.Spreadsheet();
path = "c:/temp/large.xlsx";
spreadsheet.processLargeFile( path )
  .withRowProcessor( myProcessingFunction )
  .execute();

So instead of reading the file into a query and then looping over the complete recordset as it sits in memory, we're applying our custom processing directly as the file is being read, avoiding the need for the additional loop or holding the resulting data in memory.

You can do whatever you want with the streamed row data, such as insert it directly into a database, or pull out particular rows, all without worrying about running out of JVM heap space.

Of course you're still free to build a full resultset if you wish, for example an array of arrays (something readLargeFile() can't do).

myArray = [];
myProcessingFunction = function( row, rowNumber ){
  myArray.Append( row );
}
spreadsheet = New spreadsheet.Spreadsheet();
path = "c:/temp/large.xlsx";
spreadsheet.processLargeFile( path )
  .withRowProcessor( myProcessingFunction )
  .execute();

Skipping options

Your processing function could also skip rows you don't want to process (perhaps by checking the rowNumber argument), but the library has a couple of options to make common scenarios simpler.

Firstly, if your spreadsheet uses the first row for column headers, you can have this be ignored as follows:

spreadsheet.processLargeFile( path )
  .withRowProcessor( myProcessingFunction )
  .withFirstRowIsHeader()
  .execute();

Secondly, if the data you want to access doesn't appear for several rows then you can skip a given number of rows:

spreadsheet.processLargeFile( path )
  .withRowProcessor( myProcessingFunction )
  .withSkipFirstRows( 5 )
  .execute();

Other options

Check out the docs for other options which include providing a password, specifying a particular sheet, and adjusting the streaming parameters.

Spreadsheet CFML (Github)