Using Lucee's simple parallel processing appropriately
Ben Nadel has been busy recently writing up his experiences having switched from ColdFusion 10 to Lucee 5. Among the excellent posts he has produced is one detailing a somewhat disappointing experiment with the parallel processing option Lucee offers with its higher order iteration methods, where in some cases performance appeared to degrade rather than improve.
Like Ben I was quite excited about Lucee's easy parallelism when I first read about it, but my first real-world use of it was not just disappointing, it was disastrous. Attempting to use it to build the results for an auto-complete search, the server promptly crashed (fortunately in dev).
Without going into the forensic detail for which Ben is famous, I'd done some quick and dirty (QnD) testing which also showed the parallel option to be significantly slower than serial iteration. The following example tests serial vs parallel string processing:
// Create an array of strings to process
strings = [];
loop times=1000{
	strings.Append( " please trim me " );
};
// Define our processing closure
processingFunction = function( item ){ item.Trim() };
// Test serial iteration
stopwatch variable="ms"{
	strings.Each( processingFunction );
};
dump( var=ms, label="Serial" );
// Test parallel iteration
stopwatch variable="ms"{
	strings.Each(
		processingFunction
		,true //execute in parallel using up to the default 20 threads
	);
};
dump( var=ms, label="Parallel" );Typical results on my dev machine are:
Serial: 2ms,
Parallel: 120ms.
Parallel processing in this case seems to be around 100 times slower!
Fast is slow, slow is fast
The reason for this is that parallel processing is not suited to sets of tasks each of which is quick to execute.
In his post Ben references a presentation by Lucee expert Gert Franz which warns of this and defines "quick to execute" as 5ms or less.
If we change the processing function in our QnD test so that each operation is guaranteed to take at least 10ms, we get very different typical results:
// Create an array of strings to process
strings = [];
loop times=1000{
	strings.Append( "zzzzzzz" );
};
// Define our processing closure
processingFunction = function( item ){ Sleep( 10 ) };
// Test serial iteration
stopwatch variable="ms"{
	strings.Each( processingFunction );
};
dump( var=ms, label="Serial" );
// Test parallel iteration
stopwatch variable="ms"{
	strings.Each(
		processingFunction
		,true //execute in parallel using up to the default 20 threads
	);
};
dump( var=ms, label="Parallel" );Serial: 10,500ms,
Parallel: 900ms.
With each iteration taking at least 10ms, the results are flipped and parallel processing is now 10 times faster.
Practical uses
With this in mind, here are a few examples of operations from my own apps which have shown a clear improvement in execution time with parallel processing enabled.
1. Database dumps
As part of our backup routines we run a scheduled task from Lucee to execute regular database dumps. There are around a dozen databases of varying sizes and the job would typically take between 18 and 25 seconds to complete. Running the dumps in parallel has cut request times down to between 5 and 10 seconds, which is how long it would take to process the largest database on its own.
2. RSS feed merging
An RSS aggregation function consumes around 20 feeds which may contain duplicate items and merges them into a single, de-duplicated feed. In serial mode, typical execution time would be around 5 seconds. In parallel this was cut to just 500 milliseconds.
3. Log file retrieval
Another regular maintenance job involves downloading log files from a remote server via SFTP which in serial mode was typically taking up to 15 seconds to complete. Parallel execution of the FTP operations has reduced the total time to around 3 seconds.
Safe thread maximum
For some reason, however, initial testing of the FTP log retrieval using parallel processing failed with a java NullPointerException (NPE). Most of the operations appeared to have completed but the request consistently ended with this exception. I was able to address it by limiting the threads used to the number of processors available on the machine (in our case 8), overriding the default 20 thread maximum. Lucee provides a handy environment specific variable for this:
items.Each(
	,downloadlogFileFunction // processing closure
	,true // parallel processing
	,server.system.environment.number_of_processors // max number of threads to use
);Use judiciously
The lesson here is to guard against over-enthusiasm for shiny new features which appear to promise quick performance wins across the board. Parallel iteration is wonderfully simple to implement in Lucee and can result in some significant efficiencies. But each use case should be carefully assessed first to make sure it's suitable.