Menu

As simple as possible, as complex as necessary

Getting started with Elasticsearch using Lucee Part 1

27 September 2018

The ease with which a search function could be set up within a application was one of the many appealing features of ColdFusion during the many years we used it.

As this tutorial from 2012 explains, with just 3 tags - <cfcollection>, <cfindex>, and <cfsearch> - you can create, populate and query an index so that the result data is available to your web app.

By the time we moved from CF to Lucee a few years ago, however, we'd already stopped using the built-in search capabilities in favour of Elasticsearch (ES), a standalone Lucene based product which offers a more modern and flexible way of handling search through its RESTful, JSON based API. We've also found it to be significantly faster.

There are numerous pre-built ES client libraries for various server languages, including CFML. In this post though, I'll run through some basic CFML for creating, populating and deleting an ES index using Lucee.

1. Install Elasticsearch

There are detailed instructions on how to get ES running in various environments. I prefer the "zip approach" which involves copying files and running a few simple commands. These include setting ES up as a Windows service.

If all goes well you should be able to load the following URL into a browser and get a JSON response from your ES server:

http://localhost:9200

To get ES to do things we will need to issue HTTP requests in Lucee using this base URL.

2. Create an index

There's no exact equivalent in ES to <cfcollection action="Create" collection="MyCollection" path="c:\mysite\"> since a new index ("collection") will be created automatically if it doesn't already exist when you issue requests against it. You needn't worry about paths as ES handles physical data storage internally.

But it is important to define the structure of the index including its fields, just as you would for a database table. This is done by telling ES the mappings you want. There are many options when it comes to mappings but here we'll just define a simple index mapping to hold blog posts with three searchable text fields: title, summary and body (note: the key/ID field is handled automatically).

The request will be sent to ES as JSON, but we don't need to concatenate JSON strings which would be tedious. Instead we can build them as CFML structures, then convert them to JSON before sending as a PUT request.

requestBody = { mappings: {} };
requestBody.mappings = {
	properties:{
		title: { type: "text", analyzer: "english" }
		,summary: { type: "text", analyzer: "english" }
		,body: { type: "text", analyzer: "english" }
	}
};
//Create an index called "blogposts" with the above mapping
http url="http://localhost:9200/blogposts" method="PUT" result="result"{
	httpParam type="header" name="Content-Type" value="application/json";
	httpParam type="body" value=SerializeJSON( requestBody );
};
Dump( result );

You can look at the dumped HTTP result to see if there were any errors.

To check the mapping definition, you can make the following request:

http url="http://localhost:9200/blogposts/_mapping/" result="result";
Dump( DeserializeJSON( result.fileContent ) );

3. Populate an index

Having created our index, we can now populate it from a standard database query.

data = QueryExecute( "SELECT ID, title, summary, body FROM posts WHERE datePublished IS NOT NULL" );

To add or update a single record, we just specify its key/ID in the URL and send the column name/value pairs as JSON, again using the PUT HTTP method.

row1 = data.rowData( 1 );
requestBody = {
	title: row1.title
	,summary: row1.summary
	,body: row1.body
};
http url="http://localhost:9200/blogposts/_doc/#row1.ID#" method="PUT" result="result"{
	httpParam type="header" name="Content-Type" value="application/json";
	httpParam type="body" value=SerializeJSON( requestBody );
}
Dump( DeserializeJSON( result.fileContent ) );

Adding or updating multiple records at once can be done in a single request using the _bulk API - ideal for loading your initial data. This requires 2 lines of JSON per record: the first containing the key/ID, the second the data to be indexed. In this case we need to assemble the JSON lines separately as each one must end with a new line character (which would be missing if we just used SerializeJSON() at the end).

Note that the HTTP method to use is POST.

// savecontent is a fast, memory efficient way of building strings in CFML
savecontent variable="requestBody"{
	for( row in data ){
		keyRow = { index: { _id: row.ID } };
		dataRow = {
			title: row.title
			,summary: row.summary
			,body: row.body
		};
		WriteOutput( SerializeJson( keyRow ) & Chr( 10 ) );
		WriteOutput( SerializeJson( dataRow ) & Chr( 10 ) );
	}
};
http url="http://localhost:9200/blogposts/_bulk" method="POST" {
	httpParam type="header" name="Content-Type" value="application/json";
	httpParam type="body" value=requestBody;
}

To check the status of the index including how many documents it contains, try outputting the result of this request in a HTML "pre" tag:

http url="http://localhost:9200/_cat/indices/blogposts?v" result="result";

4. Search an index

With our index set up, we can now try a very basic search for the keyword "simplicity".

http url="http://localhost:9200/blogposts/_search?q=simplicity" result="result";
Dump( DeserializeJson( result.filecontent ) );

This will return a struct containing the results of the search along with useful metadata.

screenshot showing search results returned from ES as a struct

I'll look in a bit more detail at searching and how to handle the results within a Lucee app in Part 2.

5. Delete a record

Removing an individual record from the index is as simple as specifying its REST URL and using the DELETE HTTP method.

http url="http://localhost:9200/blogposts/_doc/1" method="DELETE";

6. Delete an index

Dropping the entire index is just as easy.

http url="http://localhost:9200/blogposts" method="DELETE";

In Part 2, we'll look more closely at searching an ES index using a basic search term query and how to work with the results in Lucee.

Posted on . Updated

Comments

  • Formatting comments: See this list of formatting tags you can use in your comments.
  • Want to paste code? Enclose within <pre><code> tags for syntax higlighting and better formatting and if possible use script. If your code includes "self-closing" tags, such as <cfargument>, you must add an explicit closing tag, otherwise it is likely to be mangled by the Disqus parser.
Back to the top