nGram filtering for Elasticsearch in Drupal 8

Custom nGram filters for Elasticsearch using Drupal 8 and Search API
Fri, July 27th, 2018
blueoakinteractive

A powerful content search can be built in Drupal 8 using the Search API and Elasticsearch Connector modules. Out of the box, you get the ability to select which entities, fields, and properties are indexed into an Elasticsearch index. You also have the ability to tailor the filters and analyzers for each field from the admin interface under the "Processors" tab. These filters can strip HTML, ignore case, stem, tokenize and boost just to name a few. Each filter/analyzer has its own config for how it applies to each field in your index.

Elasticsearch Connector and Search API configuration

The out-of-the-box configurations provided by these modules are usually all you need for a robust search experience. However, if you realize that you need finer grain control over the filters or need to implement a custom filter, you can create an event subscriber in a custom module. In Drupal 8, event subscribers provide an improved way to alter or react to objects before, during or after they are processed. I encourage you to visit "Drupal 8: Hooks, Events, and Event Subscribers" by Jonathan Daggerhart if you'd like to learn more.

Elasticsearch Connector advanced customization

The Elasticsearch Connector provides several events that you can subscribe to, but there is limited documentation on them at the moment. These events currently include:

  • PrepareMappingEvent::PREPARE_MAPPING Allows alteration of the mapping config.
  • PrepareIndexEvent::PREPARE_INDEX Allows alteration of the index config.
  • PrepareIndexMappingEvent::PREPARE_INDEX_MAPPING Allows alteration of the index mapping.
  • PrepareSearchQueryEvent::PREPARE_QUERY Allows alteration of the search query.
  • BuildSearchParamsEvent::BUILD_QUERY Allows alteration of the search parameters.

Custom filters and analyzers.

We needed to solve the use case of applying an nGram filter to a Drupal Commerce product variation SKU for partial matching of the entire SKU. For example, when searching for D'Addario mandolin strings, the options include SKUs like EJ74, EJ75, EJM75C, etc... If you wanted to find all of the "EJ" series products, you'd need an nGram filter with a minimum token value of 2 to find these. You could also find all of the "75" variants with the same tokenizer.

The following example allows you to create tokens from 2 to 6 characters long on a commerce product variation SKU for partial string matching. By reacting the PrepareIndexEvent, we can create a custom nGram filter and analyzer that uses the custom filter. Then register an event subscriber for PrepareMappingEvent and register the analyzer on the fields that should use it.

Register the event subscriber class as a service.

services:
  my_module.elasticsearch_connector:
    class: Drupal\my_module\EventSubscriber\ElasticsearchConnector
    arguments: []
    tags:
      - { name: event_subscriber }

Create the event subscriber class.

<?php

namespace Drupal\custom_module\EventSubscriber;

use Symfony\Component\EventDispatcher\EventSubscriberInterface;
use Symfony\Component\EventDispatcher\Event;
use Drupal\elasticsearch_connector\Event\PrepareMappingEvent;
use Drupal\elasticsearch_connector\Event\BuildSearchParamsEvent;
use Drupal\elasticsearch_connector\Event\PrepareIndexEvent;
use Drupal\elasticsearch_connector\Event\PrepareSearchQueryEvent;
use Drupal\elasticsearch_connector\Event\PrepareIndexMappingEvent;

/**
 * Class PrepareIndexMapping.
 */
class ElasticsearchConnector implements EventSubscriberInterface {

  /**
   * {@inheritdoc}
   */
  public static function getSubscribedEvents() {

    // Define event listeners for all of the Elasticsearch
    // Connector events.
    $events[PrepareMappingEvent::PREPARE_MAPPING] = ['prepareMapping'];
    $events[PrepareIndexMappingEvent::PREPARE_INDEX_MAPPING] = ['prepareIndexMapping'];
    $events[PrepareIndexEvent::PREPARE_INDEX] = ['prepareIndex'];
    $events[PrepareSearchQueryEvent::PREPARE_QUERY] = ['searchQuery'];
    $events[BuildSearchParamsEvent::BUILD_QUERY] = ['buildSearch'];

    return $events;
  }

  /**
   * Prepare index event.
   *
   * @param \Symfony\Component\EventDispatcher\Event $event
   */
  public function prepareIndex(Event $event) {
    // Define a custom nGram analyzer for the `product_index` index.
    // Change this condition to match the name of your index.
    if ($event->getIndexName() == 'elasticsearch_index_data_product_index') {
      $config = $event->getIndexConfig();
      $config['body']['settings']['analysis'] = [
        'filter' => [
          'custom_ngram' => [
            'type' => 'ngram',
            'min_gram' => 3,
            'max_gram' => 6,
          ],
        ],
        'analyzer' => [
          'sku_ngram_analyzer' => [
            'type' => 'custom',
            'filter' => [
              'custom_ngram',
            ],
            'tokenizer' => 'standard',
          ],
        ],
      ];
      $event->setIndexConfig($config);
    }
  }

  /**
   * Prepare mapping event.
   *
   * @param \Symfony\Component\EventDispatcher\Event $event
   */
  public function prepareMapping(Event $event) {
    $field_identifier = $event->getMappingField()->getFieldIdentifier();

    // Append the custom sku_ngram_analyzer to the sku field.
    if ($field_identifier == 'sku') {
      $config = $event->getMappingConfig();
      $config['analyzer'] = 'sku_ngram_analyzer';
      $event->setMappingConfig($config);
    }

  }

  /**
   * Prepare index mapping event.
   *
   * @param \Symfony\Component\EventDispatcher\Event $event
   */
  public function prepareIndexMapping(Event $event) {
    // Not currently used.
  }

  /**
   * Respond to search query event.
   *
   * @param \Symfony\Component\EventDispatcher\Event $event
   */
  public function searchQuery(Event $event) {
    // Not currently used.
  }

  /**
   * Respond to the buildSearch event.
   *
   * @param \Symfony\Component\EventDispatcher\Event $event
   */
  public function buildSearch(Event $event) {
    // Not currently used.
  }

}

Additional Resources

More information can be found in the links below. Be sure to reference the same version of Elasticsearch documentation that your server is running.