Solving Concurrency Issues

The problem comes when we want to allow more than one person to rename files or directories at the same time. Imagine that you rename the /clinton directory, which contains hundreds of thousands of files. Meanwhile, another user renames the single file /clinton/projects/elasticsearch/README.txt. That user’s change, although it started after yours, will probably finish more quickly.

One of two things will happen:

  • You have decided to use version numbers, in which case your mass rename will fail with a version conflict when it hits the renamed README.asciidoc file.
  • You didn’t use versioning, and your changes will overwrite the changes from the other user.

The problem is that Elasticsearch does not support ACID transactions. Changes to individual documents are ACIDic, but not changes involving multiple documents.

If your main data store is a relational database, and Elasticsearch is simply being used as a search engine or as a way to improve performance, make your changes in the database first and replicate those changes to Elasticsearch after they have succeeded. This way, you benefit from the ACID transactions available in the database, and all changes to Elasticsearch happen in the right order. Concurrency is dealt with in the relational database.

If you are not using a relational store, these concurrency issues need to be dealt with at the Elasticsearch level. The following are three practical solutions using Elasticsearch, all of which involve some form of locking:

  • Global Locking
  • Document Locking
  • Tree Locking

The solutions described in this section could also be implemented by applying the same principles while using an external system instead of Elasticsearch.