In this article, let's take a look at some of the most significant changes in Meilisearch's latest update. This release brings you new features such as smart crop and deterministic API keys. With v0.28, we have stabilized our API, taking the first step towards a v1.0 🎉 This stabilization brings multiple changes. You can read the full changelog on GitHub, but we’ll go over the main ones in this article.

New feature: smart crop

Instead of considering the first search term match as the best cropping location, Meilisearch centers the crop around the largest number of unique matches, giving priority to terms that are closer to each other and follow the original query order.  Meilisearch also considers context when cropping and prioritizes keeping the sentence together.

Given the following string:

👉
“A young elephant, whose oversized ears enable him to fly, helps save a struggling circus, but when the circus plans a new venture, Dumbo and his friends discover dark secrets beneath its shiny veneer.”‌

If the search query is Dumbo and the cropLength is 5, Meilisearch will now return:

"… Dumbo and his friends discover…”

Instead of:

"…new venture, Dumbo and his…”

New feature: deterministic API keys

A deterministic algorithm is an algorithm that, given a particular input, always produces the same output, with no randomness involved.

You can create a deterministic key value by specifying a uid field at creation. The uid value must follow the uuid v4 format. If you don't specify anything, Meilisearch automatically generates the uid for you.

The value of the key field is generated by hashing the master key with the uid. The same combination always results in the same key value.

This will allow you to have the same set of API keys across different Meilisearch instances. Henceforth, when upgrading or redeploying your Meilisearch instance, you’ll be able to keep your API keys.

⚠️
As a result of these modifications, keys imported from older versions of Meilisearch will have their key and uid fields regenerated. When updating your Meilisearch instance, you will need to update your keys.

We have also added a name field to make API key retrieval more convenient. A key object should now look like this:

{
    "name": null,
    "description": "Manage documents: Products/Reviews API key",
    "key": "d0552b41536279a0ad88bd595327b96f01176a60c2243e906c52ac02375f9bc4",
    "uid": "6062abda-a5aa-4414-ac91-ecd7944c0f8d",
    "actions": [
        "documents.add",
        "documents.delete"
    ],
    "indexes": [
        "products",
        "reviews"
    ],
    "expiresAt": "2021-12-31T23:59:59Z",
    "createdAt": "2021-10-12T00:00:00Z",
    "updatedAt": "2021-10-13T15:00:00Z"
}

‌Other changes regarding API key management include:

  • being able to retrieve, update, and delete a key by either the key or uid fields
  • introducing new actions to manage API keys (keys.get, keys.create, keys.update, keys.delete)
  • removing the possibility of updating the actions, indexes, or expiresAt properties of an API key after creation for security reasons

Breaking change: search nomenclature

We’re on the road to v1, which means defining a stable API. To improve clarity, we have made several changes to the naming of some search parameters and response fields in the /indexes/{uid}/search endpoint.

Search parameters formerly known as facetsDistribution and matches are now called facets and showMatchesPosition, respectively.

The response fields returned when using those parameters are now facetDistribution instead of facetsDistribution –note the suppression of the s– and _matchesPosition instead of _matchesInfo.

The response field nbHits has been renamed estimatedTotalHits. This value was recurrently used to calculate the number of search result pages, which we strongly advise against. To learn how to paginate with Meilisearch without using nbHits, check out this fresh new guide.

For the following query:

curl \
  -X POST 'http://localhost:7700/indexes/movies/search' \
  -H 'Content-Type: application/json' \
  --data-binary '{
    "q": "Shazam",
    "facets: ["genres"],
    "showMatchesPosition": true
  }'

‌You’ll get the following response:


{
  "hits": [
    {
      "id": "287947",
      "title": "Shazam!",
      "poster": "https://image.tmdb.org/t/p/w500/xnopI5Xtky18MPhK40cZAGAOVeV.jpg",
      "overview": "A boy is given the ability to become an adult superhero in times of need with a single magic word.",
      "release_date": 1553299200,
      "genres": [
        "Action",
        "Comedy",
        "Fantasy"
      ],
      "_matchesPosition": {
        "title": [
          {
            "start": 0,
            "length": 6
          }
        ]
      }
    },
    ...
  ],
  "estimatedTotalHits": 3,
  "query": "Shazam",
  "limit": 20,
  "offset": 0,
  "processingTimeMs": 4,
  "facetDistribution": {
    "genres": {
      "Action": 3,
      "Animation": 2,
      "Comedy": 1,
      "Fantasy": 1
    }
  }
}
 

Breaking change: task management

Browsing tasks

We have added a new pagination system to the /tasks endpoint.

With this change, it’s much easier to browse through the tasks, as their number can quickly grow in instances with a large number of asynchronous operations.

For each call to this endpoint, the response will return the following fields:

  • limit: number of tasks returned (defaults to 20)
  • from: the uid of the first task returned
  • next: the uid of the next task

To view the next page of results, you would repeat the same query, replacing the value of from with the value of next. When the value of next is null, there are no more tasks to view.

This type of pagination system is called keyset pagination. As opposed to offset pagination used to browse indexes, documents, and keys, it has two main advantages: it prevents any inconsistencies and, since there is no need to scan and count records, it’s more efficient, which is a significant advantage when your task queue grows fast.

Filtering tasks

We have also made the task list filterable. You can now get tasks by status, type, or indexUid.

For example, the following command returns all tasks belonging to the index movies that succeeded:

curl -X GET 'http://localhost:7700/tasks?indexUid=movies&status=succeeded'

‌These modifications have led to the deletion of the GET /indexes/:indexUid/tasks  and the GET /indexes/:indexUid/tasks/:taskUid endpoints.

Breaking change: dumps

Dump creation has always been an asynchronous operation but used a separate queue from the task queue. With v0.28, dumps have become tasks. This has resulted in a new task type called dumpCreation.

Despite being tasks and thus sharing the same queue, dumps are given priority. They’ll be processed as soon as the current task is done running. You can think of dumps as VIPs in a club; even though they arrived last - which is reflected in their taskUid - they get to skip the line.

via GIPHY

Contributors’ experience

We have worked hard on improving the contribution experience of our tokenizer: charabia. The tokenizer’s role is to split a sentence or phrase into smaller units of language, called tokens. It is a critical factor in the quality of search results. Now, it is much easier to add languages to Meilisearch. You just need to follow the instructions on CONTRIBUTING.md.

Meilisearch works perfectly with any space-separated language and has special support for Japanese and Chinese. We now support Hebrew, too, thanks to our awesome community! Other languages will still work, but the quality and relevancy of search results may vary significantly.

We would love to provide global language support. The more feedback we get from native speakers, the easier it is for us to understand how to improve performance for those languages. If you want to help us support your language, we are eager to hear from you and see how we could make progress together!

Other changes

  • We have added pagination to the response of the GET /indexes and the GET /keys endpoints, and we have improved pagination for the GET /indexes/{uid}/documents
  • For performance reasons, we have decided to limit the number of facet values returned per faceted attribute. This limit is customizable and defaults to 100
  • You can customize the number of documents Meilisearch returns on search. The default limit is 1000 and protects the database from malicious scraping. Beware that increasing this limit can affect performance

We apologize in advance for any inconvenience caused by all these changes. It’s for a good cause: we are making these changes now to move towards v1.0 and avoid breaking changes later. Don’t hesitate to reach out to us if you need support or have any doubts. We are always happy to help!

Contributors

We are really grateful for this amazing community. We want to thank @0x0x1, @choznerol, @pierre-l, @ryanrussell, @Thearas, and @walterbm for their help with Meilisearch, and @matthias-wright for his help with milli. We want to send a special shout-out to @benny-n for adding Hebrew to our tokenizer.

And that’s it for v0.28! Remember to check the changelog for the full release notes, and see you next time!

‌‌