Happy new year, everyone! 2021 brings a new version of Meilisearch. How good is that?

In this release, we have fixed some bugs and raised the default payload size when you add documents to Meilisearch. The star of the show, however, is the new tokenizer.  Our main goal is to offer the best search experience, and a good tokenizer is critical to having quality search results. We are very proud of our core team and our contributors—they worked hard, and they delivered! Let’s take a closer look at it.

New tokenizer

The tokenizer is sort of like the brain of Meilisearch: it understands how languages work and adapts the way documents are stored in Meilisearch accordingly. Without a tokenizer, we wouldn't be able to know where a word starts and where it ends. We wouldn’t be able to understand what a user asks for when performing a search.

A Closer Look

The tokenizer’s role is to find and retrieve all the words in a string based on the language’s particularities. Every language requires a unique process. For example, in Latin-based languages, all words are generally separated by spaces, whereas splitting words in Chinese can be more complicated.

Meilisearch’s new tokenizer goes field by field, determining the most likely language for the field and running a different pipeline for each language. Since the tokenizer is modular, adding new languages is much easier than before.

The new tokenizer might not feel different for languages with Latin-based alphabets such as English, but should dramatically improve our Chinese users’ experience.

The evolution of Meilisearch in Chinese

Before, the tokenizer considered one Hanzi character (Chinese character) as one word. The new tokenizer identifies Chinese words made of one or more characters. It assesses the distance between matched query terms more accurately (see: proximity rule) and, as a result, positively improves the relevancy of the search results.

In addition, a single search query will give results in both traditional and simplified Chinese.

Increased max payload size

In previous versions of Meilisearch, the default limit of a document payload was 10 megabytes. We have increased this limit to 100 megabytes. Of course, you still have the possibility to change this limit.

$ ./meilisearch --http-payload-size-limit=209715200

Change limit to 200MB

Bug fixes

Other changes

We’ve made some modifications to our testing suite, dump tests can run in parallel now, and most of the dependencies have been updated.

For a specific list of changes and commits, please check the release notes.

Contributions

We are fortunate to have this amazingly supportive community.

We would like to thank @piaoger for beta-testing the new tokenizer, @woshilapin for improving the CI, and @sanders41 for increasing the default payload size.

We would also like to thank all the contributors who help maintain the toolkits that make Meilisearch easier to use.

Your involvement means a lot to us!

We are always eager to hear our users’ and contributors’ suggestions! Come and talk with us using the method you prefer.


Photo by Nick Fewings on Unsplash