Create a lyrics search engine using RediSearch

How many times have you ever wondered“how does Google figure out which song I’m looking for using only a small part of the lyrics?” I honestly had never asked that to myself, except yesterday, and I decided to try to replicate something similar.

To do all this I needed:

A big amount of music lyrics
A storage
A way to do something like a “full-text search”

I was about to give up on the idea when a friend asked me “why don’t you just do it for metal songs?” The metalhead in me couldn’t hold back and googled “best 1000 metal songs ever”

First step: Download all the lyrics

The title of this step is more complicated than the code. Thx to Spotify Labs you have the chance to find all the information about a song just by searching for the name of the song and the artist name (you can find the code here)and next, using another Spotify Endpoint, you can retrieve all the lyrics for each song! (so simple so fast and a lot of data)

Second step: upload everything to a storage

To store all my data I was looking for a database that:

Is light and hyper-fast
Give me the possibility to be used with Docker for local purpose
Have a managed online version
Give me the possibility to store the information as a JSON
Give me the possibility to perform a full-text search

In the end, I choose Redis. Yes, Redis is not only a simple “in-memory data store” and have a lot of cool features. The first I used (and loved) was RedisJSON

RedisJSON is a Redis module that provides JSON support in Redis. RedisJSON lets your store, update, and retrieve JSON values in Redis just as you would with any other Redis data type.

Using RedisJSON with NodeJS is a breeze. Redis gives us a well-done library @node-redis/json that allows us to store on our Redis server (with the RedisJSON module installed) a javascript object, in fact, doing something like this:

We can easily save under a metalmusic:jsondata:{id} key a real JSON object and perform queries using the internal data structure! (there is a reason why for the choice of the key’s format, but we will see the reason next)

For example, we can get the object querying by the key using the commandJSON.GET {ID}

Or getting only a key of the object using the command JSON.GET {ID} id to obtain only the id of the object or JSON.GET {ID} ..image to obtain all the key images of the stored object

Of course, we can do all of this also using the NodeJS package doing something like this:

const results = await client.json.get(`metalmusic:jsondata:${id}`, {
  path: [
    'id',
    '..image'
  ]
});

Bonus step

One of the bad news of using RedisJSON is that the major part of the DB UI tool has problems showing the information, for example, using TablePlus you will see as value NULL and this is not cool! Of course you can use the terminal and the redis-cli but Redis give us also a UI tool (thx!) called RedisInsight

This tool helps us not only show the right values (and gives us the possibility to edit and delete each key of the JSON)

But also give us some cool things like the possibility to group the keys by prefix (using the : character inside the key)

Third step: Perform the search

One of the most interesting features of using RedisJSON is that is 100% compatible and integrated with RediSearch

RediSearch provides secondary indexing, full-text search, and a query language for Redis. These features enable multi-field queries, aggregation, exact phrase matching, and numeric filtering for text queries.

Also in this case Redis give us a JS and TS compatible library @node-redis/search that allows us, in two really simple passages, to perform searches and aggregations (and many other things)

Create an index
The first thing to do is to create an index:

How you can see creating an index, using the NodeJS module, is really simple. We just have to pass:

The index name
The field we want to use to perform the search or the aggregation
The ON conf params that give Redis the information that the search will be performed on a JSON
The PREFIX that informs Redis to index only the keys that start with a certain prefix (in our case metalmusic:jsondata )

Perform the search
After creating the index, performing a search is really two lines of code!

And the results are perfect, for example, passing as query blood (sorry this is a stereotype), we can see that there are 191 different song that contains the word blood (and see each of the 191 songs)

That’s really cool! But what happens if we try to search for an artist’s name? Like metallica?

No results are returned because we have not added the artist’s name to the index! If we add this property to the index, immediately after, we will have a situation like this one:

To update our index we have to do something like this, and is really simple, right?

Fourth step: Stats!

Another cool feature of RediSearch is aggregation. Thx to the aggregation we can perform a lot of cool queries (also combined) and retrieve from our data a lot of cool information. For example, let’s create a simple aggregation that will return:

The number of different artists in our collection
The number of different albums in our collection
The average duration of the song in our collection

And let’s do this in a unique query!

As you can see the code is a little bit more complex but is really clear, the really cool thing is that STEPS and REDUCE keys are arrays! This seems that in a single query you can perform really a lot of interpolations, combining different steps and different reducers and using in a future step a key created in the step before.

Fourth step: Use a management database

To give you the possibility to test this project there were two possibilities:

Create a server with a Redis server a NodeJS-express app and serve a simple frontend application
Use Github pages to host the frontend, use Vercel to host the NodeJS-express app, and use RedisCloud to host our database

For obvious reasons, I decided to use the second option!

Redis Enterprise Cloud is a fully managed cloud service by Redis. Built for modern distributed applications, enables you to run any query, simple or complex, at sub-millisecond performance at virtually infinite scale without worrying about operational complexity or service availability

And give you all the modules you need to perform your advanced searches!

I’m using the free plan, and if you want you can test it free using this link, which gives you a small database of 30MB. Yes it’s true, it seems very little, but let’s try to do some tests:

I tried to perform 224 different searches using Postman (Europe to Europe requests) and these are the results:

224 different requests and all return the information in less than 500ms! Not bad for a 30MB server and a free plan!

Fifth step: Create a simple web interface

This is more of a fancy step. To give you all the possibility to test the whole system I put everything online using Github pages and Vercel (so the worst env possible). In this way, you can see how much is fast the mechanism just using (again) 30MB of server!