Create a lyrics search engine using RediSearch
source link: https://blog.canellariccardo.it/create-a-lyrics-search-engine-using-redisearch-9261ebe5d76
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Create a lyrics search engine using RediSearch
How many times have you ever wondered“how does Google figure out which song I’m looking for using only a small part of the lyrics?” I honestly had never asked that to myself, except yesterday, and I decided to try to replicate something similar.
To do all this I needed:
- A big amount of music lyrics
- A storage
- A way to do something like a “full-text search”
I was about to give up on the idea when a friend asked me “why don’t you just do it for metal songs?” The metalhead in me couldn’t hold back and googled “best 1000 metal songs ever”
First step: Download all the lyrics
The title of this step is more complicated than the code. Thx to Spotify Labs you have the chance to find all the information about a song just by searching for the name of the song and the artist name (you can find the code here)and next, using another Spotify Endpoint, you can retrieve all the lyrics for each song! (so simple so fast and a lot of data)
Second step: upload everything to a storage
To store all my data I was looking for a database that:
- Is light and hyper-fast
- Give me the possibility to be used with Docker for local purpose
- Have a managed online version
- Give me the possibility to store the information as a JSON
- Give me the possibility to perform a full-text search
In the end, I choose Redis. Yes, Redis is not only a simple “in-memory data store” and have a lot of cool features. The first I used (and loved) was RedisJSON
RedisJSON is a Redis module that provides JSON support in Redis. RedisJSON lets your store, update, and retrieve JSON values in Redis just as you would with any other Redis data type.
Using RedisJSON with NodeJS is a breeze. Redis gives us a well-done library @node-redis/json
that allows us to store on our Redis server (with the RedisJSON module installed) a javascript object, in fact, doing something like this:
We can easily save under a metalmusic:jsondata:{id}
key a real JSON object and perform queries using the internal data structure! (there is a reason why for the choice of the key’s format, but we will see the reason next)
For example, we can get the object querying by the key using the commandJSON.GET {ID}
Or getting only a key of the object using the command JSON.GET {ID} id
to obtain only the id
of the object or JSON.GET {ID} ..image
to obtain all the key images of the stored object
Of course, we can do all of this also using the NodeJS package doing something like this:
const results = await client.json.get(`metalmusic:jsondata:${id}`, {
path: [
'id',
'..image'
]
});
Bonus step
One of the bad news of using RedisJSON is that the major part of the DB UI tool has problems showing the information, for example, using TablePlus
you will see as value NULL
and this is not cool! Of course you can use the terminal and the redis-cli
but Redis give us also a UI tool (thx!) called RedisInsight
This tool helps us not only show the right values (and gives us the possibility to edit and delete each key of the JSON)
But also give us some cool things like the possibility to group the keys by prefix (using the :
character inside the key)
Third step: Perform the search
One of the most interesting features of using RedisJSON is that is 100% compatible and integrated with RediSearch
RediSearch provides secondary indexing, full-text search, and a query language for Redis. These features enable multi-field queries, aggregation, exact phrase matching, and numeric filtering for text queries.
Also in this case Redis give us a JS
and TS
compatible library @node-redis/search
that allows us, in two really simple passages, to perform searches and aggregations (and many other things)
Create an index
The first thing to do is to create an index:
How you can see creating an index, using the NodeJS module, is really simple. We just have to pass:
- The index name
- The field we want to use to perform the search or the aggregation
- The
ON
conf params that give Redis the information that the search will be performed on a JSON - The
PREFIX
that informs Redis to index only the keys that start with a certain prefix (in our casemetalmusic:jsondata
)
Perform the search
After creating the index, performing a search is really two lines of code!
And the results are perfect, for example, passing as query blood
(sorry this is a stereotype), we can see that there are 191 different song that contains the word blood
(and see each of the 191 songs)
That’s really cool! But what happens if we try to search for an artist’s name? Like metallica
?
No results are returned because we have not added the artist’s name to the index! If we add this property to the index, immediately after, we will have a situation like this one:
To update our index we have to do something like this, and is really simple, right?
Fourth step: Stats!
Another cool feature of RediSearch is aggregation
. Thx to the aggregation we can perform a lot of cool queries (also combined) and retrieve from our data a lot of cool information. For example, let’s create a simple aggregation that will return:
- The number of different artists in our collection
- The number of different albums in our collection
- The average duration of the song in our collection
And let’s do this in a unique query!
As you can see the code is a little bit more complex but is really clear, the really cool thing is that STEPS
and REDUCE
keys are arrays! This seems that in a single query you can perform really a lot of interpolations, combining different steps and different reducers and using in a future step a key created in the step before.
Fourth step: Use a management database
To give you the possibility to test this project there were two possibilities:
- Create a server with a Redis server a NodeJS-express app and serve a simple frontend application
- Use Github pages to host the frontend, use Vercel to host the NodeJS-express app, and use RedisCloud to host our database
For obvious reasons, I decided to use the second option!
Redis Enterprise Cloud is a fully managed cloud service by Redis. Built for modern distributed applications, enables you to run any query, simple or complex, at sub-millisecond performance at virtually infinite scale without worrying about operational complexity or service availability
And give you all the modules you need to perform your advanced searches!
I’m using the free plan, and if you want you can test it free using this link, which gives you a small database of 30MB. Yes it’s true, it seems very little, but let’s try to do some tests:
I tried to perform 224 different searches using Postman (Europe to Europe requests) and these are the results:
224 different requests and all return the information in less than 500ms
! Not bad for a 30MB
server and a free plan!
Fifth step: Create a simple web interface
This is more of a fancy step. To give you all the possibility to test the whole system I put everything online using Github pages and Vercel (so the worst env possible). In this way, you can see how much is fast the mechanism just using (again) 30MB of server!
The code of this project is open source and you can find it here:
GitHub - thecreazy/metal-song-search: Metal song search using RedisJSON and RedisSearch
You can't perform that action at this time. You signed in with another tab or window. You signed out in another tab or…
Some cool references
If you liked the article please clap and follow :)
Thx and stay tuned 🚀
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK