

What is the difference between the way Essentia and Librosa generate MFCCs?
source link: https://dev.to/enutrof/what-is-the-difference-between-the-way-essentia-and-librosa-generate-mfccs-13n3
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

What is the difference between the way Essentia and Librosa generate MFCCs?
Jul 3
・3 min read
I have been working on a music genre classification project for some time now and from the literature, I figured that MFCCs are the best features to start with. Though there are various libraries that implement the feature extraction, my focus has been on librosa
and essentia
.
Disclaimer:
This is not a piece that aims to answer the question but merely shed more light on why it is being asked and get responses.
MFCC stands for Mel Frequency Cepstral Coefficient which is a fundamental audio feature. The MFCC uses the MEL scale to divide the frequency band to sub-bands and then extracts the Cepstral Coefficients using Discrete Cosine Transform (DCT). The MEL scale is based on the way humans distinguish between frequencies which makes it very convenient to process sounds.
It is a scale of pitches judged by listeners to be equal in distance one from another. Because of how humans perceive sound, the MEL scale is a non-linear scale and the distances between the pitches increases with frequency.
LIBROSA
librosa
is an API for feature extraction and processing data in Python. librosa.feature.mfcc
is a method that simplifies the process of obtaining MFCCs by providing arguments to set the number of frames, hop length, number of MFCCs and so on. Based on the arguments that are set, a 2D array is returned.
ESSENTIA
essentia
is a full function workflow environment for high and low level features, facilitating audio input, preprocessing and statistical analysis of output. It was written in C++ with Python binding and exports data in YAML or JSON format.
The essentia.standard.MFCC
function has a parameter to fix the number of coefficients in the MFCC but processes the entire file in one go returning a 1D array. The library however also has a FrameGenerator
method that takes in other parameters which could make it yield similar results with librosa
.
Making Essentia's MFCCs like Librosa
I used the FrameGenerator
method to set other parameters like the hop length, number of frames and number of MFCCs to be the same as those used with librosa. Also, the sample rate and windowing type were modified to be the same for both libraries.
I then used both functions to generate MFCCs of the same shape for 20 tracks. Two of these are visualized below.
My observation was that even with this modification, essentia
was still about 2 times faster than librosa
(this was the primary metric I wanted to compare). However, I also noticed something else. The MFCCs did not look the same.
How different are the MFCCs from Librosa and Essentia?
Upon seeing the visual difference between them, I found the cosine similarity between the two MFCCs with the aim of quantifying it. For the two tracks displayed, the similarities were:
-
Africa Yako:
0.9019551277160645
-
So To Where:
0.9127510786056519
Generally, the similarities ranged between 0.90
and 0.94
.
If you know the reason for this difference between the MFCCs or perhaps can identify a parameter that I am not considering, please do not hesitate to drop a comment. Thanks.
References:
Recommend
-
97
When people are new to JPA, Hibernate or EclipseLink, they are often confused about the difference between them and which one they should use in their project. If you're one of them, don't worry. It's a lot easier than it seems.
-
78
Constructor and Destructor are the special methods which makes our crucial tasks easier while programming. If you have ever worked in programming language like C++ then you must have encountered these two terms. Being met...
-
46
I see a lot of questions asking about the difference between BeanFactory and ApplicationContext. Along with that, I get the q...
-
49
FSFE fellows recently started discussing my blog posts about Who were the fellowship? and
-
36
With all the new properties related to CSS Grid Layout, one of the distinctions that always confused me was the difference between the grid-template-* and grid-auto-* properties. Specifica...
-
87
Our learn section helps you get started with various web and software skills. Free online books, videos, and ebooks get you off the ground as quickly as possible.
-
47
If you’ve built out a REST API in Node (or other languages, for that matter), you’ve likely used the concept of “controllers” to help organize...
-
77
Architecture of a micro computer or a micro controller refers to the arrangement of the CPU with respect of the RAM and ROM. Hence, the Von-Neuman and Harvard architecture are the two ways through which the micro controll...
-
43
Time Zone Converter - Timely Compare time difference between zones - NEXT
-
23
One of the most popular questions I receive during my Comprehensive Database Performance Health Check is a differenc...
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK