9

Comparing Images using Node and JIMP

 3 years ago
source link: https://www.codedrome.com/comparing-images-node-jimp/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Post navigation

← Previous Next →

Comparing Images using Node and JIMP

Posted on 22nd April 2020
jimp_comparing_images_banner.png

Another post using the JIMP npm package, this time experimenting with several methods for comparing images to find duplication or plagiarism.

The full documentation for JIMP can be found at https://www.npmjs.com/package/jimp. I will be using three methods for comparing images:

  • hash: this returns a 64 bit perceptual hash of an image. Unlike the cryptographic hashing you might be familiar with, perceptual hashes vary in a way roughly proportional to the differences in input, so the hashes of similar images will also be similar.
  • distance: the Hamming distance between the hashes of two images, ie. the number of bits which differ.
  • diff: the percentage difference between two images.

The JIMP documentation linked to above recommends using both distance and diff to compare images. If either are less than 0.15 then the images can be considered to be the same. They claim 99% success with 1% false positives.

However, there were a few unanswered questions in my mind about this process:

  • Does it work if one of the images has been converted to black and white?
  • Does it work if the images are different sizes?
  • Does it work if one of the images has been slightly enhanced, for example sharpened?
  • Does it work with heavy editing, for example if one image is highly pixellized?

So as not to keep you in suspense, I found that it does work well in all these cases. All four edited images had the exact same hashes as the unedited original (and therefore the same Hamming distances) although the percentage differences did vary quite a bit. However, as long as at least one measure is less than 0.15 the images are flagged as identical according to the recommended methodology.

In this post I will show the source code used to test these cases, using the images below. There is also a completely different image which I have thrown in just to see what happens.

edinburgh_original.jpg

edinburgh_original.jpg

edinburgh_sharpened.jpg

edinburgh_sharpened.jpg

edinburgh_bw.jpg

edinburgh_bw.jpg

edinburgh_pixelized.jpg

edinburgh_pixelized.jpg

edinburgh_small.jpg

edinburgh_small.jpg

london.jpg

london.jpg

The source code can be downloaded as a ZIP, or you can clone the Github repository if you prefer.

Source Code Links

ZIP File
GitHub

This is the source code.

comparingimages.js

compare();

async function compare()
{
    const Jimp = require("jimp");

const edinburgh_original = await Jimp.read("edinburgh_original.jpg");
    const edinburgh_sharpened = await Jimp.read("edinburgh_sharpened.jpg");
    const edinburgh_bw = await Jimp.read("edinburgh_bw.jpg");
    const edinburgh_pixelized = await Jimp.read("edinburgh_pixelized.jpg");
    const edinburgh_small = await Jimp.read("edinburgh_small.jpg");
    const london = await Jimp.read("london.jpg");

console.log("Images compared to edinburgh_original.jpg\n=========================================");
    console.log(`hash (base 64) ${edinburgh_original.hash()}`);
    console.log(`hash (binary)  ${edinburgh_original.hash(2)}\n`);

console.log("edinburgh_sharpened.jpg\n=======================");
    console.log(`hash (base 64) ${edinburgh_sharpened.hash()}`);
    console.log(`hash (binary)  ${edinburgh_sharpened.hash(2)}`);
    console.log(`distance       ${Jimp.distance(edinburgh_original, edinburgh_sharpened)}`);
    console.log(`diff.percent   ${Jimp.diff(edinburgh_original, edinburgh_sharpened).percent}\n`);

console.log("edinburgh_bw.jpg\n================");
    console.log(`hash (base 64) ${edinburgh_bw.hash()}`);
    console.log(`hash (binary)  ${edinburgh_bw.hash(2)}`);
    console.log(`distance       ${Jimp.distance(edinburgh_original, edinburgh_bw)}`);
    console.log(`diff.percent   ${Jimp.diff(edinburgh_original, edinburgh_bw).percent}\n`);

console.log("edinburgh_pixelized.jpg\n=======================");
    console.log(`hash (base 64) ${edinburgh_pixelized.hash()}`);
    console.log(`hash (binary)  ${edinburgh_pixelized.hash(2)}`);
    console.log(`distance       ${Jimp.distance(edinburgh_original, edinburgh_pixelized)}`);
    console.log(`diff.percent   ${Jimp.diff(edinburgh_original, edinburgh_pixelized).percent}\n`);

console.log("edinburgh_small.jpg\n===================");
    console.log(`hash (base 64) ${edinburgh_small.hash()}`);
    console.log(`hash (binary)  ${edinburgh_small.hash(2)}`);
    console.log(`distance       ${Jimp.distance(edinburgh_original, edinburgh_small)}`);
    console.log(`diff.percent   ${Jimp.diff(edinburgh_original, edinburgh_small).percent}\n`);

console.log("london.jpg\n==========");
    console.log(`hash (base 64) ${london.hash()}`);
    console.log(`hash (binary)  ${london.hash(2)}`);
    console.log(`distance       ${Jimp.distance(edinburgh_original, london)}`);
    console.log(`diff.percent   ${Jimp.diff(edinburgh_original, london).percent}\n`);
}

The compare function is async as I have used await to open the images. As this is just an experiment I have omitted error handling although of course any production code interacting with the outside world, for example the file system, should handle errors.

After the images have been opened the hash of the original image is output. When called with no argument the hash function returns a base 64 number but you can also specify a base. Here I have also printed the binary or base 2 equivalent.

The rest of the code is repetitive, calculating the hashes, distances and percentage differences between the original image and the others.

The functions used here are relatively resource-intensive and running this program with even six small photos takes 2-3 seconds. Bear this in mind if you happen to be writing any code to compare large numbers of images.

Now let's run the code.

node comparingimages.js

Program output

Images compared to edinburgh_original.jpg
=========================================
hash (base 64) dH20I0B00aM
hash (binary)  1101101011000010000000101100000000100101000000000000001010110000

edinburgh_sharpened.jpg
=======================
hash (base 64) dH20I0B00aM
hash (binary)  1101101011000010000000101100000000100101000000000000001010110000
distance       0
diff.percent   0.08049583333333334

edinburgh_bw.jpg
================
hash (base 64) dH20I0B00aM
hash (binary)  1101101011000010000000101100000000100101000000000000001010110000
distance       0
diff.percent   0.13681666666666667

edinburgh_pixelized.jpg
=======================
hash (base 64) dH20I0B00aM
hash (binary)  1101101011000010000000101100000000100101000000000000001010110000
distance       0
diff.percent   0.25950833333333334

edinburgh_small.jpg
===================
hash (base 64) dH20I0B00aM
hash (binary)  1101101011000010000000101100000000100101000000000000001010110000
distance       0
diff.percent   0.34801666666666664

london.jpg
==========
hash (base 64) awvjOFbaIoE
hash (binary)  1010100000011111010011110010101001001011001010101100011000101000
distance       0.515625
diff.percent   0.8483791666666667

As I mentioned above the hashes and Hamming distances are identical for the Edinburgh photos, although the percentage differences are increasingly higher. Note that "percent" is misleading; these numbers are actually decimals so, for example, 0.5 = 50%.

Not surprisingly the London photo is very different by all measures.

This entry was posted in JavaScript by Chris Webb. Bookmark the permalink.

Search

My name is Chris Webb and I am a software engineer based in London.

I started programming in Sinclair Basic on a ZX81, and have subsequently used a wide range of languages including C, C++, C#, PHP, JavaScript and Python, used a numbers of RDBMSs including SQL Server, MySQL and PostgreSQL, and more frameworks and libraries than I can remember.

There are many programming blogs around but they mostly provide “how-to” tutorials with little or no explanation of how the information they give can be put to use.

My aim with this blog is to be a bit different by presenting projects which do something practical, useful and interesting. I hope you like them.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK