Using GenAI to Classify an Image as a Photo, Screenshot, or Meme
source link: https://www.raymondcamden.com/2024/01/18/using-genai-to-classify-an-image-as-a-photo-screenshot-or-meme
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Using GenAI to Classify an Image as a Photo, Screenshot, or Meme
File this under the "I wasn't sure if it would work and it did" category. Recently, a friend on Facebook wondered if there was some way to take a collection of photos and figure out which were 'real' photos versus memes. I thought it could possibly be a good exercise for GenAI and decided to take a shot at it. As usual, I opened up Google's AI Studio and did a few initial tests:
I then simply removed that image and pasted more info to test. From what I could see, it worked well enough. I then took the source code from AI Studio and began working.
The Code #
First, I grabbed some pictures from my collection, eleven of them, and tried to get a few photos, memes, and screenshots. To make it easier for me, after downloading them I renamed them so it would be quicker to see if it worked right. As I mentioned above, AI Studio gave me the code, but I modified it slightly so I could pass a directory of images:
import fs from 'fs/promises';
import 'dotenv/config';
import { GoogleGenerativeAI, HarmCategory, HarmBlockThreshold } from '@google/generative-ai';
const MODEL_NAME = "gemini-pro-vision";
const API_KEY = process.env.GOOGLE_AI_KEY;
async function detectPhoto(path) {
const genAI = new GoogleGenerativeAI(API_KEY);
const model = genAI.getGenerativeModel({ model: MODEL_NAME });
const generationConfig = {
temperature: 0.4,
topK: 32,
topP: 1,
maxOutputTokens: 4096,
};
const safetySettings = [
{
category: HarmCategory.HARM_CATEGORY_HARASSMENT,
threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
},
{
category: HarmCategory.HARM_CATEGORY_HATE_SPEECH,
threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
},
{
category: HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT,
threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
},
{
category: HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT,
threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
},
];
const parts = [
{text: "Look at the following photo and tell me if it's a photo, a screenshot, or a meme. Answer with just one word.\n"},
{
inlineData: {
mimeType: "image/jpeg",
data: Buffer.from(await fs.readFile(path)).toString("base64")
}
},
{text: "\n\n"},
];
const result = await model.generateContent({
contents: [{ role: "user", parts }],
generationConfig,
safetySettings,
});
const response = result.response;
return response.text();
}
const root = './source_for_detector/';
let files = await fs.readdir(root);
for(const file of files) {
console.log(`Check to see if ${file} is a photo, meme, or screenshot...`);
let result = await detectPhoto(root + file);
console.log(result);
}
It worked perfectly!
If you want a copy of the source, you can grab it here: https://github.com/cfjedimaster/ai-testingzone/tree/main/detect_meme_ss
The Photos #
Ok, technically you can just head over to the GitHub repo to see these, but here are the source images. First, the 'regular' photos:
Next, the screenshots:
And finally, the memes. Enjoy.
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK