

Stupid Simple Duplicate Prevention Using Redis
source link: https://fuzzyblog.io/blog/redis/2020/02/24/stupid-simple-duplicate-prevention-using-redis.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Stupid Simple Duplicate Prevention Using Redis
Feb 24, 2020

So I just saw this log message popup on a SystemD service I wrote yesterday:
Feb 24 10:27:11 ip-172-31-24-213 reddit_to_kafka[18391]: Already exists in redis so also in kafka so skipping
Sometimes you need to solve a problem without a lot of effort. Yesterday I needed to populate a Kafka queue with data and I didn't want to worry about duplicates flowing into it. Here's what I knew:
- My source wasn't a stream but a set of social media posts that I was monitoring
- I didn't have a database
- I didn't want the overhead of checking a database before I inserted
- Every post had an id that I knew would be unique (if I pre-pended it with the name of the source)
Whenever I have a problem like this, I reach for Redis almost instinctively. My stupid, simple solution was as follows:
- create a key i.e. key = "reddit_#{message.id}"
- create a Redis object i.e. redis = Redis.new
- check if the key already exists using redis.exists(key)
- if that check returns true then do nothing
- if that check returns false then add the data and set the key (you want to do this AFTER adding the data in case the add fails; atomicity, even pseudo atomicity is a thing)
The beauty of Redis is that it installs using nothing more than:
sudo apt-get install redis
And that installs a local installation of Redis – and starts it – that any process can easily connect to (and there are always language bindings for Redis seemingly). This easy usability for Redis makes it invaluable for this type of task.
Note 1: Given the size of my input source and its frequency, I'm not even going to worry about the number of keys and the fact that this approach is pretty brain dead. When we get a larger volume data feed, I'll circle back and fix it.
Note 2: It took longer to write this up than it did to actually implement and test this.
Posted In: #redis
Recommend
-
238
前端小密圈 博客签名:若批评无自由,则赞美无意义。 博客目的:风起于青萍之末,浪成于微澜之间。
-
143
Glorytun Glorytun is a small, simple and secure multipath UDP tunnel. Please use the stable branch. Visit the
-
55
a super simple stupid event-loop kernel in pure PHP
-
134
README.md Easy Rules The simple, stupid rules engine for Java™
-
14
A very Simple and Stupid plugin system in pythonA very Simple and Stupid plugin system in python Fri 02 September 2011Two convenience functions for listing and importing python modules : # utils.py import os def plugi...
-
8
Stupid Simple ActiveRecord Optimizations or Why Rails Console is Essential for Development Nov 5, 2019 Ever since 20...
-
9
Stupid Simple Computer Virus in 3 Lines of Code April 19, 2011 Yes, it’s pretty useless and completely harmless. A DOS batch file virus in 3 simple lines. Useless…...
-
9
use-change The one "keep it stupid simple" React hook for application state Define a skeleton of...
-
7
Simple Stupid Funnel Algorithm
-
12
Keep It Stupid Simple please KISS is possibly one of the...
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK