1

Near Duplicates in Survey Data Series

 7 months ago
source link: https://reluctantcriminologists.com/blog-posts/%5B8%5D/dup-index
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Near Duplicates in Survey Data Series

general
rstats
survey research
duplication
fraud
Do you know how to detect exact or near duplicate rows in your data? Read on to learn more!
Authors

Jake Day

Jon Brauer

Maja Kotlaja

Published

October 10, 2023

shining2.jpg
You thought The Shining was scary? Have you looked under your data for near duplicates?

You have reached the landing page for our “Near Duplicates in Survey Data” series. Click below to read the posts in this series:

Trust Issues

multiplicity.jpg

Post 1: Near duplicates in survey data: Like “Multiplicity” but without the humor. (Image from Multiplicity)

Trust Issues: Examining Near Duplicates in Survey Data

Do you know how to detect exact or near duplicate rows in your data? Read on to learn more!

Stumbling in the Dark

coding-dall-e.jpg

Post 2: For this non-programmer, iterating on a percentmatch function in R was not entirely unlike stumbling in the dark. (Image created using DALL-E)

Stumbling in the Dark: Building/Iterating an R Function to Match Stata’s percentmatch

If you are looking for more information about the modified R function we used to detect near duplicates, then you have come to the right place. (Code shared; detailed write-up forthcoming)

Trust Issues, Part 2

matrix-sequel.jpg

Post 3: Investigating near duplicates in different data: Will the sequel live up to the original? (Image from Matrix: Reloaded)

Trust Issues, Part 2: Investigating Near Duplicates in Different Data

This follow-up to our first entry on near duplication in survey data analyzes near duplicates in three more international survey data sets. (Forthcoming)

Citation

BibTeX citation:
@online{day2023,
  author = {Day, Jake and Brauer, Jon and Kotlaja, Maja},
  title = {Near {Duplicates} in {Survey} {Data} {Series}},
  date = {2023-10-10},
  url = {https://www.reluctantcriminologists.com/blog-posts/[8]/dup-index.html},
  langid = {en}
}
For attribution, please cite this work as:
Day, Jake, Jon Brauer, and Maja Kotlaja. 2023. “Near Duplicates in Survey Data Series.” October 10, 2023. https://www.reluctantcriminologists.com/blog-posts/[8]/dup-index.html.

questions? feedback? want to connect?

send us an email


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK