PostGIS Data Anonymization
source link: https://www.zimmi.cz/posts/2020/postgis-data-anonymization/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
PostGIS Data Anonymization
Among all the sensitive spatial data being collected through cellphones and credit cards, our address of residency is probably the most delicate one. Can it be anonymized/pseudonymized/obscured before you share it with your business partners?
Imagine given a set of address points for each of your clients and the set of all address points in the country, you should adjust it in the following way:
- find the two nearest address points for each address point of your client
- find the center of these two and the client address point
- measure the distance of the computed center to each of three points and keep the maximum value
- make the biggest distance even bigger by adding 10 % of its value
- ceil the value
- output the new position and the ceiled distance
This shifts each address point by a dynamic distance, giving us at least three points within the given distance (one of them being the original address point).
SELECT tmp.code, ST_X(tmp.new_position) x, ST_Y(tmp.new_position) y, ceil(MAX(biggest_distance) + MAX(biggest_distance) * 0.1) round_distance FROM ( SELECT tmp.code, tmp.geom, ST_Centroid((ST_Union(two_closest_points, tmp.geom))) new_position, -- get distance to two closest points and the client address point ST_Centroid((ST_Union(two_closest_points, tmp.geom))) <-> (ST_DumpPoints(ST_Union(two_closest_points, tmp.geom))).geom biggest_distance FROM ( SELECT r1.code, r1.geom, ST_Union(neighbours.geom) two_closest_points FROM address_points r1, LATERAL ( -- keep two closest points to each client address point SELECT r2.code, r2.geom, r1.geom <-> r2.geom distance FROM address_points r2 WHERE r1.code <> r2.code ORDER BY r1.geom <-> r2.geom ASC LIMIT 2 ) neighbours GROUP BY r1.code, r1.geom ) tmp ) tmp GROUP BY tmp.code, tmp.geom, tmp.new_position;
You might want to use LATERAL
for tasks like this.
If you find this blog useful, you are welcome to make a donation to support it.
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK