2

PostGIS Data Anonymization

 3 years ago
source link: https://www.zimmi.cz/posts/2020/postgis-data-anonymization/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

PostGIS Data Anonymization

Among all the sensitive spatial data being collected through cellphones and credit cards, our address of residency is probably the most delicate one. Can it be anonymized/pseudonymized/obscured before you share it with your business partners?

Imagine given a set of address points for each of your clients and the set of all address points in the country, you should adjust it in the following way:

  • find the two nearest address points for each address point of your client
  • find the center of these two and the client address point
  • measure the distance of the computed center to each of three points and keep the maximum value
  • make the biggest distance even bigger by adding 10 % of its value
  • ceil the value
  • output the new position and the ceiled distance
qgis.gif

This shifts each address point by a dynamic distance, giving us at least three points within the given distance (one of them being the original address point).

SELECT
    tmp.code,
    ST_X(tmp.new_position) x,
    ST_Y(tmp.new_position) y,
    ceil(MAX(biggest_distance) + MAX(biggest_distance) * 0.1) round_distance
FROM (
    SELECT
        tmp.code,
        tmp.geom,
        ST_Centroid((ST_Union(two_closest_points, tmp.geom))) new_position,
        -- get distance to two closest points and the client address point
        ST_Centroid((ST_Union(two_closest_points, tmp.geom))) <-> (ST_DumpPoints(ST_Union(two_closest_points, tmp.geom))).geom biggest_distance
    FROM (
        SELECT
            r1.code,
            r1.geom,
            ST_Union(neighbours.geom) two_closest_points
        FROM address_points r1,
        LATERAL (
            -- keep two closest points to each client address point
            SELECT
                r2.code,
                r2.geom,
                r1.geom <-> r2.geom distance
            FROM address_points r2
            WHERE r1.code <> r2.code
            ORDER BY r1.geom <-> r2.geom ASC
            LIMIT 2
        ) neighbours
        GROUP BY
            r1.code,
            r1.geom
    ) tmp
) tmp
GROUP BY
    tmp.code,
    tmp.geom,
    tmp.new_position;

You might want to use LATERAL for tasks like this.

If you find this blog useful, you are welcome to make a donation to support it.

Donate via PayPal


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK