7

GitHub blocks FLoC across all of GitHub Pages

 2 years ago
source link: https://news.ycombinator.com/item?id=26967903
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
GitHub blocks FLoC across all of GitHub Pages
The internet is split roughly into 3. The top 100 websites get a third of the page views, the remaining top 10k get another third and millions of websites get the last third.

The top 100 have dedicated engineering and policy teams teams that will disable FLoC because they're either not interested in ads (Wikipedia) or have their own first party implementation that doesn't need FLoC (Facebook). They'll ditch FLoC.

The next 10k might have engineering teams that can make the change, but might be more interested in finding out about their audience so they can monetize more easily. They'll keep FLoC.

As for the remaining millions, only a tiny minority of them will even know this is a thing, let alone care enough to make the change or contact a developer who can do it. These are the folks who have hosted their wordpress site with GoDaddy because it was cheap and quick when they needed a site. They'll keep FLoC.

So the upshot is that github.com, instagram.com and amazon.com might opt out, but the vast majority of the web will not. Me prediction is that at least half of all web pages loaded by users won't have this header.

s.gif
Cloudflare, Akamai, Fastly and other CDNs should disable FLoC by default for all customers, and provide a toggle to those customers who explicitly wish to enable it.

But until they do[1]:

Apache:

    Header always set Permissions-Policy: interest-cohort=()
Caddy:
    header Permissions-Policy "interest-cohort=()"
Cloudflare Workers (not free as there are limits):
    addEventListener('fetch', event=> {
        event.respondWith(handleRequest(event.request))
    })
    async function handleRequest(request) {
        let response=await fetch(request)
        let newHeaders=new Headers(response.headers)
        newHeaders.set("Permissions-Policy","interest-cohort=()")
        return new Response(response.body, {
            status: response.status,
            statusText: response.statusText,
            headers: newHeaders
        })
    }
Lighttpd:
    server.modules +=("mod_setenv")
    setenv.add-response-header=("Permissions-Policy"=>"interest-cohort=()")
Netlify:
    [[headers]] for="/*"
    [headers.values] Permissions-Policy="interest-cohort=()"
Nginx:
    add_header Permissions-Policy interest-cohort=();

[1] https://github.com/WICG/floc#opting-out-of-computation
s.gif
This kind of post that provides no context will lead to cargo-cult, with people blindly copying and pasting these directives, and believing they have increased the privacy of their site...

If your web site does not include ads, FLoC is already disabled. Here, "ads" mean ads that EasyList can detect. This HTTP header will just make your config more complex and your responses slightly bigger, with no change of behaviour.

If you include external ads on your pages, then I doubt disabling FLoC will increase your visitors' privacy, but at least this header will have a real effect.

s.gif
> If your web site does not include ads, FLoC is already disabled

Citation? Here's what the FLoC explainer says:

> All sites with publicly routable IP addresses that the user visits when not in incognito mode will be included in the POC cohort calculation.

https://github.com/WICG/floc#sites-which-interest-cohorts-wi...

This sounds to me like all sites, whether they contain ads are not, are used to cluster users into cohorts.

s.gif
From https://web.dev/floc/ in section "Do websites have to participate and share information?"

> For pages that haven't been excluded, a page visit will be included in the browser's FLoC calculation if document.interestCohort() is used on the page.

> During the current FLoC origin trial, a page will also be included in the calculation if Chrome detects that the page loads ads or ads-related resources.

s.gif
Excluding some portion of sites from a user’s cohort calculation doesn’t necessarily make a user less unique if a nontrivial number of sites doesn’t opt out.

I wrote more about this on my site: https://seirdy.one/2021/04/16/permissions-policy-floc-misinf...

s.gif
Thank you, this was informative.
s.gif
Be sure to add `always` to the nginx header:
    add_header Permissions-Policy interest-cohort=() always;
s.gif
For those wondering: this causes the header to be set on all responses. By default it will not be set on some error responses.
s.gif
Unclear to me are what these headers do to the browser.

I mean... the docs say that they are a "site" header that you should apply to a "page". Does that mean that you must apply it to all pages to exclude a site? Is absence on one page taken as opting back in to FLoC?

If the scope is site, then it would be better as a DNS entry. I've a feeling the scope is truly page though and I've also a feeling that most people who choose to add this header will add it on all assets now - which is a bit of a waste of bytes (even with header compression in place) but would be the only way to guarantee that all pages have it.

s.gif
> Cloudflare, Akamai, Fastly and other CDNs should disable FLoC by default for all customers

And this is when Google will release their own Cloudflare competitor product.

BTW do they something as popular as Cloudflare already? I'm very unfamiliar with Google's offerings.

s.gif
GCP does have a CDN product: https://cloud.google.com/cdn/

But at this point in time I think it'd be unfair to call Cloudflare "just a CDN" so not really equivalent.

From what I've heard through the technical operations jungle. Google has been pushing their CDN product hard for a long time, which isn't a shock since they've been trying to push GCP hard for a long time. But it's a little like AWS's Cloudfront CDN. It's very very rare to see someone using an AWS Cloudfront or GCP CDN... that isn't on said cloud platform already.

s.gif
Thanks, just disabled this on my tiny near zero traffic sites.

Hopefully if enough people disable it, it will become useless.

s.gif
Yes, it will become useless. The header that is..
s.gif
Google already has shown bad faith in opt-out headers like this when they immediately started ignoring Do-Not-Track as soon as non-Chrome browsers made it a default. The fact that the spec for this awful project uses an opt-out instead of an opt-in header seems a pretty clear signal to me that Google may not have any intention of following it in the long run.
s.gif
If you are using HAProxy you can use the following:
    http-response set-header Permissions-Policy interest-cohort=()
s.gif
Cloudflare, etc. should do what their customers want and not make these type of decisions for them. They are CDNs and not the owners of their customer's websites.
s.gif
Cloudflare's mission is to "help build a better internet", and to that end have made a lot of opinionated decisions to increase security and performance. Where possible options are given to customers, but the opinionated way wins by default.

Examples: Turned on HTTPS for all customers, gave image compression and optimisation to all customers, moved customers to the latest TLS as soon as possible (help drive adoption), provide tools to obscure email addresses on web pages to minimise harvesting, 1.1.1.1 privacy focused DNS, etc.

FLoC is something that an opinion can easily be formed on, and where Google have said to each site operator "you must opt-out", Cloudflare can hold an opinion that default opt-out is bad for the internet and that opt-in is better... and if they make an option that defaults to adding this header but granting customers a means to toggle it off... then all Cloudflare will have done is what Google should have done... made this opt-in by default.

s.gif
GP is proposing that they give their customers the option — just that the default state should be "off".
s.gif
Then their customers should have the option to opt in. Is that fine with you?
s.gif
the opt-in is choosing CF who's known to make many decisions for you. its not a new pattern for CF clients
s.gif
NodeJS + express:
    app.use((req, res, next) => {
        res.setHeader("Permissions-Policy", "interest-cohort=()");
        return next();
    });
s.gif
According to their own numbers, WordPress accounts for 41% of the total number of websites, and they're considering switching off FLoC by default (https://core.trac.wordpress.org/ticket/53069 - discussion not settled yet though).
s.gif
How many of those 41% are actually maintained and updated WordPress installs though
s.gif
In a lot of cases, it automatically updates.
s.gif
I think this is great news. Are there any paths towards supporting groups aligned with this position? I am aware of eff.
s.gif
It's kinda sad that you can't just run a web server and host your own homepage anymore. You need to mess with your webserver config to make it spam the client with a dozen HTTP headers to disable FLoC, enable HSTS, set this weird same site origin policy thing, disallow iframe embedding... Luckily enough someone had the idea to make it so HTTP headers will be compressed too, so we can add some more before the request header completely fills up the initial RWIN of the server.
s.gif
Strange, I'm able to run my own webserver without worrying about any of that. Just a default deploy of Nginx.

Disable FLoC if you want, but Google could always change it in future and ignore the header.

s.gif
I think that parent meant that default should be the most secure and least privacy invading, and allow people to explicitly soften the restrictions.
s.gif
You don't have to do any of that. If the user wants to use a browser that sends all of their private data to Google, it's not my job to stop them.
s.gif
IMHO it's far more unbalanced than "The top 100 websites get a third of the traffic, the remaining top 10k get another third and millions of websites get the last third." Purely from data traffic, youtube and netflix already get a third of the traffic (and that's just 2, not 100); and purely from pageview perspective, the top social media sites plus major media sites (again, a subset of the top100) get more than half IIRC.

I wouldn't be surprised if the top 100 websites get 80% of the traffic, the remaining top 10k get 10% and all the millions of other sites get the last 10%.

s.gif
I meant pageviews, not bandwidth consumed. Streaming websites are always going to dominate the latter.

This 100-10k-millions split statistic was pulled from a talk by Ilya Grigorik, who had worked on Web Performance at Google. I'm guessing they based it on data from Chrome.

s.gif
Searches and time spent browsing are very different. Google likely doesn’t have visibility over how much time people spend on TikTok. Or perhaps Android collects that data.
s.gif
Unless you view it in their browser, then they are more than likely to have that visibility.

And Chrome is the most popular browser last I checked, so probably a fairly good indication of overall trends can be drawn from such statistics

s.gif
The header is pointless anyway for the actual purpose of disabling FLoC as Chrome/Google will simply start ignoring it when enough websites add it.
s.gif
> will disable FLoC because they're either not interested in ads (Wikipedia)

Wikipedia does not need to take any action to disable FLoC; it's only active if the site opts in, on a per-pageview basis:

* If you call document.interestCohort() to get a FLoC id for a user, that pageview will be included in FLoC calculation.

* For the origin trial, to deal with the chicken-and-egg problem, a pageview is included you load ads (determined with EasyList)

See https://web.dev/floc/

(Disclosure: I work on ads at Google, speaking only for myself)

s.gif
So, every page with ads (as determined by an opaque and ever-changing method in a closed source browser), will be included while computing the cohort. Got it. It seems safest to assume that "it happens on every page" since so much of the internet is monetized with ads.

Please forgive us for not trusting Google's "we pinky swear it will change". We have no real reason to trust that Google will keep their word.

s.gif
> every page with ads (as determined by an opaque and ever-changing method in a closed source browser)

Chrome'a ad detection code is open source; https://chromium.googlesource.com/chromium/src/+/master/docs... is a good place to start.

s.gif
No, Chromium's ad detection code is open source. Chrome's is closed source. It may very well be the same code, but there is no (practical) way to verify that, other than trusting Google.

But as I already indicated, I have trust issues with Google.

s.gif
Reverse engineering of binaries is a well-understood field. Ensuring a binary and source code align is not a fully automated task at this time as far as I know, but is well within the capabilities of our industry.
s.gif
Capability and practicality are distinct concepts. Especially since it's fairly well known that Chrome will not align 100% with Chromium thanks to closed source additions and modifications, so it becomes a question of "what is different" instead of "are they different".

It's certainly not practical for me, when I can just avoid chrome for personal usage (and I'm thankful for the capability). Of course, I can't avoid it entirely, thanks to my company deciding that Chrome is the only supported browser for our product. So even though FLoC is a non-issue for me personally, it is still something I need to worry about professionally.

s.gif
EasyList is extremely broad, changes frequently, and e.g. has included communities having banners for community events as ads, and Google appears to give zero promises on the stability of this.

So relying on this requires continuous monitoring that Chrome doesn't randomly decide to tag something on some page as an ad, which is doing even more work just to cater to Google whims. Just blocking it is the sane choice here.

since you say it is about "chicken-and-egg problem" during the trial: is there a clear commitment somewhere that Google plans to not include pages that do not use the FLoC API in the future?

s.gif
> is there a clear commitment somewhere that Google plans to not include pages that do not use the FLoC API in the future?

Not something as clear as I'd like. The closest I see is:

A page visit will be included in the browser's FLoC calculation if document.interestCohort() is used on the page. During the current FLoC origin trial, a page will also be included in the calculation if Chrome detects that the page loads ads or ads-related resources. -- https://web.dev/floc/#do-websites-have-to-participate-and-sh...

s.gif
> Disclosure: I work on ads at Google

Can I ask why? I honestly can't understand how anyone could.

s.gif
I think advertising is positive [1] and the role of ads in funding freely-available sites is very important. My current work is primarily on how browsers can allow more private and secure advertising [2][3][4] which I think most people will agree is valuable even if they are less in favor of advertising in general.

At a lower level, I do this job because I'm paid, which allows me to donate. [5] But I wouldn't do this work if I thought it was harmful; there are lots of different kinds of jobs I could take.

[1] https://www.jefftk.com/p/effect-of-advertising

[2] https://github.com/google/fledge-shim

[3] https://github.com/WICG/turtledove/issues/161

[4] https://github.com/WICG/webpackage/issues/624

[5] https://www.jefftk.com/donations

s.gif
I'm simply blown away by that donations link. Here was me feeling happy about the little I give, but it really puts into perspective how much I keep for myself.
s.gif
Google, and it's method for advertising, basicaly destroyed the news industry. If you don't think your work is harmful it simply means you haven't looked into the repercussions enough.
s.gif
Internet advertising and the internet in general has made newspapers less profitable. But this was happening regardless of what Google did. 92% of the decline came from loss of classified revenue (https://mumbrella.com.au/de-classified-what-really-happened-...). Obviously it makes no sense to vilify Craigslist, because someone else would have provided free, searchable classifieds if Craigslist hadn't. That's the nature of the internet, which has reduced the cost of publishing to nearly nothing.

A parallel to the demise of the newspaper classifieds is the once thriving industry of people who would copy books by hand in the 14th century. Then Gutenberg created a printing press that could make copies of books in a fraction of the time. Life didn't get better for those folks who's skills were no longer needed, but maybe it did for society as a whole. But for sure it didn't and doesn't make sense to vilify people who work at printing presses.

You're looking for a "bad guy" when maybe none exists.

s.gif
Disagree.

The number one newspaper killer is... Craigslist.

Also the general move away from paper printed and delivered every day to internet news delivery.

s.gif
> I think advertising is positive [1]

That link only works if we buy into the premise: "One way to think about this is, what would the world would be like if we didn't allow advertising? No internet ads, TV ads, magazine ads, affiliate links, sponsored posts, product placement, everything."

However, no. I don't buy that premise at all. The state of ads as it is now is actively harmful with very little to show for in terms of "new non-stickier products" etc.

s.gif
Yeah, the all-or-nothing approach is pretty hard to buy into.

What about ads, but static and not-tracking? Is that still equally negative? Is that still equally positive?

s.gif
> What about ads, but static and not-tracking?

Coincidentally, my current project involves this Chrome proposal for supporting self-contained remarketing ads without individual tracking: https://github.com/WICG/turtledove

s.gif
> Wikipedia does not need to take any action to disable FLoC [...] If you call document.interestCohort() to get a FLoC id for a user

It is still a problem for Wikipedia, because the global Javascript for each language is editable by a subset of the editors for that language (https://en.wikipedia.org/wiki/MediaWiki:Common.js for instance), unlike the HTTP headers which can only be changed by the Wikimedia sysadmins.

s.gif
If someone can execute arbitrary JS they can already exfiltrate any information they want with userid attached, in addition to impersonating users etc. This is a far bigger risk than that they might add a call to document.interestCohort() and opt that page into FLoC?
s.gif
Google Analytics is arbitrary JS and JS can turn on floc... so using GA is a potential point to add you to floc.

Obviously its probably a small overlap of "doesn't trust floc" and "still trusts GA" but still a Trojan horse.

PS. Wow kudos for being so responsive, especially/despite less than stellar comments regarding your openness about working for G-Ads.

s.gif
It doesn't look like Google Analytics does this today:
    $ curl -sS https://www.google-analytics.com/analytics.js \
       | grep interestCohort
    [no results]
On the other hand, I could imagine them or other analytics services adding it since sites installing analytics services generally want to be able to slice their traffic by as many dimensions as possible. I would hope that any service considering reading FLoC implements it as opt-in, however, since sites probably didn't consider whether their pages are sensitive in choosing whether to add analytics (though perhaps they should have!)

(Still speaking only for myself)

s.gif
WordPress may very well block FLoC which cuts off a good chunk, even when you eliminate those that don't update to the latest version. https://make.wordpress.org/core/2021/04/18/proposal-treat-fl...
s.gif
>As for the remaining millions, only a tiny minority of them will even know this is a thing, let alone care enough to make the change or contact a developer who can do it. These are the folks who have hosted their wordpress site with GoDaddy because it was cheap and quick when they needed a site.

One company decides to do something stupid and you expect millions of website owners to scurry and add junk to their headers to create a "mitigation"? This is nuts.

This is a browser problem, not a website problem.

s.gif
Their point was that that won’t happen.

Unfortunately, the other solution is to get billions of people to stop using a trojan horse of a browser.

s.gif
That's probably accurate if you assume every website is equal, but if you measure by traffic the top 100 websites account for 95% of measurable tracking events. Won't that make FLoC rather ineffective if 95% of the data is missing?
s.gif
There may also be some uplift from various frameworks, CRMs, CDNs, etc, if any of them decide to make it the default.
s.gif
Can't remember what article it was, but I remember that Facebook didn't even care about "top companies with top ads". The long tail trumped anything.

I wouldn't be surprised if it's the same in Google's case. A dozen big-name websites drop FLOC? Who cares, there's a billion more.

alright 41 comments and everyone seems to know what FLoC is so i’ll be that guy.

What is FLoC and whats the big deal?

I googled https://www.theverge.com/2021/3/30/22358287/privacy-ads-goog... and it seems like an attempt by Google to add proprietary cohort based tracking to replace third party cookies. well intentioned but could have flaws. anything else i should know?

s.gif
It's an attempt at a replacement for third party cookies. Your browser look at your history and computes a "cohort" and if other people's browsers do the same things, people with similar history will have the same "cohort".

The upshot of this is advertisers only see the cohort_id, and not the history (which stays local on your browser). I think Google thinks it needs to give the advertisers something if third party cookies are going away, and this is attempting to preserve privacy.

Of course, just not sending third party cookies and not sending FLoC is the "ideal" solution if you don't have advertisers paying you, and some people were excited that third party cookies are going away, and hoped that nothing would replace them.

(Disclaimer, work on Gmail, this is my own opinion, I only really know what I read on HN)

s.gif
This is not attempting to preserve privacy. This is attempting to give a pretense of preserving privacy, while completely deanonymizing the web.

This is browser fingerprinting on steroids. In addition to things like screen resolution and OS, you get a FLoC ID. Browser fingerprinting already works very well. FLoC supercharges it, and adds profiling information.

FLoC also gathers information from web sites which otherwise Google could not track. Since your browser is tracking you, they don't even need Google Analytics installed.

s.gif
> This is not attempting to preserve privacy. This is attempting to give a pretense of preserving privacy, while completely deanonymizing the web.

Cookies uniquely identify me without additional data. Cohorts do not uniquely identify me without additional data.

This is not privacy, but it is more privacy than third-party cookies.

s.gif
Cookies identify you only to the site setting the cookie. When doubleclick.net started scraping my data, I blocked them.

Problem gone.

FLoC gathers data from *all* my web browsing activity.

In addition, I have a nearly unique fingerprint from browser fingerprinting already. This makes it almost certainly unique.

It's not more privacy than cookies. It's a lot less.

s.gif
Except that now you got control over your fingerprint. You can choose what to send to the website, you are the one that decide which website get it or not.

Sure you'll still get the other fingerprinting there, which still allow them to track you, but before FLoC, Google couldn't imagine reducing Chrome own fingerprints, now that they are going toward FLoC, they can do that, without cannibalizing their revenue stream.

In real life, most won't deactivate FLoC, and that's where they are still going to make money. Everyone else most probably already use adblockers or already refused ad targeting from Google Ads.

s.gif
A floc Id is shared amongst millions of users, and can be reset at any time by the user.

Google owns chrome and always had the ability to track any website whether or not it had google scripts on it. If you signed in to your browser, this was already happening.

s.gif
> A floc Id is shared amongst millions of users, and can be reset at any time by the user.

Sure but are you sharing your IP with a millions of users? That's only a single other information about you, there's a bunch others given by your browser.

s.gif
Hold up, doesn't something central still need to decide what cohort you are and store your individual data? How does it decide what cohort you are locally without pulling data about other users down locally or sending your individual data up? Is it comparing your behavior tensor with the cohorts or something like that?
s.gif
Yes, although the central service provides data that a browser-side algorithm can use to put the user into a cohort. The browser history itself isn't directly sent to the service.

Each browser developer would have to decide which central service to use, whether their own or somebody else's.

s.gif
Me too, thanks for the link.

> "FLoC is designed to help advertisers perform behavioral targeting without third-party cookies. A browser with FLoC enabled would collect information about its user’s browsing habits, then use that information to assign its user to a “cohort” or group. Users with similar browsing habits—for some definition of “similar”—would be grouped into the same cohort. Each user’s browser will share a cohort ID, indicating which group they belong to, with websites and advertisers. According to the proposal, at least a few thousand users should belong to each cohort (though that’s not a guarantee).

If that sounds dense, think of it this way: your FLoC ID will be like a succinct summary of your recent activity on the Web."

s.gif
Great explanation indeed.

I didn't know much about it, and wow, sounds really terrible. I can even see it as an idea that started with good intentions, but the use cases explained (like linking floc ids to user ids in websites you signed in and potentially exposing browsing habits) make this thing really invasive; the whole idea is broken.

s.gif
If you want to learn what Floc is for the first time you probably shouldn't start with an article titled "Google’s FLoC Is a Terrible Idea". I also don't know what it is but will just wait for a more neutral source hopefully.
s.gif
I don't care if I'm tracked by advertisers, and I'm also not advertising, so I guess I'm pretty neutral?

FLoC is Google's new way to target "cohorts" of users with advertising. The idea is that Google will classify each user into a cohort (and that classification will change over time as the user visits more web pages, and perhaps other data is acquired) and only that cohort is reported to the advertiser, which can then use it to serve appropriate ads.

On the flip side, this is obviously still targeted advertising, and some people have a strong negative outlook towards that idea in general. Also, it's been said that if you can manage to track a person across just a few cohort changes, you can personally identify them, which is contrary to the entire idea about FLoC protecting a person's privacy.

In short, proponents think it's better than tracking cookies, and opponents think it's still privacy invasion, and not much better than tracking cookies.

s.gif
> opponents think it's still privacy invasion

Or illegal. IANAL, of course, but there are certain cohorts (age, disability, etc.) which are illegal to discriminate against in the US. But since Google doesn't have insight into what a cohort describes, they can't ensure that cohorts are being handled properly according to the law.

I'm sure Google has a "get out of government oversight" card from their lawyers, but like biased AI, this seems like it's on the wrong side of "grey".

s.gif
Still holding out for an article about the benefits of Covid before you find out what that is?
s.gif
Not going deeply into the topic (even if I'm familiar what FLoC is) what I'd expect from a browser? Well, browse the internet pages, display those correctly and fast.

Tracking, advertising, user cohorts does not fit into the "browsing" part. That might be enough to feel why "Google's FLoC is a Terrible Idea".

s.gif
Unless you pay for the browser and the sites you visit, that's not enough for the browser to provide.
s.gif
FLoC is Google's loophole after disabling third-party cookies in Chrome and promising they "will not build alternate identifiers to track individuals as they browse across the web"[0]. FLoC is a new tracker to replace third-party cookies, but it works by putting users in groups and tracking those groups, so they technically aren't tracking "individuals" this time. Except you can use FLoC to track individuals by identifying them as a unique intersection of many disparate groups[1].

[0] https://www.theverge.com/2021/3/3/22310332/google-privacy-re...

[1] https://github.com/WICG/floc/issues/100

s.gif
The GH issue is interesting. Suppose each user belongs to two flocks, and each website gets randomly shown one of the two. Will this solve the problem? I’d imagine the number of possibilities after a certain while will make it impossible that cohort histories collected by two different websites will match for the same user.
s.gif
Over time everyone will be in their very own singleton group :)
s.gif
Yes, but a totally anonymous singleton group... with a unique ID.
s.gif
Not anonymous if you give your PII to create accounts on the web.
s.gif
Thanks for being that guy. I thought HN had decided FLoC was good, but I guess not.
FLoC is basically something implemented in a browser. Why website owner should be bothered by it? If client decided to use browser with FLoC than it's their decision. The only interesting thing might be to inform user that they are using shitty browser that doesn't respect their privacy and make sure that website works in other browsers.

What if Google decides that they will ignore that header? Is there anything preventing them from doing that? Do we know why they decided to even implement this "workaround" with header?

s.gif
> If client decided to use browser with FLoC than it's their decision.

For lots of people that’s not the case. Their work mandates which browser they can use, which is (partialy) why it took so long for IE to go away.

s.gif
> If client decided to use browser with FLoC than it's their decision

Is it an informed decision?

Most normal users use Chrome because "everyone else does", and won't even have a clue what FLoC is.

s.gif
> Most normal users use Chrome because "everyone else does", and won't even have a clue what FLoC is.

“Because it was installed on my machine bundled with some third-party software” is also a big factor (but obviously, nobody will give you this answer, because they don't even know where Chrome came from)

s.gif
It's definitely not an informed decision. That's why I mentioned that you can inform your users about this issue. I just think that this decision should not belong to webmasters/site-owners.
s.gif
I certainly do see your point here. But the reality is that Google doesn't have users' best interest at heart, and is not going to be the one to responsibly inform user so they can make an informed decision on their own.
s.gif
> Why website owner should be bothered by it?

at least following cases would cause this:

(1) They are disliking tracking

(2) They are disliking Google

(3) Google is competing with them

(4) They want to be liked by people disliking tracking or Google

s.gif
> Is there anything preventing them from doing that?

Potential privacy laws, competition, bad press,... but technically, nothing. Same as DoNotTrack.

In fact that's the whole idea behind FLoC. It is supposed to be a privacy improving feature! For now, the usual tracking methods based partly on third party cookies work for them, certainly better than FLoC would, and they are definitely more privacy invading.

But with things like GDPR, and with privacy being a bigger and bigger selling point, Google feels like it had to find something else and FLoC is their answer.

I don't know how the story will end but most likely in the same way as DoNotTrack, which started out badly, and turned into a joke when browsers started enabling it by default, disregarding the recommendation.

s.gif
This new header seems like DoNotTrack 2.0 that Google will be forced to ignore once it gains some adoption to preserve their core business.
s.gif
It's essentially virtue signalling.
FYI, if you are like me that has no idea what FLoC is until now, please see:

https://github.com/WICG/floc

s.gif
>Federated Learning of Cohorts

is there some competition where people try to come up with names as tricky as possible while being nowhere even close to simple english?

s.gif
Sounds like a machine learning term that escaped from researchers into the wild. Machine learning people like to make up fun names for otherwise complicated and hard-to-summarize methods. See eg "BERT".

The name is actually descriptive. It is an algorithm for constructing semantically interesting cohorts of similar users, Locally on each user's machine.

It's actually a really good idea and certainly a lot more "privacy preserving" than anything that relies on sending fine-grained user data back to a central server for processing.

Of course there are problems with it, and I'm mixed as to whether it's something that non-Chrome browsers should even try to support.

The fact that all websites are included by default, and it's up to the individual website to opt out of inclusion, makes me squirm.

But the name makes sense and I think the core idea is a step in the right direction.

s.gif
> The fact that all websites are included by default, and it's up to the individual website to opt out of inclusion, makes me squirm.

That's not the case, contrary to what Hacker News wants you to think with this massive opt-out campaign. FLoC cohort computation is only planned to include websites that themselves request cohort information. Unless your page calls document.interestCohort, it is not included in cohort computation [1]. The opt-out header does nothing unless you use FLoC.

There is an exception to this made for the pilot phase (aka. right now), where in order to bootstrap the system Google is extending cohort computation to include "all websites that show ads" [2]. My guess is that this is necessary so that early testers get useful data. This is not something that seems to be planned past the pilot. The standard also restricts this to only "while 3rd party cookies are still a thing".

Disclaimer: I work for Google, but not on advertising or Chrome. This is all from public information I researched in my own time.

[1] https://wicg.github.io/floc/#compute-eligibility §7.1.1 "By default, a page is eligible for the interest cohort computation if the interestCohort() API is used in the page."

[2] https://wicg.github.io/floc/#adoption-phase §7.1.4 "at the adoption phase, the page can be eligible to be included in the interest cohort computation if there are ads resources in the page, OR if the API is used."

s.gif
> There is an exception to this made for the pilot phase

So all websites are, in fact, included by default.

s.gif
> if there are ads resources in the page

Please read the full comment (or preferably, the standard) before spreading FUD.

s.gif
Whether ads are being loaded is being determined by an opaque, ever-changing algorithm implemented in a closed source browser. We have no way to verify that this is how it's actually working, or when it will change. That doesn't even include how a good majority of the internet is monetized by ads, often Google's ads.

It's simply safest to assume that every page will be included.

s.gif
I don't think calling Chrome a closed source browser is accurate unless you have a citation showing that Chromium is missing this code

Microsoft Edge is a closed source browser, for comparison

s.gif
Chrome is a closed-source fork of Chromium that applies numerous proprietary patches to Chromium. There's no way to tell what has been modified in that process (short of decompilation, et.al.).

Pretty much the same process that Microsoft takes with Edge, really.

s.gif
I don't think calling Chrome a closed source browser is accurate unless you have a citation showing that Chromium is missing this code

That's completely backwards. You would need some evidence showing that Chrome does not include proprietary patches, otherwise you pretty much have to conclude that it's closed-source, even if it includes a large % of code from an open-source product.

s.gif
Is it well defined what "ads resources" is?
s.gif
It doesn't seem to be defined in the standard, but it's partially documented in Chromium's docs how they determine what is an ad resource:

https://chromium.googlesource.com/chromium/src/+/master/docs...

https://chromium.googlesource.com/chromium/src.git/+/master/...

https://chromium.googlesource.com/chromium/src.git/+/master/...

This seems to indicate the authoritative source of truth is EasyList. On my current machine, the list seems to be stored in "~/.config/google-chrome/Subresource Filter/Unindexed\ Rules/9.22.0" and should be easily inspectable.

I don't know if I've missed some documentation pointers related to this.

s.gif
So only if the publisher opts-in (sort of), and what about the user, if this is all done client side surely I should be opt out by default too?

I assume I can set my browser to opt out easily?

s.gif
I haven't heard about a global opt-out in the browser, but I haven't really looked for that info either. I think I've heard Chrome allows extensions to "easily" hook document.interestCohort and return any value the user wants (including random values). The standard also mentions "The user agent should offer a dedicated permission setting for the user to disallow sites from being included for interest cohort calculations." but that's only for blocking specific sites from contributing to cohort computations, not for disabling globally.
s.gif
It is called academic publishing, and some people manage to make a career out of it!

But yes, all three parts of that name have a well established technical meaning. It is a very descriptive name, once you know all the parts.

s.gif
This one stems from Google's bird-themed naming convention for initiatives related to working around 3rd party cookies going away.
s.gif
PIGIN, TURTLEDOVE, SPARROW, SWAN, SPURFOWL, PELICAN, PARROT

not kidding

s.gif
"Federated" is the only part of that I can see as not being simple. But even if all 3 words were generally unknown, I don't know if it's really a problem. You need to understand what FLoC is instead of what the individual words mean to know what the issues with it are.
s.gif
So you think you're providing a service to others by copying and pasting a top Google search result link for this topic?
> Pages sites using a custom domain will not be impacted.

Not sure what % of Github Pages use custom domains but this appears to leave no mechanism for custom domains to optionally enable this header either.

I don't really understand the motivation here; if it was for the benefit of GH users, why wouldn't that apply to custom domain users? Is it purely to hamper Google (as Microsoft's competitor)?

s.gif
There could be a technical reason for it not to be available for custom domains (yet?).

The page mentions that the header is set for all pages served by github.io, which leads me to believe they add the header on the reverse-proxy/loadbalancer side for github.io pages.

Custom domains most likely use separate proxy/loadbalancing infrastructure where the same change could take longer to implement, or they might be exploring options to make it configurable.

s.gif
github.com is itself a social network and a tracker. They should know a lot more about the status and activities of the software projects hosted there and the users than the users themselves do. Enabling third parties to track users across their site would be the equivalent of opening the lid of this treasure chest.
s.gif
I've never thought of GitHub like that, but it makes so much sense. With all of the features (like the commit heat-map), GitHub is at this point 2 parts social network, and 2 parts social network for software.
s.gif
Oh no, I do understand that. It's just the excluding custom domains part I don't get. Why not keep Google out of all their data: why give them a portion?
s.gif
Same, I'd like a way to disable this on my GHPages site as well.
s.gif
I agree with the other comments that Github doesn't allow users to control headers on Github Pages sites.

But my first guess was that this was a Msoft vs Google thing and not a privacy thing.

s.gif
Maybe there was more of a privacy concern they wanted to address by removing it from the github.io subdomains.
Remarkably short blog post. I would have appreciated a "why", to help build the voice of opposition.

Also, what about github.com itself?

s.gif
> Also, what about github.com itself?

Don't ask, try it. curl says: permissions-policy: interest-cohort=()

s.gif
It's not really a blog post, even though it's pushed out over their "blog" endpoint. This post is part of https://github.blog/changelog/ which tends to lean closer to the "git commit message" length than blog post length. Just a statement of changes they've made users may notice or be affected by.
s.gif
    $ curl -o /dev/null -v https://github.com/ 2>&1 | grep permissions-policy
    < permissions-policy: interest-cohort=()
s.gif
Btw you can do this much shorter with the curl -I parameter which lists the return headers.
s.gif
That also causes curl to change the underlying request to a HEAD request. Though according to the spec they should return the same headers, it’s not uncommon for sites to fail to do so (some web frameworks leave this responsibility to the user) or to cache these responses differently.

Personally I reflexively use the verbose version they used for these kind of investigations of server behavior after being bit a few times.

s.gif
Same here, I thought it caused cURL to do a GET request and throw away the body, but it doesn't, and I've gotten different results more than once.
s.gif
Also consider changing the user-agent from the default. I set mine to a typical browser string in ~/.curlrc, but you can also use -A/--user-agent on the command line.
s.gif
curl -i (lowercase) prints the headers as well as the response body without the verbosity of -v
s.gif
I believe you can also control what exactly is emitted with -w, which is nicer than trying to parse it later.
s.gif
I use -i instead of -v -o /dev/null; is there any reason to prefer the latter? Is Curl smart enough to skip fetching the response body with the latter?
s.gif
FWIW the duckduckgo extension already shows the github.com website as tracker free, so they take a pretty strong stance on privacy. I think the why (ie. why does github take this stance on user pages) is pretty self explanatory in this situation.
s.gif
> Also, what about github.com itself?

Shouldn't matter. FLoC isn't enabled if they don't use the `document.interestCohort()` API and if Chromium doesn't detect ads; at least for now. https://seirdy.one/2021/04/16/permissions-policy-floc-misinf...

s.gif
This is a bit confusing. That post seems to suggest that (1) adding the header is not necessary to prevent one's site from "leveraging" floc, ie, identifying users, unless one already runs ads, and hence (2) that the header isn't necessary in most cases.

But it also says:

What adding this header does is exclude your website from being used when calcualting a user’s cohort. A cohort is an identifier shared with a few thousand other users, calculated locally from browsing history; sites that send this header will be excluded from this calculation. The EFF estimates that a cohort ID can add up to 8 bits of of entropy to a user’s fingerprint.

Being excluded from cohort calculation has a chance to place a user in a different cohort, altering a user’s fingerprint. This new fingerprint may or may not have more entropy than the one derived without being excluded.

But is individual fingerprinting really the concern? What if I don't want google clustering people who visit my page with people who visit similar pages? In they case, the header still helps protect their privacy, right? By making Google's website visit interest based clustering less substantively accurate? Or am I misunderstanding how floc works?

s.gif
(Am author) Google's FLoC cohorts are determiend by browsing history. If your page is excluded thereby giving other pages a higher weight, it doesn't necessarily reduce the bits of entropy in a user's fingerprint. Cohorts will still have roughly the same number of people and thus make it about as easy to identify users.

If you add the header to your site, do it for the right reason. It could mess with unsophisticated ad targeting, but it won't necessarily make a difference wrt. privacy. Energy is better spent getting users off of any browser that supports FLoC (Chrome, probably Chromium too).

s.gif
I guess the question here is what you mean by "privacy." It seems to me that privacy goes beyond merely avoiding the risk of fingerprinting, or individualized identification. Collective identification is also a privacy problem: if I get advertisements targeted at people with similar political beliefs to mine because I've labelled as a member of a cohort that has visited a cluster of X-leaning news sites, that seems objectionable independent of whether the owner of some website can also distinguish me as an individual from every other member of the cohort.
s.gif
I'm also interested in understanding this.

My company is a non-profit and doesn't serve ads on our website. Should we ensure this header exists for our site?

s.gif
Yeh, but what happens when Google Analytics adds `document.interestCohort` and ~90% of the web get opted in?
s.gif
If you are already embedding Google Analytics on your page, then surely all bets are off for your users' privacy?
s.gif
Yes, but aren't they different?

If we have GA, we're getting some information and Google is getting some information, but are they sharing this information about users directly with advertisers?

The premise of FLoC is that they are explicitly tagging you in a group specifically for advertisers.

s.gif
It's not just GA though it's any analytics or other 3rd-party that decides it wants to collect the cohort data
s.gif
It would be if an (advertisement) iframe did, no?
In case folks are interested, I wrote an open source Chrome extension that removes the FLoC API on every page load so websites can't get your FLoC cohort ID: https://chrome.google.com/webstore/detail/floc-block/amoljng...
I've read recently about a fair few sites and browsers and whatnot that are not going to play along with FLOC.

Out of curiosity, what would be the kind of figure that would make Google stop using it? I mean, at what point does the data from a smaller pool become useless?

Any ideas?

s.gif
I don't think that any decent figure would make Google stop using it. Floc is their try on locking more and more vendors in their ad ecosystem, it makes Google the superior ad provider because they now have an even bigger (and more unfair) advantage over other providers. My hypothesis: if you are blocking floc, you are not really dependent on Google's ad system, neither as a website hosting ads, nor being found through ads. Unfortunately, Google owns too much of the ad market and too many vendors are already dependent on Google.
s.gif
Just quit using Chrome and advocate for others to do the same. It's toxic to the web at this point.
It's exciting when the megacorps make these kinds of plays against each other. I feel like I'm watching my abusive partner get smacked in the mouth by my abusive ex.
s.gif
Yes, the title should be "Microsoft blocks FLoC..."
I have just switched to Brave (this browser blocks FLoC across all the web). I regret no trying this browser earlier. Also this IPFS stuff seems very interesting (kind of Bittorrent for the web).
Has anyone done the calculation of the amount of energy (and therefore co2) used and extra bandwidth cost for adding the opt-out header to most of the internet's traffic?

As I understand it, every response in a page has to have the header, not just a containing html or an initial options.

s.gif
I don't think it will be that much (spoiler alert: I was wrong), because only a negligible amount of web traffic will be the headers themselves vs. web--pages and streamed content. And the FLoC header itself will be a very small part of that header, maybe 40 bytes. Those 40 bytes could fit in a singular packet.

So, at most, FLoC will add 1 packet per header. I don't know how many headers are sent total each day, but I remember reading that the average person visits 100 websites per day (including reloads). Out of 4 billion people who use the internet, we're talking about 400 Billion response headers per day.

Assuming that each opt out of FLoC (a portion of this is Google, so that's unlikely), that means that an extra 400 Billion * 40 Bytes need to be sent. This is about 16 Trillion extra bytes that need to be sent (16E13). I've just checked, and it seems that the average Google Search is about 125kb, and I found that each releases approx. 7gCO2. So dividing this out, each kilobyte of traffic releases 0.056 grams of CO2. For each byte, that would be 0.000056 (5.6E-5) grams.

Multiplying that out by the 16 Trillion extra bytes, you have 8.96 Million (896E7) grams of CO2, or an extra 8960 tons of CO2 per day. So, I was totally wrong. Jeez, that's a lot of CO2.

But, my calculations were a badly-estimated, worst-case scenario. Also, since less websites will have third-party cookies as a result of this, we would have to subtract those now gone emissions. But, this is still a lot more CO2 than I expected, even if it was counteracted.

Is there any browser extension to automatically disable FLoC on every visited site?
s.gif
Use a browser that isn't made by an ad company.
s.gif
Bingo. Wish more people got this rather simple insight
If FLoC is opt out by default as many here claim, is this news saying that Github did nothing?

Also if it is opt-out by default I guess it will be simple to see who has opted-in, a nice list of shame.

Does this circumvent the EU cookie laws?

FloC can be disabled at browser level also. Check if your browser has FloC enabled: https://amifloced.org
I actually have an idea for browser extension:

implement document.interestCohort() and return some useless junk or better fake data (e.g. this user cats pictures and nothing else). However I run into that there is no documentation of how cohort ID is specified. This lead to another question - how are ad companies supposed to actually target their audience with it if there is no translation between cohorts and target groups? (I assume Google already has some translation)

s.gif
> how are ad companies supposed to actually target their audience with it if there is no translation between cohorts and target groups?

Even as an opaque identifier it's still useful. Imagine you run a store and you log the FLoC cohorts of your customers. You could then target ads at the most common cohorts you've seen as a way to say "show my ads to more people similar to my existing customers".

(Disclosure: I work on ads at Google, speaking only for myself)

FLoC ought to be opt-in, not opt-out.
s.gif
From what I've heard elsewhere on hn, only sites that use `document.interestCohort()` contribute to a FLoC identifier.
s.gif
Which can be done by any of the dozens of obfuscated javascript files people embed on their websites for some reason.
I wonder if floc can be selectively enabled to distort results. Eg only for cat videos
Thanks GitHub. Much appreciated.
But it's not a problem, right? Because we've had countless conversations about google and chrome, discussed them ad nauseam, and we're so sick of this tedious, incessant topic that we've all stopped using chrome, right? Except for, yes, yes, the people who have to use it at work, but who don't use it at home, right? Floc is only a problem if you deserve it.
This EFF article explains FloC pretty well.

https://www.eff.org/deeplinks/2021/03/googles-floc-terrible-...

tl;dr; 3rd party cookies are dying so google has come up with this way to replace them. EFF says 3rd party cookies suck but the choice shouldn't be a those or FLoC. How about neither where the user decides what to share, with who, and when.

Microsoft is pushing for their own version of FLoC AFAIK
How to set this header if you use a custom domain with gh pages?
I have been seeing a lot of FLoC articles recently. Can someone please ELI5 for me what FLoC is and why is it bad?
This isn't meant as a dunk on MSFT but it's worth keeping in mind that MSFT owns GitHub before celebrating this as GitHub taking a stance. MSFT, FB and Google all heavily employ "analytics", although to slightly different degrees and in different forms. Them not cooperating is a good thing, but not surprising enough to warrant celebration.
For those who want to know why FLoC may be a bad idea see https://www.eff.org/deeplinks/2021/03/googles-floc-terrible-...

How much monopolistic behavior does Google have to engage in before antitrust laws have enough teeth? It also seems to me that Google has been more aggressive in its monopolistic behavior in different areas the more there is talks of regulations raining down on it. Maybe they know the end is near and are trying to get away with as much as possible before that happens.

You mock my FLoC, I’ll clean your clock!

With apologies to Bill Watterson

Let's expand the title a bit:

Iteration #1: Microsoft blocks FLoC across all of GitHub Pages.

Iteration #2: Microsoft blocks Google's FLoC across all of GitHub Pages.

does FLOC even follow EU laws? it's tracking without consent right? it's basically a cookie that you can never delete?
Just like Microsoft building their own browser and then adopting Chromium, before long they'll be adopting FLoC.
s.gif
>good news post about something MSFT owned on HN

>comment(s) about how MSFT is inherently evil and doesn't deserve credit

The HN cycle continues.

s.gif
Don’t forget meta comments about the grim predictability of it all, also an important part of the ecosystem.
s.gif
And of course we must consider the meta comments complaining about the meta comments. These all contribute to the je ne sais quoi of HN.
s.gif
This just made me realize something: Once a pattern becomes a meme (or close), it becomes possible to notice a higher level meta pattern which itself becomes memetized ad infinitum(?).

I predict the parent observation will itself eventually become a meme to be "complained" about. Maybe this one too.

s.gif
You forgot a third one:

> hn users trying to discredit the commenter and telling us the "new" Microsoft is so cool, like if we owed anything to them

s.gif
Is hn crowd trying to manipulate M$...
s.gif
AMP faced far less of an onslaught and never really caught on the way they hoped it would. Time will tell how this turns out for them.
s.gif
Never caught on? The vast majority of the news link I encounter are for the AMP version.
s.gif
I think that's part of the point; AMP became a necessity for news when Google pulled their monopolistic search ranking levers, but few other types of sites implemented AMP since there was no real motive.
s.gif
AMP was specifically opt-in and only websites with enough man power and interest to implement it did, if you wanted it you had to program an entirely different page using Google's JS framework and restricted subset of HTML. FLoC is opt-out and requires zero intervention from web devs, if your website shows ads it's already part of FLoC, it can catch on if people do nothing about it.
s.gifGuidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK