GitHub blocks FLoC across all of GitHub Pages
source link: https://news.ycombinator.com/item?id=26967903
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
The top 100 have dedicated engineering and policy teams teams that will disable FLoC because they're either not interested in ads (Wikipedia) or have their own first party implementation that doesn't need FLoC (Facebook). They'll ditch FLoC.
The next 10k might have engineering teams that can make the change, but might be more interested in finding out about their audience so they can monetize more easily. They'll keep FLoC.
As for the remaining millions, only a tiny minority of them will even know this is a thing, let alone care enough to make the change or contact a developer who can do it. These are the folks who have hosted their wordpress site with GoDaddy because it was cheap and quick when they needed a site. They'll keep FLoC.
So the upshot is that github.com, instagram.com and amazon.com might opt out, but the vast majority of the web will not. Me prediction is that at least half of all web pages loaded by users won't have this header.
But until they do[1]:
Apache:
Header always set Permissions-Policy: interest-cohort=()
Caddy: header Permissions-Policy "interest-cohort=()"
Cloudflare Workers (not free as there are limits): addEventListener('fetch', event=> {
event.respondWith(handleRequest(event.request))
})
async function handleRequest(request) {
let response=await fetch(request)
let newHeaders=new Headers(response.headers)
newHeaders.set("Permissions-Policy","interest-cohort=()")
return new Response(response.body, {
status: response.status,
statusText: response.statusText,
headers: newHeaders
})
}
Lighttpd: server.modules +=("mod_setenv")
setenv.add-response-header=("Permissions-Policy"=>"interest-cohort=()")
Netlify: [[headers]] for="/*"
[headers.values] Permissions-Policy="interest-cohort=()"
Nginx: add_header Permissions-Policy interest-cohort=();
[1] https://github.com/WICG/floc#opting-out-of-computation
If your web site does not include ads, FLoC is already disabled. Here, "ads" mean ads that EasyList can detect. This HTTP header will just make your config more complex and your responses slightly bigger, with no change of behaviour.
If you include external ads on your pages, then I doubt disabling FLoC will increase your visitors' privacy, but at least this header will have a real effect.
Citation? Here's what the FLoC explainer says:
> All sites with publicly routable IP addresses that the user visits when not in incognito mode will be included in the POC cohort calculation.
https://github.com/WICG/floc#sites-which-interest-cohorts-wi...
This sounds to me like all sites, whether they contain ads are not, are used to cluster users into cohorts.
> For pages that haven't been excluded, a page visit will be included in the browser's FLoC calculation if document.interestCohort() is used on the page.
> During the current FLoC origin trial, a page will also be included in the calculation if Chrome detects that the page loads ads or ads-related resources.
I wrote more about this on my site: https://seirdy.one/2021/04/16/permissions-policy-floc-misinf...
add_header Permissions-Policy interest-cohort=() always;
I mean... the docs say that they are a "site" header that you should apply to a "page". Does that mean that you must apply it to all pages to exclude a site? Is absence on one page taken as opting back in to FLoC?
If the scope is site, then it would be better as a DNS entry. I've a feeling the scope is truly page though and I've also a feeling that most people who choose to add this header will add it on all assets now - which is a bit of a waste of bytes (even with header compression in place) but would be the only way to guarantee that all pages have it.
And this is when Google will release their own Cloudflare competitor product.
BTW do they something as popular as Cloudflare already? I'm very unfamiliar with Google's offerings.
But at this point in time I think it'd be unfair to call Cloudflare "just a CDN" so not really equivalent.
From what I've heard through the technical operations jungle. Google has been pushing their CDN product hard for a long time, which isn't a shock since they've been trying to push GCP hard for a long time. But it's a little like AWS's Cloudfront CDN. It's very very rare to see someone using an AWS Cloudfront or GCP CDN... that isn't on said cloud platform already.
Hopefully if enough people disable it, it will become useless.
http-response set-header Permissions-Policy interest-cohort=()
Examples: Turned on HTTPS for all customers, gave image compression and optimisation to all customers, moved customers to the latest TLS as soon as possible (help drive adoption), provide tools to obscure email addresses on web pages to minimise harvesting, 1.1.1.1 privacy focused DNS, etc.
FLoC is something that an opinion can easily be formed on, and where Google have said to each site operator "you must opt-out", Cloudflare can hold an opinion that default opt-out is bad for the internet and that opt-in is better... and if they make an option that defaults to adding this header but granting customers a means to toggle it off... then all Cloudflare will have done is what Google should have done... made this opt-in by default.
app.use((req, res, next) => {
res.setHeader("Permissions-Policy", "interest-cohort=()");
return next();
});
Disable FLoC if you want, but Google could always change it in future and ignore the header.
I wouldn't be surprised if the top 100 websites get 80% of the traffic, the remaining top 10k get 10% and all the millions of other sites get the last 10%.
This 100-10k-millions split statistic was pulled from a talk by Ilya Grigorik, who had worked on Web Performance at Google. I'm guessing they based it on data from Chrome.
And Chrome is the most popular browser last I checked, so probably a fairly good indication of overall trends can be drawn from such statistics
Wikipedia does not need to take any action to disable FLoC; it's only active if the site opts in, on a per-pageview basis:
* If you call document.interestCohort() to get a FLoC id for a user, that pageview will be included in FLoC calculation.
* For the origin trial, to deal with the chicken-and-egg problem, a pageview is included you load ads (determined with EasyList)
(Disclosure: I work on ads at Google, speaking only for myself)
Please forgive us for not trusting Google's "we pinky swear it will change". We have no real reason to trust that Google will keep their word.
Chrome'a ad detection code is open source; https://chromium.googlesource.com/chromium/src/+/master/docs... is a good place to start.
But as I already indicated, I have trust issues with Google.
It's certainly not practical for me, when I can just avoid chrome for personal usage (and I'm thankful for the capability). Of course, I can't avoid it entirely, thanks to my company deciding that Chrome is the only supported browser for our product. So even though FLoC is a non-issue for me personally, it is still something I need to worry about professionally.
So relying on this requires continuous monitoring that Chrome doesn't randomly decide to tag something on some page as an ad, which is doing even more work just to cater to Google whims. Just blocking it is the sane choice here.
since you say it is about "chicken-and-egg problem" during the trial: is there a clear commitment somewhere that Google plans to not include pages that do not use the FLoC API in the future?
Not something as clear as I'd like. The closest I see is:
A page visit will be included in the browser's FLoC calculation if document.interestCohort() is used on the page. During the current FLoC origin trial, a page will also be included in the calculation if Chrome detects that the page loads ads or ads-related resources. -- https://web.dev/floc/#do-websites-have-to-participate-and-sh...
Can I ask why? I honestly can't understand how anyone could.
At a lower level, I do this job because I'm paid, which allows me to donate. [5] But I wouldn't do this work if I thought it was harmful; there are lots of different kinds of jobs I could take.
[1] https://www.jefftk.com/p/effect-of-advertising
[2] https://github.com/google/fledge-shim
[3] https://github.com/WICG/turtledove/issues/161
A parallel to the demise of the newspaper classifieds is the once thriving industry of people who would copy books by hand in the 14th century. Then Gutenberg created a printing press that could make copies of books in a fraction of the time. Life didn't get better for those folks who's skills were no longer needed, but maybe it did for society as a whole. But for sure it didn't and doesn't make sense to vilify people who work at printing presses.
You're looking for a "bad guy" when maybe none exists.
The number one newspaper killer is... Craigslist.
Also the general move away from paper printed and delivered every day to internet news delivery.
That link only works if we buy into the premise: "One way to think about this is, what would the world would be like if we didn't allow advertising? No internet ads, TV ads, magazine ads, affiliate links, sponsored posts, product placement, everything."
However, no. I don't buy that premise at all. The state of ads as it is now is actively harmful with very little to show for in terms of "new non-stickier products" etc.
What about ads, but static and not-tracking? Is that still equally negative? Is that still equally positive?
Coincidentally, my current project involves this Chrome proposal for supporting self-contained remarketing ads without individual tracking: https://github.com/WICG/turtledove
It is still a problem for Wikipedia, because the global Javascript for each language is editable by a subset of the editors for that language (https://en.wikipedia.org/wiki/MediaWiki:Common.js for instance), unlike the HTTP headers which can only be changed by the Wikimedia sysadmins.
Obviously its probably a small overlap of "doesn't trust floc" and "still trusts GA" but still a Trojan horse.
PS. Wow kudos for being so responsive, especially/despite less than stellar comments regarding your openness about working for G-Ads.
$ curl -sS https://www.google-analytics.com/analytics.js \
| grep interestCohort
[no results]
On the other hand, I could imagine them or other analytics services adding it since sites installing analytics services generally want to be able to slice their traffic by as many dimensions as possible. I would hope that any service considering reading FLoC implements it as opt-in, however, since sites probably didn't consider whether their pages are sensitive in choosing whether to add analytics (though perhaps they should have!)(Still speaking only for myself)
One company decides to do something stupid and you expect millions of website owners to scurry and add junk to their headers to create a "mitigation"? This is nuts.
This is a browser problem, not a website problem.
Unfortunately, the other solution is to get billions of people to stop using a trojan horse of a browser.
I wouldn't be surprised if it's the same in Google's case. A dozen big-name websites drop FLOC? Who cares, there's a billion more.
What is FLoC and whats the big deal?
I googled https://www.theverge.com/2021/3/30/22358287/privacy-ads-goog... and it seems like an attempt by Google to add proprietary cohort based tracking to replace third party cookies. well intentioned but could have flaws. anything else i should know?
The upshot of this is advertisers only see the cohort_id, and not the history (which stays local on your browser). I think Google thinks it needs to give the advertisers something if third party cookies are going away, and this is attempting to preserve privacy.
Of course, just not sending third party cookies and not sending FLoC is the "ideal" solution if you don't have advertisers paying you, and some people were excited that third party cookies are going away, and hoped that nothing would replace them.
(Disclaimer, work on Gmail, this is my own opinion, I only really know what I read on HN)
This is browser fingerprinting on steroids. In addition to things like screen resolution and OS, you get a FLoC ID. Browser fingerprinting already works very well. FLoC supercharges it, and adds profiling information.
FLoC also gathers information from web sites which otherwise Google could not track. Since your browser is tracking you, they don't even need Google Analytics installed.
Cookies uniquely identify me without additional data. Cohorts do not uniquely identify me without additional data.
This is not privacy, but it is more privacy than third-party cookies.
Problem gone.
FLoC gathers data from *all* my web browsing activity.
In addition, I have a nearly unique fingerprint from browser fingerprinting already. This makes it almost certainly unique.
It's not more privacy than cookies. It's a lot less.
Sure you'll still get the other fingerprinting there, which still allow them to track you, but before FLoC, Google couldn't imagine reducing Chrome own fingerprints, now that they are going toward FLoC, they can do that, without cannibalizing their revenue stream.
In real life, most won't deactivate FLoC, and that's where they are still going to make money. Everyone else most probably already use adblockers or already refused ad targeting from Google Ads.
Google owns chrome and always had the ability to track any website whether or not it had google scripts on it. If you signed in to your browser, this was already happening.
Sure but are you sharing your IP with a millions of users? That's only a single other information about you, there's a bunch others given by your browser.
Each browser developer would have to decide which central service to use, whether their own or somebody else's.
> "FLoC is designed to help advertisers perform behavioral targeting without third-party cookies. A browser with FLoC enabled would collect information about its user’s browsing habits, then use that information to assign its user to a “cohort” or group. Users with similar browsing habits—for some definition of “similar”—would be grouped into the same cohort. Each user’s browser will share a cohort ID, indicating which group they belong to, with websites and advertisers. According to the proposal, at least a few thousand users should belong to each cohort (though that’s not a guarantee).
If that sounds dense, think of it this way: your FLoC ID will be like a succinct summary of your recent activity on the Web."
I didn't know much about it, and wow, sounds really terrible. I can even see it as an idea that started with good intentions, but the use cases explained (like linking floc ids to user ids in websites you signed in and potentially exposing browsing habits) make this thing really invasive; the whole idea is broken.
FLoC is Google's new way to target "cohorts" of users with advertising. The idea is that Google will classify each user into a cohort (and that classification will change over time as the user visits more web pages, and perhaps other data is acquired) and only that cohort is reported to the advertiser, which can then use it to serve appropriate ads.
On the flip side, this is obviously still targeted advertising, and some people have a strong negative outlook towards that idea in general. Also, it's been said that if you can manage to track a person across just a few cohort changes, you can personally identify them, which is contrary to the entire idea about FLoC protecting a person's privacy.
In short, proponents think it's better than tracking cookies, and opponents think it's still privacy invasion, and not much better than tracking cookies.
Or illegal. IANAL, of course, but there are certain cohorts (age, disability, etc.) which are illegal to discriminate against in the US. But since Google doesn't have insight into what a cohort describes, they can't ensure that cohorts are being handled properly according to the law.
I'm sure Google has a "get out of government oversight" card from their lawyers, but like biased AI, this seems like it's on the wrong side of "grey".
Tracking, advertising, user cohorts does not fit into the "browsing" part. That might be enough to feel why "Google's FLoC is a Terrible Idea".
[0] https://www.theverge.com/2021/3/3/22310332/google-privacy-re...
What if Google decides that they will ignore that header? Is there anything preventing them from doing that? Do we know why they decided to even implement this "workaround" with header?
For lots of people that’s not the case. Their work mandates which browser they can use, which is (partialy) why it took so long for IE to go away.
Is it an informed decision?
Most normal users use Chrome because "everyone else does", and won't even have a clue what FLoC is.
“Because it was installed on my machine bundled with some third-party software” is also a big factor (but obviously, nobody will give you this answer, because they don't even know where Chrome came from)
at least following cases would cause this:
(1) They are disliking tracking
(2) They are disliking Google
(3) Google is competing with them
(4) They want to be liked by people disliking tracking or Google
Potential privacy laws, competition, bad press,... but technically, nothing. Same as DoNotTrack.
In fact that's the whole idea behind FLoC. It is supposed to be a privacy improving feature! For now, the usual tracking methods based partly on third party cookies work for them, certainly better than FLoC would, and they are definitely more privacy invading.
But with things like GDPR, and with privacy being a bigger and bigger selling point, Google feels like it had to find something else and FLoC is their answer.
I don't know how the story will end but most likely in the same way as DoNotTrack, which started out badly, and turned into a joke when browsers started enabling it by default, disregarding the recommendation.
is there some competition where people try to come up with names as tricky as possible while being nowhere even close to simple english?
The name is actually descriptive. It is an algorithm for constructing semantically interesting cohorts of similar users, Locally on each user's machine.
It's actually a really good idea and certainly a lot more "privacy preserving" than anything that relies on sending fine-grained user data back to a central server for processing.
Of course there are problems with it, and I'm mixed as to whether it's something that non-Chrome browsers should even try to support.
The fact that all websites are included by default, and it's up to the individual website to opt out of inclusion, makes me squirm.
But the name makes sense and I think the core idea is a step in the right direction.
That's not the case, contrary to what Hacker News wants you to think with this massive opt-out campaign. FLoC cohort computation is only planned to include websites that themselves request cohort information. Unless your page calls document.interestCohort, it is not included in cohort computation [1]. The opt-out header does nothing unless you use FLoC.
There is an exception to this made for the pilot phase (aka. right now), where in order to bootstrap the system Google is extending cohort computation to include "all websites that show ads" [2]. My guess is that this is necessary so that early testers get useful data. This is not something that seems to be planned past the pilot. The standard also restricts this to only "while 3rd party cookies are still a thing".
Disclaimer: I work for Google, but not on advertising or Chrome. This is all from public information I researched in my own time.
[1] https://wicg.github.io/floc/#compute-eligibility §7.1.1 "By default, a page is eligible for the interest cohort computation if the interestCohort() API is used in the page."
[2] https://wicg.github.io/floc/#adoption-phase §7.1.4 "at the adoption phase, the page can be eligible to be included in the interest cohort computation if there are ads resources in the page, OR if the API is used."
So all websites are, in fact, included by default.
Please read the full comment (or preferably, the standard) before spreading FUD.
It's simply safest to assume that every page will be included.
Microsoft Edge is a closed source browser, for comparison
Pretty much the same process that Microsoft takes with Edge, really.
That's completely backwards. You would need some evidence showing that Chrome does not include proprietary patches, otherwise you pretty much have to conclude that it's closed-source, even if it includes a large % of code from an open-source product.
https://chromium.googlesource.com/chromium/src/+/master/docs...
https://chromium.googlesource.com/chromium/src.git/+/master/...
https://chromium.googlesource.com/chromium/src.git/+/master/...
This seems to indicate the authoritative source of truth is EasyList. On my current machine, the list seems to be stored in "~/.config/google-chrome/Subresource Filter/Unindexed\ Rules/9.22.0" and should be easily inspectable.
I don't know if I've missed some documentation pointers related to this.
I assume I can set my browser to opt out easily?
But yes, all three parts of that name have a well established technical meaning. It is a very descriptive name, once you know all the parts.
not kidding
Not sure what % of Github Pages use custom domains but this appears to leave no mechanism for custom domains to optionally enable this header either.
I don't really understand the motivation here; if it was for the benefit of GH users, why wouldn't that apply to custom domain users? Is it purely to hamper Google (as Microsoft's competitor)?
The page mentions that the header is set for all pages served by github.io, which leads me to believe they add the header on the reverse-proxy/loadbalancer side for github.io pages.
Custom domains most likely use separate proxy/loadbalancing infrastructure where the same change could take longer to implement, or they might be exploring options to make it configurable.
But my first guess was that this was a Msoft vs Google thing and not a privacy thing.
Also, what about github.com itself?
Don't ask, try it. curl says: permissions-policy: interest-cohort=()
$ curl -o /dev/null -v https://github.com/ 2>&1 | grep permissions-policy
< permissions-policy: interest-cohort=()
Personally I reflexively use the verbose version they used for these kind of investigations of server behavior after being bit a few times.
Shouldn't matter. FLoC isn't enabled if they don't use the `document.interestCohort()` API and if Chromium doesn't detect ads; at least for now. https://seirdy.one/2021/04/16/permissions-policy-floc-misinf...
But it also says:
What adding this header does is exclude your website from being used when calcualting a user’s cohort. A cohort is an identifier shared with a few thousand other users, calculated locally from browsing history; sites that send this header will be excluded from this calculation. The EFF estimates that a cohort ID can add up to 8 bits of of entropy to a user’s fingerprint.
Being excluded from cohort calculation has a chance to place a user in a different cohort, altering a user’s fingerprint. This new fingerprint may or may not have more entropy than the one derived without being excluded.
But is individual fingerprinting really the concern? What if I don't want google clustering people who visit my page with people who visit similar pages? In they case, the header still helps protect their privacy, right? By making Google's website visit interest based clustering less substantively accurate? Or am I misunderstanding how floc works?
If you add the header to your site, do it for the right reason. It could mess with unsophisticated ad targeting, but it won't necessarily make a difference wrt. privacy. Energy is better spent getting users off of any browser that supports FLoC (Chrome, probably Chromium too).
My company is a non-profit and doesn't serve ads on our website. Should we ensure this header exists for our site?
If we have GA, we're getting some information and Google is getting some information, but are they sharing this information about users directly with advertisers?
The premise of FLoC is that they are explicitly tagging you in a group specifically for advertisers.
Out of curiosity, what would be the kind of figure that would make Google stop using it? I mean, at what point does the data from a smaller pool become useless?
Any ideas?
As I understand it, every response in a page has to have the header, not just a containing html or an initial options.
So, at most, FLoC will add 1 packet per header. I don't know how many headers are sent total each day, but I remember reading that the average person visits 100 websites per day (including reloads). Out of 4 billion people who use the internet, we're talking about 400 Billion response headers per day.
Assuming that each opt out of FLoC (a portion of this is Google, so that's unlikely), that means that an extra 400 Billion * 40 Bytes need to be sent. This is about 16 Trillion extra bytes that need to be sent (16E13). I've just checked, and it seems that the average Google Search is about 125kb, and I found that each releases approx. 7gCO2. So dividing this out, each kilobyte of traffic releases 0.056 grams of CO2. For each byte, that would be 0.000056 (5.6E-5) grams.
Multiplying that out by the 16 Trillion extra bytes, you have 8.96 Million (896E7) grams of CO2, or an extra 8960 tons of CO2 per day. So, I was totally wrong. Jeez, that's a lot of CO2.
But, my calculations were a badly-estimated, worst-case scenario. Also, since less websites will have third-party cookies as a result of this, we would have to subtract those now gone emissions. But, this is still a lot more CO2 than I expected, even if it was counteracted.
Also if it is opt-out by default I guess it will be simple to see who has opted-in, a nice list of shame.
Does this circumvent the EU cookie laws?
implement document.interestCohort() and return some useless junk or better fake data (e.g. this user cats pictures and nothing else). However I run into that there is no documentation of how cohort ID is specified. This lead to another question - how are ad companies supposed to actually target their audience with it if there is no translation between cohorts and target groups? (I assume Google already has some translation)
Even as an opaque identifier it's still useful. Imagine you run a store and you log the FLoC cohorts of your customers. You could then target ads at the most common cohorts you've seen as a way to say "show my ads to more people similar to my existing customers".
(Disclosure: I work on ads at Google, speaking only for myself)
https://www.eff.org/deeplinks/2021/03/googles-floc-terrible-...
tl;dr; 3rd party cookies are dying so google has come up with this way to replace them. EFF says 3rd party cookies suck but the choice shouldn't be a those or FLoC. How about neither where the user decides what to share, with who, and when.
How much monopolistic behavior does Google have to engage in before antitrust laws have enough teeth? It also seems to me that Google has been more aggressive in its monopolistic behavior in different areas the more there is talks of regulations raining down on it. Maybe they know the end is near and are trying to get away with as much as possible before that happens.
With apologies to Bill Watterson
Iteration #1: Microsoft blocks FLoC across all of GitHub Pages.
Iteration #2: Microsoft blocks Google's FLoC across all of GitHub Pages.
>comment(s) about how MSFT is inherently evil and doesn't deserve credit
The HN cycle continues.
I predict the parent observation will itself eventually become a meme to be "complained" about. Maybe this one too.
> hn users trying to discredit the commenter and telling us the "new" Microsoft is so cool, like if we owed anything to them
Search:
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK