9

Tell HN: Google search sucks even more during Reddit blackout

 10 months ago
source link: https://news.ycombinator.com/item?id=36345345
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Tell HN: Google search sucks even more during Reddit blackout

Tell HN: Google search sucks even more during Reddit blackout
108 points by behnamoh 1 hour ago | hide | past | favorite | 60 comments
It's crazy that so much content is now unaccessible due to subs going private. I don't know what to blame more: centralization of forums into one giant company like Reddit, or Google's algorithm that still shows those private Reddit pages.
Maybe this is a good time to mention the Web Archives browser extension [1] that offers links to various cache / archive providers for any page you visit from a toolbar button. There are many such extensions, this is the one I've been using occasionally. Simple but very useful.

I haven't tried since the beginning of the Reddit strike though, I don't use Google and I only very occasionally run into Reddit pages. I know ArchiveTeam has asked help to archive Reddit [2].

(not affiliated to anything mentioned)

[1] https://addons.mozilla.org/en-US/firefox/addon/view-page-arc...

[2] https://news.ycombinator.com/item?id=36254172

s.gif
You might also have like with “unddit” for Reddit specifically.
I've said it before, the web's demise is Google's fault.

SEO optimization has pushed content 'producers' to generate page after page. There are websites that have a page for every Microsoft KB<number> error out there pointing to their own product.

It even is starting to overtake YouTube.

All just to get higher in Google and push (malicious) ads to visitors.

I think there is a third cause: the low visibility of high quality non-Reddit forums on Google against the tide of low-quality results.

There's some kind of SEO exploit that Google is unable to counter. Apparently, by creating bot accounts on a semi-reputable social media platform like LiveJournal, Baidu or Reddit and having them post endless links to each other from website X, website X is inflated on Google even if no humans interact with any of the bot posts/accounts.

I don't work in SEO and have never bothered to investigate this, it's purely a topic that keeps coming up in the twitter posts of one of the owners of Livejournal-spinoff Dreamwidth (their account is @Rahaeli).

s.gif
Google loves to group results by domain too. Say you search for something related to your hobby. Somewhere on the page is going to be a hit from the forum for that hobby, but just below it in a smaller font will be like a half dozen sub hits from that same domain, then that's all there is for that domain in the search results. Never mind that all the relevant knowledge online on this something might be contained in that forum, effectively below the fold until you expose it with the the site:forum.com flag, but this requires a priori knowledge of forum.com being a good source to use for your something.
s.gif
I don't think I've ever seen a result linking to Baidu or LiveJournal, and Reddit is always legitimate content. Might be a location thing?
s.gif
That's not what I'm saying.

* Make posts linking to X on platform of choice.

* Google starts linking to X (not the posts about X).

Actually though, is SEO/spam so out of control that allegedly the best software engineers in the world are unable to counter it?

Are bad results actually good for google in some perverse ad-based way?

Do people at google actually use their products at all?

s.gif
I don't know about ability but it is obvious they are not willing to even try. Why? If search result quality was anywhere in their radar, there would be a button next to all search results letting me block the domain from my personal search results. One day they might even figure how to use the information about blocked domains to help ranking results while not being gamed.
s.gif
The best software engineers in the world are generating it.

It pays better than google.

s.gif
- Google already put ads on the page, thus incentivized to show you the best results.

- But this must be balanced with incentives to rank-up links that further display ads benefitting itself. (and to rank-up/down stuff according to its "values", at your expense)

- Firstly, Google is tuned to recruit a certain type of people. which corresponds to a very low % of the best software engineers in the world. https://t.ly/xi_0

s.gif
It's not that they can't counter it, it's that there's less and less to find...
s.gif
It’s hard to find old things too that are still there, for queries that used to work just fine.
s.gif
A bit of A, a bit of B. More and more of the interesting information is never put on a permanent, accessible website.
s.gif
I wonder if LLMs can serve as an adaptive filter against SEO to keep up with the arms race?
s.gif
No, LLMs are just being used to generate better blogspam now.
It's really disturbing just how bad Google search has become. For so many cases Google actually is only useful as a search interface for Reddit, StackOverflow, and other sites that accumulate (and actually garbage collect) knowledge. Without such a qualifier Google will just give you absolute SEO trash results for many types of searches.
s.gif
I'd just like to note the irony of saying how bad Google is while at the same time saying that Google's general purpose search engine is still better than any of the specific search engines of these individual sites.

Perhaps the issue isn't that Google is bad, and the issue is that search is incredibly hard.

s.gif
No I disagree, because this simply wasn't the case a few years ago. Back then I could just search something, and Google would give me results, and that was it. It wasn't too long ago when googling was a proper skill to be learned and it felt like that could get you anywhere on the web. Google was exceptional once.

Now Google won't even acknowledge "" anymore, and having to hold its hand and guide it towards a single website which I already need to be aware of, is pretty pathetic compared to what Google was once able to do. Also the fact that it gives back so much spam and even shows puts it at the top of results.

> Google's general purpose search engine is still better than any of the specific search engines of these individual sites

This is only partially true. Google's search engine is definitely better than Reddit, but that is really not hard (I need to emphasize this, as reddit's search is really bad, unless it is old.reddit, then it is at least somewhat OK), but for many other sites the reason to pick Google is just convenience.

> Perhaps the issue isn't that Google is bad, and the issue is that search is incredibly hard.

I think the issue is more Google deliberately allowing and pushing all that spam, because users that find what they seek will spend less time on the site. Otherwise I find it hard to explain this drastic drop in quality. Would also explain why they are taking away all the useful search and query tools.

Or the people responsible for working on it just don't have the skill anymore, who knows.

s.gif
It's not bad because they can't build a decent search engine. Building a decent search engine is a solved problem, which they solved.

It's bad because their incentives aren't aligned with their users. The shit results they are giving aren't because they can't give good ones, they are because they don't want to.

s.gif
They are bad now, but they were exceptional few years ago. Same thing with Gmail, now I get obvious spam in my inbox and real email in the spam folder. Looks like they gave up.
s.gif
I’ve had zero gmail misclassifications in the last 2 months.
s.gif
> It's really disturbing just how bad Google search has become

I just want to be (yet another person) to echo this sentiment. For the first time in my life I had to resort to Bing (!) instead.

The lower quality of results must make business sense somehow, I suppose...

s.gif
> The goals of the advertising business model do not always correspond to providing quality search to users.

- Sergey Brin and Lawrence Page, The Anatomy of a Large-Scale Hypertextual Web Search Engine

s.gif
People run more queries which leads to more ad inpressions?..
s.gif
Before 2020, it was possible to google random phone numbers and figure out who was calling. Nowadays, googling random callers is useless, but Bing somehow still works.
s.gif
I'd love the following combination:

* A search engine that only crawls and searches a whitelist of sites.

* Community functionality to vote on the contents of the whitelist.

s.gif
Ive wondered, with 200,000 employees, how close you could get to something useful, by tasking them all with part time wiki editing/building. Hand build an internal ranking of web quality, keeping in mind that different posts being high quality, don't automatically make all posts at that host domain quality. The surfacing of information could still be very automated.
s.gif
Thanks, this seems to be a step in the right direction. I'll use it for a few days and see what I think.
s.gif
Or have it crawl anything but be able to provide allow lists the way uBlock Origin allows picking block lists.

But I'm not sure I would want an allow list approach. How to you run into new interesting websites? How would people find your new website? A block list would be good though.

s.gif
It would certainly be an interesting next turn of events if Google made an offer to buy out Reddit - just for the user-generated content and to stop the site from self-destructing, i.e. the golden goose offing itself.

It's unlikely given current capital interest rates, but at this point this drama has pretty much crossed the borders of plausible fiction anyway.

s.gif
No major company can buy reddit because it's full of porn and a lot of the user base is there for the porn and will riot if it goes.
Google should probably just buy Reddit and run it at a loss.

This may not be good for us the users, and might end up killing Reddit in the long run, but it's better for both Google and Reddit than the current situation.

s.gif
I had the same thought but with OpenAI being the buyer. I think Gruber was right that a lot of the motivation for the recent Reddit API changes are OpenAI (and other AI startups) using Reddit data and becoming ultra-valuable: https://daringfireball.net/linked/2023/06/09/reddit-ipo

That data was helpful, but if Reddit is blacked-out, or people move to more walled-off forums it makes future iterations of ChatGPT and the like difficult to train on recent developments. Why not have OpenAI run Reddit as a way to get free data into its models?

s.gif
What's the difference for users between Google buying Reddit and Reddit just shutting down now? One promo cycle? Less?

There's no "long run" any more for Google's touch of death, like there was for Google Groups; they've gotten quite good at speed-running value destruction.

s.gif
It feels like it's more core to Google's search business than the projects normally shut down by Google. Maybe not quite on par with YouTube, but it could be close.
s.gif
It seems almost exactly as good a match to Google's search business as Google Groups was...
s.gif
Google Groups was never at the level that Reddit currently is, at least in terms of search results.
s.gif
Deja News/early Google Groups provided great search results. Google, in usual form, has been working hard to destroy any property outside of Web Search/YouTube for a long, long time, Groups included. The internet is bigger now, so Reddit has an advantage there, but Groups could have been what Reddit became if it received some love instead.
s.gif
Reddit's valuation is insane right now, they got in before the "correction" so it's at some absurd multiple.

Anyone trying to buy Reddit would need to basically lowball the hell out of them and make the case that an IPO will only be worse (which may or may not be true).

s.gif
is this correct?

It doesn't sound right. It last raised money at $10B in 2021, but Fidelity, who led that round have since cut that valuation on their own books back to $6B.

I would not be surprised if that is a conservative number.

The market has moved and reddit hasn't been going in the right direction.

While that's still a lot of money, it is unprofitable, so a trade sale would make some sense. The real issue is that it comes with a lot of reputational baggage that a lot of public companies would not want.

s.gif
Maybe I am missing something about IPOs here, but if Reddit’s valuation is way off because it was before the correction, wouldn’t it be easy to argue your low ball offer is the real value?

Even if they IPO, wouldn’t the stock immediately crash if it’s overvalued?

s.gif
They would probably love to buy Reddit but would be blocked by regulators.
Duckduckgo has somewhat recently added a feature where with some searches they'll spot you're trying to search for Reddit posts, and ask if you want more of them displayed. It's a useful addition for the habit I've picked up of adding "reddit" to searches to limit the blog spam.

It's a shame it came out now, when reddit is increasingly full of spam and now is blowing itself up.

Any specific examples you can share?

I haven't experienced any decline in quality so maybe I'm not searching for the right things.

It makes sense that when a bunch of content goes dark it would have an impact so a few specific examples would appreciated.

I haven't noticed this, but have seen the complaints. I don't tend to see many reddit links in my search results, and haven't noticed a change since that blackout.

Perhaps just the nature of my searches?

s.gif
I don't get a ton either which is odd because I OFTEN add "reddit" onto my search and get way better results for what I'm looking for.

Kinda odd how Google search hasn't learned yet that Reddit results are higher quality, at least for me? Almost like it's not optimizing for quality of results...

It's almost like Google's optimizing for revenue on results has created the Reddit situation. Can't find better results without filtering for Reddit explicitly, so people end up on Reddit and end up posting on Reddit and around and around it goes.

s.gif
> I don't get a ton either which is odd because I OFTEN add "reddit" onto my search and get way better results for what I'm looking for.

Could you give some example queries?

s.gif
Same. It would be nice if someone would share some example queries. I use Google quite a bit but rarely end up on Reddit, nor do I feel Google search quality has been going down over time.
s.gif
As a concrete example, I run a used retro games business.

Proper technical manuals don't exist for the repair work that I need to do; finding the solution to a problem I can't work out myself is a mix of YouTube, Reddit and a looooot of filtering out bullshit.

s.gif
I usually don’t see many Reddit results in my searches either. According to previous threads on HN a lot of people append site:Reddit to their Google searches so perhaps that is what they are referring to.
I never worked in search, curious what are the technical difficulties of implementing a YouTube-style "Don't Recommend This Channel" are? From the outset, the YouTube recommendation can be an offline process, and much easier to scale.

Google pretty much has "F-you" level developer man-power to throw at this problem, so I'm somewhat surprised they haven't implemented it yet (the "bad result is good for Google" reasoning never quite made sense to me). I'm curious if the functionality is not worth it or if the technical challenge is insurmountable at Google's scale.

s.gif
I'm using Kagi search that allows you to raise, lower or block websites from your search results.
The fragility of web pages is a good argument for generating query responses directly (e.g. via LLM) vs. returning links to pages.
Blame the Reddit moderators taking a unilateral decision for sham reasons.
Google's cached versions of reddit pages are coming in handy these days.
It's not reddit 's fault, it was users' choice
s.gif
It was the moderators' choice and it is reddit's fault.

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK