7

Official Elasticsearch Python library no longer works with open-source forks

 2 years ago
source link: https://news.ycombinator.com/item?id=28110610
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Official Elasticsearch Python library no longer works with open-source forks
I use to be a huge proponent of Elastic.co for search. I paid for their Elastic Cloud Enterprise product to offer an internally hosted Elastic as a Service. We ran a proof of concept with the the open source Elastic Stack before requesting the funding to get commercial support and the extra management features.

I'm not sure I can recommend them anymore. The company doesn't seem to understand their customer's journey and are rejecting the methods that brought them success.

s.gif
It is easy to judge another’s actions from afar when it’s their meal and roof they’re working for. It’s okay for Amazon (a trillion dollar company) to make money off of their work but not themselves?

Imho, the problem isn’t “well it’s not open source anymore then.” The problem is people who give more weight to a religion/philosophy than sustainability. Who cares if it’s open source if the people developing it can’t support themselves?

Fair code > open source.

https://faircode.io/

s.gif
If they don't like it, they shouldn't have licensed their software under a free software license. This was an entirely foreseeable consequence. They only have themselves to blame. The AGPL generally fixes this perceived problem, but means that several large players (like Google) will never touch your code with a ten foot pole, let alone improve it. So, there's some difficult decision-making at the inception stage of a project - what do you value more?

Sony forked FreeBSD and used it as a substantial component of their operating system for their PlayStation 5, a product which they will make money off (in the long run, at least). No changes are available to anyone. No one at FreeBSD is jumping up and down about it. Maybe because there's more to a software project than maximising profit.

s.gif
>If they don't like it, they shouldn't have licensed their software under a free software license

They or anyone else are also free to relicense their project, there's nothing wrong with it at all and anything under the old license is still what it was before.

It seems kinda strange to demand that someone operate under some license in perpetuity, there is no such rule. Licensing questions don't stop after the inception of a project.

s.gif
The lessons that some contributors (myself included) will learn from this, is that it’s now a common bait and switch to build your product with open source contributions, and then relicense it when it’s time to monetise. I used to talk most of my employers into letting me bug fix open source projects that they were having issues with, now I just talk them into maintaining their own forks. The open source “freemium” business model is very clearly a scam at this point.

I also personally can’t stand Elastic, because they charge enterprise rates for support, and then offer open source level support. You’ll come across a show stopping issue, and the support you’ll get is most commonly “why are even trying to do this?”, or “please refer to this issue that’s been stale since 2016”.

s.gif
There are lots of open source projects that aren't bait and switch, like the Linux kernel, or Python. So I would encourage you to reconsider a blanket fork-everything policy and at least contribute to the projects that aren't backed by VC-funded companies and instead have a diverse community of contributors with multiple copyright holders.
s.gif
Yeah I should have been more specific. I’m just whining about the open source as a business model type of projects.

My general approach has been, when I’m capable and have the time to, to fix any issues or obvious feature deficiencies that I come across in open source projects that I use. I still do that, but I now only contribute my changes back to projects that seem more trustworthy.

s.gif
I wonder what criteria you apply to determine the trustworthiness of a project. For me, signing a CLA or otherwise not using inbound=outbound licensing is a major one, as well as any project backed by a company with VC funding. Any project backed by a single organisation instead of a group of people from lots of different organisations is a red flag, with some exceptions for long-term known-trustworthy non-profits.

None of that helps with a situation like Audacity/MuseCore though, if developers are willing to sell out their project copyrights, that isn't something that you can really protect against, except maybe discussing people's opinions on that openly.

The other issue with withholding changes from upstream is the potentially infinite cost of updating your changes as the project evolves, things like git-imerge, mergify or git-mergify-rebase can reduce that burden by letting you do incremental rebases/merges though. Normally I don't contribute to projects with a CLA assigning extra rights to corporations over the license, but I've been considering signing one just to drop the maintenance burden.

https://github.com/mhagger/git-imerge https://github.com/brooksdavis/mergify https://github.com/CTSRD-CHERI/git-mergify-rebase

s.gif
That’s pretty much the criteria I use. At my most recent job I left behind a Telegraf fork after running in to the Influx CLA.

But the more important criteria is deciding what changes are feasible to support with a fork. The projects that I find to be most suspect typically come with an enterprise support license, and if you’re struggling to get your issues fixed with one of them, then the best long term solution is usually to abandon the product. Forking for a bug fix would typically be a temporary solution, and hopefully the first merge conflict you run into is the vendor actually fixing the issue, otherwise you can take your time to find a product that offers better value for money.

There’s lots of things I don’t especially like about the large enterprise workplace, but I really detest vendors that fleece them with high price, low quality products/services.

s.gif
I fear you might underestimate the maintenance costs of forking.

A nice anecdote I heard a long time ago was comparing the approach to custom engineering between SuSE and Red Hat.

SuSE was always very happy to do custom engineering for paying customers and developed and shipped these features in their Linux Distribution. Specifically I am thinking of some interesting features done in the Kernel. At that point, SuSE received money for the engineering work but now had the burden of supporting their fork. The functionality code itself was not a problem, but the interfaces to the remaining kernel were a source of churn and pain.

Red Hat did the opposite. They told their customers that they can have whatever is in the upstream Kernel and they will help them upstream the necessary changes. That took markedly longer but the long term maintenance was much less effort because the in-tree code would be updated whenever an API/interface changed.

Canonical also had a tendency to happily fork of whatever project they needed to ship a fancy thing on time (think Netbook UI, think Unity etc.) and then get hit by the long term maintenance burden once upstream diverged.

Both SuSE and Canonical always found themselves in the unenviable position of having to constantly update their code and potentially seeing a competing but conflicting solution being merged upstream.

Carrying a few bugfixes or feature improvements in a local branch of a library is easy at first. But you're essentially creating technical debt which you'll need to pay off on every upstream change, every security fix etc. Most companies see value in developing features for their main application, not in maintaining internal forks of open source libraries.

s.gif
Good point on contributors, didn't think of it this way.
s.gif
It is the return of shareware, as developers realised their parents wouldn't pay forever their housing needs.
s.gif
That is not quite fair is it?

Publicise your work as open source and make hay with the work of open source contributors who thought they were contributing to a truly free and open source project only to find out a couple of years later that their contributions are now locked into a project that locked itself up with a changed license... (I know that their contributions may stay with the license it originally was, but if the total product changes license, who is going to continue an unmaintained fork? )

IMO, it is fair criticism when someone starts off with a liberal license and then changes it when it is time to monetise, or someone else figures out how to monetise it better than the creators. Either be okay with it or not, you can't have it both ways.

s.gif
No ones demanding they don’t change, they’re just stating that as users they’ll react accordingly.
s.gif
You write "accordingly" as if it's obvious what's the correct course of action for a company that's in direct competition with a titan, and any other course of action must disappoint their users.

What would you have them do? Would only their demise appease you?

s.gif
> This was an entirely foreseeable consequence

I disagree. I think it was hard to know that many years ago (10+?) how things would turn out.

Especially not "entirely foreseeable."

There's a cognitive bias called "hindsight biasy", namely "the common tendency for people to perceive past events as having been more predictable than they actually were", https://en.m.wikipedia.org/wiki/Hindsight_bias

I agree with you about AGPL.

s.gif
> I disagree. I think it was hard to know that many years ago (10+?) how things would turn out.

The GPL exists because of the problem of other people taking your code and hiding it in their products, placing you at a disadvantage.

The AGPL exists to extend that protection regarding SAAS.

According to Wikipedia, the AGPL is from 2007, and Elasticsearch started 4 years later.

I disagree that you need hindsight not to be surprised by how Elasticsearch was used by e.g. Amazon.

s.gif
GPL exists because nobody can (should?) stop users from modifying software they use (at the very least, on the machine level) and such modifying should neither be illegal nor hard (thus source code).

The "sharing" bit is on top, and only comes out if you distribute your changes too. AGPL fixes the flaw where "users" don't really get a copy of the software they are using distributed to them in full (eg. only part of it with JS in the browser, but backend stuff is hidden).

So the focus point of GPL is use of the software (thus users), not writing of it (developers). Developers embrace it because they are simultaneously users of the software written by others, and they are best positioned to make the most out of those liberties ("standing on the shoulders of giants").

s.gif
You are right: none of it was foreseeable, but rather, an expected outcome.

In case of success, which Elastic has certainly achieved with ElasticSearch, it is fully expected that other companies will jump in on the bandwagon and try to profit off of it too! And as a company, you hope for success, or rather, that to play out, but just that you'd be the go-to for earning the most off the product you created. While not foreseeable, it is not unexpected that you might not be the one to profit most from your product, and you should plan to profit enough! Where it gets complicated is that nobody expects to earn orders of magnitude less from a product they created than others relying on it.

What they did not foresee was that one of those companies would be The Cloud Provider, thus minimizing their value proposition since Amazon can throw significant resources at it, possibly even greater than Elastic themselves. One could argue that Amazon abused their monopolistic position in cloud providing to offer a bundled ElasticSearch experience that Elastic could never compete with. Even if they developed an alternative in-house product, it looks exactly the same as Microsoft bundling Internet Explorer with Windows back in the day.

Even today, if you are willing to develop an open source or free software product, if you get successful enough, you are likely to be an exploitation target of a megacorp. Companies always have an option to relicense in-house written code, and any code submitted by signatories of an appropriate contributor license agreement (CLA).

Basically, I agree it wasn't nice of Amazon, and that it wasn't foreseeable, but it ultimately wasn't unexpected either. To me, this is monopolistic behaviour that should be treated as such.

This also raises another interesting question: if AGPL is an appropriate solution, why did Elastic not relicense under it today? (I am sure they answered this very question when they published their original license, so it's mostly rethorical)

s.gif
> Where it gets complicated is that nobody expects to earn orders of magnitude less from a product they created than others relying on it.

And this is where the problem basically is. The core philosophy of open source licensing is that the creators who use it don't particularly care beyond compliance with the terms of the licenses (eg. Source distribution for copy left licenses)

If it turns out you *do* in fact care, then Open Source licenses are not a good fit for you. If you still go ahead release under Open Source and then change later, don't be surprised to be called out for bait and switch manipulation.

s.gif
> I think it was hard to know that many years ago (10+?) how things would turn out.

The long-term commercial advantage in Free Software being for the largest established players with revenue mechanisms centered around services rather than selling software licenses was widely recognized, with many observers independently coming to the conclusion, from at least the mid-1990s.

s.gif
On the plus side, the python repo is Apache2.0 licences, so fork away...
s.gif
> No one at FreeBSD is jumping up and down about it. Maybe because there's more to a software project than maximising profit.

That is because xBSD have more than enough people to develop on it - hobbyists and companies alike.

For smaller software with less of an established ecosystem, the supply of people willing to work on it for free is... not exactly much.

s.gif
Heaven forbid a FAANG doesn’t touch your code. Look at all the exposure you’ll miss out on. They might actually have to spend resources to write and maintain the code themselves, instead of open source maintainers begging for scraps [1]. The horror.

The homebrew maintainer couldn’t even get hired at Google [2] if I recall, even though they’re big fans of using the tooling internally!

[1] https://arstechnica.com/information-technology/2014/04/tech-... (OpenSSL Software Foundation President Steve Marquess wrote in a blog post last week that OpenSSL typically receives about $2,000 in donations a year and has just one employee who works full time on the open source code.)

[2] https://twitter.com/mxcl/status/608682016205344768 (Google: 90% of our engineers use the software you wrote (Homebrew), but you can’t invert a binary tree on a whiteboard so fuck off.)

s.gif
Who said anyone was doing anything for exposure, either? I think open source is doomed if people fail to understand that when you release open source code under open source licenses, your users don’t owe you anything, and in return, you don’t owe your users anything either. As far as I’m concerned, I don’t even really care that much about whether or not people adhere to the licenses I use for the most part.

Nobody is owed a sustainable ecosystem to profit off of open source. When things align to be mutually beneficial, that’s great. But by the nature of it, if you want to make money off of software, you should not release it as open source. It’s probably going to eventually wind up being a conflict of interest, wherein the “open” parts of a project eventually become less and less relevant in favor of closed parts.

Open source doesn’t and shouldn’t guarantee a sustainable business model. The best you can hope for is that parties collaborate because they can benefit mutually from this collaboration, like with the Linux kernel.

s.gif
An important distinction of Linux kernel development from most open source contributions is that the companies contributing code typically make their money off of selling hardware which must run or work with linux to be successful. These companies are all financially incentivized to keep the Linux collaboration successful.
s.gif
Almost every open source package has a model where multiple contributors make money off something else. Usually that something else is software or hardware that is built with the open source package.
s.gif
That sounds a bit like Thatcher 'there is no society' - society, I am not sure if open software can survive without permanence, trust and reliability (from the developer and customer side, not the code basis).
s.gif
Sort of how Elastic using the license that’s biting them; can’t really expect much else so long as “it’s a private company they can hire as they see fit” is the default political posture.

Your problem transcends one whiner on Twitter; private entities can monopolize public agency across contexts and not be held accountable at all.

Here you are in a VC echo chamber; good luck

s.gif
Uhh, no. "Trillion dollar company BAD, think of the billion dollar company!!!". Elastic made it's business off being open source, period. If they weren't open source to begin with they wouldn't be as successful as they have been. Their whole business model initially built off the popularity of a useful open source project.

Of course, once you add-on that they've basically done to Lucene what AWS did to Elasticsearch, then the whole thing is a joke. I also just checked and Elastic are NOT sponsers of the ASF: https://www.apache.org/foundation/thanks.

Absolute garbage company in terms of open source and community. I will never willingly install their software ever again.

s.gif
You seem to be confused here.

1) Elastic never forked Lucene.

2) Elasticsearch is not a competitor to Lucene.

3) Elastic developers are PMC members and committers for the Lucene project.

s.gif
Even Amazon did not fork ES, till they were explicitly blocked. If Lucene changes it license to GPL, elatic will need to fork it.
s.gif
It's entirely possible that the relicense and now this will have the the precisely the opposite effect though. I'm sure the relicense was an attempt to increase Elastic NV's revenues from SaaS by locking out competitors from using their same codebase (from public financial filings, Elastic NV is not profitable so they are probably looking to become so). But open source also attracts new customers and improves the product by encouraging outside contribution. It could very well be that in 10 years we'll look back as this year that led to downfall of Elastic as a company. Or this move towards a different business model could succeed. But even if it does succeed, it's possible it would also succeed if they had kept the code open source.

The person you responded to didn't even assert that open source software is morally superior to alternatives, but instead said that by moving away from open source, Elastic is failing to understand its customers. His argument doesn't require any judgement about the morality of the relicense.

s.gif
1) Is outside contribution really important going forward? Most of the Elasticsearch contributors are probably already employed by Elastic.

2) Open source doesn't attract new customers. There's nothing special about having software where anyone can contribute freely, that makes it attractive to customers. There is however, something special in making it free to download, setup and use in your own servers that makes it attractive. Which is still the case.

s.gif
> There's nothing special about having software where anyone can contribute freely, that makes it attractive to customers.

That may be true for some customers, but certainly not all. Being able to fix a bug or add a feature yourself instead of having to wait for a vendor to do it for you can be very attractive. Especially if you need a feature that would never be accepted upstream, and you have the ability to use your own fork/patchset. That is something my company has done many times, and is a factor when choosing software.

s.gif
>There's nothing special about having software where anyone can contribute freely, that makes it attractive to customers

Community, support, acquisition, extendability and code quality.

Community/Support -- lots of documentation available for free, generally multiple vendors that offer paid support, free access to updates. Don't have any data but I suspect finding people to support/run it is easier (skills are more marketable and easier to learn on your own)

Acquisition -- eg less of a vetting/auditing process for places where that matters (you can handle vetting the code and supporting company separately)

Extendability -- you can hire developers to add features or integrate without being subject to the owner's timeline e.g. you don't have to plead for features

Code quality -- being open, it's more apparent if the software has a high number of bugs or large amount of technical debt that could lead to instability. Usually the issue trackers for OSS are open to everyone

These apply more to large companies with dedicated teams

s.gif
3 of those 4 still apply to Elasticsearch. Mongo, another example has the same license and loads of docs freely available. You can still hire devs to add any missing features to Mongo (albeit you will have to fork it). You can still view their issues in GitHub too
s.gif
It’s moves like this that have made me go from largely neutral in this fight to actively hating elastic.co.

It’s their fault for not differentiating their offering enough.

When I looked at logz.io, I spent hours trying to get it to work and ultimately gave up, irritated that my seemingly plain vanilla use case (send ubuntu journald logs to ES) wasn’t as straightforward as I would’ve liked. (I’m aware they aren’t affiliated with elastic, just saying there might have been an opportunity for them here)

If elastic had a way for me to blindly copy paste things into an ubuntu server that allowed me to see all my systemd logs, I would’ve happily paid them. Instead it was an endless maze of having to figure out beats vs logstash, finding a journalbeat whose documentation says it’s beta, etc.

Yet ultimately the only thing that worked was AWS ES service hooked up to vector.dev’s agent. And boy does it work amazingly well. And it’s not like Amazon is doing anything special.

It’s on them for not having differentiated and made things drop dead simple. Now they seem to want to play dirty like Oracle.

s.gif
People give Amazon flack but they offer solid support with reasonable escalation when you have an issue. Imo it's a bit sad Elasti Co couldn't out compete ES on their own product.

When AWS released ES you couldn't even dynamically scale cluster nodes and you had to use Amazon's special library to sign requests for IAM

There are other companies that run curated & managed SaaS on providers like AWS, GCP, etc that offer better services than the native ones. For instance, MariaDB SkySQL offers DBAs with their product and will help tune the DB for the workload vs AWS where they offer limited app-level support outside suggesting things like Performance Insights. When I looked, SkySQL was also a bit cheaper than AWS RDS for comparable hardware

s.gif
It's not a philosophical argument at all. Businesses embrace commercially supported open source software because it makes business sense to do so. The door of interoperability swings both directions. Open source vendors frequently acquire customers through users who 1) are moving from another compatible vendor 2) trialed their open source software for an extended period of time, often in production, and 3) from introduction by highly innovative teams in a larger corporation.

When customers see Elastic circling the wagons to remove some of the differentiators of their open source software, they're going to have to reconsider their options.

s.gif
Indeed, a new equilibrium must be arrived at, as the existing contract has proven suboptimal (except perhaps for Amazon, in this context). This is why you see many projects (Redis, Elastic) with commercial potential (or actively generating revenue) moving away from open source licenses that allows for AWS to copy and serve their work wholesale without any compensation or rev share, even if they lose some business from doing so. A loss of some business is preferred to a level of business that is unsustainable for an org as a going concern.
s.gif
Customers want solutions, not software. Timescale is one vendor that figured this out and is having great success without any other enterprise licenses on the software itself. Others have at least started amplifying their cloud/support/managed offerings.

Elastic didn't, and AWS stepped to give customers what they wanted. And in case anyone wants to argue that it's impossible to compete with AWS, please go ask Snowflake how they built a better product on top of AWS itself and became one of the best SaaS success stories so far.

s.gif
The only question is: is it user hostile ? These changes are user hostile.

I don't care about VC's money to be honest. I design IT architectures for big banks, and apparent excessive greed from software companies tend to stress me.

s.gif
Plenty of major databases are able to operate in a self-sustaining manner while remaining completely open. NoSQL is a highly competitive field, it's not enough to go "well we should accept their behavior because they deserve special treatment".
s.gif
But you can actually multi-licence code. Put it under GPL and offer other licencing agreements for commercial actors. Copy-left inhibits commercial exploitation while your business licenses get you the money. That is entirely possible.
s.gif
> Copy-left inhibits commercial exploitation

It doesn't.

s.gif
Perhaps exploitation is too negative and it is not true for some cases, but most companies do not employ it or use it as a component because of the restrictive licence.
s.gif
It depends in usage and license. Not all business models suffer from requirement to provide source code to users. Even AGPL can be fine for company (e.g. MongoDB in AWS), if it sells service. And number of companies using Linux kernel licensed under GPL and even making their products based on it is countless.
s.gif
In that case I agree and kudos to those companies that embrace it and give back. I think it is a way many more companies could embrace if they modified their business strategy a bit.
s.gif
Basically, they make it clear their library doesn’t work with a different data store and thereby avoiding people reporting bugs due to incompatiblities between the two? It seems very reasonable.
s.gif
Reminds me of pfSense vs OPNsense saga
s.gif
It's hard to blame them when Amazon is eating their lunch.

Well, it's more like they made a sandwich, Amazon stole it, and now Amazon is selling it to other people. Meanwhile Elastic.co is starving.

It's not a fair game anymore.

I hope the DOJ breaks them up into three or more companies.

s.gif
I don't understand this argument. It's weird enough to me when folks say "steal" in the context of copyright, but in the context of FLOSS software, it seems wholly inappropriate. Third-party hosting of open source is the norm, not the exception, in my experience.

The antitrust argument seems orthogonal: lots of small companies make money hosting FLOSS software (Bitnami[0] comes to mind, but so do other hosting companies like Dreamhost). I just searched "elasticsearch hosting" on DDG[1], and Elastic is the 5th result, behind four companies I've never heard of. In many ways, that's exactly what makes FLOSS so attractive to me: if one provider isn't suitable, I can switch. I was very grateful for this in 2013 when I had a huge issue with a change in how LShift was hosting RabbitMQ, and I was able to move my company's cluster over to CloudAMQP instead. I had a similar issue with Elastic in 2016: they didn't offer compute-heavy nodes, but our application was compute heavy (geohashing-intensive for the majority of queries). I discussed with the Elastic sales team, and they said they had no timeline for this, so we migrated the company to AWS ElasticSearch.

So I guess I'm not sure how things are improved if Amazon is broken up...even if AWS had to stand on its own, this seems like it would remain a proven business strategy that customers value and is used across the industry.

[0]: https://bitnami.com/stack/elasticsearch [1]: https://duckduckgo.com/?q=elasticsearch+hosting

s.gif
This is a false narrative. Both Elastic and Amazon are making a killing.

> Elastic (NYSE: ESTC) ("Elastic"), the company behind Elasticsearch and the Elastic Stack, announced strong results for its fourth quarter and full fiscal year (ended April 30, 2021). Total revenue was $177.6 million, an increase of 44% year-over-year, or 39% on a constant currency basis.

s.gif
Is that true? Let's examine this claim a bit more closely. Net profit margin for the quarter -24.38%, net income for the quarter -43.3M, sales growth slowing by double-digit percent every year since 2018.

It's possible to take in a ton of money and still lose money.

s.gif
I'm gonna go out on a limb and say if they're losing money with $177 million in revenue it's because they're expanding quickly, made a bunch of acquisitions, etc... No way they're losing money on operations.

Edit - just looked it up, they have free cash flow for the year and a $400 million cash position. Tons of deferred revenue too.

s.gif
Revenue only tells us how much business they conducted, not how much of a "killing" they made. Profits is what would tell you that, which I guarentee are going to be far less than their $177.6 million in revenue.
s.gif
Look for Normalized EBITDA. It's -121! A negative number isn't really killing it. Unless your growth number is poppin!
s.gif
It is interesting, because Amazon used to claim [1] others would be doing to them exactly what they are doing to others today.

[1] My Conversation with Jeff Bezos (2000) - https://web.archive.org/web/20040215160333/http://www.oreill...

s.gif
They made an open sandwich recipe with help of common folks. Some were making at home but lot were buying from them as it is a complex sandwich to make. Starbuck started selling same sandwich because it wwas getting popular. Good for customers they have two competing vendors for it now. But ES is unhappy because the multi billion valuation is justified only if they are a monopoly.
Elastic, like others at the time, have used open source to their advantage - to start. The obvious one is that it is built around the Apache Lucene, and I have no doubt that this is one of the reasons ES ended up being initially released under the Apache license.

Secondly, being released under a permissive open source license definitely helped with its adoption. I was working as a senior developer in a UK Government department in 2013 and we had need for a full-text search engine for a project - and even before v1, ES was a contender, and it was eventually selected once v1 was released. This was largely due to a) ease of set-up/use and b) it’s release under an open source license. If it had been under AGPL, would we have still used it? Yes, probably - our specific use case wouldn’t have been affected by such a license, and the dept. was relatively open to more complex OSS licenses - but I have worked for several other orgs. where even just the AGPL would have resulted in a hard “no”.

Thirdly, ES has had contributions from a wide range of people. I honestly don’t know how I’d even begin to evaluate how much value ES has got from community contributions, but I feel it’s likely to be greater than the costs of managing those contributions.

But of course eventually Elastic got funding and had shareholders to placate. I don’t really have much sympathy for them about this conflict - their early choices were in part clearly made to maximise their value, and they decided to cash in on that value at a later date - the fact that those decisions had implications for their value should have been somewhat obvious to any investor who did any due diligence. I’m still not entirely convinced that the open source model is antithetical to commercialisation, but I think it does highlight how early decisions around OSS licensing can affect such processes.

s.gif
I keep hearing the AGPL-makes-corporations-wince argument, and I'm curious: what are the reasons given?

Does the AGPL really place such burdens on the organization such that the benefits of a locked-open, community-guaranteed (albeit popularity not guaranteed) technology aren't worthwhile?

Or is it kind of a cargo-culting and cultural-norms phenomenon where people don't use AGPL projects because they've heard that other people don't use them, thus continuing the cycle?

s.gif
I the early 2000's to mid 2010's I worked at an Architecture Firm (Buildings not Software) and at the time and likely now all plotters (large printers) depend on a software called Ghostscript. It's AGPL which prevents it from being bundled with any other software unless you negotiate a very expensive contract with their sales people.

If a company risks installing it or supporting it company wide then your risk lawsuit and having your business shutdown. The work-around is that for every user's computer IT must manually install the driver (a windows DLL file).

To me if I see AGPL it makes me think that it's likely a predatory business with good lawyers; and for this reason I would stay away from it even on personal projects unless there is no alternative.

s.gif
Thank you. For me the AGPL is at the complete other end of the spectrum: it's a stake in the ground to say "this software must be allowed to progress", even if the originating entity ceases to exist.

As I'll mention in a sibling thread, I think that lawyers can be extremely risk averse in software licensing (it's their professional incentive structure) and that is my guess where this cultural meme about AGPL comes from.

s.gif
In one of the of the organisations I mentioned, they had a strict policy against using any GPL dependencies, let alone the AGPL. I tried discussing this with the legal policy person but they were quite resolute - they feared it’s use could “infect” our code and therefore must be avoided.

I frankly doubt there’s any sort of cost-benefit analysis being done here. Certainly in my experience it was much more driven by legal uncertainty and risk-aversion.

s.gif
Thanks - yep, that matches my rough understanding of the view from the ground too.

Statisticians and scientists sometimes talk about 'type 1' errors and 'type 2' errors - false positives and false negatives. I can rarely remember which is which, but I think that generally, software license/contract legal professionals never want to advise a client about something that later turns out to be a liability.

That's fine, because it protects their firm's reputation, and it maintains the client's trust (which needs to be strong). But I think that in the context of software licenses, this has led to an overly strong aversion (and indeed self-replicating idea) about the AGPL and other copyleft licenses.

(in the context of cost-benefit, it's hard to justify the upside from using and helping contribute towards a software commons, but I think it can be significant, perhaps depending on project context and popularity)

This should be forked to be opensource compatible. Opensource doesn't mean that you get to control everything. If what you're offering doesn't stand on its own merits then what's it worth? I get that we wish Amazon couldn't do what they do but that's what the license permits.
s.gif
and I hope it was not my comments that triggered this "its Our libraries!" thing. https://news.ycombinator.com/item?id=27980470
s.gif
Rather than forking it seems easier to create an additional module and just subclass the two Transport classes and override the _do_verify_elasticsearch method. Looks like they've made it easy to modify that particular dependency, which is helpful.

And perhaps it'd be a good idea to override that method anyway, unless you're fine with your code doing some unnecessary handshake every first time you use a client.

s.gif
The maintainers of the project chose to implement this change. You may not like it, and you are free to make a fork compatible with the original apache 2.0 license, but it is well within Elastic's right to add this. This is the danger with corporate open source in general.

Facebook made a similar unilateral decision in relicensing React from Apache 2 to "BSD + Patents" many years ago. They faced quite a bit of pressure, resulting in another relicensing to MIT.

s.gif
They have the right to do so, but that doesn't change the fact that it's a bad idea politically. It comes off as Elastic prioritizing their war with Amazon over the community, leaving room for Amazon to continue to position themselves as the community's ally.
s.gif
At the end of the day, CTOs and decision makers in corporations don't go with the company that position themselves as "allies". They go with the company that is shipping important new features, with better support and resiliency. If Elastic is shipping features faster than AWS is, then they'll go with Elastic. I bet most CTOs aren't even aware of the politics happening - it's only the echo chamber in Hacker News that is making a big deal of this.
s.gif
Doesn't really matter, at most companies +150 employees CTOs are abstracted from making those decisions. It'll be senior level individual contributors that make the case across the org to other ICs.

At that level we're very much aware of the current situation. On a personal level the moment Elastic announced their changes I dropped all support for them moving forward.

They got out done on their own product. Elastic cloud had little value add and once managed ES came along it became pointless to use their cloud.

s.gif
> They go with the company

...that is the bigger name and perceived as lower risk, better enterprise support, and fits with less internal work in their broader strategy even if that means sacrificing feature speed.

I mean, at least, most of the established market, rather than startups burning through VC dollars. But even the latter is likely to do one stop shopping with a big cloud vendor like Amazon unless particular features of an alternative are key to their business.

s.gif
Indeed, that’s why Elastic has their own cloud service that you can host in AWS.
s.gif
No one is disputing it's within their rights. It's within their rights to add a feature to the library that blasts "Caramelldansen" through the nearest Chromecast device when you import it, too.

What they're saying is that this is a breach of community norms.

I’m not going to say that Elastic has no real competition, but as a whole package it stands alone. It’s speed and versatility definitely differentiate it from other similar systems. The fact that it can be a log aggregator, search engine, personalization system, seim, analytics tool, and forecasting tool make it a lot more useful than typical full-text search systems, often which can’t process aggregations or run in a single instance. I think they know their position in the market is strong and that helps them feel empowered to make risky moves with their licensing. I’d rather use anything else, but for interactive analytics (sub second response times on millions of data points) it’s tough to beat. It’s a fussy app to self-host or run in a non-managed cloud environment. I wish them the best, but worry these kinds of moves will inspire more competition from totally other projects, let alone it’s fork(s).
s.gif
A lot of people like myself just want to view logs and dont care for features like SIEM. This is where the OSS and OpenDistro versions shine.

I have revisited Loki after all this new though, but I think its still missing full text search.

s.gif
> but I think its still missing full text search.

That's very much by design, by not having indexing of the logging content it's much much leaner and efficient. It's aimed at a completely different use case where you already know the set of log streams you want to monitor.

s.gif
As long as they keep innovating to stay number 1, they will do just fine, even against AWS fork or new competitors.
Hacker News had a good discussion about this earlier today.

https://news.ycombinator.com/item?id=28103389

What surprised me was how many commenters seemed unhappy with Amazon's Open Source alternative and the level of resources they appeared to be committing to it.

s.gif
AWS is pretty well known for rolling out half baked products with understaffed teams especially when it's a re-hosting of an open source product. EKS was a mess for a long time and had a tiny team behind it from what I hear.
s.gif
Amazon is certainly the big bad wolf here, but I wouldn't be surprised if the move is also designed to inconvenience the small players like logz.io, Graylog, Searchly, etc who might be interfering with their flow of introductory level customers. A quick search reveals there are more of these providers than I ever realized.
Can't Amazon just serve these bogus headers and keep opensearch compatible? It's not like headers are copyrighted...
s.gif
Likely! It worked against SEGA, in the Sega v. Accolade case. Unlicensed Accolade (and later, SEGA Dreamcast homebrew games) displayed a Sega copyright logo because they were (believed to be) needed to run arbitrary executables.
s.gif
So the idea is that a poem, being a work of art, is actually copyrightable? Does this legal hack work?
s.gif
As a side note, although Accolade won this case, the injunction that SEGA won earlier prevented Accolade from selling product for a while and that cash flow problem motivated their primary investor to step in and remove the company's founder and all top executives and bring in their own team which directly led to the end of Accolade as a going concern (my opinion). Of course in this case, Amazon isn't as vulnerable as Accolade was back then.
s.gif
Amazon could try that, but it wouldn't stop there. they'd end up in a constant arms race.

also, that "you know, for search" tagline is featured on t-shirts sold by Elastic [0]. seems entirely plausible to me that Amazon & Elastic lawyers could get into a pissing contest about potential trademark or copyright protection of that term and whether sending it as an HTTP header constitutes infringement.

0: https://elastic.shop/products/you-know-for-search-t-shirt-st...

s.gif
I'm sure the lawyers would be happy to spent lots of time hashing out the details, but Sega v Accolade [1] would seem to be relevant and allow Amazon to use trademarks for interoperability.

[1] https://en.m.wikipedia.org/wiki/Sega_v._Accolade

s.gif
Apple won their Hackintosh copyright lawsuit against Psystar over their copyright haiku that the machine will refuse to boot if absent.

https://www.zdnet.com/article/apple-warns-off-os-pirates-wit...

s.gif
The haiku is “ourhardworkbythesewordsguardedpleasedontsteal(c)AppleComputerInc”, whereas the linked poem is loaded with DSMOS I think.
s.gif
Though I suppose they're now implicitly making it part of the API, which I think was ruled to fall under fair-use, though I'm not too sure about the details.
s.gif
“You Know, for Search” is sent in Elasticsearch 6.8 on AWS. Was it removed at some point?
s.gif
Amazon is doing the opposite. They're making their OpenSearch client compatible with both OpenSearch and ElasticSearch.
s.gif
No need for the headers just return a 401 or 403 on a GET to '/'.

> If call to '/' fails with 401 or 403 pass the check and show a warning (message will be linked later). This happens if the monitor permission missing for user. The subsequent checks must be ignored.

People who want open source search should take another look at Solr, it's doing fine.
s.gif
I don't exactly know why you're getting downvoted, but I will point out that while they play in the same sandbox, this is roughly equivalent to suggesting that someone rewrite their app because Solr and ES have absolutely stellar differences of opinion about the API used to access them, the amount of schema one must provide up-front to the document index, and the amount of operational hoops one must jump through to keep the service alive

Sure, there are about 5-6 open source search engines one could pick from in a greenfield project, with amazing differences of features and maturity, but choosing between Open Search and ElasticSearch is likely a "political" decision, since they are (ahem, mostly) API compatible and have very similar operational needs.

PostgreSQL should be everyone’s first choice for a data store. It can do so much, including serving as your full text search system.
s.gif
Elastic's main selling point is not so much the full text search.

The search is what it does but most of it's value is centered in the management/scaling/monitoring of full text search over many machines.

I love Postgres but it's "clustering" story is definitely not as user friendly.

s.gif
for logs (where you can shard on a modulo of timestamp) you might have luck with CitusDB (PostgreSQL sharding)
s.gif
Is there a real scaling? Like increasing node count on indexing latency or CPU metrics?
s.gif
Yes. You can scale ES to fairly massive data volumes. Postgres is a very different system with different design constraints.

There are plenty of peta-scale ES clusters in the wild

s.gif
Why would I do that when Elasticsearch is a proven search engine.
s.gif
I don’t know, maybe you yourself get confused. Why would you say this otherwise:

> I love Postgres but it's "clustering" story is definitely not as user friendly.

Then have a look at YugabyteDB.

s.gif
I love postgres and the full-text search feature works great in some use cases, but it is not really comparable to elastic search in many scenarios (huge document stores, complex text processing or querying, etc).
s.gif
For sure, but I would posit most startups and smaller stage companies can get by with it. It really comes down to indexing data properly and designing for your search patterns. If your search patterns are vast or change constantly, ES might be better, but if you just need basic text search over X attributes, Postgres will be sufficient.
s.gif
Do you know for sure that postgres doesn't perform as well as elasticsearch if you don't use the relational capabilities of postgres?

Instinctively I believe what you're saying, just wondering if you know for sure.

s.gif
Yes I know for sure. Postgres search is essentially an easier to use regex engine. If you have a recall-only use case and/or a small dataset, then that works great. As soon as you need multiple languages, advanced autocomplete, misspelling detection, large documents, large datasets, custom scoring, etc you need Solr or ES.
s.gif
While I don't doubt that you know your usecase and weighed/tried the option.

> Postgres search is essentially an easier to use regex engine.

I'm not sure exactly what you meant to convey here, but if you're searching with LIKE or `~` you're not doing Postgres's proper Full Text Search. You should be dealing with tsvectors[0]

> As soon as you need multiple languages

Postgres FTS supports multiple languages and you can create your own configurations[1]

> advanced autocomplete

I'm not sure what "advanced" autocomplete is but you can get pretty fast trigram searches going[2] (back to LIKE/ILIKE here but obviously this is an isolated usecase). In the end I'd expect auto complete results to actually not hit your DB most of the time (maybe I'm naive but that feels like a caching > cache invalidation > cache pushdown problem to me)

> misspelling detection

pg_similarity_extension[3] might be of some help here, but it may require some wrangling.

> large documents, large datasets,

PG has TOAST[4], and obviously can scale (maybe not necessarily great at it) -- see pg_partman/Timescale/Citus/etc.

> custom scoring

Postgres only has basic ranking features[5], but you can write your own functions and extend it of course.

Solr/ES are definitely the right tools for the job (tm) when the job is search, but you can get surprisingly far with Postgres. I'd argue that many usecases actually don't want/need a perfect full text search solution -- it's often minor features that turn into overkill fests and ops people learning/figuring out how to properly manage and scale an ES cluster and falling into pitfalls along the way.

[0]: https://www.postgresql.org/docs/current/textsearch-intro.htm...

[1]: https://www.postgresql.org/docs/current/textsearch-intro.htm...

[2]: https://about.gitlab.com/blog/2016/03/18/fast-search-using-p...

[3]: https://github.com/eulerto/pg_similarity

[4]: https://www.postgresql.org/docs/current/storage-toast.html

[5]: https://www.postgresql.org/docs/9.5/textsearch-controls.html...

s.gif
Scoring results in Postgres requires scanning all matches, which is slow if you have a lot of results.

Elastic search and other search solutions don’t have this problem.

s.gif
Searching a structured database just isn't the same as having a full on indexed search engine. Those are different tools for different usage.
s.gif
Even though we are currently replacing ES (hosted on elastic.co) with Postgres for ~100M docs + low QPS usecase, it's no real competition to Elasticsearch. There are better™ alternatives for niches (like Algolia), but nothing just works like elasticsearch at scales when not everything can fit in a single machine.
s.gif
a) Elasticsearch should not be used as a primary data store.

b) PostgreSQL does not compare to Elasticsearch when it comes to full text searching capabilities.

c) PostgreSQL has no vendor-supported, built-in solution for horizontal scalability which is a big reason why you would choose Elasticsearch over a more lightweight search system.

s.gif
Not a good story with tokenizing asian languages. And even the way how to tokenizes roman languages is not that great.

However, it does get one up to that 80% mark for text search. But that other 20% is why Elasticsearch and Algolia etc exists.

s.gif
How much of Elastic's usage is ELK logging vs. application search?
s.gif
Are you aware that Lucene, the technology that powers ElasticSearch runs on top of SQL?
Oh my. I'm not sure if there is any Open Source company which fell so low - to Break the Drivers... I'm also not sure what they are looking to reach with this step as Developers who want their applications to work both with Elastic and OpenSearch will not need to go right into AWS hands...
It’s nigh time to fork the ElasticSearch API.
What differences cause issues?
Good? Amazon deserves no support in branding their version 'open'.
s.gif
You do realize that Amazon would have been happy selling hosting for the official Elastic code until Elastic forced them into a situation where the pull the rug out of all of their customers or fork?

Amazon is actually the more open company when it comes to Elastic at this point. Open source does best when it’s not the meal ticket of the company developing it.

s.gif
It's a bit clunky, but Amazon's fork also comes with security by default. ES open-by-default has caused breaches for years. Amazon's is a bit clunky to setup but at least comes with defaults where you are less likely to leak your data

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK