Official Elasticsearch Python library no longer works with open-source forks
source link: https://news.ycombinator.com/item?id=28110610
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
I'm not sure I can recommend them anymore. The company doesn't seem to understand their customer's journey and are rejecting the methods that brought them success.
Imho, the problem isn’t “well it’s not open source anymore then.” The problem is people who give more weight to a religion/philosophy than sustainability. Who cares if it’s open source if the people developing it can’t support themselves?
Fair code > open source.
Sony forked FreeBSD and used it as a substantial component of their operating system for their PlayStation 5, a product which they will make money off (in the long run, at least). No changes are available to anyone. No one at FreeBSD is jumping up and down about it. Maybe because there's more to a software project than maximising profit.
They or anyone else are also free to relicense their project, there's nothing wrong with it at all and anything under the old license is still what it was before.
It seems kinda strange to demand that someone operate under some license in perpetuity, there is no such rule. Licensing questions don't stop after the inception of a project.
I also personally can’t stand Elastic, because they charge enterprise rates for support, and then offer open source level support. You’ll come across a show stopping issue, and the support you’ll get is most commonly “why are even trying to do this?”, or “please refer to this issue that’s been stale since 2016”.
My general approach has been, when I’m capable and have the time to, to fix any issues or obvious feature deficiencies that I come across in open source projects that I use. I still do that, but I now only contribute my changes back to projects that seem more trustworthy.
None of that helps with a situation like Audacity/MuseCore though, if developers are willing to sell out their project copyrights, that isn't something that you can really protect against, except maybe discussing people's opinions on that openly.
The other issue with withholding changes from upstream is the potentially infinite cost of updating your changes as the project evolves, things like git-imerge, mergify or git-mergify-rebase can reduce that burden by letting you do incremental rebases/merges though. Normally I don't contribute to projects with a CLA assigning extra rights to corporations over the license, but I've been considering signing one just to drop the maintenance burden.
https://github.com/mhagger/git-imerge https://github.com/brooksdavis/mergify https://github.com/CTSRD-CHERI/git-mergify-rebase
But the more important criteria is deciding what changes are feasible to support with a fork. The projects that I find to be most suspect typically come with an enterprise support license, and if you’re struggling to get your issues fixed with one of them, then the best long term solution is usually to abandon the product. Forking for a bug fix would typically be a temporary solution, and hopefully the first merge conflict you run into is the vendor actually fixing the issue, otherwise you can take your time to find a product that offers better value for money.
There’s lots of things I don’t especially like about the large enterprise workplace, but I really detest vendors that fleece them with high price, low quality products/services.
A nice anecdote I heard a long time ago was comparing the approach to custom engineering between SuSE and Red Hat.
SuSE was always very happy to do custom engineering for paying customers and developed and shipped these features in their Linux Distribution. Specifically I am thinking of some interesting features done in the Kernel. At that point, SuSE received money for the engineering work but now had the burden of supporting their fork. The functionality code itself was not a problem, but the interfaces to the remaining kernel were a source of churn and pain.
Red Hat did the opposite. They told their customers that they can have whatever is in the upstream Kernel and they will help them upstream the necessary changes. That took markedly longer but the long term maintenance was much less effort because the in-tree code would be updated whenever an API/interface changed.
Canonical also had a tendency to happily fork of whatever project they needed to ship a fancy thing on time (think Netbook UI, think Unity etc.) and then get hit by the long term maintenance burden once upstream diverged.
Both SuSE and Canonical always found themselves in the unenviable position of having to constantly update their code and potentially seeing a competing but conflicting solution being merged upstream.
Carrying a few bugfixes or feature improvements in a local branch of a library is easy at first. But you're essentially creating technical debt which you'll need to pay off on every upstream change, every security fix etc. Most companies see value in developing features for their main application, not in maintaining internal forks of open source libraries.
Publicise your work as open source and make hay with the work of open source contributors who thought they were contributing to a truly free and open source project only to find out a couple of years later that their contributions are now locked into a project that locked itself up with a changed license... (I know that their contributions may stay with the license it originally was, but if the total product changes license, who is going to continue an unmaintained fork? )
IMO, it is fair criticism when someone starts off with a liberal license and then changes it when it is time to monetise, or someone else figures out how to monetise it better than the creators. Either be okay with it or not, you can't have it both ways.
What would you have them do? Would only their demise appease you?
I disagree. I think it was hard to know that many years ago (10+?) how things would turn out.
Especially not "entirely foreseeable."
There's a cognitive bias called "hindsight biasy", namely "the common tendency for people to perceive past events as having been more predictable than they actually were", https://en.m.wikipedia.org/wiki/Hindsight_bias
I agree with you about AGPL.
The GPL exists because of the problem of other people taking your code and hiding it in their products, placing you at a disadvantage.
The AGPL exists to extend that protection regarding SAAS.
According to Wikipedia, the AGPL is from 2007, and Elasticsearch started 4 years later.
I disagree that you need hindsight not to be surprised by how Elasticsearch was used by e.g. Amazon.
The "sharing" bit is on top, and only comes out if you distribute your changes too. AGPL fixes the flaw where "users" don't really get a copy of the software they are using distributed to them in full (eg. only part of it with JS in the browser, but backend stuff is hidden).
So the focus point of GPL is use of the software (thus users), not writing of it (developers). Developers embrace it because they are simultaneously users of the software written by others, and they are best positioned to make the most out of those liberties ("standing on the shoulders of giants").
In case of success, which Elastic has certainly achieved with ElasticSearch, it is fully expected that other companies will jump in on the bandwagon and try to profit off of it too! And as a company, you hope for success, or rather, that to play out, but just that you'd be the go-to for earning the most off the product you created. While not foreseeable, it is not unexpected that you might not be the one to profit most from your product, and you should plan to profit enough! Where it gets complicated is that nobody expects to earn orders of magnitude less from a product they created than others relying on it.
What they did not foresee was that one of those companies would be The Cloud Provider, thus minimizing their value proposition since Amazon can throw significant resources at it, possibly even greater than Elastic themselves. One could argue that Amazon abused their monopolistic position in cloud providing to offer a bundled ElasticSearch experience that Elastic could never compete with. Even if they developed an alternative in-house product, it looks exactly the same as Microsoft bundling Internet Explorer with Windows back in the day.
Even today, if you are willing to develop an open source or free software product, if you get successful enough, you are likely to be an exploitation target of a megacorp. Companies always have an option to relicense in-house written code, and any code submitted by signatories of an appropriate contributor license agreement (CLA).
Basically, I agree it wasn't nice of Amazon, and that it wasn't foreseeable, but it ultimately wasn't unexpected either. To me, this is monopolistic behaviour that should be treated as such.
This also raises another interesting question: if AGPL is an appropriate solution, why did Elastic not relicense under it today? (I am sure they answered this very question when they published their original license, so it's mostly rethorical)
And this is where the problem basically is. The core philosophy of open source licensing is that the creators who use it don't particularly care beyond compliance with the terms of the licenses (eg. Source distribution for copy left licenses)
If it turns out you *do* in fact care, then Open Source licenses are not a good fit for you. If you still go ahead release under Open Source and then change later, don't be surprised to be called out for bait and switch manipulation.
The long-term commercial advantage in Free Software being for the largest established players with revenue mechanisms centered around services rather than selling software licenses was widely recognized, with many observers independently coming to the conclusion, from at least the mid-1990s.
That is because xBSD have more than enough people to develop on it - hobbyists and companies alike.
For smaller software with less of an established ecosystem, the supply of people willing to work on it for free is... not exactly much.
The homebrew maintainer couldn’t even get hired at Google [2] if I recall, even though they’re big fans of using the tooling internally!
[1] https://arstechnica.com/information-technology/2014/04/tech-... (OpenSSL Software Foundation President Steve Marquess wrote in a blog post last week that OpenSSL typically receives about $2,000 in donations a year and has just one employee who works full time on the open source code.)
[2] https://twitter.com/mxcl/status/608682016205344768 (Google: 90% of our engineers use the software you wrote (Homebrew), but you can’t invert a binary tree on a whiteboard so fuck off.)
Nobody is owed a sustainable ecosystem to profit off of open source. When things align to be mutually beneficial, that’s great. But by the nature of it, if you want to make money off of software, you should not release it as open source. It’s probably going to eventually wind up being a conflict of interest, wherein the “open” parts of a project eventually become less and less relevant in favor of closed parts.
Open source doesn’t and shouldn’t guarantee a sustainable business model. The best you can hope for is that parties collaborate because they can benefit mutually from this collaboration, like with the Linux kernel.
Your problem transcends one whiner on Twitter; private entities can monopolize public agency across contexts and not be held accountable at all.
Here you are in a VC echo chamber; good luck
Of course, once you add-on that they've basically done to Lucene what AWS did to Elasticsearch, then the whole thing is a joke. I also just checked and Elastic are NOT sponsers of the ASF: https://www.apache.org/foundation/thanks.
Absolute garbage company in terms of open source and community. I will never willingly install their software ever again.
1) Elastic never forked Lucene.
2) Elasticsearch is not a competitor to Lucene.
3) Elastic developers are PMC members and committers for the Lucene project.
The person you responded to didn't even assert that open source software is morally superior to alternatives, but instead said that by moving away from open source, Elastic is failing to understand its customers. His argument doesn't require any judgement about the morality of the relicense.
2) Open source doesn't attract new customers. There's nothing special about having software where anyone can contribute freely, that makes it attractive to customers. There is however, something special in making it free to download, setup and use in your own servers that makes it attractive. Which is still the case.
That may be true for some customers, but certainly not all. Being able to fix a bug or add a feature yourself instead of having to wait for a vendor to do it for you can be very attractive. Especially if you need a feature that would never be accepted upstream, and you have the ability to use your own fork/patchset. That is something my company has done many times, and is a factor when choosing software.
Community, support, acquisition, extendability and code quality.
Community/Support -- lots of documentation available for free, generally multiple vendors that offer paid support, free access to updates. Don't have any data but I suspect finding people to support/run it is easier (skills are more marketable and easier to learn on your own)
Acquisition -- eg less of a vetting/auditing process for places where that matters (you can handle vetting the code and supporting company separately)
Extendability -- you can hire developers to add features or integrate without being subject to the owner's timeline e.g. you don't have to plead for features
Code quality -- being open, it's more apparent if the software has a high number of bugs or large amount of technical debt that could lead to instability. Usually the issue trackers for OSS are open to everyone
These apply more to large companies with dedicated teams
It’s their fault for not differentiating their offering enough.
When I looked at logz.io, I spent hours trying to get it to work and ultimately gave up, irritated that my seemingly plain vanilla use case (send ubuntu journald logs to ES) wasn’t as straightforward as I would’ve liked. (I’m aware they aren’t affiliated with elastic, just saying there might have been an opportunity for them here)
If elastic had a way for me to blindly copy paste things into an ubuntu server that allowed me to see all my systemd logs, I would’ve happily paid them. Instead it was an endless maze of having to figure out beats vs logstash, finding a journalbeat whose documentation says it’s beta, etc.
Yet ultimately the only thing that worked was AWS ES service hooked up to vector.dev’s agent. And boy does it work amazingly well. And it’s not like Amazon is doing anything special.
It’s on them for not having differentiated and made things drop dead simple. Now they seem to want to play dirty like Oracle.
When AWS released ES you couldn't even dynamically scale cluster nodes and you had to use Amazon's special library to sign requests for IAM
There are other companies that run curated & managed SaaS on providers like AWS, GCP, etc that offer better services than the native ones. For instance, MariaDB SkySQL offers DBAs with their product and will help tune the DB for the workload vs AWS where they offer limited app-level support outside suggesting things like Performance Insights. When I looked, SkySQL was also a bit cheaper than AWS RDS for comparable hardware
When customers see Elastic circling the wagons to remove some of the differentiators of their open source software, they're going to have to reconsider their options.
Elastic didn't, and AWS stepped to give customers what they wanted. And in case anyone wants to argue that it's impossible to compete with AWS, please go ask Snowflake how they built a better product on top of AWS itself and became one of the best SaaS success stories so far.
I don't care about VC's money to be honest. I design IT architectures for big banks, and apparent excessive greed from software companies tend to stress me.
It doesn't.
Well, it's more like they made a sandwich, Amazon stole it, and now Amazon is selling it to other people. Meanwhile Elastic.co is starving.
It's not a fair game anymore.
I hope the DOJ breaks them up into three or more companies.
The antitrust argument seems orthogonal: lots of small companies make money hosting FLOSS software (Bitnami[0] comes to mind, but so do other hosting companies like Dreamhost). I just searched "elasticsearch hosting" on DDG[1], and Elastic is the 5th result, behind four companies I've never heard of. In many ways, that's exactly what makes FLOSS so attractive to me: if one provider isn't suitable, I can switch. I was very grateful for this in 2013 when I had a huge issue with a change in how LShift was hosting RabbitMQ, and I was able to move my company's cluster over to CloudAMQP instead. I had a similar issue with Elastic in 2016: they didn't offer compute-heavy nodes, but our application was compute heavy (geohashing-intensive for the majority of queries). I discussed with the Elastic sales team, and they said they had no timeline for this, so we migrated the company to AWS ElasticSearch.
So I guess I'm not sure how things are improved if Amazon is broken up...even if AWS had to stand on its own, this seems like it would remain a proven business strategy that customers value and is used across the industry.
[0]: https://bitnami.com/stack/elasticsearch [1]: https://duckduckgo.com/?q=elasticsearch+hosting
> Elastic (NYSE: ESTC) ("Elastic"), the company behind Elasticsearch and the Elastic Stack, announced strong results for its fourth quarter and full fiscal year (ended April 30, 2021). Total revenue was $177.6 million, an increase of 44% year-over-year, or 39% on a constant currency basis.
It's possible to take in a ton of money and still lose money.
Edit - just looked it up, they have free cash flow for the year and a $400 million cash position. Tons of deferred revenue too.
[1] My Conversation with Jeff Bezos (2000) - https://web.archive.org/web/20040215160333/http://www.oreill...
Secondly, being released under a permissive open source license definitely helped with its adoption. I was working as a senior developer in a UK Government department in 2013 and we had need for a full-text search engine for a project - and even before v1, ES was a contender, and it was eventually selected once v1 was released. This was largely due to a) ease of set-up/use and b) it’s release under an open source license. If it had been under AGPL, would we have still used it? Yes, probably - our specific use case wouldn’t have been affected by such a license, and the dept. was relatively open to more complex OSS licenses - but I have worked for several other orgs. where even just the AGPL would have resulted in a hard “no”.
Thirdly, ES has had contributions from a wide range of people. I honestly don’t know how I’d even begin to evaluate how much value ES has got from community contributions, but I feel it’s likely to be greater than the costs of managing those contributions.
But of course eventually Elastic got funding and had shareholders to placate. I don’t really have much sympathy for them about this conflict - their early choices were in part clearly made to maximise their value, and they decided to cash in on that value at a later date - the fact that those decisions had implications for their value should have been somewhat obvious to any investor who did any due diligence. I’m still not entirely convinced that the open source model is antithetical to commercialisation, but I think it does highlight how early decisions around OSS licensing can affect such processes.
Does the AGPL really place such burdens on the organization such that the benefits of a locked-open, community-guaranteed (albeit popularity not guaranteed) technology aren't worthwhile?
Or is it kind of a cargo-culting and cultural-norms phenomenon where people don't use AGPL projects because they've heard that other people don't use them, thus continuing the cycle?
If a company risks installing it or supporting it company wide then your risk lawsuit and having your business shutdown. The work-around is that for every user's computer IT must manually install the driver (a windows DLL file).
To me if I see AGPL it makes me think that it's likely a predatory business with good lawyers; and for this reason I would stay away from it even on personal projects unless there is no alternative.
As I'll mention in a sibling thread, I think that lawyers can be extremely risk averse in software licensing (it's their professional incentive structure) and that is my guess where this cultural meme about AGPL comes from.
I frankly doubt there’s any sort of cost-benefit analysis being done here. Certainly in my experience it was much more driven by legal uncertainty and risk-aversion.
Statisticians and scientists sometimes talk about 'type 1' errors and 'type 2' errors - false positives and false negatives. I can rarely remember which is which, but I think that generally, software license/contract legal professionals never want to advise a client about something that later turns out to be a liability.
That's fine, because it protects their firm's reputation, and it maintains the client's trust (which needs to be strong). But I think that in the context of software licenses, this has led to an overly strong aversion (and indeed self-replicating idea) about the AGPL and other copyleft licenses.
(in the context of cost-benefit, it's hard to justify the upside from using and helping contribute towards a software commons, but I think it can be significant, perhaps depending on project context and popularity)
And perhaps it'd be a good idea to override that method anyway, unless you're fine with your code doing some unnecessary handshake every first time you use a client.
Facebook made a similar unilateral decision in relicensing React from Apache 2 to "BSD + Patents" many years ago. They faced quite a bit of pressure, resulting in another relicensing to MIT.
At that level we're very much aware of the current situation. On a personal level the moment Elastic announced their changes I dropped all support for them moving forward.
They got out done on their own product. Elastic cloud had little value add and once managed ES came along it became pointless to use their cloud.
...that is the bigger name and perceived as lower risk, better enterprise support, and fits with less internal work in their broader strategy even if that means sacrificing feature speed.
I mean, at least, most of the established market, rather than startups burning through VC dollars. But even the latter is likely to do one stop shopping with a big cloud vendor like Amazon unless particular features of an alternative are key to their business.
What they're saying is that this is a breach of community norms.
I have revisited Loki after all this new though, but I think its still missing full text search.
That's very much by design, by not having indexing of the logging content it's much much leaner and efficient. It's aimed at a completely different use case where you already know the set of log streams you want to monitor.
https://news.ycombinator.com/item?id=28103389
What surprised me was how many commenters seemed unhappy with Amazon's Open Source alternative and the level of resources they appeared to be committing to it.
also, that "you know, for search" tagline is featured on t-shirts sold by Elastic [0]. seems entirely plausible to me that Amazon & Elastic lawyers could get into a pissing contest about potential trademark or copyright protection of that term and whether sending it as an HTTP header constitutes infringement.
0: https://elastic.shop/products/you-know-for-search-t-shirt-st...
https://www.zdnet.com/article/apple-warns-off-os-pirates-wit...
> If call to '/' fails with 401 or 403 pass the check and show a warning (message will be linked later). This happens if the monitor permission missing for user. The subsequent checks must be ignored.
Sure, there are about 5-6 open source search engines one could pick from in a greenfield project, with amazing differences of features and maturity, but choosing between Open Search and ElasticSearch is likely a "political" decision, since they are (ahem, mostly) API compatible and have very similar operational needs.
The search is what it does but most of it's value is centered in the management/scaling/monitoring of full text search over many machines.
I love Postgres but it's "clustering" story is definitely not as user friendly.
There are plenty of peta-scale ES clusters in the wild
> I love Postgres but it's "clustering" story is definitely not as user friendly.
Then have a look at YugabyteDB.
Instinctively I believe what you're saying, just wondering if you know for sure.
> Postgres search is essentially an easier to use regex engine.
I'm not sure exactly what you meant to convey here, but if you're searching with LIKE or `~` you're not doing Postgres's proper Full Text Search. You should be dealing with tsvectors[0]
> As soon as you need multiple languages
Postgres FTS supports multiple languages and you can create your own configurations[1]
> advanced autocomplete
I'm not sure what "advanced" autocomplete is but you can get pretty fast trigram searches going[2] (back to LIKE/ILIKE here but obviously this is an isolated usecase). In the end I'd expect auto complete results to actually not hit your DB most of the time (maybe I'm naive but that feels like a caching > cache invalidation > cache pushdown problem to me)
> misspelling detection
pg_similarity_extension[3] might be of some help here, but it may require some wrangling.
> large documents, large datasets,
PG has TOAST[4], and obviously can scale (maybe not necessarily great at it) -- see pg_partman/Timescale/Citus/etc.
> custom scoring
Postgres only has basic ranking features[5], but you can write your own functions and extend it of course.
Solr/ES are definitely the right tools for the job (tm) when the job is search, but you can get surprisingly far with Postgres. I'd argue that many usecases actually don't want/need a perfect full text search solution -- it's often minor features that turn into overkill fests and ops people learning/figuring out how to properly manage and scale an ES cluster and falling into pitfalls along the way.
[0]: https://www.postgresql.org/docs/current/textsearch-intro.htm...
[1]: https://www.postgresql.org/docs/current/textsearch-intro.htm...
[2]: https://about.gitlab.com/blog/2016/03/18/fast-search-using-p...
[3]: https://github.com/eulerto/pg_similarity
[4]: https://www.postgresql.org/docs/current/storage-toast.html
[5]: https://www.postgresql.org/docs/9.5/textsearch-controls.html...
Elastic search and other search solutions don’t have this problem.
b) PostgreSQL does not compare to Elasticsearch when it comes to full text searching capabilities.
c) PostgreSQL has no vendor-supported, built-in solution for horizontal scalability which is a big reason why you would choose Elasticsearch over a more lightweight search system.
However, it does get one up to that 80% mark for text search. But that other 20% is why Elasticsearch and Algolia etc exists.
Amazon is actually the more open company when it comes to Elastic at this point. Open source does best when it’s not the meal ticket of the company developing it.
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK