Ask HN: Have you used SQLite as a primary database?
source link: https://news.ycombinator.com/item?id=31152490
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Ask HN: Have you used SQLite as a primary database?
Ask HN: Have you used SQLite as a primary database? 386 points by barryhennessy 9 hours ago | hide | past | favorite | 264 comments I periodically hear about projects that use/have used sqlite as their sole datastore. The theory seems to be is that you can test out an idea with fewer dependencies (and cost) and that it scales surprisingly far.
There are even distributed versions being built for reliability in the cloud: dqlite by canonical (of Ubuntu fame) and rqlite
Given the complexity it seems like there are use cases or needs here that I'm not seeing and I'd be very interested to know more from those who've tried.
Have you tried this? Did it go well? Or blow up? Were there big surprises along the way?
- https://sqlite.org - https://dqlite.io - https://github.com/rqlite/rqlite
I use SQLite in production for my SaaS[1]. It's really great — saves me money, required basically no setup/configuration/management, and has had no scaling issues whatsoever with a few million hits a month. SQLite is really blazing fast for typical SaaS workloads. And will be easy to scale by vertically scaling the vm it's hosted on.
Litestream was the final piece of the missing puzzle that helped me use it in production — continuous backups for SQLite like other database servers have: https://litestream.io/ With Litestream, I pay literally $0 to back up customer data and have confidence nothing will be lost. And it took like 5 minutes to set up.
I'm so on-board the SQLite train you guys.
[1] https://extensionpay.com — Lets developers take payments in their browser extensions.
I thought people complained how MySQL sucks and PostgreSQL rocks for being right and SQLite was nowhere near being right or performant. (Things seem to be getting better with strict column types these days.)
I've recently migrated a smallish service from MySQL to PostgreSQL and figured it's quite a work if you're not careful writing by the SQL standard which means if the service had gotten bigger, your chance of moving away from SQLite kind of walks away.
So, why not use a safer choice to begin with? Nothing is complicated running MySQL/PostgreSQL unless you've sold yourself to AWS to care for the cost and don't know how to run a DB instance yourself.
SQLite is fine when all your load can be served by a single backend process on a single machine. The moment you need multiple backends to handle more load, or the moment you need high availability, you can't do it with SQLite. SQLite has very limited DDL operations, so you also can't evolve your schema over time without downtime. Now, for streaming backups - how do you come back from node failure? You're going to incur downtime downloading your DB.
I run many SQL backed production services, and my sweet spot has become a big honking Postgres instance in RDS in AWS, with WAL streaming based replicas for instant failover in case of outage, and read replicas, also via WAL. This system has been up for four years at this point, with no downtime.
I love SQLite, and use it regularly, just not for production web services.
I know several people who build projects like that. It take them months to get a working product, just to discover it doesn't interest people or doesn't work like they expected. If for every piece of tooling you go for the "safe" and most performant one you gain bloat and complexity real quick.
People underestimate "simple" tech performance, in 99% of projects by the time your bottleneck is your DB system I can assure you that it'll be the least of your concerns
- Setting up an out-of-repo config file on the server with your MySQL credentials
- Setting up a backup script for your server data
It's only about an hour of work total, but it's an hour of work that I hate doing.
- You make sure the production version is larger or equal than the development, or you make sure to not use new features before they reach production, what is quite easy. There is no problem with different OSes (except for Windows itself not being very reliable, but I imagine you are not using Windows on production, as it's another one of those labor-generating techs).
- Trusting a local user is the same level of security you get with SQLite, no credentials required.
- And setting a backup script... Wait, you don't do that for SQLite? There's something missing here.
Yes, there are a lot of small tasks that add up when setting some new software. It's a pain. But it's a pain you suffer once, and it's over. It's worth optimizing, but not at any ongoing cost.
I'm not a professional db engineer but one point is that there doesn't seem to be a way to create functions in SQLite which would mean creating triggers on various tables can cause excessive amount of duplicate code.
If I rely on PostgreSQL, I feel covered for my use case for web apps but once you hit some little gotchas in SQLite, you may regret about saving 10 minutes (install db and set up a password) for nothing.
When we're talking about risks, think security exploits: how is sqlite3 more likely to get your data leaked, or flat out copied in its entirety, compared to using a mysql/postgres/etc.
With a separate DB you may have a hope of detecting when someone hacked your app. But without that firewall, the question becomes: how much of the data in my SQLite can now be trusted? If you don't know what backup is safe to restore, then you can't trust any of it.
Again, this is about layers. Not saying MySQL/Postgres will save you. But they can increase the odds.
1. You're fucked. The end. It doesn't matter whether you were using mysql, postgres, or sqlite3, or S3, or Redis, or any other server your app was connecting to: they can just look at your environment vars.
That's not going to happen "because you're using Sqlite3", that's going to happen because you used some obscure server software, or worse, rolled your own.
People really do seem to put too much faith into "it has a username and password, it's more secure". It's not: if someone has access to your actual server, they have access to everything your server has access to. Sqlite3 is no more or less secure than a dbms daemon (or remote) in that sense.
- needs to be provisioned and configured
- needs additional tooling and operational overhead
- comes with a _large_ performance overhead that is only won back if you have quite a significant load - especially writes, which means the vast majority of web projects are slower and require more resources than they should.
- it makes the whole system more complex by definition
It is a cost-benefit thing that tilts towards RDBMS as soon as you need to sustain very high transactional loads and want a managed, individually accessible server that you can query and interact with while it's running in production.
But if it is just "a website that needs durability" then you haven't yet shown how that tradeoff is worth it.
I’ve used SQLite in production once and it worked great. But that was a very simple app. For more complex (but not always higher traffic) I’m leaning more and more on postgresql and less on my middleware, like moving business logic to the database when it makes sense.
My understanding is that for read performance SQLite is pretty damn good, outperforming MySQL and Postgres in both single and concurrent tests. The key performance issue is the single global write lock. If your data access pattern is massively read biased then SQLite is a good choice performance wise, if you see a lot of write activity then it really isn't.
With regard to being correct*, it offers proper ACID transactions and so on. Typing is a big issue for some but far from all. It is significantly more correct than mysql used to be back before InnoDB became the default table type in ~2010, at least as correct as it now (aside from the data types matter depending on which side of that you sit on).
Take a look at the Consider SQLite post I linked. They address your performance questions too.
For me, SQLite was a nice way to simplify launching and running my SaaS business and has had no downsides.
As for scaling if I need it I can increase disk space for the app server or scale out horizontally/vertically. Don't need that yet so I'm waiting for more details in the future to decide how to handle that.
That’s what I haven’t seen mentioned anywhere, if the database is part of the application, how do you switch from one version to another without downtime.
So no matter how optimized MySQL and PostgreSQL are, SQLite will run rings around them for basic SELECT queries.
While "people" complaining is basically meaningless, I don't know why they'd be doing that about SQLite. It's used in most phones, all copies of Windows 10+, and countless other places.
This is not a guarantee Litestream makes (nor it can, since replication is async).
You'll lose things to catastrophic failures, but chances are you'd be able to restore to a last known good checkpoint.
I say "inspiring" because using SQLite reminds me of the simplicity and productivity from coding for the "early web" that lost 10-15 years ago. The days when you could spin up a website without worrying about a bunch of ancillary services and focus on the app itself.
For me, SQLite's lack of online schema changes seems like perhaps the biggest blocker to actual production. I've never had a production project where the schema didn't change a lot.
> SQLite supports an unlimited number of simultaneous readers, but it will only allow one writer at any instant in time. For many situations, this is not a problem. Writers queue up. Each application does its database work quickly and moves on, and no lock lasts for more than a few dozen milliseconds. But there are some applications that require more concurrency, and those applications may need to seek a different solution.
If your application needs to support hundreds of concurrent writes a second you shouldn't use SQLite. But that's a pretty high bar!
Is not a very useful performance metric. What is your peak hits per second?
There are single node/CPU solutions that can process 10-100 million business events per second. I am almost certain that no one logged into HN right now has a realistic business case on their plate that would ever come close to exceeding that capacity.
E.g.: https://lmax-exchange.github.io/disruptor/disruptor.html
This stuff isn't that hard to do either. It's just different.
"This effort is incomparable with litestream: Verneuil is meant for asynchronous read replication, with streaming backups as a nice side effect. The replication approach is thus completely different. In particular, while litestream only works with SQLite databases in WAL mode, Verneuil only supports rollback journaling"
The only thing I have against using SQLite in production (for my needs) is the lack of at rest encryption and row level permissions by user.
You don't do row level permissions on your database. You keep it all on the application layer.
When I first came and saw it, it...did not sound right. But I didn't want to be the guy who comes in and says "you are doing it wrong" month 1. So I went along with it.
Of course, eventually problems started to pop up. I distinctly remember that the ingestion (happening via a lot of Kafka consumers) throughput was high enough that SQLite started to crumble and even saw WAL overruns, data loss etc. Fortunately, it wasn't "real" production yet.
I suggested we move to Postgres and was eventually able to convince everyone from engineers to leadership. We moved to a custom sharded Postgres (9.6 at the time). This was in 2016. I spoke to people at the place last month, and it's still humming along nicely.
This isn't to illustrate anything bad about SQLite, to be clear! I like it for what it does. Just to show at least 1 use case where it was a bad fit.
SQLite was a tempting first answer, but what solved it was Postgres, and we eventually offloaded a lot of aggregation tables to Clickhouse and turned the whole thing into a warehouse where the events got logged.
I guess the take away here is that this underscores that sqlite isn't for the 'large number of writers' scenario.
p.s. > I didn't want to be the guy who comes in and says "you are doing it wrong" month 1 Very wise
Do you have any more information about the situation?
Could it have been in the hand spun partitioning logic instead of SQLite?
What was the ingestion throughout roughly?
It's been more than 5 years but from what I remember, it definitely was _not_ the partitioning logic (the sharding just meant we had a huge amount of files that were hard to organise). But a single consumer doing heavy writes on a single SQLite file would see enough traffic that pretty soon you would start to see errors and your writes would start to break.
Sadly, we had to move to Postgres and eat all that scaling complexity. :(
I believe the high level approach he's taking is essentially: 1. Concurrently execute the multiple write transactions in parallel. 2. Sequentially write the changed pages to the WAL. *[3] If a previous transaction causes the next to compute differently (conflict), then rerun that next transaction & then write.
The way to detect if were conflicts is essentially:
1. Keep track of all the b-tree pages accessed before running the transaction 2. Check the WAL if any previously transaction modified one of those b-trees. If so, this means we have to rerun our transaction.
I've seen it done in software transactional memory (STM) systems as well. It's really beautifully simple, but I think there are a lot of devils in the details.
[1] https://github.com/sqlite/sqlite/blob/9077e4652fd0691f45463e...
[2] https://github.com/sqlite/sqlite/compare/begin-concurrent
[3] * Write to the WAL, so that parallel transactions see a static snapshot of the world.
When not to use sqlite:
- Is the data separated from the application by a network?
- Many concurrent writers?
- Data size > 280 TB
For device-local storage with low writer concurrency and less than a terabyte of content, SQLite is almost always better.
[1]: https://pouchdb.com/
Isn't MySQL MyISAM faster and this way constitute a better choice for a scientific number crunching application? I mean near 4GB DB, very simple schema, heavy reading load, little/no inserts and no updates.
Since this is a no-update and no live-insert scenario we're talking about, it's fairly easy to produce code that is an order of magnitude faster than a DBMS, since they're not only primarily optimized for efficiently reading off disk (an in-memory hash table beats a B-tree every day of the week), they've got really unfortunate CPU cache characteristics, and additionally need to acquire read locks.
If you're just using the database for object persistence, which is common, it doesn't matter all too much. But that's not really the scenario we're discussing here, since the data is by the original problem statement, immutable.
Two features I enjoy in ActiveRecord and other ORMs, and why I would consider them a good standard practice for most things that aren't "toy" projects.
1. Easy chainability. In ActiveRecord you can have scopes like `User#older_than_65` and `User.lives_in_utah` and easily chain them: `User.older_than_65.lives_in_utah` which is occasionally very useful and way more sane than dynamically building up SQL queries "by hand."
2. Standardization. Maintenance and ongoing development (usually the biggest part of the software lifecycle) tend to get absolutely insane when you have N different coders doing things N different ways. I don't love everything ActiveRecord does, but it's generally quite sane and you can drop a new coder into a standard Rails project and they can understand it quickly. On a large team/codebase that can equate to many thousands or even millions of dollars worth of productivity.
I far prefer to write a concise, clear query instead
of trying to remember some ORM syntax
100% agree.ActiveRecord strikes a good balance here IMO. An explicit goal of ActiveRecord is to make it painless to use "raw" SQL when desired.
On non-toy projects, I think a policy of "use the ORM by default, and use raw SQL for the other N% of the time when it makes sense" is very very sane.
Faster, but at what price?
I think the performance MySQL has over sqlite comes from its multithreading more than the storage engine.
In my experience sqlite is just as fast as MyISAM for single threaded work.
That said, sqlite used 'badly' can be quite frustrating. Home Assistant, for example, is usually set up on an sd card in a raspi and then runs an sqlite database on it that it dumps massive amounts of very redundant data into as json blobs. Pretty common to have it just randomly lock up because the sd card has trouble with that frequency of writes.
The workload was simple (single node work tracking) and I didn't expect it to become a bottleneck. Unfortunately, there were some default settings in the storage backend (tiny page size or WAL or something) that caused severe thrashing and a dearth of tooling to track down the issue. After making a custom build with custom instrumentation and figuring out the problem, I found an email thread where the sqlite community was arguing about this exact issue and the default settings in question. A couple of people had forseen the exact problem I had run into and suggested a fix. Their concerns were dismissed on the grounds that the problem could be configured away, and their concerns about discoverability of configuration were ignored completely. I wasn't thrilled with the crummy defaults, but seeing that the consequences had been forseen, considered, and dismissed despite what seemed like widespread consensus on the fix being simple... it really damaged my trust. How many more landmines did SQLite have?
Lack of perf tooling + bad defaults = recipe for pain.
If you have that thread would be great to see it as well.
The problem itself didn't concern me nearly as much as "no perf tools + no perf foolproofing." That's a rough combo. If a problem this simple required this much debugging, extrapolations to problems of any complexity are terrifying. I knew that simplicity implied limitations, but this lesson taught me that simplicity could also imply danger.
I've commented elsewhere here for docs referencing this problem. It's FAQ#19 on the SQLite website.
Was your inserts based on HTTP requests or was it more of a batch process? we're they grouped in txns? Obviously user http requests would be harder to group up, but kudos if your service is handling 100 QPS writes as that's pretty rare level of scale approaching the real "need a beefy concurrent db server" type of problem statement.
SQLite may shine in edge cases where you know you can outperform a regular database server and you know why, and you could build everything either way. SQLite could be a way to e.g. decentralize state, using local instances to do local storage and compute before shipping off or coordinating elsewhere.
Otherwise, SQLite can simply be a recipe for lots of lock errors on concurrent operations. I've also never been very impressed with its performance as a general purpose replacement for postgres or MySQL.
Yes, I learned this the hard way. I understood that simplicity meant limitations, but I did not understand that simplicity meant danger until SQLite burned me.
If your perf tanks, you don't want to have to spend days putting timers all around someone else's codebase. Caveat: SQLite may be better these days -- my incident happened in 2013 -- but I spent more time tracking that one SQLite issue than I have spent spinning up postgres instances since then.
As he notes https://www.mozilla.org/ uses this pattern:
> They started using SQLite back in 2018 in a system they call Bedrock ... Their site content lives in a ~22MB SQLite database file, which is built and uploaded to S3 and then downloaded on a regular basis to each of their application servers.
I'm particularly interested in the "Sessions" extension (https://www.sqlite.org/sessionintro.html) and would love to hear if anyone has successfully used it for an eventually consistent architecture built on top of SQLite?
Unfortunately it was far too advanced for the org and no one else understood it so it was canned in favour of a connected solution under the guise of ubiquitous internet access being available. This is proving to be a poor technical decision so my solution may have some legs yet.
It works great - there are ergonomic APIs in most languages, it’s fast and reliable, and great to be able to drop into an SQL shell occasionally to work out what’s going on. A custom binary format might be slightly more optimal in some ways but using sqlite saves so much work and means a solid base we can trust.
1. PHP web development for the client of a client. They needed persistent data and MySQL was not available. Moving to a different webhost was straight up rejected. Used sqlite with Idiorm and it worked just fine.
2. As the local datastore for a cross platform mobile application. The sqlite DB was unique on each device. Libraries were available and worked well.
3. This is a large one. Several 10's of thousands of installs that query the filesystem, but filesystem access is throttled by the vendor. We're using sqlite to store the state of the filesystem as it doesn't really change that much. If the db is damaged or whatever, it can be wiped as it isn't the final source of truth.
I switched from Postgres to SQLite for a couple of versions, put mainly because Postgres wasn't "supported" I called SQLite an "internal database thing".
Worked flawlessly for about 7-8 years before both services were gobbled up into micro API services.
At the last count, we have about 14,000 services checked by uptime (about 1,000 every 5 minutes, 2,000 every 10 minutes, the rest every 15). Probably had about 60,000 tinyurls in MyTurl. We also ran the MyTurl urls through uptime every night to look for bad links. The system go hammered, often.
It took minor tweaking to get the the best performance out of the database and AOLserver has some nice caching features, which helped to take the load off the database a bit. But overall, it worked as well as the Postgres counterpart.
And now, I have to figure out why I never released the SQLite version of both.
How do you handle this? Do you store the SQLite file somewhere like s3 or just in memory?
How does this work for such high traffic sites?
When embedding natively, like in a Rust app, the performance is better than any other RDBMs because no network/serialization overhead and being able to use pointers in-process if needed.
The DevOps story also is a dream: typically it is just a single file (optionally + some more for journaling) and setup is automated away (most language libs bundle it already), plus it is widely known since smartphone SDKs and all webbrowsers include/expose it.
A subtile advantage: the supported SQL subset is so small, that "if it works in sqlite, it will also work with $RDBMS" in most cases, but not the other way around. I always use it when getting started when in need of relational data, and only had to swap it out for postgres once, but not due to technical/scaling reasons (IT policy change & stuff).
Having said that, it is mind-boggling what kind of load you can handle with a small VPS that runs a Rust microservice that embeds it's own SQLite natively... that would be an expensive cluster of your typical rails/django servers and still have worse performance.
We've been using this stuff in production for over half a decade now. Multi-user, heavily-concurrent systems too. The biggest cost savings so far has been the lack of having to screw with a separate database server per customer install (we do B2B software).
Yeah, you probably can do everything with the "simpler" stack. It might even be nominally faster in many cases. But if there's any chance you're going end up rolling your own type-validation or ORM or admin interface or GIS... Just use the battle-tested kitchen sink from the get go.
Sqlite is one of the greatest open source projects in history, with awesome docs, and really is a tribute to the art of programming. I'm happy and honored to use it for the appropriate use cases (which are a lot more than one would think).
1) Sqlite
2) Self-hosted Postgres
3) Big Boy Database, with an $$$ cost. (AWS Aurora, Oracle, etc).
Most projects never leave the Sqlite level. Only one has left the Postgres level so far.
This is where the "MongoDB is webscale" meme came from.
The truth is SQLite and a single webserver or Docker container will be fine for 95% of web applications.
People really underestimate the advantage of simplicity vs perceived power.
Use SQLite.
In my opinion the biggest thing separating Sqlite from a "full blown" database is actually Sqlite's lack of stored procedures. At all of the places where I worked with traditional databases, we used stored procedures to create an ersatz data access abstraction so that the database design could vary independently of the API presented to the application. With Sqlite I find myself (ab)using views as a poor man's stored procedure, but of course that only covers the read-only or "functional" (in the functional programming sense) portion of stored procedure code.
Everything other commenters have said about data size or centralization also applies, but for me (again, just personal opinion) I'd actually draw the line at the point where you can or cannot get by without stored procedures. From an operational standpoint that would be: at what point is it imperative to be able to vary the details of the database design while maintaining an abstraction layer (stored procedures) that allows application code to be blissfully unaware anything changed underneath it?
Examples of when that would be needed would be if new users + applications start having competing needs, or if you need to revamp your table structure to improve performance or get around a limitation. If you're in a startup or small company, it would be the point at when you find yourselves hiring a real Database Administrator (DBA) rather than giving DBA duties to developers. Prior to that organizational scale you may be better off with the simplicity of Sqlite; after reaching that level of organizational complexity you might need a "real" (server-based) database.
I also do backups periodically with ActiveJob using `.backup` on the sqlite3 client. It's simple and nice because I just have to worry about running the app, and nothing else.
We replaced SSMS + SQL Server with Python + SQLite run in AWS Lambda. The jobs fetch the database from S3, update with the latest deltas and write out the database and some CSV files to S3. The CSV files drive some Tableau dashboards through Athena.
The SQL usually needs a bit of a rework to make this work, but for the volumes of data we were looking at (we're talking less than a million rows, jobs run once per day) we've seen good performance at low cost. We used DuckDB for a couple of workloads which needed more complicated queries, it's stupid quick.
https://pve.proxmox.com/pve-docs/chapter-pmxcfs.html
https://git.proxmox.com/?p=pve-cluster.git;a=tree;f=data/src...
When it was written by our CTO over 10 years ago he tried every DB solution available, that is those that somewhat fit the picture, only sqlite survived any test thrown at them, if setup as documented it handles a pulling the power plug in any situation, at least in our experience.
It may need to be noted that the DBs are only used locally, we synchronize commits ourselves via a distributed FSM, that's mostly transforming the Extended Virtual Synchrony corosync provides to simple Virtual Synchrony.
There is a hacky solution for redundancy; at certain events, a copy of the .db file is made and rsynced to a secondary node. This will probably fall apart if the file ever goes above a few MB in size.
Pros / reasons to use it: Self-contained, just a single file to transfer, no drivers needed, no servers running other than my own application.
Cons: No good support for ALTER TABLE queries, so things like changing the name, datatype, or default value of a column isn't happening. The workaround is to create a new table and transfer rows over, then drop the old table and rename the new table. Also the aforementioned issue if you want redundancy.
So basically, if redundancy isn't a requirement for you, sqlite is fine. It's probably ideal for single user applications, like your browser or apps (iirc sqlite is used a lot for those purposes).
I can't really count how many times I've been pleasantly surprised by how extensive the feature set of SQLite is. I mean, it even has window functions (https://www.sqlite.org/windowfunctions.html). And being able to quickly open up the app's SQLite file in a database browser is also quite helpful during development.
This is a very respectable goal. I wish you great success!
Good full text search without pulling in another dependency would be quite a win. I'll add fts5 to my reading list. :)
Out of interest, what kind of compute and storage resources do you have underneath that?
It’s a linode. Shared cpu Plan.
1 CPU Core 50 GB Storage 2 GB RAM
It’s just $10 per month.
With linode even shared cpus are powerful and I’m yet to hit any overload. I’m sure it will at some point, I might upgrade then.
About 2 years ago I wrote my own blog engine so I could get up-to-speed with NodeJS: https://andrewrondeau.herokuapp.com/
I would have loved to do SQLite with some kind of "magic" backup, as SQLite is more than enough to handle a personal blog. (It certainly would make development easier!) However, at the time Heroku only offered Postgress.
This means it won't cost any money if it's not receiving any traffic, and it can scale easily by launching additional instances.
I wrote about my patter for doing this, which I call Baked Data, here: https://simonwillison.net/2021/Jul/28/baked-data/
A few examples are listed here: https://datasette.io/examples
Our product is a self-hosted IoT & hub unit solution, so we have no requirements to work with thousands of users doing who knows what. For our use case, sqlite is perfect. We don’t need to work with millions of rows, don’t need to stress the relatively low-power server units with another long lived network process, have no requirements of authentication since the user owns all the data, and can easily get insights into the database both during development and during troubleshooting at customer locations.
I’d sooner leave the project than move to anything else.
Production uses: 0 (1 if my Ph.D. thesis code is included, which had some C++ code that linked against version 2 of the SQLite library).
Glad to hear its type enforcement situation is improving.
- https://www.sqlite.org/stricttables.html
- https://news.ycombinator.com/item?id=28259104
- https://news.ycombinator.com/item?id=29363054
With a lot of help from https://www.sqlite.org/appfileformat.html
It has had some pains but it has been great rather then flat file storage.
But you probably won't see it since at the time of writing my response there are already 172 comments.
But I've run into on prod that didn't exist in dev on my MacBook M1, and I'm curious if anyone has any suggestions:
My app is basically quiet and serves requests in the dozens (super easy to run on a tiny instance), but for a few hours a day it needs to run several million database transactions, N+1 queries, etc. Because of the high number of IOPS needed, a small instance falls down and runs suuuuuper sluggishly, so I've found myself needing to shut everything down, resize the instance to something with more CPUs, memory, and IOPS ($$$), doing the big batch, then scaling down again. That whole dance is a pain.
Were I using a more traditional postgres setup, I'd probably architect this differently -- the day-to-day stuff I'd run on Cloud Run and I'd spin up a separate beefy instance just for the daily batch job rather than resizing one instance up and down over and over again. The constraint here is that I have a 50GB+ sqlite db file that basically lives on local SSD.
Any thoughts?
Edit: a several hundred million txns over a few hours math. How many per second? ~500
According to here (question 19), for old HDDs you could expect 3 orders of magnitude improvement by using bigger txns. Not sure SSD wise but worth a shot.
Even though SQLite can handle 99% of peoples use cases, WAL2 + BEGIN TRANSACTION will greatly close that last 1% gap.
b. Expensify has created a client/server database based on SQLite called https://bedrockdb.com and years ago it was scaling to 4M+ qps https://blog.expensify.com/2018/01/08/scaling-sqlite-to-4m-q...
I usually drop it because I need something that Postgres has or does better, or it's a write heavy site.
Although SQLite is not designed for this type of scenario, this discussion higlights there's a strong demand for a concurrent client/server RDMS that is simple, performant and easy to deploy. PostgreSQL is powerful and feature-rich, but not simple or easy to delploy. Hence the appeal of SQLite.
For example, could SQLite power a discussion forum of moderate (or more) activity i.e. users posting comments? The Nim language forum is powered by SQLite, but activity in the forum is fairly low. [1]
Between the simplicity of SQLite and the complex, heavyweight that is PostgreSQL, there is a wide gap between these database opposites. It's a shame there is no concurrent RDMS to fill that gap.
(Note: Another poster mentions the concurrent Firebird RDMS as a possible alternative, but I haven't used it. [2])
Does your forum accept more than 1,000 writes per second? If not, SQLite should be fine.
It's pretty well-known that concurrent writes are SQLites weak point, and that if your application requires high numbers of writes, that it's not the proper solution.
The SQLite devs even acknowledge this:
> SQLite will normally work fine as the database backend to a website. But if the website is write-intensive or is so busy that it requires multiple servers, then consider using an enterprise-class client/server database engine instead of SQLite.
> [...] client/server database systems, because they have a long-running server process at hand to coordinate access, can usually handle far more write concurrency than SQLite ever will.
The use case is storing trace/profiling data, where we use one sqlite file for each customer per day. This way its easy to implement retention based cleanup and also there is little contention in write locking. We store about 1 terrabyte of data over the course of 2 weeks this way.
Metadata is stored in Elasticsearch for querying the search results and then displaying a trace hits the Sqlite database. As looking at traces is a somewhat rare occurence we iterate over all fileservers and query them for trace data given an ID until we find the result.
Reference https://www.sqlite.org/fasterthanfs.html
Then, with a little traffic, things continued to go well in production. But as traffic scaled up (to 1-5 QPS, roughly 25% writes), they fell apart. Hard. Because my production environment was spinning rust, IO contention was a real issue and totally absent from development. This manifested as frequent database timeouts, both from reads and writes.
Echoing another commenter's sentiment: things would have gone much more smoothly from the beginning had I started with PostgreSQL, but after having written many thousands of lines of direct SQL taking intimate advantage of SQLite's surprisingly rich featureset, migrating was less than totally appealing.
The mitigation strategy, which ultimately worked out, was to implement backpressure for writes to SQLite: queuing and serializing all writes to each database in the application, failing loudly and conspicuously in the case of errors (thus forcing the client to retry), and gracefully handling the rare deadlock by crashing the process completely with a watchdog timer.
Would an SSD in production have solved the timeouts by increasing your write throughput?
This is the project: https://github.com/a-chris/faenz
I use SQLAlchemy and write applications where by just swapping out the database URI I can use either SQLite or Postgres. SQLite is nice for local development and easy testing, (you can even run tests using :memory: to accelerate CI/CD) and then I use hosted Postgres in prod. That said, based on what I have seen I would not be at all afraid to use SQLite in prod for internal tools etc.
HTTPS: https://vatcomply.com/ Github: https://github.com/madisvain/vatcomply
I have of course pickle -> open()
But eventually my projects grow large enough that sqlite3 becomes the database. I have never needed to go beyond sqlite3 in my projects. It does everything I ever need it to do.
For sake of argument, let's say I have a fixed schema/format that will never change and I never need to aggregate queries across multiple customer accounts. Also, let's say writes to a single database are never going to be more than a hundred concurrent users. Why shouldn't I store each tenant's data in its own SQLite database? It makes it very easy for local client apps to download their data all at once. Backups are incredibly easy. Peer-to-peer synchronization behaves like a git repository merge. Why shouldn't I do this?
When would this happen for any non-trivial multi-tenant service? The difficultly of performing migrations sounds like it would pretty quickly negate any simplicity gained.
Even if there are migrations, it's treated similar to a file conversion (like upgrading an older excel file format to a new format on demand when the file is accessed).
[1] Maybe something similar to https://www.notion.so/blog/data-model-behind-notion or an EAV model imbedded in a JSON1 column.
However, ff you want interaction across users (messaging, user-roles etc.) then you might want to have them in one database.
Client (incident response dept at megacorp) had a problem: their quarterly exercises of switching network storage devices from live servers to disaster recovery (DR) servers was a manual operation of reconciling about 8 Excel spreadsheets and setting up ACLs before (luckily) an automated process would switch the storage mounts from live to DR.
We modeled and matched up all the hosts, servers, and ACLs and did a daily write to a single SQLite database. (We redundantly sent all the data to Splunk.) Now the DR employees are automating a daily diff of servers, hosts, ACLs etc to further automate the switch.
To echo a bunch of comments here, we decided on SQLite for a few reasons:
- only one user would write to the DB - only a few users need to access the data - besides standard retention policies, the data could be considered ephemeral and easily recompiled - the script we wrote to compile the data runs in 5 minutes, so if we lose the db, we can easily recompile it.
SQLite (and SQLalchemy) is useful for inexpensive data.
These sound like the kind of hidden costs that could turn sqlite's simplicity quite complicated if you don't see them coming.
Aside from some surprises regarding packaging it together with the rust crate and inability to rename columns, I'm really happy with it. Easier than deploying postgresql, more useful than documents.
It is so convenient to have it as a file, especially when you are just learning to do software development.
And the performance has not been an issue once.
One thing to note is that my site is not Facebook size. It only gets ~40 page views a day. And most of them are just for viewing, so no database opertaions.
So, I'm not going to be the most credible voice here. FWIW, I know that Pieter Levels, who runs multiple projects like nomadlist, remoteok, rebase, uses both SQLite and plain JSON files for storage.
However, I would consider how important RDMS features are to you which are not available in SQLite:
- less sophisticated type and constraint system.
- a severely limited ALTER TABLE.
- No stored procedures.
- limited selection of math and statistical functions.
- no permission and user model, not to mention row-level security.
To be clear, I don't think it's bad the SQLIte doesn't try to be an RDMS, but I would consider this perspective when making a decision, not performance which is great, and difficult to max out.
It's also really easy to add new custom SQL functions to a SQLite connection, which means the missing math functions shouldn't be a limitation. Here's an extension for example: https://github.com/nalgeon/sqlite-stats
Under the hood, SQLite treats json as strings, you have to do some strange stuff with extract and computed fields indexing to index into it, which can be a bit fragile.
My use case these days is hybrid rdbms / nosql, where most of my tables have defined columns for frequently queried data, and a jsonb "data" field for everything else.
Postgres has impressive jsonb capabilities, and with 14 the index operators making querying it a dream.
I love SQLite, but postgres' jsonb handing makes the additional operational overhead worth it to me.
Works perfectly well. Mind you, I would use Postgresql if the site were important, just to be on the safe side.
[1] https://www.gaia-gis.it/fossil/libspatialite/index
[2] http://web.archive.org/web/20190224100905/https://core.ac.uk...
An important realization is that not everything needs to scale, and that it depends on how you access the DB and what your product looks like. For a load with many concurrent writes I'd be careful with sqlite, or when I know that I'll want my DB to mostly live in memory (e.g. operations will often process the whole, huge dataset and no index can help with that). But even if I thought "Uh, I'll probably need a full DB", I'd still benchmark my application with both sqlite and e.g. postgres. And if the API to access the DB uses some nice abstractions, swapping the flavor of SQL isn't a huge issue anyway.
//edit: Plus, I've done stupid stuff like "my SPA hammers the PHP API with 20 to 40 requests, each resulting in a simple SQLite query, just to render a checklist" and got away with it: a) because we had at most 20 concurrent users [realistically: 1 to 5] b) doing the checklist took half a workday (ticking off an item was done via a JS callback in the background, so the actual rendering happened only once) and c) SQLite performs great for read heavy loads. The site performed so well (page loads felt about as fast as HN, even when connected via VPN) that I even scraped the plan to locally cache checklist in the HTML5 localStore (bonus: no cache = no cache incoherence to care about).
This [0] is a good article with some benchmarks, misconceptions about speed, and limitations.
That seems like a pretty big flaw as your data grows. Zero downtime migrations are really nice. Anyone here got a war story / experience with this one?
I've migrated database versions 35 times over the last 13 years, ie, every individual HB database has been migrated 35 times. You don't always need to make a new table, do a copy, and switch over. In the latest migration, I added new columns, initialized them, dropped columns, etc. without doing a db copy.
For this migration I wanted to switch to strict tables, where typing is strict. I could have done this by just altering the schema (it's just a bunch of text in the SQLite db) and then using SQL to make sure existing columns had the right data (using CAST). But instead, I created a general framework to allow me to migrate data from one schema to another, mainly so I could reorder columns if I wanted. That can't be done with ALTER statements, so I did end up doing a complete copy, but I've done many migrations without a copy.
I found this paper interesting on "zero downtime migrations".
https://postgres.ai/blog/20210923-zero-downtime-postgres-sch...
After reading it, the bottom line is that changes happen within transactions (true for SQLite too), and the key to zero downtime migrations is to use short transactions, use timeouts, and use retries on all database operations, including the migration commands. You can do all these with SQLite.
You should understand whichever RDBMS you use, and how to get the best performance out of it. Previously I used Postgres extensively, and it worked fine, and before that I managed MySQL servers. They are all fine, but SQLite is as simple as it gets, and more than adequate for most workloads.
Pros:
- A single API server, no separate database to worry about, configure, and update.
- Backups are as simple as backing up one file every so often. SQLite even has an API to do this from a live connection.
- Handles way more concurrent users than we’ve ever needed.
- Dev and test environments are trivial and fast.
- Plenty of tools for inspecting and analysing the data.
Cons:
- There are certainly use cases it won’t scale to, or at least not without a bunch of work, but in my experience those are less than 1% of projects. YMMV.
- The type system (even with the newish stricter option) has nothing on Postgres. I realise this is basically a non-goal but I’d seriously love to somehow combine the two and get PG’s typing in a fast, single file embedded DB library.
- Postgres JSON support is also better/nicer IMO.
It's also powering another one and I really like the fact that I can just commit the whole DB to the GIT repo.
I've never (deliberately) considered committing a DB to git. Although there was that one time when I was straight out of college...
Pro tip: surprising your colleagues in the morning with a 40 minute wait to pull master (because you committed a ???GB db) is a good way to feel like a right eegit.
I also built a Google Go library wrapping the sql amalgamation file and then cross compiled it for Android and iOS but with some more SQLite extension (GIS), which the stock Android/iOS SQLite did not have. This was some time in 2017 I guess.
I am a big fan of SQLite. You can integrate it in all kinds of stuff and adapt it to your needs. Compiling it is also straightforward.
Except for some rare exceptions, it's been doing pretty great. I don't have any plans to migrate from SQLite any time soon.
Also they're where the real insights are.
By default, the connection pool in Rails contains 5 connections, and they time out within 5 seconds.
..but like all things, it depends on your needs. Some have already pointed out the pages on SQLite's on site regarding # of writers (the main issue), etc.
I love it - very robust, lots of documentation, StackOverflow answers, example queries, etc.
On a typical day, I get less than 50 users per day globally, so I don't really have to worry much about concurrency or other issues that SQLite struggles with. I'd wager that many web applications are perfectly well served by it.
My manager currently runs a number of personal sites with a SQLite backend and they all seem very performant so I have been honestly considering giving it a second look.
Everything is good so far, though most of my traffic is bots probing for wordpress flaws.
Recently I stumbled over a potential fix[1], which I will try in my next project.
[1] https://ja.nsommer.dk/articles/thread-safe-async-sqlite3-ent...
One to look out/test for early if I go in this direction though. Thanks for the heads up!
I think so. sqlite is pretty reliable in my opinion and I never hat these issues anywhere else, but this bug made me switch my DMBS for a small side project from sqlite to postgres.
> One to look out/test for early if I go in this direction though.
You're welcome!
It's probably elsewhere but I don't realize it.
I’ve been using (locally) a Redis container for a very early prototype because it seems to be simple enough to use.
I know you can query json strings in salute but that’s not quite the same thing. For one redis offers some geo features.
No issues at all.
The longer you can scale the product without having to scale the application, the better!
When I looked around, Dropbox used it too; and so did Bittorrent Sync (Now Resilio)
> Given the complexity
Which complexity? It is the simplest possible widespread, reliable and effective solution. Which makes it a primary choice.
> it seems like there are use cases or needs here that I'm not seeing
On the contrary, the use cases for the traditional Relational DB engines are defined: when you need a concurrency manager better than filesystem access. (Or maybe some unimplemented SQL function; or special features.) Otherwise, SQLite would be the natural primary candidate, given the above.
Edit:
I concur about https://blog.wesleyac.com/posts/consider-sqlite being a close to essential read if one has the poster's doubt.
To its "So, what's the catch?" section, I would add: SQLite does not implement the whole of SQL (things that come to mind on the spot are variables; the possibility of recursion was implemented only recently, etc).
Noting that "recently" was August 2014 (version 3.8.6), according to https://sqlite.org/oldnews.html.
The "missing" right/full join types just hit trunk this past week and are still being debugged: https://sqlite.org/src/info/f766dff012af0ea3
Right. This as an issue has little to do with deployment the way the submitter intended, and has to do with software that was implemented using specific static versions of SQLite and that some sometimes use as no better alternative meanwhile emerged. The "delay" is more evident (more "daily present") to said users, and has little to do with new products.
Nonetheless, given that, I would first check which subset of SQL you may need - I am not sure on how much overlapping you have with other mainstream products.
The only issue is that you'll need to take special care when backing up the DB file (but this is probably the same for most DBs even today.)
https://docs.google.com/presentation/d/1Q8lQgCaODlecHa2hS-Oe...
I.e. it's leaning more toward the moderate, but reliable writes, and heavy read use cases?
Please let me know ~if I'm missing anything~ what use cases I'm missing.
https://github.com/rqlite/rqlite/blob/master/DOC/FAQ.md#what...
SQLite proved to be phenomenal. We spec'ed hardware with enough RAM to hold the FR DB in memory, and damn SQLite is fast enough to keep up with the optimized FR system performing 24M face compares per second. With a 700M face training set, SQLite also proved instrumental in reducing the training time significantly. These daze, if given the opportunity to choose a DB I always choose SQLite. I use SQLite for my personal projects, and I go out of my way to not use MySQL because SQLite is so much faster.
Out of interest, were you running on bare metal/cloud? And what kind of CPU was behind those 24M face compares per second?
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK