49

Ask HN: Have you used SQLite as a primary database?

 2 years ago
source link: https://news.ycombinator.com/item?id=31152490
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Ask HN: Have you used SQLite as a primary database?

Ask HN: Have you used SQLite as a primary database? 386 points by barryhennessy 9 hours ago | hide | past | favorite | 264 comments I periodically hear about projects that use/have used sqlite as their sole datastore. The theory seems to be is that you can test out an idea with fewer dependencies (and cost) and that it scales surprisingly far.

There are even distributed versions being built for reliability in the cloud: dqlite by canonical (of Ubuntu fame) and rqlite

Given the complexity it seems like there are use cases or needs here that I'm not seeing and I'd be very interested to know more from those who've tried.

Have you tried this? Did it go well? Or blow up? Were there big surprises along the way?

- https://sqlite.org - https://dqlite.io - https://github.com/rqlite/rqlite

Here's an all-time great post about why you might consider SQLite in production with data about performance: https://blog.wesleyac.com/posts/consider-sqlite

I use SQLite in production for my SaaS[1]. It's really great — saves me money, required basically no setup/configuration/management, and has had no scaling issues whatsoever with a few million hits a month. SQLite is really blazing fast for typical SaaS workloads. And will be easy to scale by vertically scaling the vm it's hosted on.

Litestream was the final piece of the missing puzzle that helped me use it in production — continuous backups for SQLite like other database servers have: https://litestream.io/ With Litestream, I pay literally $0 to back up customer data and have confidence nothing will be lost. And it took like 5 minutes to set up.

I'm so on-board the SQLite train you guys.

[1] https://extensionpay.com — Lets developers take payments in their browser extensions.

s.gif
What is the point of using SQLite under a web service?

I thought people complained how MySQL sucks and PostgreSQL rocks for being right and SQLite was nowhere near being right or performant. (Things seem to be getting better with strict column types these days.)

I've recently migrated a smallish service from MySQL to PostgreSQL and figured it's quite a work if you're not careful writing by the SQL standard which means if the service had gotten bigger, your chance of moving away from SQLite kind of walks away.

So, why not use a safer choice to begin with? Nothing is complicated running MySQL/PostgreSQL unless you've sold yourself to AWS to care for the cost and don't know how to run a DB instance yourself.

s.gif
SQLite is far faster than Postgres or MySQL, however, the price you pay for this is having a single writer thread, and it's a library incorporated into your process, not a shared DB. It's faster because those other features of a server have a cost as well, particularly the cost of write arbitration.

SQLite is fine when all your load can be served by a single backend process on a single machine. The moment you need multiple backends to handle more load, or the moment you need high availability, you can't do it with SQLite. SQLite has very limited DDL operations, so you also can't evolve your schema over time without downtime. Now, for streaming backups - how do you come back from node failure? You're going to incur downtime downloading your DB.

I run many SQL backed production services, and my sweet spot has become a big honking Postgres instance in RDS in AWS, with WAL streaming based replicas for instant failover in case of outage, and read replicas, also via WAL. This system has been up for four years at this point, with no downtime.

I love SQLite, and use it regularly, just not for production web services.

s.gif
> So, why not use a safer choice to begin with?

I know several people who build projects like that. It take them months to get a working product, just to discover it doesn't interest people or doesn't work like they expected. If for every piece of tooling you go for the "safe" and most performant one you gain bloat and complexity real quick.

People underestimate "simple" tech performance, in 99% of projects by the time your bottleneck is your DB system I can assure you that it'll be the least of your concerns

s.gif
Which part of running MySQL instead SQLite is over engineering?
s.gif
- Setting up a MySQL server on both your dev machine and server, and making sure they're the same version (extra fun if they're on different OS versions)

- Setting up an out-of-repo config file on the server with your MySQL credentials

- Setting up a backup script for your server data

It's only about an hour of work total, but it's an hour of work that I hate doing.

s.gif
Hum... MySQL is one of those labor-generating technologies that a Luddite would fall in love for. Those are best avoided. But for Postgres...

- You make sure the production version is larger or equal than the development, or you make sure to not use new features before they reach production, what is quite easy. There is no problem with different OSes (except for Windows itself not being very reliable, but I imagine you are not using Windows on production, as it's another one of those labor-generating techs).

- Trusting a local user is the same level of security you get with SQLite, no credentials required.

- And setting a backup script... Wait, you don't do that for SQLite? There's something missing here.

Yes, there are a lot of small tasks that add up when setting some new software. It's a pain. But it's a pain you suffer once, and it's over. It's worth optimizing, but not at any ongoing cost.

s.gif
docker-compose is the way to go for keeping the dev versions synced with the production version. And for the backup script scheduled mysqldump and copy to storage should see you through quite far, so not really any more effort than copying an SQLite database.
s.gif
This person is talking about reducing complexity, so I don't think adding more moving pieces to the machine is the way to go.
s.gif
and now you need to learn docker and 10 other things.
s.gif
Please do explain what risk you're thinking of, as anyone smart enough to write their own SaaS would not put resources in the web server's file system tree? You stick your db file in an normal secure location outside the server's root, chmodded appropriately so that it suffers the exact same risks as any other file on the OS. It's no more or less risky than /etc/shadow, while being considerably easier to work with (and less failure-prone for light db work) than an independently running database service.
s.gif
The risk is as I had written previously that it takes some effort to move away from a db to another when the need arises when I see no benefit in choosing SQLite in the beginning.

I'm not a professional db engineer but one point is that there doesn't seem to be a way to create functions in SQLite which would mean creating triggers on various tables can cause excessive amount of duplicate code.

If I rely on PostgreSQL, I feel covered for my use case for web apps but once you hit some little gotchas in SQLite, you may regret about saving 10 minutes (install db and set up a password) for nothing.

s.gif
That's not a risk, that's just an inefficiency further down the line (migrating data from sqlite to a "real" database can indeed be quite a chore, but far less so if you formalized your schema and constraints beforehand, so that a migration mostly involves exporting to SQL, rewriting where needed, and then importing into whatever dbms you end up using later on in the lifetime of your project).

When we're talking about risks, think security exploits: how is sqlite3 more likely to get your data leaked, or flat out copied in its entirety, compared to using a mysql/postgres/etc.

s.gif
your app has full access to the SQLite file. MySQL/PostgreSQL have users and permissions. Security is about layers, and SQLite is removing one layer of security. You can, for example, put DELETEs or access to certain tables on a separate user that your web app has no access to. With SQLite, if your app gets hacked then they can do anything with the whole DB they want to. In addition, with a separate DB process you get audit logs. If someone hacks your SQLite app they may have access for months before you realize it, if you ever do. Especially if they are doing something subtle like UPDATEs on specific tables/fields that may go unnoticed but provide the hacker some benefit. This is why you can't simply rely on the idea of using a backup. That's only going to help if the hacker totally trashes your DB.

With a separate DB you may have a hope of detecting when someone hacked your app. But without that firewall, the question becomes: how much of the data in my SQLite can now be trusted? If you don't know what backup is safe to restore, then you can't trust any of it.

Again, this is about layers. Not saying MySQL/Postgres will save you. But they can increase the odds.

s.gif
If your server or API can be exploited, it doesn't matter whether there's an auth layer in between. Your SQL server runs as a service to connect to, your sqlite3 file is a file that you need access to. They're the same kind of layer: you need to break through the server's security to ever get to them directly, and if your app gets hacked such that the hackers gain file system access, then:

1. You're fucked. The end. It doesn't matter whether you were using mysql, postgres, or sqlite3, or S3, or Redis, or any other server your app was connecting to: they can just look at your environment vars.

That's not going to happen "because you're using Sqlite3", that's going to happen because you used some obscure server software, or worse, rolled your own.

People really do seem to put too much faith into "it has a username and password, it's more secure". It's not: if someone has access to your actual server, they have access to everything your server has access to. Sqlite3 is no more or less secure than a dbms daemon (or remote) in that sense.

s.gif
Pieter Levels has been using SQLite for nomadlist and I think it's been going well for him.
s.gif
Running MySQL/Postgres over SQLite:

- needs to be provisioned and configured

- needs additional tooling and operational overhead

- comes with a _large_ performance overhead that is only won back if you have quite a significant load - especially writes, which means the vast majority of web projects are slower and require more resources than they should.

- it makes the whole system more complex by definition

It is a cost-benefit thing that tilts towards RDBMS as soon as you need to sustain very high transactional loads and want a managed, individually accessible server that you can query and interact with while it's running in production.

But if it is just "a website that needs durability" then you haven't yet shown how that tradeoff is worth it.

s.gif
Just wanted to add another’s scenario where postgresql has been useful to me. Functions. There are cases where you have expensive operation(s) that reference a lot of persistent data. Even without massive traffic these operations can be prohibitively expensive in the middleware. Leveraging database functions can be a massive performance improvement (100x + for me), especially if your middleware is slow (e.g. rails).

I’ve used SQLite in production once and it worked great. But that was a very simple app. For more complex (but not always higher traffic) I’m leaning more and more on postgresql and less on my middleware, like moving business logic to the database when it makes sense.

s.gif
If you need the expressiveness and power of Postgres then sure, it has also way better JSON support for example, there is generally better tooling for it as well and so on. But in this case, your database becomes it's own _thing_, has much more value outside of being just durability for an application. Like for example Supabase is doing things. That's a very fundamental design decision IMO. I explored this and it is very attractive and robust, but serves different use-cases.
s.gif
> What is the point of using SQLite under a web service? … people complained how MySQL sucks and PostgreSQL rocks for being right and SQLite was nowhere near being right or performant.

My understanding is that for read performance SQLite is pretty damn good, outperforming MySQL and Postgres in both single and concurrent tests. The key performance issue is the single global write lock. If your data access pattern is massively read biased then SQLite is a good choice performance wise, if you see a lot of write activity then it really isn't.

With regard to being correct*, it offers proper ACID transactions and so on. Typing is a big issue for some but far from all. It is significantly more correct than mysql used to be back before InnoDB became the default table type in ~2010, at least as correct as it now (aside from the data types matter depending on which side of that you sit on).

s.gif
> What is the point of using SQLite under a web service?

Take a look at the Consider SQLite post I linked. They address your performance questions too.

For me, SQLite was a nice way to simplify launching and running my SaaS business and has had no downsides.

s.gif
Any tips you can share on how to deploy your saas app without downtime when using sqlite?
s.gif
For my personal project I'm planning to use a VM with an SSD. Manually I'll use caddy to switch over to a new running backend service with readiness check.

As for scaling if I need it I can increase disk space for the app server or scale out horizontally/vertically. Don't need that yet so I'm waiting for more details in the future to decide how to handle that.

s.gif
Which backend “owns” the sqlite database?

That’s what I haven’t seen mentioned anywhere, if the database is part of the application, how do you switch from one version to another without downtime.

s.gif
I would expect the performance of SQLite for queries against an index to outperform MySQL and PostgreSQL in all cases - because SQLite eliminates the need for network overhead by essentially executing those queries directly as a C function call.

So no matter how optimized MySQL and PostgreSQL are, SQLite will run rings around them for basic SELECT queries.

s.gif
Running against a unix socket doesn't seem to make that a selling point for SQLite.
s.gif
> I thought people complained how MySQL sucks and PostgreSQL rocks for being right and SQLite was nowhere near being right or performant

While "people" complaining is basically meaningless, I don't know why they'd be doing that about SQLite. It's used in most phones, all copies of Windows 10+, and countless other places.

s.gif
Chrome browser has loads of them in your profile directory.
s.gif
What about zero downtime deploy of new version of your application? You have to take it down to restart, right?
s.gif
> With Litestream, I pay literally $0 to back up customer data and have confidence nothing will be lost.

This is not a guarantee Litestream makes (nor it can, since replication is async).

You'll lose things to catastrophic failures, but chances are you'd be able to restore to a last known good checkpoint.

s.gif
This was so inspiring to read. It's a very balanced take about the pros and cons of using SQLite vs. Postgres at scale.

I say "inspiring" because using SQLite reminds me of the simplicity and productivity from coding for the "early web" that lost 10-15 years ago. The days when you could spin up a website without worrying about a bunch of ancillary services and focus on the app itself.

For me, SQLite's lack of online schema changes seems like perhaps the biggest blocker to actual production. I've never had a production project where the schema didn't change a lot.

s.gif
I have this beef too. Tooling for dumping and restoring into a new schema are easy/simple/fast. So, these schema migrations can happen w/o issue. Some tricks with the PRAGMA directive in SQLite so you can roll out changes (eg: code supports old/new schema while migrating)
s.gif
The Consider SQLite post mentions that one of SQLite’s in the past decade as “WAL mode (enabling concurrent reads and writes)”. Does this mean that the official advice to avoid SQLite for concurrent writes [1] is no longer a big concern?

[1]: https://www.sqlite.org/whentouse.html

s.gif
Take this with a huge grain of salt because I am by no means an expert, but I currently am working on some scripts that import a few million rows into SQLite. I am using bash and the sqlite command line. I was getting a lot of concurrent write errors from sqlite (bear in mind i am only doing inserts of separate rows so in theory there is never an actual conflict), so I tried using WAL mode. It actually resulted in more contention. I ended up just going back to non-WAL mode and implementing an exponential backoff in bash to retry writes.
s.gif
are you ingesting the data under 1 transaction? This is a common SQLite issue as writes aren't slow but transactions are. By default 1 write = 1 txn but you can put millions of writes into 1 txn and get many orders of magnitude speedup
s.gif
Thanks for this comment! I had tried some bulk imports into SQLite and mostly gave up after hitting similar limits, thinking I was doing something wrong with the configuration.
s.gif
I think the way it's worded in that SQLite documentation page is still accurate:

> SQLite supports an unlimited number of simultaneous readers, but it will only allow one writer at any instant in time. For many situations, this is not a problem. Writers queue up. Each application does its database work quickly and moves on, and no lock lasts for more than a few dozen milliseconds. But there are some applications that require more concurrency, and those applications may need to seek a different solution.

If your application needs to support hundreds of concurrent writes a second you shouldn't use SQLite. But that's a pretty high bar!

s.gif
> hits a month

Is not a very useful performance metric. What is your peak hits per second?

s.gif
I love watching people use "x per month" as some sort of architecture selection argument, especially when these arguments conclude in a proud justification of cloud sprawl.

There are single node/CPU solutions that can process 10-100 million business events per second. I am almost certain that no one logged into HN right now has a realistic business case on their plate that would ever come close to exceeding that capacity.

E.g.: https://lmax-exchange.github.io/disruptor/disruptor.html

This stuff isn't that hard to do either. It's just different.

s.gif
Litestream is indeed a missing piece of the puzzle. But it also defeats some of the purpose of using an embedded database library in the first place. Now you're back to juggling separate processes once again.
s.gif
Good point! Although practically speaking I don't mind at all. "Juggling" is too strong a word — it's literally just starting the Litestream process and never thinking about it again. It's nice that it just slides into my existing app without any code changes.
s.gif
I had never heard of verneuil. Thanks for sharing. For anyone curious about the differences between the two:

"This effort is incomparable with litestream: Verneuil is meant for asynchronous read replication, with streaming backups as a nice side effect. The replication approach is thus completely different. In particular, while litestream only works with SQLite databases in WAL mode, Verneuil only supports rollback journaling"

s.gif
If the application is in go, you can likely embed litestream.
s.gif
how do you handle things like encryption and access permissions?

The only thing I have against using SQLite in production (for my needs) is the lack of at rest encryption and row level permissions by user.

s.gif
At rest encryption is a complicated subject (in general, I bet most people get negative net security from it). For SQLite, you can either encrypt your disk or get one of the versions with added encryption (I only know of proprietary ones).

You don't do row level permissions on your database. You keep it all on the application layer.

One of my previous employers was using SQLite as a large distributed database - they had their own custom sharding strategy, but essentially the idea was to shard A accounts * B tables * C num_of_days with a .db file for every shard.

When I first came and saw it, it...did not sound right. But I didn't want to be the guy who comes in and says "you are doing it wrong" month 1. So I went along with it.

Of course, eventually problems started to pop up. I distinctly remember that the ingestion (happening via a lot of Kafka consumers) throughput was high enough that SQLite started to crumble and even saw WAL overruns, data loss etc. Fortunately, it wasn't "real" production yet.

I suggested we move to Postgres and was eventually able to convince everyone from engineers to leadership. We moved to a custom sharded Postgres (9.6 at the time). This was in 2016. I spoke to people at the place last month, and it's still humming along nicely.

This isn't to illustrate anything bad about SQLite, to be clear! I like it for what it does. Just to show at least 1 use case where it was a bad fit.

SQLite was a tempting first answer, but what solved it was Postgres, and we eventually offloaded a lot of aggregation tables to Clickhouse and turned the whole thing into a warehouse where the events got logged.

s.gif
That's a good counter-case to keep in mind, thank you.

I guess the take away here is that this underscores that sqlite isn't for the 'large number of writers' scenario.

p.s. > I didn't want to be the guy who comes in and says "you are doing it wrong" month 1 Very wise

s.gif
That's for sure a good point. SQLite has really well defined limitations, and most if not all of those are by design. You should consider them BEFORE starting a new project with it, but it it is a fit, it's a great experience.
s.gif
Data loss is a pretty serious problem.

Do you have any more information about the situation?

Could it have been in the hand spun partitioning logic instead of SQLite?

What was the ingestion throughout roughly?

s.gif
Unfortunately I don't have numbers on hand. We approximated our Postgres would ingest around 1 TB over the course of a year, later I think? I could be wildly wrong.

It's been more than 5 years but from what I remember, it definitely was _not_ the partitioning logic (the sharding just meant we had a huge amount of files that were hard to organise). But a single consumer doing heavy writes on a single SQLite file would see enough traffic that pretty soon you would start to see errors and your writes would start to break.

s.gif
I also experience that 1 bad use case: heavy writes. I've always used SQLite first for any project, and when I did an in-house analytics tool for tracking user-initiated events (visits, button clicks, hovers, etc) I thought SQLite could handle it well. Unfortunately, even with heavy tuning, we saw WAL overruns and missing data.

Sadly, we had to move to Postgres and eat all that scaling complexity. :(

One thing that really excites me is concurrent writes -- I was poking around the project, and I've seen drh has been working on this for a bit now. [1] [2]

I believe the high level approach he's taking is essentially: 1. Concurrently execute the multiple write transactions in parallel. 2. Sequentially write the changed pages to the WAL. *[3] If a previous transaction causes the next to compute differently (conflict), then rerun that next transaction & then write.

The way to detect if were conflicts is essentially:

1. Keep track of all the b-tree pages accessed before running the transaction 2. Check the WAL if any previously transaction modified one of those b-trees. If so, this means we have to rerun our transaction.

I've seen it done in software transactional memory (STM) systems as well. It's really beautifully simple, but I think there are a lot of devils in the details.

[1] https://github.com/sqlite/sqlite/blob/9077e4652fd0691f45463e...

[2] https://github.com/sqlite/sqlite/compare/begin-concurrent

[3] * Write to the WAL, so that parallel transactions see a static snapshot of the world.

The sqlite docs page has a nice article [1] on when to use an embedded database such as sqlite and when to go with a client/server model (postgres, mysql or others)

When not to use sqlite:

- Is the data separated from the application by a network?

- Many concurrent writers?

- Data size > 280 TB

For device-local storage with low writer concurrency and less than a terabyte of content, SQLite is almost always better.

[1] https://www.sqlite.org/whentouse.html

s.gif
What is the recommendation for offline capability with sync?
s.gif
we considered PouchDB[1] (client) and CouchDB[2] (server) for an PWA back then (2017). nowadays i would probably favor WatermelonDB[3].

[1]: https://pouchdb.com/

[2]: https://couchdb.apache.org/

[3]: https://nozbe.github.io/WatermelonDB/

s.gif
> For device-local storage with low writer concurrency and less than a terabyte of content, SQLite is almost always better.

Isn't MySQL MyISAM faster and this way constitute a better choice for a scientific number crunching application? I mean near 4GB DB, very simple schema, heavy reading load, little/no inserts and no updates.

s.gif
With 4 Gb you might as well just load the data into RAM.
s.gif
But then you have to implement all the SELECT and DML logic yourself. SQL makes this a breeze with JOIN, ON UPDATE CASCADE, etc. And being SQL, it is very easy to maintain, even by the PFY that replaces you.
s.gif
SQLite (also H2 and some other embedded SQL databases) can be used entirely in-memory, one can also drop an SQLite file on a RAM-hosted filesystem (tmpfs/ramdrive). You really can put everything into RAM (and still enjoy SQL) if you have enough, don't mind long cold-load and potential data loss.
s.gif
It looked to me that the GP was suggesting to keep the data in the application instead of in a DB. But yeah, I suppose he might have meant a HEAP table instead of MyISAM.
s.gif
I'd argue adding SQL into the mix makes it difficult to maintain, mixed-language codebases are almost by definition complex, and you get significant chafing when mixing a declarative language like SQL and OOP.

Since this is a no-update and no live-insert scenario we're talking about, it's fairly easy to produce code that is an order of magnitude faster than a DBMS, since they're not only primarily optimized for efficiently reading off disk (an in-memory hash table beats a B-tree every day of the week), they've got really unfortunate CPU cache characteristics, and additionally need to acquire read locks.

s.gif
Maybe this is a failure of imagination on my part, but won't most people be using ORMs? Again, talking about the use case of the average application that's light enough to get away with SQLite, it doesn't seem like you would need to be hand writing queries.
s.gif
ORMs integrate poorly in many languages, and perform strictly worse than hand-written SQL.

If you're just using the database for object persistence, which is common, it doesn't matter all too much. But that's not really the scenario we're discussing here, since the data is by the original problem statement, immutable.

s.gif
In my experience ORMs add a layer of complexity, instead of removing one. It's nice to e.g. have a "Pythonic" interface in Python, but when working close to the data I far prefer to write a concise, clear query instead of trying to remember some ORM syntax or what they're calling VARCHARS in this particular ORM, or how they're representing JOINS, or if the condition will be on the ON clause or the WHERE clause, or how they're representing GROUP BY, etc etc.
s.gif
Wrote code for many years sans ORMs.

Two features I enjoy in ActiveRecord and other ORMs, and why I would consider them a good standard practice for most things that aren't "toy" projects.

1. Easy chainability. In ActiveRecord you can have scopes like `User#older_than_65` and `User.lives_in_utah` and easily chain them: `User.older_than_65.lives_in_utah` which is occasionally very useful and way more sane than dynamically building up SQL queries "by hand."

2. Standardization. Maintenance and ongoing development (usually the biggest part of the software lifecycle) tend to get absolutely insane when you have N different coders doing things N different ways. I don't love everything ActiveRecord does, but it's generally quite sane and you can drop a new coder into a standard Rails project and they can understand it quickly. On a large team/codebase that can equate to many thousands or even millions of dollars worth of productivity.

    I far prefer to write a concise, clear query instead 
    of trying to remember some ORM syntax
100% agree.

ActiveRecord strikes a good balance here IMO. An explicit goal of ActiveRecord is to make it painless to use "raw" SQL when desired.

On non-toy projects, I think a policy of "use the ORM by default, and use raw SQL for the other N% of the time when it makes sense" is very very sane.

s.gif
DuckDB is the OLAP equivalent of SQLite, as far as I know.
s.gif
MyIsam is not crashsafe. Anytime your server crashes the MyIsam database may get corrupted.

Faster, but at what price?

s.gif
> Isn't MySQL MyISAM faster

I think the performance MySQL has over sqlite comes from its multithreading more than the storage engine.

In my experience sqlite is just as fast as MyISAM for single threaded work.

s.gif
I dunno, just felt like conventional (since long ago) knowledge that MyISAM is the fastest of all SQL DBs in simplistic non-RAM scenarios. I'm not sure this is true so I ask.
s.gif
The engine might be faster (I'm not sure) but SQLite has the advantage that it doesn't have to connect over a socket. Instead you load the SQLite library into your code and your application directly manipulates the database files. That's potentially a lot faster.
I wish more "self-hosted" open source projects would support sqlite out of the box. It's honestly a little ridiculous to, for example, stand up postgres for a 1 person blog on a personal domain. Or even a 10 person mastadon or pleroma instance or whatever.

That said, sqlite used 'badly' can be quite frustrating. Home Assistant, for example, is usually set up on an sd card in a raspi and then runs an sqlite database on it that it dumps massive amounts of very redundant data into as json blobs. Pretty common to have it just randomly lock up because the sd card has trouble with that frequency of writes.

I'm using SQLite for my blogging engine (https://goblog.app). Easy backups, easy testing, light resource usage and probably faster than a separate database.
It blew up big time. I would have saved myself lots of trouble if I had just gone with postgres from the getgo.

The workload was simple (single node work tracking) and I didn't expect it to become a bottleneck. Unfortunately, there were some default settings in the storage backend (tiny page size or WAL or something) that caused severe thrashing and a dearth of tooling to track down the issue. After making a custom build with custom instrumentation and figuring out the problem, I found an email thread where the sqlite community was arguing about this exact issue and the default settings in question. A couple of people had forseen the exact problem I had run into and suggested a fix. Their concerns were dismissed on the grounds that the problem could be configured away, and their concerns about discoverability of configuration were ignored completely. I wasn't thrilled with the crummy defaults, but seeing that the consequences had been forseen, considered, and dismissed despite what seemed like widespread consensus on the fix being simple... it really damaged my trust. How many more landmines did SQLite have?

Lack of perf tooling + bad defaults = recipe for pain.

s.gif
More details please if you have them. What was the throughput? number of transactions? Data size?

If you have that thread would be great to see it as well.

s.gif
It was chugging at ~100 inserts per second on about ~300k rows of <100 bytes each. Just inserts, no index (it was the first thing to go), no foreign keys, one table only. SQLite was spending all the disk bandwidth copying data in and out of a tiny designated working area and flushing. This was in 2013 -- I'm afraid I've forgotten the details. It's an old problem, long fixed by now.

The problem itself didn't concern me nearly as much as "no perf tools + no perf foolproofing." That's a rough combo. If a problem this simple required this much debugging, extrapolations to problems of any complexity are terrifying. I knew that simplicity implied limitations, but this lesson taught me that simplicity could also imply danger.

s.gif
This is a common problem. Bulk inserts should be done within a single txn if possible. The limit isn't insert throughput, but transaction throughput.

I've commented elsewhere here for docs referencing this problem. It's FAQ#19 on the SQLite website.

Was your inserts based on HTTP requests or was it more of a batch process? we're they grouped in txns? Obviously user http requests would be harder to group up, but kudos if your service is handling 100 QPS writes as that's pretty rare level of scale approaching the real "need a beefy concurrent db server" type of problem statement.

s.gif
Could you tell us more about that configure option?
s.gif
This happened in 2013. A few years later the fix went through. The bug itself is water under the bridge, what you're hearing is that I'm still leery of "no perf tools + no perf foolproofing." These judgements may be obsolete -- once burned, twice shy.
Don't be afraid of a database process. They are not scary, and are certainly less scary to scale up than whatever you might need to do with SQLite. There's more help available and better tooling.

SQLite may shine in edge cases where you know you can outperform a regular database server and you know why, and you could build everything either way. SQLite could be a way to e.g. decentralize state, using local instances to do local storage and compute before shipping off or coordinating elsewhere.

Otherwise, SQLite can simply be a recipe for lots of lock errors on concurrent operations. I've also never been very impressed with its performance as a general purpose replacement for postgres or MySQL.

s.gif
> tooling

Yes, I learned this the hard way. I understood that simplicity meant limitations, but I did not understand that simplicity meant danger until SQLite burned me.

If your perf tanks, you don't want to have to spend days putting timers all around someone else's codebase. Caveat: SQLite may be better these days -- my incident happened in 2013 -- but I spent more time tracking that one SQLite issue than I have spent spinning up postgres instances since then.

s.gif
do you have any details about the situation you were facing?
Simon Willison has written about using SQLite for a "Baked in data" architecture which is a super interesting method for some situations: https://simonwillison.net/2021/Jul/28/baked-data/

As he notes https://www.mozilla.org/ uses this pattern:

> They started using SQLite back in 2018 in a system they call Bedrock ... Their site content lives in a ~22MB SQLite database file, which is built and uploaded to S3 and then downloaded on a regular basis to each of their application servers.

I'm particularly interested in the "Sessions" extension (https://www.sqlite.org/sessionintro.html) and would love to hear if anyone has successfully used it for an eventually consistent architecture built on top of SQLite?

s.gif
I designed something that used a local SQLite database on the client and a remote postgresql instance as the master. It used read and write queues at each end for sync and was eventually consistent.

Unfortunately it was far too advanced for the org and no one else understood it so it was canned in favour of a connected solution under the guise of ubiquitous internet access being available. This is proving to be a poor technical decision so my solution may have some legs yet.

s.gif
Hey I actually do the same in a mobile app. I dump everything into a local database, when they are connected to the internet it syncs, lets it work offline.
s.gif
If your database is just 22MB, probably even MS Access 2000 will perform adequately.
A slightly unusual use-case but for my work we have our own file format which is a thinly-veiled sqlite database. Originally we used a json file but we moved to sqlite for performance reasons once the files started getting to multi-gigabyte sizes.

It works great - there are ergonomic APIs in most languages, it’s fast and reliable, and great to be able to drop into an SQL shell occasionally to work out what’s going on. A custom binary format might be slightly more optimal in some ways but using sqlite saves so much work and means a solid base we can trust.

s.gif
Considering adobe Photoshop use(d) SQLite for application file format, this could be very far from unusual.
I've used in "production" and as the "primary datastore", but not in the ways those terms are normally used.

1. PHP web development for the client of a client. They needed persistent data and MySQL was not available. Moving to a different webhost was straight up rejected. Used sqlite with Idiorm and it worked just fine.

2. As the local datastore for a cross platform mobile application. The sqlite DB was unique on each device. Libraries were available and worked well.

3. This is a large one. Several 10's of thousands of installs that query the filesystem, but filesystem access is throttled by the vendor. We're using sqlite to store the state of the filesystem as it doesn't really change that much. If the db is damaged or whatever, it can be wiped as it isn't the final source of truth.

When I maintained uptime.openacs.org (https://gitlab.com/siddfinch/uptime) and MyTurl (both running AOLserver) I wrote internal versions for a place I was working at.

I switched from Postgres to SQLite for a couple of versions, put mainly because Postgres wasn't "supported" I called SQLite an "internal database thing".

Worked flawlessly for about 7-8 years before both services were gobbled up into micro API services.

At the last count, we have about 14,000 services checked by uptime (about 1,000 every 5 minutes, 2,000 every 10 minutes, the rest every 15). Probably had about 60,000 tinyurls in MyTurl. We also ran the MyTurl urls through uptime every night to look for bad links. The system go hammered, often.

It took minor tweaking to get the the best performance out of the database and AOLserver has some nice caching features, which helped to take the load off the database a bit. But overall, it worked as well as the Postgres counterpart.

And now, I have to figure out why I never released the SQLite version of both.

Yes for all my sites: Nomad List, Remote OK, Hoodmaps, Rebase etc. No real issues at all.
s.gif
I read in one of your Tweets that you use one database file per (unrelated) table to avoid corruption. Why did you move to this model? Are multiple tables per file really more easy to corrupt?
s.gif
I'd be interested to know what kind of corruption you were facing.
s.gif
Awesome to hear!

How do you handle this? Do you store the SQLite file somewhere like s3 or just in memory?

How does this work for such high traffic sites?

s.gif
glad to see pieter here, I am keeping an eye on rebase ;)
I’ve worked on several projects with sqlite, both read and write heavy, all with high concurrency, with databases in the few hundred MB with 400k server clients, and 100 bare-metal servers running at capacity. The sqlite part of our system is never the problem. In our case sqlite has been an alternative to custom files on disk or replacing a spaghetti of hashmaps in memory. we also replaced a single postgresql instance with all customers into many sqlites per customer. Performance and reliability is why I always reach for it first. At this point I’m a zealot and would argue your first ‘data structure’ of choice should be an sqlite database :)
s.gif
What pain points have you experienced with "many sqlites per customer". I'm considering transitioning to something similar but would love to know what pitfalls I might not be considering.
s.gif
None really, but we have a fairly simple design where all the shared databases we attach are read-only (think big static lookup tables that a separate process takes care of updating.) I would probably avoid having databases attach contextually -- seems complicated and error-prone.
s.gif
I'm also in the SQLite is a data structure camp. You get fast access and persistence for free. You can also fix a lot of issues just exploring the database offline.
It is exceptionally great if you don't need parallel writes or have many terabytes of data - ie: for most services out there.

When embedding natively, like in a Rust app, the performance is better than any other RDBMs because no network/serialization overhead and being able to use pointers in-process if needed.

The DevOps story also is a dream: typically it is just a single file (optionally + some more for journaling) and setup is automated away (most language libs bundle it already), plus it is widely known since smartphone SDKs and all webbrowsers include/expose it.

A subtile advantage: the supported SQL subset is so small, that "if it works in sqlite, it will also work with $RDBMS" in most cases, but not the other way around. I always use it when getting started when in need of relational data, and only had to swap it out for postgres once, but not due to technical/scaling reasons (IT policy change & stuff).

Having said that, it is mind-boggling what kind of load you can handle with a small VPS that runs a Rust microservice that embeds it's own SQLite natively... that would be an expensive cluster of your typical rails/django servers and still have worse performance.

One other reason sqlite is great is the reduced latency. When you can satisfy queries in microseconds vs milliseconds, there is a fundamental shift in certain things you might try or not try.

We've been using this stuff in production for over half a decade now. Multi-user, heavily-concurrent systems too. The biggest cost savings so far has been the lack of having to screw with a separate database server per customer install (we do B2B software).

I think SQLite vs. PostgreSQL is similar to Flask+SQLAlchemy vs. Django, or similar debates.

Yeah, you probably can do everything with the "simpler" stack. It might even be nominally faster in many cases. But if there's any chance you're going end up rolling your own type-validation or ORM or admin interface or GIS... Just use the battle-tested kitchen sink from the get go.

We adopted SQLite for a commercial Windows application suite. the architecture allowed users to export, backup and share databases (useful in the context). I've also used it for a large number of developer desktop utility apps. I'd consider it for web apps which have limited data throughput; I'm not convinced it's a good candidate for high data volumes (but open to changing my mind).
I'm using for a suite of commercial desktop products and it's working out really well. You'll need to figure your multiple reader/single write connection pools, and graceful shutdowns in your custom server to avoid a data file corruption. This stuff you wouldn't normally do with a db server, but the discovery has made for some great learning and provided food for all kinds of load balancing and distributed db designs. Also started using fossil, which I really love for my small team.

Sqlite is one of the greatest open source projects in history, with awesome docs, and really is a tribute to the art of programming. I'm happy and honored to use it for the appropriate use cases (which are a lot more than one would think).

s.gif
BTW, the only corruption I've run into with sqlite is plug-pullers with transaction frames still open, killing a debug session, the usual shenanigans that are improper with basic NTFS.
My solution path for databases has been like this for a good decade:
  1) Sqlite
  2) Self-hosted Postgres
  3) Big Boy Database, with an $$$ cost. (AWS Aurora, Oracle, etc).
Most projects never leave the Sqlite level. Only one has left the Postgres level so far.
s.gif
Im using SQLite for several personal projects as well, if you were to migrate to Postgres how would you go about it? Any tools/service you recommend?
I think a lot of us fall into the trap of expecting that our apps will grow to a huge size, and that we need to be ready to scale just in case.

This is where the "MongoDB is webscale" meme came from.

The truth is SQLite and a single webserver or Docker container will be fine for 95% of web applications.

People really underestimate the advantage of simplicity vs perceived power.

Use SQLite.

I use Sqlite for all my projects, but most of my projects are Windows applications. At prior employers I did work on web applications that used traditional server-based databases.

In my opinion the biggest thing separating Sqlite from a "full blown" database is actually Sqlite's lack of stored procedures. At all of the places where I worked with traditional databases, we used stored procedures to create an ersatz data access abstraction so that the database design could vary independently of the API presented to the application. With Sqlite I find myself (ab)using views as a poor man's stored procedure, but of course that only covers the read-only or "functional" (in the functional programming sense) portion of stored procedure code.

Everything other commenters have said about data size or centralization also applies, but for me (again, just personal opinion) I'd actually draw the line at the point where you can or cannot get by without stored procedures. From an operational standpoint that would be: at what point is it imperative to be able to vary the details of the database design while maintaining an abstraction layer (stored procedures) that allows application code to be blissfully unaware anything changed underneath it?

Examples of when that would be needed would be if new users + applications start having competing needs, or if you need to revamp your table structure to improve performance or get around a limitation. If you're in a startup or small company, it would be the point at when you find yourselves hiring a real Database Administrator (DBA) rather than giving DBA duties to developers. Prior to that organizational scale you may be better off with the simplicity of Sqlite; after reaching that level of organizational complexity you might need a "real" (server-based) database.

I use it together with Rails and the horizontal sharding feature. Each customer has it's own sqlite database running in WAL mode. Since the app is internally used, traffic/writes are pretty predictable.

I also do backups periodically with ActiveJob using `.backup` on the sqlite3 client. It's simple and nice because I just have to worry about running the app, and nothing else.

s.gif
are you doing multi tenancy in your application by creating different SQLite database for each customer? I'm curious to know more about your approach
s.gif
Would love to know more about how you set this up.
An interesting answer to your question from Tailscale a few weeks ago [1]: sqlite as the main database, many readers read a copy, the copies synced using litestream. [1] https://news.ycombinator.com/item?id=30883015&p=2
We've used it in cloud migrations of light SQL Server workflows which were previously run on shared servers.

We replaced SSMS + SQL Server with Python + SQLite run in AWS Lambda. The jobs fetch the database from S3, update with the latest deltas and write out the database and some CSV files to S3. The CSV files drive some Tableau dashboards through Athena.

The SQL usually needs a bit of a rework to make this work, but for the volumes of data we were looking at (we're talking less than a million rows, jobs run once per day) we've seen good performance at low cost. We used DuckDB for a couple of workloads which needed more complicated queries, it's stupid quick.

Yes, our clustered, real-time configuration file system uses sqlite as sole backing store.

https://pve.proxmox.com/pve-docs/chapter-pmxcfs.html

https://git.proxmox.com/?p=pve-cluster.git;a=tree;f=data/src...

When it was written by our CTO over 10 years ago he tried every DB solution available, that is those that somewhat fit the picture, only sqlite survived any test thrown at them, if setup as documented it handles a pulling the power plug in any situation, at least in our experience.

It may need to be noted that the DBs are only used locally, we synchronize commits ourselves via a distributed FSM, that's mostly transforming the Extended Virtual Synchrony corosync provides to simple Virtual Synchrony.

I use it for https://chaitinschool.org/ but it's a fairly small web app and minimal traffic. It's nice to move the data around easily (which is mostly workshop/event data) but if we have more people sign up I might switch to Postres.
I maintain an 'older' codebase (2012) and am rebuilding it to a new version, but both use SQLite. It's a configuration management web application installed onto either bare metal or virtual machines. Generally only a handful of simultaneous users; I want to say performance isn't much of an issue or concern, but I've had to fix a bug that was caused by too many writes to the database where the system ran into IOPS limits (traditional hard drives or constrained VMs at 100 IOPS).

There is a hacky solution for redundancy; at certain events, a copy of the .db file is made and rsynced to a secondary node. This will probably fall apart if the file ever goes above a few MB in size.

Pros / reasons to use it: Self-contained, just a single file to transfer, no drivers needed, no servers running other than my own application.

Cons: No good support for ALTER TABLE queries, so things like changing the name, datatype, or default value of a column isn't happening. The workaround is to create a new table and transfer rows over, then drop the old table and rename the new table. Also the aforementioned issue if you want redundancy.

So basically, if redundancy isn't a requirement for you, sqlite is fine. It's probably ideal for single user applications, like your browser or apps (iirc sqlite is used a lot for those purposes).

s.gif
I wrote a command line tool that automates that ALTER table flow (create new table, copy data across, rename and drop old table) - you might find it useful: https://simonwillison.net/2020/Sep/23/sqlite-advanced-alter-...
Slightly off-topic, because we use PostgreSQL on the backend, but because StoryArk is offline-first, we heavily rely on SQLite in our mobile app. The backend database is mostly just there to backup your local data and to sync it across all of your devices. So there aren't that many queries being run on the backend database.

I can't really count how many times I've been pleasantly surprised by how extensive the feature set of SQLite is. I mean, it even has window functions (https://www.sqlite.org/windowfunctions.html). And being able to quickly open up the app's SQLite file in a database browser is also quite helpful during development.

Yes. For http://ht3.org which is a search engine I wrote for tech related articles. It works really well. It uses the fts5 extension, that allows full text searching. There are over a million indexed pages and it’s no trouble.
s.gif
I'm also using fts5 for some small projects but i haven't looked too deeply into it so i'm wondering if you have any interesting insights. Like what kind of index/options do you use? Maybe the trigram index? And your "across boundaries" mode is just word* in fts syntax?
s.gif
> The irritant-free web

This is a very respectable goal. I wish you great success!

Good full text search without pulling in another dependency would be quite a win. I'll add fts5 to my reading list. :)

Out of interest, what kind of compute and storage resources do you have underneath that?

s.gif
Thank you for the appreciation :)

It’s a linode. Shared cpu Plan.

1 CPU Core 50 GB Storage 2 GB RAM

It’s just $10 per month.

With linode even shared cpus are powerful and I’m yet to hit any overload. I’m sure it will at some point, I might upgrade then.

Question for people using SQLite in prod: how do you cope if your app is running on a platform like Heroku or Cloud Run, rather than a proper server or VM? Have you found a solution for the fact that those environments, and disk, is ephemeral?
s.gif
Yeah this is an annoying problem. I just run an actual server rather than an "app" platform. I haven't found it that complicated and I pay less monthly which is nice.
s.gif
Postgress

About 2 years ago I wrote my own blog engine so I could get up-to-speed with NodeJS: https://andrewrondeau.herokuapp.com/

I would have loved to do SQLite with some kind of "magic" backup, as SQLite is more than enough to handle a personal blog. (It certainly would make development easier!) However, at the time Heroku only offered Postgress.

s.gif
You could use Render or a similar service that offers persistent storage. I imagine Heroku has a similar feature.

[0] https://render-web.onrender.com/docs/disks

s.gif
I'd always opt to use their hosted database in those constraints. IMO SQLite only works well if you own the storage, so bare metal (or VMs) only.
I'm running a bunch of different read-only sites and APIs on top of SQLite using Cloud Run and Vercel - abusing the fact that if the database is read-only you can package up a binary DB file as part of a Docker container or application bundle and run it on serverless hosting.

This means it won't cost any money if it's not receiving any traffic, and it can scale easily by launching additional instances.

I wrote about my patter for doing this, which I call Baked Data, here: https://simonwillison.net/2021/Jul/28/baked-data/

A few examples are listed here: https://datasette.io/examples

At my current company we deploy sqlite as the primary and only database for our server. Our use case is a little less impressive than your usual webscale startups though.

Our product is a self-hosted IoT & hub unit solution, so we have no requirements to work with thousands of users doing who knows what. For our use case, sqlite is perfect. We don’t need to work with millions of rows, don’t need to stress the relatively low-power server units with another long lived network process, have no requirements of authentication since the user owns all the data, and can easily get insights into the database both during development and during troubleshooting at customer locations.

I’d sooner leave the project than move to anything else.

not exactly - but I did do an iOS app that used CoreData, which has sqlite as its primary datastore. tricky because running ad-hoc sql queries on a sql database in an emulator or device was not straightforward...
My preferred production DB is PostgreSQL. However, for small experiments, SQLite is more versatile due to fewer dependencies, single binary, zero install overhead etc., so I use it often, in particular for research experiments and systems prototyping. The only thing that ever bothered me was the lack of type enforcement, which has since been improved.

Production uses: 0 (1 if my Ph.D. thesis code is included, which had some C++ code that linked against version 2 of the SQLite library).

s.gif
I'm coming from a similar direction. Postgres is my go-to, and I love it's reliability when you get your schema right.

Glad to hear its type enforcement situation is improving.

s.gif
For those who've missed the announcement, here are some past links on the topic:
    - https://www.sqlite.org/stricttables.html
    - https://news.ycombinator.com/item?id=28259104
    - https://news.ycombinator.com/item?id=29363054
I am using it as the primary application file format for my desktop application.

With a lot of help from https://www.sqlite.org/appfileformat.html

It has had some pains but it has been great rather then flat file storage.

This article could help a little: https://www.unixsheikh.com/articles/sqlite-the-only-database...

But you probably won't see it since at the time of writing my response there are already 172 comments.

I'm using SQLite for a small personal project that's live in production and so far I love it (both for its simplicity in development and for its performance).

But I've run into on prod that didn't exist in dev on my MacBook M1, and I'm curious if anyone has any suggestions:

My app is basically quiet and serves requests in the dozens (super easy to run on a tiny instance), but for a few hours a day it needs to run several million database transactions, N+1 queries, etc. Because of the high number of IOPS needed, a small instance falls down and runs suuuuuper sluggishly, so I've found myself needing to shut everything down, resize the instance to something with more CPUs, memory, and IOPS ($$$), doing the big batch, then scaling down again. That whole dance is a pain.

Were I using a more traditional postgres setup, I'd probably architect this differently -- the day-to-day stuff I'd run on Cloud Run and I'd spin up a separate beefy instance just for the daily batch job rather than resizing one instance up and down over and over again. The constraint here is that I have a 50GB+ sqlite db file that basically lives on local SSD.

Any thoughts?

s.gif
Interesting problem. Did you try grouping up transactions? Ex instead of a few hundred million txns, do a few hundred thousand 1000op chunk txns. SQLite is much much faster within a txn.

Edit: a several hundred million txns over a few hours math. How many per second? ~500

According to here (question 19), for old HDDs you could expect 3 orders of magnitude improvement by using bigger txns. Not sure SSD wise but worth a shot.

https://www.sqlite.org/faq.html

s.gif
I have not tried that yet (I was being lazy since it all ran so fast on my M1), but that's a good idea for something to investigate. That way I suppose I can run the whole thing on a less-beefy instance and avoid the scale-up/scale-down cycle.
a. I'm surprised no one has mentioned WAL2 + BEGIN TRANSACTION, both of which are in separate branches with the plan to be merged into main.

Even though SQLite can handle 99% of peoples use cases, WAL2 + BEGIN TRANSACTION will greatly close that last 1% gap.

b. Expensify has created a client/server database based on SQLite called https://bedrockdb.com and years ago it was scaling to 4M+ qps https://blog.expensify.com/2018/01/08/scaling-sqlite-to-4m-q...

s.gif
I wasn’t aware or a plan to merge wal2 and BEGIN CONCURRENT into main. That would be awesome. What is your source about this?
s.gif
Whoops, yes - meant BEGIN CONCURRENT (I can't update my original post).
Used it in several Django projects, up to around 2M pageviews a day but most of that cached read.

I usually drop it because I need something that Postgres has or does better, or it's a write heavy site.

I am using it in production but my product has (almost) no users. Hopefully it will grow. I plan to replace it with Postgres as needed, but it offers some interesting new ways to approach certain problems, and creates interesting new problems like how to scale across multiple app servers? It's so far just a file. I'm sure there is some networking solution. Developing locally is interesting since I can just copy the production database file right to my machine.
Many of the replies here attest to the simplicity and fast performance of SQLite particularly for serving pages or data. But how well does SQLite fare in concurrent write/insert situations?

Although SQLite is not designed for this type of scenario, this discussion higlights there's a strong demand for a concurrent client/server RDMS that is simple, performant and easy to deploy. PostgreSQL is powerful and feature-rich, but not simple or easy to delploy. Hence the appeal of SQLite.

For example, could SQLite power a discussion forum of moderate (or more) activity i.e. users posting comments? The Nim language forum is powered by SQLite, but activity in the forum is fairly low. [1]

Between the simplicity of SQLite and the complex, heavyweight that is PostgreSQL, there is a wide gap between these database opposites. It's a shame there is no concurrent RDMS to fill that gap.

(Note: Another poster mentions the concurrent Firebird RDMS as a possible alternative, but I haven't used it. [2])

[1] https://forum.nim-lang.org/

[2] https://firebirdsql.org/en/features/

s.gif
Most SQLite writes take less than a ms, so the concurrent writes scenario isn't actually a big problem - writes can be applied via a queue.

Does your forum accept more than 1,000 writes per second? If not, SQLite should be fine.

s.gif
> But how well does SQLite fare in concurrent write/insert situations?

It's pretty well-known that concurrent writes are SQLites weak point, and that if your application requires high numbers of writes, that it's not the proper solution.

The SQLite devs even acknowledge this:

> SQLite will normally work fine as the database backend to a website. But if the website is write-intensive or is so busy that it requires multiple servers, then consider using an enterprise-class client/server database engine instead of SQLite.

> [...] client/server database systems, because they have a long-running server process at hand to coordinate access, can usually handle far more write concurrency than SQLite ever will.

(https://www.sqlite.org/whentouse.html)

We are using Sqlite as a primary datastorage for "file blobs" and implement a software based replication using our queue to multiple servers for redundancy.

The use case is storing trace/profiling data, where we use one sqlite file for each customer per day. This way its easy to implement retention based cleanup and also there is little contention in write locking. We store about 1 terrabyte of data over the course of 2 weeks this way.

Metadata is stored in Elasticsearch for querying the search results and then displaying a trace hits the Sqlite database. As looking at traces is a somewhat rare occurence we iterate over all fileservers and query them for trace data given an ID until we find the result.

Reference https://www.sqlite.org/fasterthanfs.html

I used SQLite and come custom python for a client solution, a web application for teaching and thought process. It was a large application (many steps and user screens, some admin functions), probably a good fit for Django but I don't like Django. I used python sqlite interfaces, running on a single Debian base server. The server ran like a tank, no problems whatsoever, but the application had slow performance at times. The client did a complete re-write later with different consultants and they started over without SQLIte. In summary, the clients had no clue what SQLite is or why you would want it, and my efforts to explain the benefits in detail, did not sink in, while the slow performance was very important and caused them to dislike the entire product. All things considered, I would re-write it the same way, since I enjoyed the tech stack personally, but debugging those rough spots instead of adding a dozen more GUI features, would have been better for the project.
Things worked well at the outset, especially in local development against my NVMe drive for my small CRUD application.

Then, with a little traffic, things continued to go well in production. But as traffic scaled up (to 1-5 QPS, roughly 25% writes), they fell apart. Hard. Because my production environment was spinning rust, IO contention was a real issue and totally absent from development. This manifested as frequent database timeouts, both from reads and writes.

Echoing another commenter's sentiment: things would have gone much more smoothly from the beginning had I started with PostgreSQL, but after having written many thousands of lines of direct SQL taking intimate advantage of SQLite's surprisingly rich featureset, migrating was less than totally appealing.

The mitigation strategy, which ultimately worked out, was to implement backpressure for writes to SQLite: queuing and serializing all writes to each database in the application, failing loudly and conspicuously in the case of errors (thus forcing the client to retry), and gracefully handling the rare deadlock by crashing the process completely with a watchdog timer.

s.gif
So you were using an HDD, not an SSD?

Would an SSD in production have solved the timeouts by increasing your write throughput?

s.gif
According to my testing, SSDs would greatly reduce the contention but were not a complete fix.
I'm working on a self-hosted analytics like Plausible but that allows to use SQLite as primary database. Of course it should be used for small websites and side projects but it will have all the Plausible features. Am I crazy?

This is the project: https://github.com/a-chris/faenz

I know a lot of people hate ORMs but I tend to find them useful, and if you do go down that route, it is pretty easy to write an application that will work with either a Postgres or SQlite backend.

I use SQLAlchemy and write applications where by just swapping out the database URI I can use either SQLite or Postgres. SQLite is nice for local development and easy testing, (you can even run tests using :memory: to accelerate CI/CD) and then I use hosted Postgres in prod. That said, based on what I have seen I would not be at all afraid to use SQLite in prod for internal tools etc.

VATcomply.com has been using SQLite as a primary database for 2 years and is serving ~70 requests per second without an issue.

HTTPS: https://vatcomply.com/ Github: https://github.com/madisvain/vatcomply

I love sqlite3. In python: https://docs.python.org/3/library/persistence.html

I have of course pickle -> open()

But eventually my projects grow large enough that sqlite3 becomes the database. I have never needed to go beyond sqlite3 in my projects. It does everything I ever need it to do.

Has anyone ever used multiple SQLite databases per tenant/account?

For sake of argument, let's say I have a fixed schema/format that will never change and I never need to aggregate queries across multiple customer accounts. Also, let's say writes to a single database are never going to be more than a hundred concurrent users. Why shouldn't I store each tenant's data in its own SQLite database? It makes it very easy for local client apps to download their data all at once. Backups are incredibly easy. Peer-to-peer synchronization behaves like a git repository merge. Why shouldn't I do this?

s.gif
> a fixed schema/format that will never change

When would this happen for any non-trivial multi-tenant service? The difficultly of performing migrations sounds like it would pretty quickly negate any simplicity gained.

s.gif
Definitely use case specific, as it wouldn't work generally, especially for domain driven data models. The schema I'm thinking of specifically represents a data model that is flexible enough where migrations are extremely rare[1].

Even if there are migrations, it's treated similar to a file conversion (like upgrading an older excel file format to a new format on demand when the file is accessed).

[1] Maybe something similar to https://www.notion.so/blog/data-model-behind-notion or an EAV model imbedded in a JSON1 column.

s.gif
If the users are never aware of other users and they only care about their own data, sure. Maybe you have a service where a user can sign up to do a specific task that never requires interaction with another user.

However, ff you want interaction across users (messaging, user-roles etc.) then you might want to have them in one database.

I had a great use case for SQLite last year.

Client (incident response dept at megacorp) had a problem: their quarterly exercises of switching network storage devices from live servers to disaster recovery (DR) servers was a manual operation of reconciling about 8 Excel spreadsheets and setting up ACLs before (luckily) an automated process would switch the storage mounts from live to DR.

We modeled and matched up all the hosts, servers, and ACLs and did a daily write to a single SQLite database. (We redundantly sent all the data to Splunk.) Now the DR employees are automating a daily diff of servers, hosts, ACLs etc to further automate the switch.

To echo a bunch of comments here, we decided on SQLite for a few reasons:

- only one user would write to the DB - only a few users need to access the data - besides standard retention policies, the data could be considered ephemeral and easily recompiled - the script we wrote to compile the data runs in 5 minutes, so if we lose the db, we can easily recompile it.

SQLite (and SQLalchemy) is useful for inexpensive data.

https://internetdb.shodan.io is powered by a SQLite database and gets millions of requests a month. It does require a different workflow and custom synchronization scripts but otherwise it's performed well.
s.gif
What kind of workflow and synchronization scripts?

These sound like the kind of hidden costs that could turn sqlite's simplicity quite complicated if you don't see them coming.

I used it for a Magic: the Gathering Commander tool I made recently[0]. It was pretty useful, since the point of the tool was on-device card searching and deck organization. I was even able to sync the data between multiple computers just by putting it in a NextCloud folder.

Aside from some surprises regarding packaging it together with the rust crate and inability to rename columns, I'm really happy with it. Easier than deploying postgresql, more useful than documents.

[0] https://github.com/Endominus/Lieutenant

Yes, I use it for builtwithdjango.com . I have not a single bad thing to say about it, not one!

It is so convenient to have it as a file, especially when you are just learning to do software development.

And the performance has not been an issue once.

One thing to note is that my site is not Facebook size. It only gets ~40 page views a day. And most of them are just for viewing, so no database opertaions.

So, I'm not going to be the most credible voice here. FWIW, I know that Pieter Levels, who runs multiple projects like nomadlist, remoteok, rebase, uses both SQLite and plain JSON files for storage.

I have used SQLite for Django applications with a few thousand users. It has had no problems. However, I just use the ORM and never configure the SQL directly. The vast majority of LAMP stack style web applications would be an ideal use case.

However, I would consider how important RDMS features are to you which are not available in SQLite:

- less sophisticated type and constraint system.

- a severely limited ALTER TABLE.

- No stored procedures.

- limited selection of math and statistical functions.

- no permission and user model, not to mention row-level security.

To be clear, I don't think it's bad the SQLIte doesn't try to be an RDMS, but I would consider this perspective when making a decision, not performance which is great, and difficult to max out.

s.gif
SQLite has a pretty powerful permission model via the set_authorizer hook - you can register a callback function which will be called each time SQLite attempts to read data. It's not widely used though from what I've seen.

It's also really easy to add new custom SQL functions to a SQLite connection, which means the missing math functions shouldn't be a limitation. Here's an extension for example: https://github.com/nalgeon/sqlite-stats

s.gif
I would add "limited json support" to this list.

Under the hood, SQLite treats json as strings, you have to do some strange stuff with extract and computed fields indexing to index into it, which can be a bit fragile.

My use case these days is hybrid rdbms / nosql, where most of my tables have defined columns for frequently queried data, and a jsonb "data" field for everything else.

Postgres has impressive jsonb capabilities, and with 14 the index operators making querying it a dream.

I love SQLite, but postgres' jsonb handing makes the additional operational overhead worth it to me.

I use it for a small site (high reads/writes ratio) where I keep different things in different databases (one for users, one for posts, etc).

Works perfectly well. Mind you, I would use Postgresql if the site were important, just to be on the safe side.

I can't comment on my own use of SQLite as a primary database for anything (although the existence of SpatiaLite [1] may lead to me trying this out), but whoever needs an embedded database system should probably consider evaluating Firebird for that role as well -- it has an embedded mode with basically no feature compromises relative to the server mode. (They even put Interbase -- Firebird's ancestor -- in (not only) M1 Abrams' tactical data system apparently [2], for reasons of reliability.)

[1] https://www.gaia-gis.it/fossil/libspatialite/index

[2] http://web.archive.org/web/20190224100905/https://core.ac.uk...

For one project we used sqlite in a GUI application to store some tabular data (usually way below 1GB), and to do some light processing over it. Mind that the performance requirements were minimal, as this was a just an auxiliary function; the program did do a lot of heavy lifting in other parts, but the DB wasn't involved in these (it got fed some messages/stats at best, and most of the data came from other aux functions). The sqlite lib was easy to integrate both on the technical as well as on the legal level. We could have done all that with native code, too (I think we even removed some code?), but it would have consumed much more time (dev work, unit tests, maintenance) without any benefits. And it worked like a charm, except for one issue: The GUI person did create a connection for every operation but thought they were reusing it, which then caused some weirdness. And this was easily fixed.

An important realization is that not everything needs to scale, and that it depends on how you access the DB and what your product looks like. For a load with many concurrent writes I'd be careful with sqlite, or when I know that I'll want my DB to mostly live in memory (e.g. operations will often process the whole, huge dataset and no index can help with that). But even if I thought "Uh, I'll probably need a full DB", I'd still benchmark my application with both sqlite and e.g. postgres. And if the API to access the DB uses some nice abstractions, swapping the flavor of SQL isn't a huge issue anyway.

//edit: Plus, I've done stupid stuff like "my SPA hammers the PHP API with 20 to 40 requests, each resulting in a simple SQLite query, just to render a checklist" and got away with it: a) because we had at most 20 concurrent users [realistically: 1 to 5] b) doing the checklist took half a workday (ticking off an item was done via a JS callback in the background, so the actual rendering happened only once) and c) SQLite performs great for read heavy loads. The site performed so well (page loads felt about as fast as HN, even when connected via VPN) that I even scraped the plan to locally cache checklist in the HTML5 localStore (bonus: no cache = no cache incoherence to care about).

The old(ish) Cyanide implementation of Blood Bowl used SQLite as the data store for all of the game data. It worked, but the game startup time was absolutely dominated by loading data from the database and took too long even on a fast SSD.
Yes, works great for my sites (they are mostly read-heavy). I used to default to Postgresql, now I default to sqlite.

This [0] is a good article with some benchmarks, misconceptions about speed, and limitations.

[0]: https://blog.wesleyac.com/posts/consider-sqlite

s.gif
> SQLite has essentially no support for live migrations, so you need to instead make a new table, copy the data from the old table into the new one, and switch over.

That seems like a pretty big flaw as your data grows. Zero downtime migrations are really nice. Anyone here got a war story / experience with this one?

s.gif
I've used SQLite with HashBackup for 13 years and it's been awesome. A lot of the stories I read about "data corruption" with SQLite are, IMO, really about faulty applications that do database commits with the data in an inconsistent state from the application point of view. I've done it myself - it's an easy sin.

I've migrated database versions 35 times over the last 13 years, ie, every individual HB database has been migrated 35 times. You don't always need to make a new table, do a copy, and switch over. In the latest migration, I added new columns, initialized them, dropped columns, etc. without doing a db copy.

For this migration I wanted to switch to strict tables, where typing is strict. I could have done this by just altering the schema (it's just a bunch of text in the SQLite db) and then using SQL to make sure existing columns had the right data (using CAST). But instead, I created a general framework to allow me to migrate data from one schema to another, mainly so I could reorder columns if I wanted. That can't be done with ALTER statements, so I did end up doing a complete copy, but I've done many migrations without a copy.

I found this paper interesting on "zero downtime migrations".

https://postgres.ai/blog/20210923-zero-downtime-postgres-sch...

After reading it, the bottom line is that changes happen within transactions (true for SQLite too), and the key to zero downtime migrations is to use short transactions, use timeouts, and use retries on all database operations, including the migration commands. You can do all these with SQLite.

I use SQLite in production and it works great.

You should understand whichever RDBMS you use, and how to get the best performance out of it. Previously I used Postgres extensively, and it worked fine, and before that I managed MySQL servers. They are all fine, but SQLite is as simple as it gets, and more than adequate for most workloads.

Yes, multiple times. It went/is going great!

Pros:

- A single API server, no separate database to worry about, configure, and update.

- Backups are as simple as backing up one file every so often. SQLite even has an API to do this from a live connection.

- Handles way more concurrent users than we’ve ever needed.

- Dev and test environments are trivial and fast.

- Plenty of tools for inspecting and analysing the data.

Cons:

- There are certainly use cases it won’t scale to, or at least not without a bunch of work, but in my experience those are less than 1% of projects. YMMV.

- The type system (even with the newish stricter option) has nothing on Postgres. I realise this is basically a non-goal but I’d seriously love to somehow combine the two and get PG’s typing in a fast, single file embedded DB library.

- Postgres JSON support is also better/nicer IMO.

This isn't a typical use case. FWIW, a decade ago, we used sqlite as the persistence mechanism for an in-memory KV store called membase. (See https://github.com/membase/ep-engine). This was powering 50M+ DAU traffic in production for very intense write-heavy traffic. It did its job well. Around that time we also considered leveldb (and rocksdb a bit later) as alternative to sqlite.
I use it as the data store for my company's Grafana instances. I back up the SQLite store to an S3 bucket with litestream.io, allowing me to treat the servers as 'cattle'. It has worked perfectly without any issues, and saved the cost of a full RDS instance on AWS.
Yes, I've used it for a side project of mine. It processed like 5 financial transactions in total, so I'm glad I never invested the time to build anything more robust :)

It's also powering another one and I really like the fact that I can just commit the whole DB to the GIT repo.

s.gif
Very interesting, what's your workflow? And how big is the DB?

I've never (deliberately) considered committing a DB to git. Although there was that one time when I was straight out of college...

Pro tip: surprising your colleagues in the morning with a 40 minute wait to pull master (because you committed a ???GB db) is a good way to feel like a right eegit.

Yes. For an interesting Desktop project in Java which needed to be fast. Using SQLite with the FTS extension, custom Java extension functions and also some BLOB tables with protobuf worked wonders.

I also built a Google Go library wrapping the sql amalgamation file and then cross compiled it for Android and iOS but with some more SQLite extension (GIS), which the stock Android/iOS SQLite did not have. This was some time in 2017 I guess.

I am a big fan of SQLite. You can integrate it in all kinds of stuff and adapt it to your needs. Compiling it is also straightforward.

I'm using SQLite in Notion Backups (most of the workload happens in background jobs; the web app itself doesn't get that many visits)

Except for some rare exceptions, it's been doing pretty great. I don't have any plans to migrate from SQLite any time soon.

s.gif
Any chance I could convince you to share some of those 'rare exceptions'? I love a good exception ;)

Also they're where the real insights are.

s.gif
Lately, Rails has been complaining about connection timeouts, stating that all connections in the pool were in use (this usually happens when Sidekiq, a background jobs framework, processes multiple long-running jobs).

By default, the connection pool in Rails contains 5 connections, and they time out within 5 seconds.

I think that iOS CoreData is just SQLite under the hood so I imagine a lot of iOS apps use it as the main data store
I use sqlite and litestream for all of my projects, deployment very simple, performance is great, litestream backups to backblaze b2 cost pennies per month
It's something I've been meaning to try for a long time. I had https://litestream.io/ in mind as a means to achieve a "Fully-replicated database".
FWIW, i've never once started with sqlite and then later "upgrade" to another db. i once actually used sqlite to implement locking for a mysql db because mysql's locking requires (or required, back then) that the calling code specify, in advance, every table which would need locking, and that wasn't possible in that code base. So an sqlite db connection was opened just to act as a mutex for the mysql db in some code paths.
SQLite is a horrible database in comparison to Postgres, MSSQL, Mysql and so on. There is no reason to use it as a primary db. There are many other areas were SQLite really shines though.
I used to work for a mobile app company that made educational apps. We used SQLITE as our primary data store on the client and server-side. Our data was very small, think the question and answer content for flash cards. It worked great.
Yes, absolutely, and for many projects.

..but like all things, it depends on your needs. Some have already pointed out the pages on SQLite's on site regarding # of writers (the main issue), etc.

Currently using SQLite3 as the backend db of www.rtljobs.com.

I love it - very robust, lots of documentation, StackOverflow answers, example queries, etc.

On a typical day, I get less than 50 users per day globally, so I don't really have to worry much about concurrency or other issues that SQLite struggles with. I'd wager that many web applications are perfectly well served by it.

I on first learning of it like 15 years ago thought it was an amazing idea and converted my personal website to use it. The performance at the time was lackluster, and I shortly rolled back to MySQL.

My manager currently runs a number of personal sites with a SQLite backend and they all seem very performant so I have been honestly considering giving it a second look.

I use sqlite for a flashcard app to help kids with multiplication. It's running on an ec2 micro instance, because I'm trying to see how long I can stay in the AWS free tier.

Everything is good so far, though most of my traffic is bots probing for wordpress flaws.

I love SQLite and use it for a lot of my harebrained ideas to see if they'll take off. They never do, so I never out grow SQLite. :P
I wrote a travel blog for a trip I'm currently on using Laravel and an SQLite database. The only people writing to it are my partner and I so that's not an issue. The CPU on my $5 VM would probably bottleneck trying to serve traffic before the database would.
With C# (Entity Framework) I've tried and failed using sqlite in a production scenario because of concurrency issues...

Recently I stumbled over a potential fix[1], which I will try in my next project.

[1] https://ja.nsommer.dk/articles/thread-safe-async-sqlite3-ent...

s.gif
Ouch, that sounds like a nasty bug. But if I understand correctly it's more the driver/ORM's problem than sqlite's?

One to look out/test for early if I go in this direction though. Thanks for the heads up!

s.gif
> But if I understand correctly it's more the driver/ORM's problem than sqlite's?

I think so. sqlite is pretty reliable in my opinion and I never hat these issues anywhere else, but this bug made me switch my DMBS for a small side project from sqlite to postgres.

> One to look out/test for early if I go in this direction though.

You're welcome!

I use it for production backups. There's a sqlite db in each compressed archive to store metadata about the backup.

It's probably elsewhere but I don't realize it.

Side question: what’s something as simple as SQLite, but more of an unstructured key-value store?

I’ve been using (locally) a Redis container for a very early prototype because it seems to be simple enough to use.

I know you can query json strings in salute but that’s not quite the same thing. For one redis offers some geo features.

s.gif
I just make key-value store tables and write a small interface to simplify access in H2 or SQLite.
s.gif
But why not just use Redis (which is almost surely faster even vs. SQLite in-memory) instead of creating key-value tables?
Yes for sniprss.com. Backend is written in Go and using litestream for replication.

No issues at all.

s.gif
How do you deploy your application? Is it possible to do without downtime?
I'm using SQLite as the primary database for PlotPanel [1]! It's going great with no unpleasant surprises along the way.

[1] https://plotpanel.com

s.gif
I’d be curious to hear how you deploy new versions of your application with SQLite. Can it be done without downtime?
Yes! I use it for https://webtoapp.design But its not really impressive as my DB is just 3 megabytes large haha.
s.gif
Don't knock it - if it works and only costs 3MB that's a win.

The longer you can scale the product without having to scale the application, the better!

Expensify are doing this to a certain extent and have written about it before https://news.ycombinator.com/item?id=23291779 There have been a lot discussions about this on HN over the years too.
When I was Syncplicity's desktop client lead, we used SQLite. (Syncplicity is a desktop file synchronization application.)

When I looked around, Dropbox used it too; and so did Bittorrent Sync (Now Resilio)

I do not quite understand the premises:

> Given the complexity

Which complexity? It is the simplest possible widespread, reliable and effective solution. Which makes it a primary choice.

> it seems like there are use cases or needs here that I'm not seeing

On the contrary, the use cases for the traditional Relational DB engines are defined: when you need a concurrency manager better than filesystem access. (Or maybe some unimplemented SQL function; or special features.) Otherwise, SQLite would be the natural primary candidate, given the above.

Edit:

I concur about https://blog.wesleyac.com/posts/consider-sqlite being a close to essential read if one has the poster's doubt.

To its "So, what's the catch?" section, I would add: SQLite does not implement the whole of SQL (things that come to mind on the spot are variables; the possibility of recursion was implemented only recently, etc).

s.gif
> ... the possibility of recursion was implemented only recently, etc).

Noting that "recently" was August 2014 (version 3.8.6), according to https://sqlite.org/oldnews.html.

The "missing" right/full join types just hit trunk this past week and are still being debugged: https://sqlite.org/src/info/f766dff012af0ea3

s.gif
> Noting that "recently" was August 2014

Right. This as an issue has little to do with deployment the way the submitter intended, and has to do with software that was implemented using specific static versions of SQLite and that some sometimes use as no better alternative meanwhile emerged. The "delay" is more evident (more "daily present") to said users, and has little to do with new products.

Nonetheless, given that, I would first check which subset of SQL you may need - I am not sure on how much overlapping you have with other mainstream products.

My workplace does. It works surprisingly well and has completely changed what I think about SQLite and databases in general. SQLite is more than enough for almost all production needs.
so long as you are not trying to access the DB with more than one process, sqlite scales as far as most DBs on a single instance.

The only issue is that you'll need to take special care when backing up the DB file (but this is probably the same for most DBs even today.)

s.gif
Note that this is specifically for write/modify operations. You can perform multiple SELECTs in parallel, so depending on the workload this may be acceptable. As the devs say: "Experience suggests that most applications need much less concurrency than their designers imagine."

https://sqlite.org/faq.html#q5

s.gif
It's my understanding especially with WAL mode that it's fine to read and write from multiple processes (I hope so because I do this) it's just that any write locks the entire table.
rqlite author here, happy to answer any questions. One thing I've noticed is a trend towards folks doing bitcoin mining (and related applications) wanting to use rqlite. I think they like that it is very easy to run, and gives them complete control over their data.

https://docs.google.com/presentation/d/1Q8lQgCaODlecHa2hS-Oe...

s.gif
I only very briefly looked into rqlite. It's very interesting, but if I understand it correctly it's also not geared toward a write heavy workflow. (all writes are routed to the same node)

I.e. it's leaning more toward the moderate, but reliable writes, and heavy read use cases?

Please let me know ~if I'm missing anything~ what use cases I'm missing.

s.gif
That's correct. rqlite replicates SQLite for fault-tolerance and high-availability, not for performance. It fact performance takes a hit, for the reasons you state. But it's no worse (nor better) than something like, say, etcd or Consul.

https://github.com/rqlite/rqlite/blob/master/DOC/FAQ.md#what...

Are there any blog posts that details, soup-to-nuts, how to deploy a webapp with sqlite on something like AWS or GCP or Render?
s.gif
You could provision an EC2 instance and mount some persistent storage. I'm not sure if it would make a lot of sense to take advantage of EKS or GKE.
yes, used sqlite and failed miserably at multi-process access to the same db file. no DBMS on top of it, so to be expected. moved to postgres.
It depends on the application you are developing. If you are doing microservices, separate the database from the application is a must.
s.gif
Some microservices will need persistent local data that is only used by the microservice and sqlite can be a good fit.
I wrote a stock trading strategy backtester for a client in Go and SQLite and it read 100k rows/sec on a cheap consumer grade SATA SSD.
Yes, for a project used internally in my company, 300k rows per week. So far it’s going great. If things get too slow I’ll migrate to Postgres.
Yes, via Bedrock (bedrockdb.com) for Streamie (streamieapp.com). No complaints.
Yes, it's great when you want to get started quickly on a single machine.
Grafana uses SQLite as its default database
I no longer work there, but an enterprise facial recognition system used by NGOs, and 3-lettered government agencies has SQLite as the sole datastore. I wrote a portion of the SQLite runtime logic, a simply key/value store used all over the software.

SQLite proved to be phenomenal. We spec'ed hardware with enough RAM to hold the FR DB in memory, and damn SQLite is fast enough to keep up with the optimized FR system performing 24M face compares per second. With a 700M face training set, SQLite also proved instrumental in reducing the training time significantly. These daze, if given the opportunity to choose a DB I always choose SQLite. I use SQLite for my personal projects, and I go out of my way to not use MySQL because SQLite is so much faster.

s.gif
Impressive numbers, thanks for sharing.

Out of interest, were you running on bare metal/cloud? And what kind of CPU was behind those 24M face compares per second?

s.gif
Running on bare metal, and those numbers come from a 3.4 GHz i9. The system is a fully integrated single executable, with embedded SQLite. Since I left the firm a year ago, new optimizations have the facial compares down to 40 nanoseconds per face.
s.gif14 more comments...

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK