4

SQL is (almost) always superior to NoSQL

 1 year ago
source link: https://dev.to/polterguy/sql-is-almost-always-superior-to-nosql-361g
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Myth busting (10 Part Series)

Saying the above out loud is almost like publicly admitting you've got lepra today, because all the hype seems to be geared towards NoSQL and document based database systems, while RDBMS types of systems doesn't seem to get much love. However, somebody needs to say it out loud, and it might as well be me. To illustrate my point let's examine what you can do with SQL that's simply not possible with most NoSQL systems.

  • You can create join statements
  • You can extract meaningful statistics
  • Partial record updates
  • You've got referential integrity
  • Etc ...

Let's go through the above features one at the time to see the power of such features. To make this as much "hands on" as possible, please do the following first please.

  1. Register a Hyperlambda cloudlet here
  2. Install the SQLite Chinook DB plugin from the "Management/Plugins" section - Screenshot below

Installing Chinook database

Then go to "Tools/SQL Studio" and click the load button, for then to choose the SQL script called "chinook_aggregate_profitable_countries". Make sure you choose the "chinook" database and execute the script. The result should resemble the following.

Profitable countries

Doing something such as the above with MongoDB, Couchbase, CosmosDB or DynamoDB could easily turn into a 6 months long software development project by itself. We did everything in less than 5 minutes, assuming you followed the hands on parts. And the result was that we now know which countries are the most profitable countries in regards to music buyers according to the Chinook database. Below is the SQL for reference purposes.

select BillingCountry, round(sum(Total),2) as Revenue
  from Invoice
  group by BillingCountry
  order by Revenue Desc;

Statistics such as the above often depends upon combining joins with group by constructs, and aggregates. Now try to perform a join statement in your CosmosDB installation to extract similar statistics. I'll just eat my popcorn in the meantime ...

Partial record updates

OK, this is only a "partially true argument" since (some) NoSQL systems actually supports partial record updates - However, the last time I checked, CosmosDB did not support it, and unless you've got it, you'll need to lock database records somehow as you're updating them. There are 3 different techniques for locking database records, and they're all disastrous in regards to implementation details, and/or the resulting software.

  1. Optimistic locking
  2. Pessimistic locking
  3. Ignore locking

To start out with the worst, ignoring locking implies you are 100% destined to experience race conditions in the future. Having race conditions on your database, is a guarantee of making sure your data becomes garbage over time.

Going for optimistic locking at best explodes the complexity of your codebase. Going for pessimistic locking ensures that you'll go insane if one of your employees goes on holiday with a web form open on his computer, in addition to that it explodes the complexity of your code of course. I wrote about database locks a year ago at DZone in case you're interested.

With partial record updates, the problem simply vanishes more or less. To understand why, read the above article for details. Some NoSQL database systems do support partial record updates, such as CouchBase if I remember correctly, but most struggles with these kind of constructs.

Referential integrity

This one is the big one (pun!). Most NoSQL database systems poses restrictions upon you in regards to the size of your documents. I think the maximum size in Cosmos is 2MB. The mantra for NoSQL believers of course is as follows ...

You don't need referential integrity because you can just store everything in one document

"Good luck with that when the maximum document size is 2MB" - Is my answer to these people :D

Maybe this is why Twitter only allows you to use 140 characters ...? ;)

Besides, storing everything in one document implies zero normalisation, making it almost impossible to update stuff, where multiple documents are having the same value, and that value needs to be changed over time.

Things you can't do with NoSQL

Below is a list of SQL statements that are more or less impossible to execute in NoSQL database systems.

Counting records from artists

select ar.Name, count(*) as count
  from Album al, Artist ar where al.ArtistId = ar.ArtistId
  group by al.ArtistId
  order by count desc
  limit 25

The above counts how many records all artists have released.

Calculate audio format profitability

select mt.Name, count(mt.MediaTypeId) as Amount 
  from MediaType mt
    inner join Track t on mt.MediaTypeId = t.MediaTypeId
  group by mt.MediaTypeId
  order by Amount desc;

The above calculates how much profit was made from each audio format.

Get info about genre consumers

select distinct c.Email, c.FirstName, c.LastName, g.name
  from Customer c
    inner join Invoice i on c.CustomerId = i.CustomerId
    inner join InvoiceLine ii on i.InvoiceId = ii.InvoiceId
    inner join Track t ON ii.TrackId = t.TrackId
    inner join Genre g ON t.GenreId = g.GenreId
  where g.Name = "Rock"
  order by c.Email

The above finds the names and emails of all customers that are listening to rock music.

Count songs by artists

select ar.Name AS ArtistName, count(t.TrackId) as TrackCount
  from Track t
    inner join Album al on t.AlbumId = al.AlbumId
    inner join Artist ar on al.ArtistId = ar.ArtistId
    inner join Genre g on t.GenreId = g.GenreId
  where g.GenreId = 1 
  group by al.ArtistId 
  order by TrackCount desc
  limit 10

The above counts how many songs each band has created, and orders such that the most productive bands are listed first.

Conclusion

Every now and then you really need a NoSQL database system. There are problem domains where document based databases are better. However, 99.99999999% of the time a relational SQL based database is simply a bajillion times better. My rule of thumb is as follows ...

Am I to create Twitter, Google or Facebook? If the answer is no, I'm using SQL ... ;)


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK