35

Complex Queries in SQL

 4 years ago
source link: https://www.tuicool.com/articles/Anue2m7
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Texts on SQL are good at providing the basic templates of SQL syntax, but sometimes the queries in those books are a little idealised compared to real life, with no more than two or three tables to join and a small number of fields to deal with. As a result, the queries themselves are usually no more than a dozen lines long.

It often happens in the wild that you end up joining across many tables with long winded ‘WHERE’ clauses that are easy to lose track of. These are hard enough to use the first time you employ them, but can be impossible to debug and understand if you need to re-use them. What can be done to make long and complicated queries easier to write and easier to read?

Firstly, though, ask yourself whether you need a long, complex query at all. Is it vital that your single query does all the work? If you still think it is, ask yourself at least two more times. In the excellent SQL Antipatterns , using a single query to solve a complex task or multiple tasks is identified as a key anti-pattern of queries, underlining its status as something to be wary of. One particular danger that this book identifies is that as the query becomes more difficult to follow, the risk of accidentally making a Cartesian join, taking a protracted period of time to return many unwanted rows grows enormously.

Hence the crucial piece of advice is to check whether or not the large query could be performed with more than one smaller easier to understand queries.

If you think that the complex query you are writing is completely unavoidable, there are a few ways to make it a less painful process.

  1. Specify the column names within the query; don’t fall into the ‘SELECT *’ trap. A complex query that doesn’t specify which columns to return is likely to return columns that aren’t needed, making it harder to find the information you are looking for. Additionally, there is a decent chance of returning redundant columns.
  2. Format your query carefully to make it as readable as possible for another human being. Put each field to be selected on its own line, and put each new table joined onto on its own line, and each element of a ‘WHERE’ clause on its own line.
  3. Use meaningful aliases for tables to further aid readability. You are likely to need to refer to your tables multiple times within your query, and if you are working within someone else’s database with their naming convention, the table names can contain redundant information about they refer e.g. ‘MyDBTable1’ — so you call a table like that ‘T1’ as you already know you’re working within ‘MyDB’
  4. Create a redundant ‘Where’ clause e.g. ‘WHERE 0=0’ to enable easy testing of different parts of the WHERE clause (h/t former colleague Mark Watson). That is, because the ‘WHERE 0=0’ is automatically sitting at the beginning of the ‘WHERE’ clause, you can turn the rest of it on and off (e.g. comment it out) without a syntax error to check the effect.
  5. Plan your query, potentially making a sketch on paper so that you know what you are trying to achieve in terms of where your data is located and how the tables in your query relate to each other before you start converting it to code.

The end result of taking these steps is that your code will look a bit like this:

SELECT
        Var1
      , Var2
      ...
      , VarNFROM  
          Table1  T1
     JOIN Table2  T2  on  T1.Var1=T2.Var1
     JOIN Table3  T3  on  T1.Var1=T3.Var1
     WHERE  0=0
     
AND    T1.Var1 > 0

Overall, it is preferable to have more small queries rather than fewer large queries. Some database strutctures make this difficult however, for example if the you need to join on to get a secondary key to find what you are really need from table B. In these cases, planning and careful formatting can assist avoiding your code turning to spaghetti.

Robert de Graaf has written a number of pieces on SQL for Data Scientists, including most recently, SQL For Missing Values .

Robert de Graaf’s book, Managing Your Data Science Projects , is out now through Apress.

Follow Robert on Twitter


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK