GitHub - pingcap/awesome-database-learning: A list of learning materials to unde...
source link: https://github.com/pingcap/awesome-database-learning
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
README.md
Awesome Database Learning
A list of learning materials to understand databases internals, including but not limited to:
- papers
- blogs
- courses
- talks
Please submit a pull request if there is any material that you think should be included in this collection.
Table of Contents
- Recommended Courses and Books
- SQL & Relation Algebra
- Query Optimizer
- Query Execution
- DDL
- Transaction
- Network
- Storage
- Serializing & RPC
- Data Partitioning
- Replication & Consistency
- Consensus
- Scale & Balance
- Benchmark & Testing
Recommended Courses and Books
Courses
- CMU Database Systems (15-445/645), thanks to Andy Pavlo
- CMU Advanced Database Systems (15-721), thanks to Andy Pavlo
- UC Berkeley Introduction to Database Systems
Books
- Stanford Database Systems: The Complete Book
- Designing Data-Intensive Applications, 中文翻译
- Database Internals
SQL & Relation Algebra
Courses:
-
CMU Database Systems (15-445/645), thanks to Andy Pavlo
-
UC Berkeley Introduction to Database Systems
- Introduction + SQL I
- SQL II
- Relational Algebra
Query Optimizer
Blogs:
- 数据库内核杂谈, thanks to 顾仲贤
- SQL优化器原理 - 查询优化器综述, thanks to 勿烦
Planner Models
Blogs:
- 数据库内核杂谈, thanks to 顾仲贤
- SQL 查询优化原理与 Volcano Optimizer 介绍, thanks to 张茄子
- Cascades Optimizer, thanks to hellocode
Papers:
- 1979, Access Path Selection in a Relational Database Management System, SIGMOD
- 1979, Query Processing in Main Memory Database Management Systems, VLDB
- 1987, Query Optimization by Simulated Annealing, SIGMOD
- 1988, Grammar-like Functional Rules for Representing Query Optimization Alternatives, SIGMOD
- 1993, The Volcano Optimizer Generator- Extensibility and Efficient Search, ICDE
- 1995, The Cascades Framework for Query Optimization, IEEE Data engineering Bulltin
- 1998, An Overview of Query Optimization in Relational Systems, PODS
- 2001, LEO – DB2’s LEarning Optimizer, VLDB
- 2004, Robust Query Processing through Progressive Optimization, SIGMOD
- 2014, Orca: A Modular Query Optimizer Architecture for Big Data, SIGMOD
- 2016, Parallelizing Query Optimization on Shared-Nothing Architectures, VLDB
- 2016, The MemSQL Query Optimizer: A modern optimizer for real-time analytics in a distributed database, VLDB
Subquery Optimization
Blogs:
- SQL 子查询的优化, thanks to Eric Fu
- Calcite 子查询处理 - I (RemoveSubQuery), thanks to 一只无情的小猫咪
- Calcite 子查询处理 - II (Decorrelate), thanks to 一只无情的小猫咪
Papers:
- 2001, Orthogonal Optimization of Subqueries and Aggregation, SIGMOD
- 2009, Enhanced subquery optimizations in Oracle, VLDB
- 2015, Unnesting Arbitrary Queries, BTW
Join Order Optimization
Papers:
- 2006, Analysis of Two Existing and One New Dynamic Programming Algorithm for the Generation of Optimal Bushy Join Trees without Cross Products, VLDB
- 2015, How Good Are Query Optimizers, Really?, VLDB
- 2018, Adaptive Optimization of Very Large Join Queries, SIGMOD
Functional Dependency & Physical Properties
Thesis:
Papers:
- 1996, Fundamental Techniques for Order Optimization, SIGMOD
- 2004, An Efficient Framework for Order Optimization, ICDE
- 2010, Incorporating Partitioning and Parallel Plans into the SCOPE Optimizer, ICDE
Cost Model
Papers:
- 1996, Modelling Costs for a MM-DBMS, in Real-Time Databases
- 2014, Approximation Schemes for Many-Objective Query Optimization, SIGMOD
- 2015, Multi-Objective Parametric Query Optimization, VLDB
Statistics
Papers:
- 2003, The History of Histograms, VLDB
- 2005, An Improved Data Stream Summary: The Count-Min Sketch and its Applications, Journal of Algorithms
- 2007, New Estimation Algorithms for Streaming Data: Count-min Can Do More
- 2017, Adaptive Statistics in Oracle 12c, VLDB
Books:
Query Execution
Execution Framework
Papers:
- 1994, Volcano-An Extensible and Parallel Query Evaluation System, IEEE Transactions on Knowledge and Data EngineeringFebruary
- 2014, Morsel-Driven Parallelism: A NUMA-Aware Query Evaluation Framework for the Many-Core Age, SIGMOD
Vectorization vs Compilization
Blogs:
- Overhead of a Generalized Query Execution Engine, from The Pivotal Engineering Journal, thanks to the Pivotal Engineering team
Papers:
- 2005, MonetDB/X100: Hyper-Pipelining Query Execution, CIDR
- 2011, Efficiently Compiling Efficient Query Plans for Modern Hardware, VLDB
- 2017, Relaxed Operator Fusion for In-Memory Databases: Making Compilation, Vectorization, and Prefetching Work Together At Last, VLDB
- 2018, Everything You Always Wanted to Know About Compiled and Vectorized Queries But Were Afraid to Ask, VLDB
- 2018, Adaptive Execution of Compiled Queries, ICDE
Join
Papers:
- 2013, Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited, VLDB
- 2017, Looking Ahead Makes Query Plans Robust, VLDB
Hash Table
Blogs:
- Fibonacci Hashing: The Optimization that the World Forgot (or: a Better Alternative to Integer Modulo), thanks to Malte Skarupke
- All hash table sizes you will ever need, thanks to Database Architects - Thomas Neumann
DDL
- 2013, Online, Asynchronous Schema Change in F1, VLDB
Transaction
Isolation Levels
Blogs:
- 一致性模型, thanks to siddontang
Papers:
- 1995, A Critique of ANSI SQL Isolation Levels, SIGMOD
Concurrency Control
Courses:
-
CMU Database Systems (15-445/645), thanks to Andy Pavlo
-
CMU Advanced Database Systems (15-721), thanks to Andy Pavlo
Papers:
- 2012, Serializable Snapshot Isolation in PostgreSQL, VLDB
- 2015, Fast Serializable Multi-Version Concurrency Control for Main-Memory Database Systems, SIGMOD
- 2017, An Empirical Evaluation of In-Memory Multi-Version Concurrency Control, VLDB
- 2019, Scalable Garbage Collection for In-Memory MVCC Systems, VLDB
Network
Courses:
- CMU Advanced Database Systems (15-721), thanks to Andy Pavlo
Papers:
- 2016, The End of Slow Networks: It's Time for a Redesign, VLDB
- 2016, Accelerating Relational Databases by Leveraging Remote Memory and RDMA, SIGMOD
- 2017, Don't Hold My Data Hostage: A Case for Client Protocol Redesign, VLDB
Storage
Buffer Management
Courses:
- CMU Database Systems (15-445/645), thanks to Andy Pavlo
Papers:
- 1987, The 5 Minute Rule for Trading Memory for Disc Accesses and the 5 Byte Rule for Trading Memory for CPU Time, SIGMOD
- 2008, The Five Minute Rule 20 Years Later and How Flash Memory Changes the Rules, ACM Queue
- 2018, Managing Non-Volatile Memory in Database Systems, SIGMOD
- 2018, LeanStore: In-Memory Data Management Beyond Main Memory, ICDE
- 2020, Umbra: A Disk-Based System with In-Memory Performance, CIDR
Disk IO
Blogs:
- On Disk IO, Part 1: Flavors of IO, thanks to Alex
- On Disk IO, Part 2: More Flavours of IO, thanks to Alex
- Read, write & space amplification - pick 2, thanks to Mark Callaghan
Papers:
- 2016, Design Tradeoffs of Data Access Methods, SIGMOD
- 2016, Designing Access Methods: The RUM Conjecture, EDBT
B-Tree
Courses:
LSM-Tree
Serializing & RPC
Data Partitioning
Replication & Consistency
Consensus
- University of Cambridge Distributed consensus revised, a great paper about Consenssus especially Paxos and Paxos-Related algorithms, by Heidi Howard
Scale & Balance
Blogs:
Benchmark & Testing
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK