

Databricks Unity Catalog primary key and foreign key constraints are not enforce...
source link: https://datasavvy.me/2023/11/21/databricks-unity-catalog-primary-key-and-foreign-key-constraints-are-not-enforced/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Azure, Databricks, Microsoft Technologies, Unity Catalog
Databricks Unity Catalog primary key and foreign key constraints are not enforced
I’ve been building lakehouses using Databricks Unity catalog for a couple of clients. Overall, I like the technology, but there are a few things to get used to. This includes the fact that primary key and foreign key constraints are informational only and not enforced.
If you come from a relational database background, this unenforced constraint may bother you a bit as you may be used to enforcing it to help with referential integrity. It is still otherwise useful when paired with data exploration or analytics tools that can detect the constraints and use them to help a user write a query or build a semantic model.
Let’s look at an example of the consequences of primary key constraints not being enforced.
First, let’s create a new catalog and schema in a Unity Catalog metastore.
Create Catalog devcatalog1;
Use catalog devcatalog1;
Create schema myschema;
Next, we create a table with a primary key constraint and insert two rows.
CREATE TABLE IF NOT EXISTS devcatalog1.myschema.table1
(id Int NOT NULL, columnB String, CONSTRAINT table1_pk Primary Key(id));
INSERT INTO TABLE devcatalog1.myschema.table1
VALUES
(1, "one"),
(2, "two");
Then we can create a second table that has a foreign key constraint that goes back to the first table.
CREATE TABLE IF NOT EXISTS devcatalog1.myschema.table2
(table2id INT NOT NULL Primary Key, table1id INT NOT NULL, EventDate date NOT NULL,
CONSTRAINT table2_table1_fk FOREIGN KEY(table1id) REFERENCES devcatalog1.myschema.table1);
INSERT INTO TABLE devcatalog1.myschema.table2
VALUES
(1, 1, '2023-06-15'),
(2, 1, '2023-06-15'),
(3, 2, '2023-06-15');
If we then insert a new row into table1 that has an id value of two, we are violating our primary key constraint. But nothing happens (no error is thrown) because it is unenforced.
INSERT INTO TABLE devcatalog1.myschema.table1
values (2, 'three');
Querying table1 shows our duplicate values in the id column.
Now we don’t know which row the foreign key constraint in table2 was supposed to reference.
What do we do now?
Unenforced constraints are not the end of the world. Many people argue that referential integrity should be maintained in the business logic layer code that populates the tables rather than in the database engine. Enforced constraints in the database are a failsafe when the code that updates the tables fails to maintain that referential integrity. For now, we just have to make sure that the way we populate the tables does not allow us to violate these constraints, even when Unity Catalog doesn’t stop us. You may have to check whether rows exist before inserting new ones. And you may need data quality checks after batch loads to make sure something wasn’t missed. But you likely needed those things anyway. This is definitely something that can be worked around. And it’s likely an issue in many MPP systems, so it is something to think through and have a design pattern ready to handle.
Note that as of November 2023, primary key and foreign key constraints are in preview in Databricks, so there may be more enhancements to come.
Get the notebook
If you need to explain this concept to someone else, feel free to grab my example notebook from Github.
Tagged DCAC
Recommend
-
23
In this post, we introduce a configuration option that controls whether replication channels allow the creation of tables without primary keys. This continues our recent work on replication security, where we allowed user...
-
8
Copy link Contributor benedikt c...
-
6
Code is available on GitHub What are foreign keys A foreign key is a property in a table that references items in another table. That’s why they ar...
-
8
<?xml encoding="utf-8" ??>Introduction In MySQL, referential integrity is a concept that maintains the relationship of data across multiple tables for consistency purposes. For instance, in an e...
-
6
Rails 7 adds support for deferrable foreign key constraints in PostgreSQL Mar 30, 2022 , by Murtaza Bagwala 2 minute read...
-
12
Do you wonder if MySQL tells you the truth about writes to...
-
11
Unity Catalog in Databricks provides a single place to create and manage data access policies that apply across all workspaces and user...
-
10
Azure, Databricks, Unity Cata...
-
6
@rozhnevSlava RozhnevFull stack dev and DBA Maintainer of https://sqlize.onlin...Receive Stories from @...
-
5
Performance Benefits of NOT NULL Constraints on Foreign Key Reference Columns
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK