9

Storing JSON in Your Databases: Tips and Tricks For MySQL

 3 years ago
source link: https://www.percona.com/blog/storing-json-in-your-databases-tips-mysql/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client
Storing JSON in Your Databases: Tips and Tricks For MySQL

MySQL JSON DatabasesDatabase architecture and design are becoming an increasingly lost art. With new technologies and the push towards faster development cycles, people continue to take shortcuts, often to the detriment of long-term performance, scalability, and security. Designing how your application stores, accesses, and processes data is so fundamentally important, it can not be overlooked. I want people to understand that early design choices can have a profound impact on their applications. To that end, I will be exploring database design principles and practices over the next several months. I am starting with every developer’s favorite data format: JSON!

It seems that almost every database over the last few years has introduced various degrees of support for storing and interacting with JSON objects directly. While these features are designed to make it easier for application developers to write code faster, the implementations of each implementation tend to vary wildly and can cause some, well, weirdness. Over the next few weeks/months, I will show you some methods, mistakes, and common ways developers store JSON. Just because you can use a database’s native JSON support does not always mean you should! I hope to show you which ones work best for which use cases.

For part one of this series, I am going to focus on MySQL. MySQL’s implementation of the JSON data type was introduced back in 5.7 (Late 2015/Early 2016 timeframe). Since then, a few minor enhancements have made the implementation a bit more liveable. The current iteration MySQL 8 offers a fully functional implementation of JSON functions and features. Let me show you some examples of how to store and interact with your JSON documents within MySQL.

Setup

For all my tests, I wanted a reasonable amount of data to test the performance implications of certain functions.  I opted to use the metadata JSON from http://movienet.site/, about 2.3GB of individual JSON files (one per movie).

I wrote a small python script to load and iterate through the JSON files and load them into MySQL.

metadata JSON

I will walk through the examples and show you how I have seen many developers use MySQL to interact with JSON and point out why some of them may be incorrect or cause you issues you may not be aware of.  I will also show you a few other features you may want to look into and explore and offer some design advice.  Let us start with the following simple table definition:

Shell
create table movies_json (
   ai_myid int AUTO_INCREMENT primary key,
   imdb_id varchar(255),
   json_column json
) engine = innodb;
create unique index imdb_idx on movies_json(imdb_id);
Shell
  "imdb_id": "tt8408760",
  "tmdb_id": null,
  "douban_id": null,
  "title": "Rubes (2019)",
  "genres": [
    "Short",
    "Comedy",
    "Horror"
  "country": "USA",
  "version": [
      "runtime": "7 min",
      "description": ""
  "imdb_rating": null,
  "director": [
      "id": "nm3216042",
      "name": "Nathan Alan Bunker"
  "writer": null,
  "cast": [
      "id": "nm1899908",
      "name": "Brendan Jennings",
      "character": "Milton"
      "id": "nm2384265",
      "name": "Ben Begley",
      "character": "Paul"
      "id": "nm2287013",
      "name": "Jerry Marr",
      "character": "Professor Henson"
      "id": "nm7529700",
      "name": "Allene Prince",
      "character": "Margaret"
  "overview": null,
  "storyline": "Two disgruntled teachers use a Rube Goldberg machine to exact revenge on the people who have wronged them.",
  "plot": null,
  "synopsis": null

You can see an example of the JSON format

101: Simple JSON Interactions in MySQL

Yes, a single column in a table with a key or two.  Each row would store one of the movies in the downloaded JSON files.  There is an auto_increment key and the IMDB ID that I extracted from the JSON during the load. This structure and setup is a straightforward design with minimal effort.  However, this design also means you generally rely on MySQL as merely the storage for your data.  Provided you are accessing everything by the imdb_id key, you can get and update your JSON to your application easily with a:

Shell
select json_column from movies_json where imdb_id = ‘tt4154796’;
update movies_json set json_column = ‘<new JSON>’ where imdb_id = ‘tt4154796’;

Eventually, however, you will want to search within your JSON or just return a portion of the JSON Document. For example, let’s say you only want to find the title and IMDB rating for a specified movie. You can do this with functionality is built-in:

Shell
mysql> select json_column->>'$.title', json_column->>'$.imdb_rating' from movies_json where json_column->>'$.imdb_id'='tt2395427';
+--------------------------------+-------------------------------+
| json_column->>'$.title'        | json_column->>'$.imdb_rating' |
+--------------------------------+-------------------------------+
| Avengers: Age of Ultron (2015) | 7.5                           |
+--------------------------------+-------------------------------+
1 row in set (0.77 sec)
mysql> select json_column->>'$.title', json_column->>'$.imdb_rating' from movies_json where json_column->>'$.imdb_id'='tt4154796';
+--------------------------+-------------------------------+
| json_column->>'$.title'  | json_column->>'$.imdb_rating' |
+--------------------------+-------------------------------+
| Avengers: Endgame (2019) | null                          |
+--------------------------+-------------------------------+
1 row in set (0.75 sec)

Here you can see we can interact inside the JSON column just like we would with standard data via SQL by using the special syntax “->>’$.key’”.   You can see Avengers: Endgame has a rating of null!  That is no good, and it was a much better movie than that.  Instead of updating and storing the entire JSON document again, MySQL provides a JSON_SET function to set an element within a document.

Shell
mysql>  update movies_json
       set json_column = JSON_SET(json_column, "$.imdb_rating", 9)
       where json_column->>'$.imdb_id'='tt4154796';
Query OK, 1 row affected (0.93 sec)
Rows matched: 1  Changed: 1  Warnings: 0
mysql> select json_column->>'$.title', json_column->>'$.imdb_rating' from movies_json where json_column->>'$.imdb_id'='tt4154796';
+--------------------------+-------------------------------+
| json_column->>'$.title'  | json_column->>'$.imdb_rating' |
+--------------------------+-------------------------------+
| Avengers: Endgame (2019) | 9                             |
+--------------------------+-------------------------------+
1 row in set (0.80 sec)

We now have fixed the missing rating for Endgame!  But we may not know the IMDB ID when we are searching. Just like working with standard data types, you can use data from within your document in a where clause.  In this case, we will look for all movies that start with “Avengers”.

Shell
mysql> select json_column->>'$.title', json_column->>'$.imdb_rating', json_column->>'$.imdb_id' from movies_json
where json_column->>'$.title' like 'Avengers%';
+------------------------------------------------------------+-------------------------------+---------------------------+
| json_column->>'$.title'                                    | json_column->>'$.imdb_rating' | json_column->>'$.imdb_id' |
+------------------------------------------------------------+-------------------------------+---------------------------+
| Avengers: Endgame (2019)                                   | 9.0                           | tt4154796                 |
| Avengers Confidential: Black Widow & Punisher (Video 2014) | 5.8                           | tt3482378                 |
| Avengers of Justice: Farce Wars (2018)                     | null                          | tt6172666                 |
| Avengers: Age of Ultron (2015)                             | 7.5                           | tt2395427                 |
| Avengers: Infinity War (2018)                              | null                          | tt4154756                 |
| Avengers Grimm: Time Wars (Video 2018)                     | null                          | tt8159584                 |
| Avengers Assemble! (TV Series 2010– )                      | null                          | tt1779952                 |
| Avengers Grimm (Video 2015)                                | 2.8                           | tt4296026                 |
+------------------------------------------------------------+-------------------------------+---------------------------+
8 rows in set (0.74 sec)

Using the “json_column->’$.title’ in the where clause got us a nice list of Avengers titled movies and TV shows.  But, you can see from this query, we got more than just the blockbuster Avengers movies.  Let’s say you want to refine this a bit more and find just Avengers movies with Robert Downey Jr. in the cast.  This is a bit more difficult, honestly, because the format of our JSON documents uses an array for cast members.

Here is what the JSON looks like:

Shell
  "imdb_id": "tt2395427",
  "tmdb_id": "99861",
  "douban_id": "10741834",
  "title": "Avengers: Age of Ultron (2015)",
  "genres": [
    "Action",
    "Adventure",
    "Sci-Fi"
  "country": "USA",
  "version": [
      "runtime": "141 min",
      "description": ""
  "imdb_rating": 7.5,
  "director": [
      "id": "nm0923736",
      "name": "Joss Whedon"
  "writer": [
      "id": "nm0923736",
      "name": "Joss Whedon",
      "description": "written by"
      "id": "nm0498278",
      "name": "Stan Lee",
      "description": "based on the Marvel comics by and"
      "id": "nm0456158",
      "name": "Jack Kirby",
      "description": "based on the Marvel comics by"
  "cast": [
      "id": "nm0000375",
      "name": "Robert Downey Jr.",
      "character": "Tony Stark"
      "id": "nm1165110",
      "name": "Chris Hemsworth",
      "character": "Thor"
      "id": "nm0749263",
      "name": "Mark Ruffalo",
      "character": "Bruce Banner"
      "id": "nm0262635",
      "name": "Chris Evans",
      "character": "Steve Rogers"
      "id": "nm0424060",
      "name": "Scarlett Johansson",
      "character": "Natasha Romanoff"
      "id": "nm0719637",
      "name": "Jeremy Renner",
      "character": "Clint Barton"

You can access arrays in a JSON document by referencing the specific index for the element you want ( i.e. [0].name ), however, if you don’t know which one contains the data you are looking for you need to search for it.  MySQL has the function json_search to help with this (there are other functions such as json_contains as well).  json_search searches a provided value and will return the location if found and null if not found:

Shell
mysql> select json_column->>'$.title',
json_column->>'$.imdb_rating',
json_column->>'$.imdb_id' from movies_json  
where json_column->>'$.title' like 'Avengers%' and json_search(json_column->>'$.cast', 'one','Robert Downey Jr.', NULL,'$[*].name' ) is not null;
+--------------------------------+-------------------------------+---------------------------+
| json_column->>'$.title'        | json_column->>'$.imdb_rating' | json_column->>'$.imdb_id' |
+--------------------------------+-------------------------------+---------------------------+
| Avengers: Endgame (2019)       | 9                             | tt4154796                 |
| Avengers: Age of Ultron (2015) | 7.5                           | tt2395427                 |
| Avengers: Infinity War (2018)  | null                          | tt4154756                 |
+--------------------------------+-------------------------------+---------------------------+
3 rows in set (0.79 sec)

You will notice that I used the parameter ‘one’, this finds the first value.  You can also use ‘all’ to return every value matched.  In case you are curious as to what the json_search actually returns here is the output:

Shell
mysql>  select json_column->>'$.title' as title,
        json_search(json_column->>'$.cast', 'one','Robert Downey Jr.') as search_output
    from movies_json  where json_column->>'$.title' like 'Avengers%'
and json_search(json_column->>'$.cast', 'one','Robert Downey Jr.', NULL,'$[*].name' ) is not null;
+--------------------------------+---------------+
| title                          | search_output |
+--------------------------------+---------------+
| Avengers: Endgame (2019)       | "$[0].name"   |
| Avengers: Age of Ultron (2015) | "$[0].name"   |
| Avengers: Infinity War (2018)  | "$[0].name"   |
+--------------------------------+---------------+
3 rows in set (0.72 sec)

You can see it returns the position and the property that contains the value.  This output is useful for a variety of reasons.  One is if you need to find which index value contains that particular text.  In the example of searching for Robert Downey JR movies, we can use this index information to return the character he played in each movie.  The first way I have seen this done requires a bit of unholy wrangling but:

Shell
mysql> select json_column->>'$.title' as title,
    json_column->>'$.imdb_rating' as Rating,
    json_column->>'$.imdb_id' as IMDB_ID,
json_extract(json_column->>'$.cast',concat(substr(json_unquote(json_search(json_column->>'$.cast', 'one','Robert Downey Jr.')),1,
    -> locate('.',json_unquote(json_search(json_column->>'$.cast', 'one','Robert Downey Jr.')))),'character')) as Char_played
    from movies_json  where json_column->>'$.title' like 'Avengers%' and json_search(json_column->>'$.cast', 'one','Robert Downey Jr.') is not null;
+--------------------------------+--------+-----------+--------------------------------------+
| title                          | Rating | IMDB_ID   | Char_played                          |
+--------------------------------+--------+-----------+--------------------------------------+
| Avengers: Endgame (2019)       | 9      | tt4154796 | "Tony Stark /              Iron Man" |
| Avengers: Age of Ultron (2015) | 7.5    | tt2395427 | "Tony Stark"                         |
| Avengers: Infinity War (2018)  | null   | tt4154756 | "Tony Stark /              Iron Man" |
+--------------------------------+--------+-----------+--------------------------------------+
3 rows in set (0.68 sec)

Here I am finding where in the document Robert Downey Jr is listed, then extracting the index and using that with the JSON Extract function to pull out the value of the “$[0].character” instead of “$[0].name”.  While this works, it is ugly.  MySQL provides an alternative to doing this by using json_table.

Shell
mysql> select json_column->>'$.title',  json_column->>'$.imdb_rating', t.* from movies_json, json_table(json_column, '$.cast[*]' columns(
       V_name varchar(200) path '$.name',
       V_character varchar(200) path '$.character')
       ) t where t.V_name like 'Robert Downey Jr.%' and json_column->>'$.title' like 'Avengers%';
+--------------------------------+-------------------------------+-------------------+------------------------------------+
| json_column->>'$.title'        | json_column->>'$.imdb_rating' | V_name            | V_character                        |
+--------------------------------+-------------------------------+-------------------+------------------------------------+
| Avengers: Endgame (2019)       | 9                             | Robert Downey Jr. | Tony Stark /              Iron Man |
| Avengers: Age of Ultron (2015) | 7.5                           | Robert Downey Jr. | Tony Stark                         |
| Avengers: Infinity War (2018)  | null                          | Robert Downey Jr. | Tony Stark /              Iron Man |
+--------------------------------+-------------------------------+-------------------+------------------------------------+
3 rows in set (0.74 sec)

Basically, json_table takes an array and turns it into a table object, allowing you to join and query it.  You can also use this to list all the characters an actor played in any film in his career. Stay tuned for part two of this series, when we will show you some easier and faster ways to use JSON from inside MySQL.


Recommend

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK