125

sql: support JSON/Protobuf · Issue #2969 · cockroachdb/cockroach · GitHub

 6 years ago
source link: https://github.com/cockroachdb/cockroach/issues/2969#issuecomment-333959020
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Collaborator

petermattis commented on Oct 30, 2015

Support manipulation of JSON/Protobuf data. This might mean adding JSON/Protobuf column types. Or it might mean adding builtin functions to manipulate TEXT columns that are JSON/Protobuf encoded. Need to do a little exploration and design work first to figure out which is the right path to take.

nickjackson, hanleym, thatguystone, kulshekhar, d4l3k, manigandham, kwando, sudhakar, mikestoltz, josephjacks, and 77 more reacted with thumbs up emojisourcec0de, johandorland, tomer-ben-david, and benesch reacted with laugh emojisourcec0de, johandorland, tomer-ben-david, and benesch reacted with hooray emojisourcec0de, tomer-ben-david, and benesch reacted with heart emoji

this'is a very usefull feature for semi-structured data

problem is how we are going to select/mutate this columns. if they are treated as text, then what is the value? couldn't clients just unmarshal that text into whatever structure they have?

how we filter based on that JSON document content (select * from foo where JSONCOLUMN.Person.Id = 1 ?) what about updating part of document? how we run aggregate queries on values?
what if that attribute is an array itself? how we append/prepend/delete items? and how we query on that-arrays-items attributes?
these and many more are questions that to my experience, only Mongodb answered properly (but still some area like aggregation is still crappy with alien syntax), others like Couchbase did it, and its simply incomplete.
if you are going to add understanding of JSON/PROTOBUF to parser/ execution-engine, then it vastly increase it's scope (code, tests, maintenance etc)
feels a misdirection...
my 2 cent

jhartman86, hyfrey, hvaara, novabyte, jozn, and manuel-huez reacted with thumbs down emoji

Member

tbg commented on Jun 3, 2016

edited

See also #3895 (which had a couple of thumbs up, much discussion and in particular linked to https://github.com/mgutz/dat, the Go Postgres Data Access Toolkit).

Contributor

Linicks commented on Jul 13, 2016

All,
I have a use case where I would like to mix structured and unstructured data. In some instances, I would need one or more columns to store random data from various sensor groups, etc. Once implemented, will there be a limit on the number of these types of columns?

Thanks!
--Nick

Collaborator

Author

petermattis commented on Jul 14, 2016

@Linicks We haven't even sketched out how a JSON/Protobuf column type would work, so it is hard to say with any certainty what the limits would be. If I were to guess, though, there wouldn't be any specific limit on the number of JSON/Protobuf columns within a table, but overall row size limits (for which we don't have any hard setting at the moment) would still apply.

It's great to see that support for JSON datatype is on the 1.0 roadmap. Would we be able to index columns of this datatype in a way that'll allow efficient querying of object paths? In Postgres, for example, columns of type jsonb can be queried on just as efficiently as other traditional types if they are indexed.

Collaborator

Author

petermattis commented on Aug 3, 2016

@kulshekhar We haven't sketched out the design yet so it's difficult to answer your question with any certainty. I would guess the answer is yes and we'll definitely look at the Postgres jsonb column type and implementation for inspiration.

@petermattis we've got need for jsonb column types and functions in project I'm working on. We would love to contribute code given spec is agreed on.

lwansbrough commented on Mar 30, 2017

edited

+1 for Protocol Buffers support. JSON is of course very valid too, however we do have a specific need for ProtoBuf. The challenge for JSON is choosing the correct data type for a given field without any prior knowledge of the field in order to create an index on that field. (I believe Postgres' jsonb type uses the first record as a schema, IIRC.) ProtoBuf provides this information, and so theoretically it should be easier to allow for the creation of indices on specific fields of the ProtoBuf data.

Of course, in order to create a Protocol Buffer field in Cockroach that is both indexable and queryable, Coachroach needs to be aware of the schema. You'd need to be able to assign a ProtoBuf schema to a field, and also be able to update that schema (ideally at runtime, so that no downtime is required to upgrade a Protocol Buffer field's schema.)

To my knowledge, CockroachDB would be the only database offering such a feature, but it would be hugely desirable to anyone looking to use ProtoBuf in a database.

It might be helpful to take a look at this, which I just found: https://github.com/google/mysql-protobuf/wiki Mind you I think it's rather strange that a demonstration isn't shown where bytes are inserted into the protobuf field.

sean-, mikelnrd, donpdonp, merqurio, robertjpayne, and jozn reacted with thumbs up emojimikelnrd and glerchundi reacted with heart emoji

sergionalin commented on Jun 29, 2017

edited

On JSON support: how about borrowing some ideas from MySQL?
I'm talking about saving the document in an indexed format (see links below), which would slow down writes a bit but would allow for faster selects, faster indexing and, what's even better, smart and fast partial updates (a feature that very few databases, even NoSQL ones, offer).

Overview:
http://mysqlserverteam.com/json-labs-release-native-json-data-type-and-binary-format/
Storage format specs:
https://dev.mysql.com/worklog/task/?id=8132

sky87 reacted with thumbs up emoji

I would really like to see functionality more akin to Postgres's JSON(B) support. For example:
https://www.postgresql.org/docs/9.6/static/functions-json.html

The ability to treat JSON first class with queries, indexing, etc. is extremely valuable.

tdbgamer and rbygrave reacted with thumbs up emoji

Contributor

dianasaur323 commented on Aug 15, 2017

Thanks for your feedback. We are actively working on scoping out this feature, so please provide as much detail as you would like in terms of use case / requirements. cc @nstewart

hvaara, NuSkooler, Istar-Eldritch, and Rambatino reacted with hooray emoji

xarg commented on Aug 16, 2017

We're avid users of JSONB on Postgres and using SQLAlchemy as the lib to connect to it. We're exploring CockroachDB. It's great, but JSONB support is the one big missing thing that does not allow us to migrate.

The good things in PG:

  • The ability to perform complex JSONB queries via operators ->>, @> and more.
  • The ability to index (GIN/btree) on an expression (super useful on huge JSONB docs).

The not so good:

  • The language selectors/operators are rather weird and unnatural. Having something closer to the actual JS way of selecting an element in JSON would be so much better without having to think twice about it.

For example if you have an object such as:

{
  "animals": {
    "cats": [
      {
        "id": 1,
        "name": "tom"
      },
      {
        "id": 2,
        "name": "mot"
      }
    ],
    "dogs": [
      {
        "id": 2,
        "name": "dog"
      }
    ]
  }
}

In PG if you want to select the animals key you do something like: json_field->>'animals'. However if you want to select a path then you have a totally different syntax: json_field#>'{animals,cats}'. I'm sure there are important reasons as to why it's like this (I don't know why). But as an end user I would prefer something more natural like:

json_field.animals - the db would figure out that json_field is a JSONB and adapt the selection accordingly - is this even possible?
json_field.animals.cats - same idea.
json_field.animals.cats[0]."id" - again, like JS - super easy to remember.

For operators such as contains, or exists in PG they are @>, ?, etc.. I can never remember them and even if I see them after I've written the query a few days ago I still can't remember what they do so I always have to look them up.
A better approach IMHO is to use functions like: contains(collection, item), exists(path), 'concat(field1, field2)'. It would be immediately clear what they are doing without having to look them up. I don't really mind if they are prefixed with something like 'json_' if it's a concern.

Hope my input helps.

heiskr, nstewart, 0x5487, andrewlague, Istar-Eldritch, manuel-huez, tdbgamer, and jdgriffith reacted with thumbs up emoji

JSONPath is becoming the standard for navigating JSON documents (similar to XPath for XML). Postgres has commits toward this which should be released in v11. This is the best option for CockroachDB to implement too so there's a unified access language.

http://gmile.me/jsonpath/

takama, Istar-Eldritch, and rbygrave reacted with thumbs up emoji

xarg commented on Aug 16, 2017

edited

+1 for JSONPath! For simpler queries it would still be useful to have the usual JSON syntax.

I would love to have something like: json_path(json_field, '$.animals.cats.[*].name') - to select all cats names! It would be a dream come true.

lwansbrough commented on Aug 16, 2017

edited

It may be a good idea to separate these features, as I would hope the implementations for JSON and Protobuf would look (perhaps dramatically) different. From a Protobuf user's perspective, I think we'd like to see a way to essentially create and update tables by passing a Protobuf schema definition to a create/alter table query. Whether or not data is stored on the backend as a protobuf is probably irrelevant, so long as its as compact as possible (we wouldn't be using protobuf if we weren't totally paranoid about bandwidth/storage.) I wouldn't want to see overhead introduced because of a choice to use a format designed to reduce overhead :)

I say "dramatically different", because I would suspect the way CDB would approach creating the table would be. Since JSON is all about being dynamic, you'd want to allow for flexible schemas in a JSON table/field. Whereas with protocol buffers, you may want the entire table to be definable as a protocol buffer, not just one field. Although the difference between a protocol buffer field vs. a protocol buffer table probably is irrelevant, as long as you can provide a schema update.

As for querying, I agree with the comments above that querying should be easy (I see no reason why JSONPath couldn't be used for PB) and fields need to be indexable, especially at arbitrary depths.

@lwansbrough

What is the need for Protobuf in a relational database that already has data types and table schemas? This seems more like wanting a translation between Protobuf to SQL which can be done with an ORM, especially if you don't care about the backend storage itself.

@manigandham I mean you can make the exact same argument about JSON. The idea is that given a schema defined externally to the DB (say, inside an application), it becomes very easy to delegate management of the database without having to provide SQL access or build an interfacing layer.

If the data can be stored as a protocol buffer while we're at it while maintain the same feature set then I say by all means, but I suspect the storage impact of using PB in that case vs. a custom solution is probably very minimal.

@manigandham I can also tell you first hand that trying to build a conversion layer between a flat schema (SQL) and schemas with depth is tedious to say the least.

nstewart

added this to the 1.2 milestone

on Sep 7, 2017

nstewart

removed this from the Later milestone

on Sep 7, 2017

Postgres's JSONB support together with GIN indexing is wonderful for a lot of use cases involving mix of structured and unstructured data. However, like others mentioned, the JSON query semantics are slightly complicated and not trivial to reason about. There is a PG extension called JSQuery that tries to address many of the query limitations. I would love if CockroachDB could support the query/index capabilities of JSQuery, may be using a JSONPath interface.

Contributor

awoods187 commented on Oct 3, 2017

Team, I'd like to introduce myself. My name is Andy and I am a PM here at CRDB. I'm happy to share with you that we have heard you loud and clear and are working to support JSONB in 1.2. To that end we would love to solicit your comments on our current RFC scope proposal found here. In particular, we would love to hear more about how you plan to use JSONB and if we are adequately thinking about your use case. How do you want to use JSONB? Please feel free to share in the comments below or directly at the RFC.

manigandham, NuSkooler, soncodi, yookoala, xarg, 0x5487, glaslos, lming, christianpv, esgio, and 17 more reacted with thumbs up emojinstewart, kulshekhar, soncodi, tlvenn, ainar-g, grahamsnz, ktham, jjacobson93, igavva, kwando, and 4 more reacted with hooray emojiNuSkooler, glerchundi, ktham, igavva, kwando, d4l3k, jhayotte, tomer-ben-david, and brettmc reacted with heart emoji

xarg commented on Oct 4, 2017

@awoods187 great news! What's the schedule for 1.2?

Contributor

awoods187 commented on Oct 4, 2017

@xarg 1.2 is planned for release in April 2018

NuSkooler, xarg, esgio, ktham, alaister, drdaeman, theonewolf, soncodi, yookoala, derkan, and 8 more reacted with thumbs up emojiigavva, d4l3k, jhayotte, and sourcec0de reacted with hooray emojiigavva, d4l3k, jhayotte, and sourcec0de reacted with heart emoji

I look forward to this feature!

Contributor

justinj commented on Jan 24, 2018

JSONB support will be in the 2.0 release in April, so I'm going to close this issue!

NuSkooler, senorsmile, sakti, yookoala, Linicks, iagomelanias, akubera, scottrobertson, manigandham, takama, and 9 more reacted with thumbs up emojixarg, sakti, nictuku, hvaara, Linicks, iagomelanias, scottrobertson, bsharpe, manigandham, takama, and 5 more reacted with hooray emojiLinicks, iagomelanias, scottrobertson, manigandham, igavva, mofirouz, awwong1, and castjosem reacted with heart emoji

Contributor

mofirouz commented on Feb 21, 2018

@justinj Do you have any links to documentation on JSONB support that you can share publicly here?

Contributor

justinj commented on Feb 21, 2018

Hi @mofirouz, we don't have any documentation for JSONB yet, it will be ready by the time 2.0 releases in April. That said, it's very similar to Postgres' JSONB feature, and most of their docs should be applicable to its use in CockroachDB. You'll need to use a recent build of CockroachDB to have access to all the features in JSONB, though.

Hi. It's work) Cockroach

root@:26257/> SELECT * FROM bank.json
-> ;
+----+-----------------------------------+
| id | data |
+----+-----------------------------------+
| 0 | {"bar":2,"foo":1} |
| 1 | {"bar":"x","baz":"hello","foo":2} |
| 2 | {"foo":-1,"goo":"hi"} |
| 3 | {"bup":[1,2,3],"foo":-2} |
+----+-----------------------------------+

root@:26257/> SELECT * FROM bank.json WHERE data ->> 'foo' > '0' ORDER BY data->>'foo' ASC LIMIT 1;
+----+-------------------+
| id | data |
+----+-------------------+
| 0 | {"bar":2,"foo":1} |
+----+-------------------+

root@:26257/> SELECT id,data->>'foo' AS count FROM bank.json WHERE data ->> 'foo' > '0' ORDER BY data->>'foo' ASC LIMIT 100;
+----+-------+
| id | count |
+----+-------+
| 0 | 1 |
| 1 | 2 |
+----+-------+

yookoala, nstewart, nictuku, troyk, bchoor, and rbastic reacted with thumbs up emojiawwong1 reacted with hooray emoji

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK