40

System.Text.Json : Consider supporting F# discriminated unions · Issue #55744 ·...

 2 years ago
source link: https://github.com/dotnet/runtime/issues/55744
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Comments

Concerning support for discriminated unions, it might be worth pointing out that there is no one canonical way to encode DUs in JSON. DUs are used in many different ways in F# code, including the following:

  • Single case DUs, e.g. type Email = Email of string, used to provide a type-safe wrapper for common values. A user might expect that Email("email") should serialize as its payload, "email".
  • Type-safe enums, e.g. type Suit = Heart | Spade | Diamond | Club. Users might reasonably expect that Spade should serialize as the union case identifier, "Spade".
  • Unions with arbitrary combinations of union cases and arities, e.g. type Shape = Point | Circle of radius:float | Rectangle of width:float * length:float. A value like Circle(42.) would require an encoding similar to { "shape" : "circle", "radius" : 42 }. Users should be able to specify the case discriminator property name ("shape") as well as the identifier for the union case (Circle mapping to "circle").

In light of the above, I'm increasingly starting to think that System.Text.Json should not be providing a default mechanism for serializing DUs. Users would still be able to use available custom converters that provide the union encoding that suits their use case or perhaps use unions in their data contracts in a way that bypasses the serialization layer altogether.

Originally posted by @eiriktsarpalis in #29812 (comment)

If we do decide to support F# DUs in the future, it would likely be in the form of a publicly available converter factory à la JsonStringEnumConverter that configures one or more of the above alternative serialization formats.

Copy link

Contributor

msftbot bot commented on Jul 15, 2021

Tagging subscribers to this area: @eiriktsarpalis, @layomia
See info in area-owners.md if you want to be subscribed.

Issue Details

eiriktsarpalis

changed the title Consider supporting F# discriminated unions

System.Text.Json : Consider supporting F# discriminated unions

on Jul 15, 2021

Copy link

Member

Author

eiriktsarpalis commented on Jul 15, 2021

Tagging @bartelink @Zaid-Ajaj @NinoFloris who might have opinions on the matter.

it might be worth pointing out that there is no one canonical way to encode DUs in JSON.

IMO just because there are multiple ways to represent DUs in JSON doesn't imply that a library shouldn't pick a default. I think many F# devs would appreciate a choice that works out of the box with the possibility of customization where required, not having to customize or having to understand library internals to get started with it.

This is the approach I implemented in Fable.Remoting.Json which is a Newtonsoft.Json converter for F# types without loss of information (possible to round trip). Given a Shape type like

type Shape = 
  | Point 
  | Circle of radius:float 
  | Rectangle of width:float * length:float

The following JSON is generated

Point => { "Point": [] } | "Point"
Circle(20.0) => { "Circle": [20.0] }
Rectangle(12.0, 10.0) => { "Rectangle": [12.0, 10.0] }  

of course there could be a setting to choose to write out the property names when provided

Circle(20.0) => { "Circle": { "radius": 20.0 } }
Rectangle(12.0, 10.0) => { "Rectangle": { "width": 12.0, "height": 10.0 } }

Copy link

bartelink commented on Jul 15, 2021

edited

I agree [with the OP] that there cannot and should not be a default canonical implementation; As FSharp.SystemTextJson and the default encoding provided by Newtonsoft.Json >=6 illustrate, there are many choices for how to represent things, and there is no reasonable default IMO.

I tend to take view that individual converters that do easy to describe things are the way to go. Thus I would ideally like to see the following in the box:

  1. TypeSafeEnumConverter: example unoptimized impl
  2. UnionConverter: IME the format you proposed in the OP ({ "discriminator": "value", <named fields from case as for a class/record, same for cases in tupled form> }) is common in the wild, and IME there are established implementations on the JVM, Swift and JS which work with that rendering format). See incomplete impl. (There's also an OSS equivalent of this converter for Newtonsoft.Json that implements this scheme)
  3. I don't believe special case handling of single-case unions is something that should be supported

Rectangle(12.0, 10.0) => { "Rectangle": [12.0, 10.0]}

While Newtonsoft.Json does this out of the box, I'd be very much against having this representation be a default. Furthermore, I think its a harmful thing to even put in the box in the first place; let me try to justify this:

  • It's not just a private roundtrippable rendering format - its best assumed to be going into a database that will a) outlive my code b) will need to be parsed by other systems in other languages
  • because JSON does not define a tuple concept, it's less likely to be covered in any given library on the other side.
  • Arrays with mixed types is not something that should be introduced lightly into a rendering format
  • In general, if I'm looking to render or parse as JSON, there needs to be a good story wrt versioning of type being serialized/deserialized
  • versioning concerns in my code:
    • reordering fields in a tuple opens up a troubleshooting nightmare.
    • if I want to move from a 2-tuple to a record with three fields, there is no smooth transition possible
    • if I want to introduce a new optional, field, it has to go at the end
    • etc. There's likely plenty best practice in this area; this ebook covers many real world cases in depth

In order to guard against things that IME happen in the wild, I'd propose the following rules:

  • unnamed tuple case body fields should be rejected (i.e. no Item fields)
  • by default, unions should not convert without opting in by either registering a converter or tagging the type with a converter configuration

Point => { "Point": [] } | "Point"

Again, IME, maintaining symmetry is valuable for nullary cases too. The main reason for this being that any consumer will be able to adapt to me adding a field to the case payload if it's rendered as { "shape": "point"}; one is simply adding a field to an object, versus transitioning from a string to an object.

I'd also mention another reason a canonical default handing is likely to run aground even if it was to be defined:

  • Unmappable Cases: I've encountered at least the following needs when case names don't map to a case in the union
    1. Fail fast - throw an exception
    2. Have a catch-all case that hands you the body as a JObject or equivalent (see UnionConverter Unknown case support)
    3. Silently ignore

Copy link

jhf commented on Jul 23, 2021

I agree with @bartelink that the compatibility of json encoded data can be a minefield, that requires attention when writing migrations, and when making API changes.

I still think that having a default canonical implementation provides a good user experience. I think it is too early to force the programmer understand and choose between serialisation strategies just because they wish to turn information into JSON.

Rather, when somebody has durable data, or needs to concern themselves with wire compatibility, then the way things are serialised may get important. In my case, when I upgrade durable data migrations, I transform the data, using the default canonical implementation, but I map from one type to another type, and write again using the default canonical implementation.
For exposed API REST/json endpoints I either make sure it is a wire compatible change, or I make a new endpoint (probably with a version number).

If there is a large number of types involved, then the burden of having to specify a per type serialisation strategy, as a lot of extra work. Surely, yes, it will be more optimised, and possible upgrade proof, but since I'm writing migrations and making new endpoints, that point is moot, for me. Thus I'm forced to do more work, without any apparent benefit.

By having a default canonical implementation, and allowing overrides, it is possible to use json with a minimum amount of effort, and allow freedom when it has benefit.

By the way, what would be the developer experience when using Type Providers, such as https://fsprojects.github.io/SQLProvider/ when there is no default canonical implementation?

From my perspective, having a default canonical implementation, allows usage without con
@bartelink

@jhf @Zaid-Ajaj While I protested a lot, being able to have Unions convert without having to litter the code with Attributes is definitely something that's hard to give up once you've experienced it ;)

I implemented opt-in selection of such a policy jet/FsCodec#69 - would appreciate any thoughts you might have and/or whether the fact that one can define an explicit encoding policy in ~200 LOC without the risk of forcing a necessarily opinionated encoding without people being aware


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK