7

Using an F# DSL to Generate C# code

 3 years ago
source link: https://www.codeproject.com/Articles/5301612/Using-an-Fsharp-DSL-to-generate-Csharp-code
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

In the efact file, there over 10 different record formats, all with a fixed length of 370 bytes. Fields in a record are positional. To describe this, I created a more readable Domain Specific Language in F#. In this article, I will demonstrate how this works. The code for this article can be found on GVerelst/CodeGen: F# DSL for C#Code generation. (github.com).

Prerequisites

  • You’ll need a little bit of C# knowledge to follow this post. In particular, I will show how to create a couple of C# classes, with some (custom) attributes. But in the end, the F# program will just generate some text that happens to be C# code.
  • F# knowledge will help, but I will explain most of what I’m doing in this post.

The Problem

The code that is needed for FileHelpers to work with fixed records looks like this:

Copy Code
// //////////////////////////////////////
// FileInfoBase
// //////////////////////////////////////

[FixedLengthRecord()]
public partial class FileInfoBase
{
    // Segment segment200
    [EFactMetadata("200", "6N", "1-6", "Naam van het bericht", "Nom du message"),
     FieldFixedLength(6), FieldAlign(AlignMode.Right, '0')]
     public int MessageName { get; set; } = 920000; // 920000|920900|...
    [EFactMetadata("2001", "2N", "7-8", "Code fout", "Code érreur"),
     FieldFixedLength(2), FieldAlign(AlignMode.Right, '0')]
     public byte Error2001 { get; set; } = 0;
    [EFactMetadata("201", "2N", "9-10", "Versienummer formaat van het bericht",
     "N° version du format du message"), FieldFixedLength(2),
     FieldAlign(AlignMode.Right, '0')] public byte MessageVersionNumber { get; set; } // 2
    // ...
    [EFactMetadata("204", "14N", "21-34", "Referentie bericht ziekenhuis",
     "Reference du message"), FieldFixedLength(14), FieldTrim(TrimMode.Both)]
     public string InputReference { get; set; } = new string('0', 14);
    // ...
    [EFactMetadata("3091", "2N", "206-207", "Code fout", "Code érreur"),
     FieldFixedLength(2), FieldAlign(AlignMode.Right, '0')]
     public byte Error3091 { get; set; } = 0;
}

We describe the record fields mainly using attributes:

  • The FileHelper attributes that describe the format of the field
  • The EFactMetadata custom attribute that gives some additional information about the field. It contains the name of the zone (example: “200”. Remember, this is an archaic record format), the type of the zone (example: “6N” or “45A”), the position in the record (calculated), and then the translation in Dutch and French.
  • And in addition, we also give the field a data type, a name, an optional default value and an optional comment.

The documentation describes the file format using these terms. I also created a file viewer to show the contents in a user-friendly way, hence the translated fields. This will not be on GitHub. The attributes allow the use of reflection in the user interface to show the file format. The documentation also uses a notion of segments to describe a block in the record formats that can be reused in similar record formats. We want to mimic this behavior as well.

Clearly, this is a lot of error prone code to type, and also not very readable because of all the clutter. If only we could represent this in a more concise and readable way, and generate the necessary code from this …

Defining the Internal Domain Specific Language in F#

Looking at this, we can see some needed entities. The first thing we need to describe is a zone (which will translate into a property in the generated class. For the first zone (“200”), this can look like:

Copy Code
Z "200" (N 6) "Dutch name" "French name" Int "PropertyName" "920000" "920000|920900|..."

This contains all the data we need to describe a field with all its attributes. Let’s create the Zone type:

Copy Code
type Zone = { zone: string; length: Length; nl: string; fr: string; 
              datatype: Datatypes; name: string; defaultvalue: string; comments: string}

and a constructor for this type:

Copy Code
let Z zone length nl fr dt name dft comments =
    { zone=zone; length = length; nl= nl; fr = fr; datatype=dt; name=name; 
      defaultvalue=dft; comments=comments }
  • zone: the zone name (being “200”, “2001”, …)
  • length: definition of the length (ex N 6 indicates 6 digits, A 5 indicates 5 characters)
  • nl: description in Dutch
  • fr: description in French
  • The rest of the fields are clear.

This is valid F# code, embedded in our project. The nice thing is that the compiler will prevent a lot of errors for us. This is what is called an “internal DSL”. An external DSL describes a separate language, with its own syntax rules. This means that the interpretation of the external DSL needs to be written as well.

There are some unknown parts in there:

Copy Code
Z "200" (N 6) "Naam van het bericht" "Nom du message" 
               Int "PropertyName" "920000" "920000|920900|..."

The datatype is Int, we must describe this as well. This could have been just a string, but that doesn’t allow for validation. Ideally, we want the F# compiler to catch as many errors as possible before we start to generate the code. So here is the Datatypes enumeration:

Copy Code
type Datatypes = Bool | CRC | Byte | Short | Int | DateTime | 
                 Time | String | Money | AmbHos | Gender | Error

Now the F# compiler will only allow these datatypes. Depending on the datatype, we can generate slightly different C# code. Example:

Intgenerates this:

Copy Code
[EFactMetadata("200", "6N", "1-6", "Naam van het bericht", "Nom du message"), 
FieldFixedLength(6), FieldAlign(AlignMode.Right, '0')] 
public int MessageName { get; set; } = 920000; // 920000|920900|...

And Stringwill generate this:

Copy Code
[EFactMetadata("204", "14N", "21-34", "Referentie bericht ziekenhuis", 
"Reference du message"), FieldFixedLength(14), FieldTrim(TrimMode.Both)] 
public string InputReference { get; set; } = new string('0', 14);

Of course, the other datatypes generate their own versions.

Having this in place already reduces the number of hard to find errors in the C# code.

Copy Code
Z "200" (N 6) "Naam van het bericht" "Nom du message" 
Int "PropertyName" "920000" "920000|920900|..."

We also see a Length. The constructor (N 6) is actually composed of the length type (“A” is alphabetic, “N” is numeric, “S” is numeric, but prefixed with ‘+’ or ‘-‘. This will later be used in the code generation. Let’s describe this:

Copy Code
type LengthType = A | N | S
type Length = { ltype: LengthType; length: int }

and we create three constructor functions:

Copy Code
let N x = { ltype= N; length= x }
let A x = { ltype= A; length= x }
let S x = { ltype= S; length= x }

N 6 will now return a new Length record with ltype = N, length = 6. Having these three little functions allows the F# again to validate the code at compile time.

Recap of the Definition of the Zone So Far

Copy Code
type Datatypes = Bool | CRC | Byte | Short | Int | DateTime | Time | 
                 String | Money | AmbHos | Gender | Error 
type LengthType = A | N | S
type Length = { ltype: LengthType; length: int }
type Zone = { zone: string; length: Length; nl: string; fr: string; 
              datatype: Datatypes; name: string; defaultvalue: string; comments: string}

let N x = { ltype= N; length= x }
let A x = { ltype= A; length= x }
let S x = { ltype= S; length= x }

let Z zone length nl fr dt name dft comments =
    { zone=zone; length = length; nl= nl; fr = fr; datatype=dt; name=name; 
      defaultvalue=dft; comments=comments }

These 9 lines of code allow us to create zones in a concise and clear way. Let’s add semantics to this. Some zones are of the same type, and have a specific meaning. For example, I defined the Recordtype function as:

Copy Code
let Recordtype rectype zone = 
    let rt = rectype.ToString()
    Z zone (N 2) ("recordtype " + rt) ("enregistrement de type " + rt) 
                   Byte "Recordtype" rt ("Always " + rt);

Every efact record will have a specific record type, the 2 first bytes of the record. They always have the same NL and FR description, so I made a new function for this. The function on itself is not to save typing, but to give semantics to this field.

Copy Code
Recordtype 95 "400"

Indicates a record that is a record type. We can also write this out in the code as:

Copy Code
Z "40" (N 2) "recordtype 95" " enregistrement de type 95" Byte "Recordtype" "95" "Always 95"

It is not much longer (copy / paste is your friend here), but a lot clearer on what it means. So in the same style, I defined Mutuality:

Copy Code
let Mutuality zone =
    Z zone (N 3) "Nummer mutualiteit" "Numéro de Mutualité" Int "MutualityNumber" "" ""

And again:

Copy Code
Mutuality "401"

indicates very clearly what we mean here. I made some more:

Copy Code
let Errorcode zone =
    let nzone = normalizeName zone
    Z zone (N 2) "Code fout" "Code érreur" Error ("Error" + nzone) "0" ""

let Reserved l zone =
    let nzone = normalizeName zone
    let dft = match l.ltype with
                | A -> sprintf "new string(' ', %d)" l.length 
                | N -> sprintf "new string('0', %d)" l.length 
                | S -> sprintf "'+' + new string('0', %d)" (l.length - 1)
    Z zone l "Reserve" "Reserve" String ("Reserved" + nzone) dft ""

As you can see, a reserved zone can only be of datatypes A | N | S. For each of the cases, I defined the outcome. No more need to think about what kind of attributes need to be generated, and it is clear that this is a zone that is there as a filler, in case more zones would be needed in the future (remember, this is an archaic format).

This now gives us a (domain specific) language to describe the records, for example:

Copy Code
Recordtype 95 "400"
Errorcode "4001"
Mutuality "401"
Errorcode "4011"
Z "402" (N 12) "Nummer van verzamelfactuur"
               "Numéro de facture récapitulative" String "RecapInvoiceNumber" "" ""
Errorcode "4021"
// ...
Reserved (N 257) "413"

Now we have a way to describe the zones in the flat file that will be converted into properties in a C# class. Let’s extend the DSL to include classes. In the eFact documentation, there are some predefined structures called segments. A segment has a name and is composed of 1 or more zones. These segments will be put together in a class. So a class is a named collection of segments, and a segment is a named collection of zones. A class can also inherit from another class, which saves some more typing. A namespace is a named collection of classes, and finally a program (I didn’t find a better name for this) is composed of namespaces, and has a filename.

Image 1

Here are the definitions:

Copy Code
type Segment = { name: string; zones: Zone list }
type Interface = { name: string; lines: string list }
type Record = { name: string; inherits: Record option; 
                implements: Interface list; segments: Segment list }
type Namespace = { name: string; records: Record list }
type Program = { filename: string; baseNamespace: string; namespaces: Namespace list }

Let’s define a small program:

Shrink ▲   Copy Code
let segment200 = 
    { 
        name= "segment200"; 
        zones= 
        [
            Z "200" (N 6) "Naam van het bericht" "Nom du message" Int " 
                           MessageName" "920000" "920000|920900|..."
            Errorcode "2001" 
            Z "201" (N 2) "Versienummer formaat van het bericht" 
                    "N° version du format du message" Byte " MessageVersionNumber" "" "2"
            Errorcode "2011" 
	     // ...
            Z "205" (N 14) "Referentie bericht VI" 
                           "Reference du message OA" String " ReferenceOA" "" ""
            Errorcode "2051" 
            Reserved (N 15) "206"
        ] 
    }

let segment300 = 
    { 
        name= "segment300"; 
        zones= 
        [
            Z "300a" (N 4) "Factureringsjaar"  "Année de facturation"  Int "YearBilled" "" ""
            Z "300b" (N 2) "Factureringsmaand" "Mois de facturation"  Byte "MonthBilled" "" ""
            Errorcode "3001" 
            Z "301" (N 3) "Nummer van de verzendingen" "Numero d''envoi" Int " RequestNr" "" ""
            Errorcode "3011" 
            Z "302" (N 8) "Datum opmaak factuur" 
              "Date de création de facture" DateTime " Creationdate" "" ""
            Errorcode "3021" 
	     // ...
            Z "309" (N 2) "Type facturering" "Type facturation" Byte "Invoicingtype" "" ""
            Errorcode "3091" 
        ] 
    }

let fileInfoBase =
   {
        name= "FileInfoBase"; 
        inherits = None;
        implements = [];
        segments= 
        [
           segment200 
           segment300
        ]
   }

let fileInfo =
    {
        name= "FileInfo"; 
        inherits = Some fileInfoBase;
        implements = [];
        segments= 
        [
            segment300a
        ]
    }

// ...
let namespaceRequests =
    {
        name="Requests";
        records=
        [
            fileInfoBase
            fileInfo
	     // ...
        ]
    }

let namespaceSettlement =
    {
        name="Settlement";
        records=
        [
	     // ...
        ]
    }

let prog = 
    { 
        filename="eFact.cs";
        baseNamespace="HdmpCloud.eHealth.eFact.Serializer.Recordformats.";
        namespaces = 
        [
            namespaceRequests
            namespaceSettlement
        ]
    }

As you can see, the definition of all the needed datatypes is about 10 lines, and very readable:

Copy Code
type Datatypes = Bool | CRC | Byte | Short | Int | DateTime | Time | 
                 String | Money | AmbHos | Gender | Error 
type LengthType = A | N | S
type Length = { ltype: LengthType; length: int }
type Zone = { zone: string; length: Length; nl: string; fr: string; 
              datatype: Datatypes; name: string; defaultvalue: string; comments: string}
type Segment = { name: string; zones: Zone list }
type Interface = { name: string; lines: string list }
type Record = { name: string; inherits: Record option; implements: Interface list; 
                segments: Segment list }
type Namespace = { name: string; records: Record list }
type Program = { filename: string; baseNamespace: string; namespaces: Namespace list }

Then we defined some helper functions to make the definition of the zones a bit easier, and to give it semantic meaning. And now we have described the zones, segments, records, namespaces and the program. This is done in about 1000 lines of code.

Let’s Generate Some C#

Nice. We have described our language (DSL), and we have described what our C# classes should look like. We can compile this program, and if it succeeds, we know that the program in our DSL is syntactically correct. Time to generate the code, so this becomes useful.

To start, let’s output a Zone. This will be output as a property in a C# class. Don’t mind the pos parameter yet.

Copy Code
let outputZone pos zone  =
    let (declaration, att3) = outputDeclaration zone
    let att1 = outputEFactMetadata zone pos
    let att2 = sprintf "FieldFixedLength(%d)" zone.length.length

    let attslist = [ att1; att2; att3 ]

    let atts = attslist |> List.reduce (fun a b -> a + ", " + b)
    let comment = if zone.comments.Length = 0 then "" else (C2 zone.comments)

    "[" + atts + "] " + declaration + (outputDefaultValue zone) + " " + comment

As you can see, there are some helper functions here. I’ll discuss them below.

The outputZone function takes 2 parameters: pos and zone. The output is a string describing a C# property with the necessary attributes. This is the central function in the code generation. The output type of this function is a string. In the end, the generated program will just be a list of strings to be written into a file.

In F#, a function can only be used if it was defined before the calling function. At first this is a pain, but it forces you to have a correct dependency structure. Typically, this results in a list of small functions that are composed into more useful functions. Let’s look at some of the functions in “generator.fs”, which contains the code to generate the C# classes.

Very simple function to generate the string “5N” from the type (N 5):

Copy Code
let outputLength (l: Length) =
    sprintf "%d%A" l.length l.ltype

Make the first character of a string uppercase:

Copy Code
let captitalize (s:string) =
    if s.Length = 0 then ""
    else s.Substring(0,1).ToUpper() + s.Substring(1)

Create the EFactMetadata attribute:

Copy Code
// EFactMetadata("312", "449", "352-800", "Reserve", "Reserve") 
let outputEFactMetadata zone pos =
    let nl = captitalize zone.nl
    let fr = captitalize zone.fr
    let rng = sprintf "%d-%d" pos (pos + zone.length.length - 1)
    sprintf "EFactMetadata(\"%s\", \"%s\", \"%s\", \"%s\", \"%s\")" 
             zone.zone (outputLength zone.length) rng nl fr

The function is straightforward, thanks to the use of the small helpers.

Copy Code
// public string Reserve9 { get; set; } = new string(' ', 449);
let outputDeclaration zone =
    let (dt, att) = match zone.datatype with
                     | CRC -> ("byte", "FieldTrim(TrimMode.Both)")
                     | Int -> ("int", if (zone.length.ltype = LengthType.S )
                                         then sprintf 
                                         "FieldConverter(typeof(SignedIntConverter), %d)" 
                                         zone.length.length
                                         else "FieldAlign(AlignMode.Right, '0')")
                    // ...
                     | Gender -> ("Gender", "FieldConverter(typeof(EnumIntConverter),1)")

    (sprintf "public %s %s { get; set; }" dt zone.name, att)

// [EFactMetadata("312", "449", "352-800", "Reserve", "Reserve"), 
//  FieldFixedLength(450), FieldValueDiscarded] 
//  public string Reserve9 { get; set; } = new string(' ', 449);
let outputZone pos zone  =
    let (declaration, att3) = outputDeclaration zone
    let att1 = outputEFactMetadata zone pos
    let att2 = sprintf "FieldFixedLength(%d)" zone.length.length

    let attslist = [ att1; att2; att3 ]

    let atts = attslist |> List.reduce (fun a b -> a + ", " + b)
    let comment = if zone.comments.Length = 0 then "" else (C2 zone.comments)

    "[" + atts + "] " + declaration + (outputDefaultValue zone) + " " + comment

The First Function With Some Logic in it: outputSegment

We want to output a segment, which is a number of zones. There will be a loop to cover all the zones, but in functional programming, we avoid loops as much as possible. F# provides us with a lot of functions to handle collections.

The output we want is not just a line for each zone, but given that eFact files are records with fixed-length fields, we also want to indicate the position of the field in the record. We saw before that each record has a length, this allows us to calculate the positions. Here is some partial output of a zone:

Copy Code
// Segment segment200
[EFactMetadata("200", "6N", "1-6", "Naam van het bericht", "Nom du message"), 
FieldFixedLength(6), FieldAlign(AlignMode.Right, '0')] 
public int MessageName { get; set; } = 920000; // 920000|920900|...
[EFactMetadata("2001", "2N", "7-8", "Code fout", "Code érreur"), 
FieldFixedLength(2), FieldAlign(AlignMode.Right, '0')] 
public byte Error2001 { get; set; } = 0;
[EFactMetadata("201", "2N", "9-10", "Versienummer formaat van het bericht", 
"N° version du format du message"), FieldFixedLength(2), 
FieldAlign(AlignMode.Right, '0')] public byte MessageVersionNumber { get; set; } // 2
[EFactMetadata("2011", "2N", "11-12", "Code fout", "Code érreur"), 
FieldFixedLength(2), FieldAlign(AlignMode.Right, '0')] 
public byte Error2011 { get; set; } = 0;

Notice the 3rd parameter of the eFactMetadata attribute (“1-6”, “7-8”, “9-10”, “11-12”, …). This is a running total that is calculated using a start position and the lengths of the zones. Remember that the outputZone function takes a pos parameter, this explains why. Here is the function:

Copy Code
let outputSegment start (seg: Segment)  =
    let (endpos, lines) = 
        seg.zones |> List.fold (fun (pos, lines) z -> 
            let z2 = outputZone pos z
            (pos + z.length.length, z2::lines)
                                    ) (start, [])

    let zs2 = (C2 ("Segment " + seg.name)) :: (lines |> List.rev)

    (endpos, zs2)

A record is composed of one or multiple segments, so we need a start position and we return the end position for this segment. Later the outputRecord function will use the same trick as we use here for the position in the EfactMetadata attribute.

The main loop is implemented in the List.fold function:

Copy Code
seg.zones |> List.fold (fun (pos, lines) z -> 
            let z2 = outputZone pos z
            (pos + z.length.length, z2::lines)
                       ) (start, [])

Taking the collection of zones as its input, List.fold will iterate over each zone and apply an accumulator function to it. The accumulator is the tuple (pos, lines), which indicates that we are accumulating 2 things at the same time: the position and the generated lines.

Copy Code
let z2 = outputZone pos z			// generates the line for the current position
(pos + z.length.length, z2::lines)	// returns pos plus the length of the zone 
       // and the generated line in front of all the lines that were already generated

The result is that we now have our lines with the position correctly filled, but in reverse order. This explains the following line:

Copy Code
let zs2 = (C2 ("Segment " + seg.name)) :: (lines |> List.rev)

If you like, you can read the rest of the code on GitHub. Most of the code is straightforward from this point on.

More Enhancements

One simple enhancement is this:

Copy Code
let C2 s = "// " + s

Now we can generate comments like C2 “Segment 200”.

Errorcodes

In the efact format, there are many Errorcode fields. They always look the same:

Copy Code
Z  "2001" (N 2) "Code fout" "Code érreur" Error ("Error" + nzone) "0" ""

This is always a 2-digit field (N 2), so we can define a new function for this:

Copy Code
let Errorcode zone =
    let nzone = normalizeName zone
    Z zone (N 2) "Code fout" "Code érreur" Error ("Error" + nzone) "0" ""

Errorcode “2001” will now create a 2-digit zone in a descriptive way.

Reserved Zones

There are also 2 types of reserved zones: numeric and alphabetic. Depending on their type, they will be filled up with different values. They are the FILLERS in good old COBOL (and yes, this says something about my age).

To describe them, we make another function:

Copy Code
let Reserved l zone =
    let nzone = normalizeName zone
    let dft = match l.ltype with
                | A -> sprintf "new string(' ', %d)" l.length 
                | N -> sprintf "new string('0', %d)" l.length 
                | S -> sprintf "'+' + new string('0', %d)" (l.length - 1)
    Z zone l "Reserve" "Reserve" String ("Reserved" + nzone) dft ""

Conclusion

Describing the data model for the classes to be generated takes about 15 lines of code. Then we defined a couple of small helper functions and some bigger functions to generate the code. The generator.fs file contains 163 lines of code. With this, we can describe our program in a readable way. We also added some semantics to the code with constructor functions to describe fillers, errors, a mutuality, … I think this is a nice demonstration of F# as a functional language.

References


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK