33

Why we Did Not Name the cdata Transforms wide/tall/long/short

 5 years ago
source link: https://www.tuicool.com/articles/hit/VreaIbI
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

We recently saw this UX (user experience) question from the tidyr author as he adapts tidyr to cdata techniques.

3eMfuuA.png!web

zemYNvA.png!web

The terminology that he is not adopting from cdata is “unpivot_to_blocks()” and “pivot_to_rowrecs()”. One of the research ideas in the cdata package is that the important thing to call out is record structure.

The important point is: are we in a very de-normalized form where all facts about an instance are in a single row (which we called “row records”), or are we in a record oriented form where all the facts about an instances are in several rows (which we called “block records”)? The point is: row records don’t necessarily have more columns than block records. This makes shape based naming of the transforms problematic, no matter what names you pick for the shapes. There is an advantage to using intent or semantic based naming.

Below is a simple example.

zQRnQzA.png!web

Notice the width of the result relative to input width varies as function of the input data, even though we were always calling the same transform. This makes it incorrect to characterize these transforms as merely widening or narrowing.

There are still some subtle points (for instance row records are in fact instances of block records), but overall the scheme we (Nina Zumel, and myself: John Mount) worked out, tested, and promoted is pretty good. A lot of our work researching this topic can be found here .


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK