Binary formats and protocols: LTV is better than TLV

One of the most ubiquitous patterns found in the design of binary file formats and network protocols is the Type-Length-Value (TLV) construction:

     t bytes    Type
     l bytes    Length
Length bytes    Value

This is the most common order of these fields as it is directly implied by the TLV acronym. However, I don't think it is the optimal construction.

In my view, LTV — Length-Type-Value — is the better construction. One reason for this is that if you arrange the fields like this, you can define the LTV construction as a composition of two different constructions:

LV(data) (length-value), in which a sequence of bytes has its length indicated, but without any indication of its format or meaning, defined as:

LV(data) = len(data) || data
TV(type, data) (type-value), in which a sequence of bytes has its type indicated but it is assumed its length is understood from the context, defined as:

TV(type, data) = type || data

Both of these are useful in different circumstances. If the type of some piece of data is contextually understood, LV suffices. If the length of some piece of data is contextually understood, TV suffices.

By using these constructions, LTV is simply defined as the composition of the two:

LTV(type, data) = LV(TV(type, data))

On this basis I believe that LTV is a more “natural” construction than TLV. TLV is a strange construction because it has the type field outside of the length-demarcated area, but everything else inside of it. The type field is given special treatment but it is unclear to me that this is justified. Even with the LTV construction, it's no harder to read the first few bytes to skip over LTVs of a type which is not currently sought.

Disadvantages of LTVs? Are there any disadvantages of the LTV construction? With LTVs, the length field naturally includes the length of any type field. This is fine, but it does create the issue that a malicious or buggy peer could send an LV which is not large enough to include a TV type field. So you would have to check the length is adequate before extracting a type field from a LTV. However if skipping over TLVs/LTVs to search for a specific type, you need to process the length field anyway to get to the next TLV/LTV. Moreover, bounds checks with regard to the buffer length are essential when processing input anyway, so this is nothing new. Plus, this enforcement can be done in wire protocol deserializer routines rather than in higher-level libraries.

There is an argument to be made that it is better to have invalid values “unrepresentable” in a language (see also LANGSEC). With LTV, supposing for example a four-byte type field, the valid length values are [4,], whereas with TLV, the valid length values are [0,]. So TLV seems to have an advantage here in making invalid values unrepresentable in the binary format. But the gain here is limited as you still need to validate lengths of untrusted input with regard to the buffer length (which seems a much bigger hazard than any robustness LANGSEC might offer).

Anyway, I'm not making any big dogmatic point here. It's just long seemed to me that LTV is a more “natural” arrangement than TLV and I'm mildly surprised it's not more common as a construction. So I just thought I'd put that thought here.

Binary formats and protocols: LTV is better than TLV

Binary formats and protocols: LTV is better than TLV

Recommend

Improve Business Outcomes with SAP Business Technology Platform Use Cases at Sap...

盖茨再次力挺AI，技术决定于人

Capita says Russian cyber attack will cost £20m to clean up

The ultimate data ops platform

“Sleep language” could enable communication during lucid dreams

Disney will bring Hulu content into Disney Plus and raise its ad-free prices

Salesforce unveils Tableau data analysis tools driven by generative AI

Photos: Sen. Dianne Feinstein Returns to the Capitol Following Illness

Google teases Project Tailwind — a prototype AI notebook that learns from your d...

谷歌AI武器升级！新模型PaLM 2驱动Bard，协作工具全家桶进化，云、搜索、电邮等一网打...

About Joyk