4

Negative feature response: Automatic attachment compression in RavenDB

 2 years ago
source link: https://ayende.com/blog/195073-C/negative-feature-response-automatic-attachment-compression-in-ravendb?Key=041af353-afed-487a-8cb4-8ed49be2f6e8&utm_campaign=Feed%3A+AyendeRahien+%28Ayende+%40+Rahien%29
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Comments

Milosz
17 Oct 2021
22:26 PM

These are fair points you made. I guess if somebody insisted it could be overcome by introducing EligibleForCompression property to the attachments infrastructure but I totally understand that attachments are general-purpose mechanism and doing special-cases is problematical and in this case may be not worthy.

If someone really insisted on having collection-wide compression of texts I guess he could emulate attachments to some degree and store texts in normal documents within its own entity type e.g. TextAttachment or if he wanted to be more domain-specific EbookContent.


Regarding user-based compression - I was wondering how would one know whether the text in an attachment is compressed and if so how is it compressed. Two ideas came to my mind that make use of attachment's ContentType property:

  1. The usage of Media Type's Structured Syntax Name Suffixes. There are 3 compression-related suffixes registered at the moment: +zip, +gzip, +zstd
    (https://www.iana.org/assignments/media-type-structured-suffix)
    Example: text/plain+gzip;charset=utf-8
    Usage of unregistered suffixes is not recommended "given the possibility of conflicts with future suffix definitions"
    (https://www.rfc-editor.org/rfc/rfc6838.html#section-4.2.8)

  2. The usage of own Media Type (https://en.wikipedia.org/wiki/Media_type#Registration_trees, https://www.rfc-editor.org/rfc/rfc6838.html#section-3.1)

I could even imagine myself creating an IAttachmentsSessionOperations extension methods called Store{/Get}CompressedText that would wrap a stream into {de}compressing stream and would construct{/parse} a ContentType string.

Milosz
17 Oct 2021
22:50 PM

Sorry, I forgot to clarify that hypothetical EligibleForCompression property would be set by user when storing an attachment.

Oren Eini
19 Oct 2021
09:26 AM

Milosz,

Yes, technically speaking you can re-use document compression in RavenDB to do cross text compression. Not something that I actually thought of, but would work.In general, EligibleForCompression is the same as just sending a gzip (or zstd, etc) values, no need to get anything inside RavenDB involved.

Milosz
19 Oct 2021
09:55 AM

Sure, it wouldn't be much helpful if it would just locally compress the attachment - what I meant is that by EligibleForCompression it would behave like a smart compression of values in documents that you presented in the very first approach in the previous post (also described here https://ravendb.net/articles/ravendb-5-0-features-smart-document-compression).
But then again - I totally understand that gains here are probably negligible compared to feature implementation and ownership.

Steve
20 Oct 2021
14:29 PM

We just added 2 extension methods to IAttachmentsSessionOperations to StoreGzipped/TryGetGzipped for the (very) few cases where we do store some larger text files. It tries to gzip them, for larger results we keep the original content, and the TryGetGzipped first checks for the binary marker to determine if it was gzipped:

bool IsGzipCompressed(byte[] data) => data.Length > 1 && data[0] == 0x1F && data[1] == 0x8B;

This also helped us that we didn't have to gzip all the existing documents and we can just keep the same logic, when you really need the attachment directly from the database you can just download it, add .gz to the filename and use WinRAR or any other tool to decrompress them.

Oren Eini
20 Oct 2021
15:25 PM

Steve,

Awesome that this is that easy to integrate.

Join the conversation...


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK