

Compress Strings With .NET and C#
source link: https://khalidabuhakmeh.com/compress-strings-with-dotnet-and-csharp
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Compress Strings With .NET and C#
Modern development has an abundance of processing power, network bandwidth, and disk space. Given our current fortunes, we should spend these resources like there’s no tomorrow, right? Well, no! We should be mindful of our resource utilization and how it affects our overall application’s run profile.
This post will show how we can use compression algorithms in the System.IO.Compression
namespace to compress and decompress a string
value. Compressing values should result in significant byte reduction.
What Is Compression?
Compression in physics is a size reduction due to forces pushing in on a mass. In terms of data compression, it is transforming data into a smaller format without any perceivable information loss. Data compression uses algorithms to encode existing information into the fewest bits possible. Different algorithms have different levels of effectiveness but typically have trade-offs in terms of time to compress or CPU processing required to achieve a desirable result. In computer science, this is the space-time complexity trade-off.
Developers should evaluate the following factors when choosing a data compression algorithm:
- Time: How long does it take to compress my particular data?
- Space: How much space do I save when compressing data?
- Lossy: Does compression cause a loss in data? Normally acceptable for audio and video to have a level of information loss.
What are data compression algorithms available to .NET developers?
.NET Data Compression Algorithms
When using .NET 5, developers have access to the System.IO.Compression
namespace, which has two compression algorithms: GZip
and Brotli
.
Gzip is a lossless algorithm for data compression. The algorithm includes redundancy checks for detecting data corruption. Linux users are likely familiar with the .gz
extension, as its commonly used in the Unix space. Creators optimized Gzip for uncompressed data. Compressing already compressed data with Gzip may increase the size from the initially compressed size.
Brotli is another lossless data compression algorithm developed at Google and is best suited for text compression. As you may have guessed, Brotli is ideal for web and content delivery, which primarily operates on HTML, JavaScript, and CSS. Brotli is considered the successor of gzip, and most major web browsers support it. It also offers far better data compression than its predecessor, gzip.
Using Compression in C#
Luckily, .NET Developers have access to both the data compression algorithms mentioned above in the form of GZipStream
and BrotliStream
. Both classes have identical APIs and inputs.
var value = "hello world";
var level = CompressionLevel.Fastest;
var bytes = Encoding.Unicode.GetBytes(value);
await using var input = new MemoryStream(bytes);
await using var output = new MemoryStream();
// GZipStream with BrotliStream
await using var stream = new GZipStream(output, level);
await input.CopyToAsync(stream);
var result = output.ToArray();
var resultString = Convert.ToBase64String(result);
We can also create extension methods to make these compression algorithms easier to use in our codebase.
public static class Compression
{
public static async Task<CompressionResult> ToGzipAsync(this string value, CompressionLevel level = CompressionLevel.Fastest)
{
var bytes = Encoding.Unicode.GetBytes(value);
await using var input = new MemoryStream(bytes);
await using var output = new MemoryStream();
await using var stream = new GZipStream(output, level);
await input.CopyToAsync(stream);
var result = output.ToArray();
return new CompressionResult(
new CompressionValue(value, bytes.Length),
new CompressionValue(Convert.ToBase64String(result), result.Length),
level,
"Gzip");
}
public static async Task<CompressionResult> ToBrotliAsync(this string value, CompressionLevel level = CompressionLevel.Fastest)
{
var bytes = Encoding.Unicode.GetBytes(value);
await using var input = new MemoryStream(bytes);
await using var output = new MemoryStream();
await using var stream = new BrotliStream(output, level);
await input.CopyToAsync(stream);
var result = output.ToArray();
return new CompressionResult(
new CompressionValue(value, bytes.Length),
new CompressionValue(Convert.ToBase64String(result), result.Length),
level,
"Brotli"
);
}
public static async Task<string> FromGzipAsync(this string value)
{
var bytes = Convert.FromBase64String(value);
await using var input = new MemoryStream(bytes);
await using var output = new MemoryStream();
await using var stream = new GZipStream(input, CompressionMode.Decompress);
await stream.CopyToAsync(output);
return Encoding.Unicode.GetString(output.ToArray());
}
public static async Task<string> FromBrotliAsync(this string value)
{
var bytes = Convert.FromBase64String(value);
await using var input = new MemoryStream(bytes);
await using var output = new MemoryStream();
await using var stream = new BrotliStream(input, CompressionMode.Decompress);
await stream.CopyToAsync(output);
return Encoding.Unicode.GetString(output.ToArray());
}
}
public record CompressionResult(
CompressionValue Original,
CompressionValue Result,
CompressionLevel Level,
string Kind
)
{
public int Difference =>
Original.Size - Result.Size;
public decimal Percent =>
Math.Abs(Difference / (decimal) Original.Size);
}
public record CompressionValue(
string Value,
int Size
);
We can now use them to compress any string.
var comedyOfErrors = await File.ReadAllTextAsync("the-comedy-of-errors.txt");
var compressions = new[]
{
await comedyOfErrors.ToGzipAsync(),
await comedyOfErrors.ToBrotliAsync()
};
var table = new Table()
.MarkdownBorder()
.Title("compression in bytes")
.ShowHeaders()
.AddColumns("kind", "level", "before", "after", "difference", "% reduction");
foreach (var result in compressions)
{
table
.AddRow(
result.Kind,
result.Level.ToString(),
result.Original.Size.ToString("N0"),
result.Result.Size.ToString("N0"),
result.Difference.ToString("N0"),
result.Percent.ToString("P")
);
}
AnsiConsole.Render(table);
kind level before after difference % reduction Gzip Fastest 186,500 30,310 156,190 83.75 % Brotli Fastest 186,500 49,424 137,076 73.50 %
In this example, I load Shakespeare’s play The Comedy of Errors from a text file and compress it. What’s interesting is the Gzip compression is better than Brotli in this case.
BrotliEncoder Instead
The System.IO.Compression
namespace also has a BrotliEncoder
class that we can use to compress strings. To use it effectively, we’ll need a reference to the System.Memory
nuget package. The additional package allows us to translate existing arrays into Span
types, either explicitly or implicitly.
// compression
var source = Encoding.Unicode.GetBytes(comedyOfErrors);
var memory = new byte[source.Length];
var encoded = BrotliEncoder.TryCompress(
source,
memory,
out var encodedBytes
);
Console.WriteLine($"compress bytes: {encodedBytes}");
// decompression
var target = new byte[memory.Length];
BrotliDecoder.TryDecompress(memory, target, out var decodedBytes);
Console.WriteLine($"decompress bytes: {decodedBytes}");
var value = Encoding.Unicode.GetString(target);
Interestingly enough, when using the BrotliEncoder
, we get a more efficient resulting artifact of 33,090
bytes than using the BrotliStream
directly, which results in a byte size of 49,424
.
To try out this code, you can clone my GitHub repository.
Conclusion
Data compression is an integral part of modern software development; in most cases, compression is a low-level feature of a web server or framework. We only need to enable it and get the benefits of smaller payloads and reduced bandwidth usage. That said, it is nice to know we can take advantage of the System.IO.Compression
namespace to compress any data we choose to manually.
I hope you found this post helpful, and thank you for reading.
Resources
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK