6

View Storage Accelerator (CBRC) Hashing Function

 2 years ago
source link: https://myvirtualcloud.net/view-storage-accelerator-cbrc-hashing-function/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

View Storage Accelerator (CBRC) Hashing Function

VMware View Storage Accelerator uses a vSphere platform feature called CBRC (Content Based Read Cache). CBRC is an in-host RAM cache for VMDK blocks frequently accessed by virtual desktops.

When VMware View Storage Accelerator is enabled (OS disk, or OS and Persistent Disks), a per-VMDK digest file is created to store the hash information about logical block in VMDK disks. This information is stored using SHA1 cryptographic hash function. SHA-1 is the most widely used of the existing SHA hash functions, and is employed in several widely used applications and protocols.

For more information on CBRC please refer to:

View Storage Accelerator Performance Benchmark
Sizing for VMware View Storage Accelerator (CBRC)
Understanding CBRC (Content Based Read Cache)
Understanding CBRC – RecomputeDigest Method

This time around I’ll focus on CBRC hashing function and capabilities. The core CBRC infrastructure is implemented as a vSCSI Filter in the VMKernel and is transparently attached to a CBRC enabled VMDK.

This CBRC vSCSI Filter maintains a global RAM cache per host to serve IO requests from virtual desktops. This data is stored in cache, based on dedup-hash signatures, and is not tied to any particular VM. This implementation method allow detection of duplicate blocks across multiple virtual desktops, and then serve IOs from cache.

Unlike most traditional disk read caches, CBRC make efficient use of the cache size by indexing itself by hashing the content of the logical block number being accessed rather than by the logical block numbers themselves. For virtual desktops derived from common templates, such as replica disks used by Linked Clones, this drastically reduce the amount of cache needed since identical blocks from different virtual desktops will only be cached once, allowing it to fit within the host RAM.

CBRC uses an offline process controlled via the VIM-API to create a digest file per VMDK containing the hashes for each 4K block of the VMDK. For linked clones this process happens after a replica disk is created and after the a clone is created, but before powering them on. The digest file contains metadata and the sequence of hash data; one for every VMDK block.

When virtual desktops are powered-on the vSCSI Filter gets attached to the VMDKs. The vSCSI filter reads the digest file and updates a global hash table. This global hash table maintains the logical block numbers map equivalent to the hash mapping for every VMDK and helps identify duplicated blocks across VMDKs.

IOs from different virtual desktops are served off from cache based on the hash-content for the block for which the IO has been issued. When a virtual desktop is powered on, if CBRC is enabled, the vSCSI Filter is attached. The vSphere vmkernel module get the information about the location of the digest file and updates the global de-duplication map based on the hash contents. When the VM is powered-off the vSCSI Filter closes the digest file.

Hash Collision

Hash collision (strict compare) is turned off by default. This means that CBRC is relying on the statistical power of SHA1 cryptographic hash function. SHA1 is a strong algorithm and the probability of hash collisions is very small.

When hash collision detection is turned-on a SHA256 cryptographic hash function is also stored along with the default SHA1 cryptographic hash function for the VMDK block. This increases the digest disk footprint and the vmkernel memory footprint, and could lead to slight increase in digest load times as part of virtual desktop boot process. There is also a CPU overhead associated with this feature.

When a digest file with SHA256 is in use, hash collisions are detected by comparing both cryptographic hash functions. Blocks for which hash collisions are detected are not cached, increasing the amount of memory available for other caching operations.

In my article Sizing for VMware View Storage Accelerator (CBRC) I explain how to size CBRC storage and RAM with and without the hash-collision feature.

It’s possible turn on hash collision detection through advanced configuration options, but please note that this will have some CPU performance impact.

The configuration option is exposed through VMware vSphere Client Advanced Configuration (/config/Digest/intOpts/CollisionEnabled).

To activate hash collision detection use the following methods:

vsish –e set /config/Digest/intOpts/CollisionEnabled 1

esxcli system settings advanced set -o “/config/Digest/intOpts/CollisionEnabled” -s “1”

This will enable detecting hash collisions by hashing each VMDK block using two cryptographic hash functions (SHA1/SHA256).

Once you have collision detection enabled you need to re-create the digest file for existing VMDKs, so that the digest will be re-created with SHA-1 and SHA-256 cryptographic hash functions. If the digest file is not re-created the VMDK will not be CBRC enabled. In VMware View an easy way to re-create the digest file is to re-compose a desktop pool using a new Parent VM snapshot. In another article I will explain to use the API to re-hash without having to re-compose your desktops.

This article was first published by Andre Leibovici (@andreleibovici) at myvirtualcloud.net.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK