8

Github Core Workloads · brianfrankcooper/YCSB Wiki · GitHub

 3 years ago
source link: https://github.com/brianfrankcooper/YCSB/wiki/Core-Workloads
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Core Workloads

Sean Busbey edited this page on Nov 30, 2020 · 6 revisions

YCSB includes a set of core workloads that define a basic benchmark for cloud systems. Of course, you can define your own workloads, as described in Implementing New Workloads. However, the core workloads are a useful first step, and obtaining these benchmark numbers for a variety of different systems would allow you to understand the performance
tradeoffs of different systems.

The core workloads consist of six different workloads:

Workload A: Update heavy workload

This workload has a mix of 50/50 reads and writes. An application example is a session store recording recent actions. Updates in this workload do not presume you read the original record first. The assumption is all update writes contain fields for a record that already exists; oftentimes writing only a subset of the total fields for that record. Some data stores need to read the underlying record in order to reconcile what the final record should look like, but not all do.

Workload B: Read mostly workload

This workload has a 95/5 reads/write mix. Application example: photo tagging; add a tag is an update, but most operations are to read tags. As with Workload A, these writes do not presume you read the original record before writing to it.

Workload C: Read only

This workload is 100% read. Application example: user profile cache, where profiles are constructed elsewhere (e.g., Hadoop).

Workload D: Read latest workload

In this workload, new records are inserted, and the most recently inserted records are the most popular. Application example: user status updates; people want to read the latest.

Workload E: Short ranges

In this workload, short ranges of records are queried, instead of individual records. Application example: threaded conversations, where each scan is for the posts in a given thread (assumed to be clustered by thread id).

Workload F: Read-modify-write

In this workload, the client will read a record, modify it, and write back the changes. Application example: user database, where user records are read and modified by the user or to record user activity. This workload forces a read of the record from the underlying datastore prior to writing an updated set of fields for that record. This effectively forces all datastores to read the underlying record prior to accepting a write for it. At the moment we use a random delta for the write rather than some value derived from the current record (say incrementing a counter). That can make the workload a bit harder to follow since the starting read seems unnecessary.

Running the workloads

All six workloads have a data set which is similar. Workloads D and E insert records during the test run. Thus, to keep the database size consistent, we recommend the following sequence:

  1. Load the database, using workload A’s parameter file (workloads/workloada) and the “-load” switch to the client.
  2. Run workload A (using workloads/workloada and “-t”) for a variety of throughputs.
  3. Run workload B (using workloads/workloadb and “-t”) for a variety of throughputs.
  4. Run workload C (using workloads/workloadc and “-t”) for a variety of throughputs.
  5. Run workload F (using workloads/workloadf and “-t”) for a variety of throughputs.
  6. Run workload D (using workloads/workloadd and “-t”) for a variety of throughputs. This workload inserts records, increasing the size of the database.
  7. Delete the data in the database.
  8. Reload the database, using workload E’s parameter file (workloads/workloade) and the "-load switch to the client.
  9. Run workload E (using workloads/workloade and “-t”) for a variety of throughputs. This workload inserts records, increasing the size of the database.

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK