

GitHub - pingcap/chaos-mesh: A Chaos Engineering Platform for Kubernetes
source link: https://github.com/pingcap/chaos-mesh
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

README.md
Note:
This readme and related documentation are Work in Progress.
Chaos Mesh is a cloud-native Chaos Engineering platform that orchestrates chaos on Kubernetes environments. At the current stage, it has the following components:
- Chaos Operator: the core component for chaos orchestration. Fully open sourced.
- Chaos Dashboard: a visualized panel that shows the impacts of chaos experiments on the online services of the system; under development; curently only supports chaos experiments on TiDB.
See the following demo video for a quick view of Chaos Mesh:
Chaos Operator
Chaos Operator injects chaos into the applications and Kubernetes infrastructure in a manageable way, which provides easy, custom definitions for chaos experiments and automatic orchestration. There are three components at play:
Controller-manager: used to schedule and manage the lifecycle of CRD objects
Chaos-daemon: runs as daemonset with privileged system permissions over network, Cgroup, etc. for a specifc node
Sidecar: a special type of container that is dynamically injected into the target Pod by the webhook-server, which can be used for hacjacking I/O of the application container.
Chaos Operator uses Custom Resource Definition (CRD) to define chaos objects. The current implementation supports three types of CRD objects for fault injection, namely PodChaos, NetworkChaos, and IOChaos, which correspond to the following major actions (experiments):
- pod-kill: The selected pod is killed (ReplicaSet or something similar may be needed to ensure the pod will be restarted)
- pod-failure: The selected pod will be unavailable in a specified period of time
- netem chaos: Network chaos such as delay, duplication, etc.
- network-partition: Simulate network partition
- IO chaos: simulate file system falults such as I/O delay, read/write errors, etc.
Prerequisites
Before deploying Chaos Mesh, make sure the following items have been installed. If you would like to have a try on your machine, you can refer to get-started-on-your-local-machine section.
Deploy Chaos Mesh
Get the Helm files
git clone https://github.com/pingcap/chaos-mesh.git
cd chaos-mesh/
Create custom resource type
To use Chaos Mesh, you must first create the related custom resource type.
kubectl apply -f manifests/ kubectl get crd podchaos.pingcap.com
Install Chaos Mesh
- Install Chaos Mesh with Chaos Operator only
helm install helm/chaos-mesh --name=chaos-mesh --namespace=chaos-testing kubectl get pods --namespace chaos-testing -l app.kubernetes.io/instance=chaos-mesh
- Install Chaos Mesh with Chaos Operator and Chaos Dashboard
helm install helm/chaos-mesh --name=chaos-mesh --namespace=chaos-testing --set dashboard.create=true
Get started on your local machine
Warning:
This deployment is for testing only. DO NOT USE in production!
You can try Chaos Mesh on your local K8s environment deployed using kind
or minikube
.
Deploy your local K8s environment
Deploy with kind
-
Clone the code
git clone --depth=1 https://github.com/pingcap/chaos-mesh && \ cd chaos-mesh
-
Run the script and create a local Kubernetes cluster. Make sure you have installed kind.
hack/kind-cluster-build.sh
-
To connect the local Kubernetes cluster, set the default configuration file path of
kubectl
tokube-config
.export KUBECONFIG="$(kind get kubeconfig-path)"
-
Verify whether the Kubernetes cluster is on and running
kubectl cluster-info
-
Install
chaos-mesh
onkind
kubernetes cluster as suggested in Install Chaos Mesh.
Deploy with minikube
-
Start a
minikube
kubernetes cluster. Make sure you have installed minikube.minikube start --kubernetes-version v1.15.0 --cpus 4 --memory "8192mb" # we recommend that you allocate enough RAM (more than 8192 MiB) to the VM
-
Install helm
curl https://raw.githubusercontent.com/helm/helm/master/scripts/get | bash helm init
-
Check whether helm tiller pod is running
kubectl -n kube-system get pods -l app=helm
-
Install
chaos-mesh
onminikube
kubernetes cluster as suggested in Install Chaos Mesh.
Note:
There are some known restrictions for Chaos Operator deployed on kind
and minikube
clusters:
-
All network-related chaos is not supported for
kind
cluster.Chaos Operator uses docker pkg to transform between container id and pid, which is necessary to find network namespace for pods.
Kind
usescontainerd
as Introducing Container Runtime Interface (CRI) runtime and it's not supported in our implementation yet. -
netem chaos
is not supported forminikube
clusters.In
minikube
, the default virtual machine driver's image doesn't contain thesch_netem
kernel module. You can usenone
driver (if your host is Linux with thesch_netem
kernel module loaded) to try these chaos actions onminikube
or build an image with sch_netem by yourself.
Deploy target cluster
After Chaos Mesh is deployed, we can deploy the target cluster to be tested, or where we want to inject faults. For illustration purposes, we use TiDB as our sample cluster.
You can follow the instructions in the following two documents to deploy a TiDB cluster:
Define chaos experiment config file
The chaos experiement configuration is defined in a .yaml file. The following sample file (pod-kill-example.yaml
) defines a chaos experiment to kill one tikv pod randomly every 60 seconds:
apiVersion: pingcap.com/v1alpha1 kind: PodChaos metadata: name: pod-failure-example namespace: chaos-testing spec: action: pod-failure # the specific chaos action to inject; supported actions: pod-kill/pod-failure mode: one # the mode to run chaos action; supported modes are one/all/fixed/fixed-percent/random-max-percent duration: "60s" # duration for the injected chaos experiment selector: # pods where to inject chaos actions namespaces: - tidb-cluster-demo # the namespace of the system under test (SUT) you've deployed labelSelectors: "app.kubernetes.io/component": "tikv" # the label of the pod for chaos injection scheduler: # scheduler rules for the running time of the chaos experiments about pods. cron: "@every 5m"
Create a chaos experiment
kubectl apply -f pod-failure-example.yaml kubectl get podchaos --namespace=chaos-testing
You can see the QPS performance (by running a benchmark against the cluster affected by the chaos experiment from TiDB Grafana dashboard:
Update a chaos experiment
vim pod-failure-example.yaml # modify pod-failure-example.yaml to what you want
kubectl apply -f pod-failure-example.yaml
Delete a chaos experiment
kubectl delete -f pod-failure-example.yaml
Watch your chaos experiments in Dashboard
Chaos Dashboard is currently only available for TiDB clusters. Stay tuned for more supports or join us in making it happen.
Note:
If Chaos Dashboard was not installed in your earlier deployment, you need to install it by upgrading Chaos Mesh:
helm upgrade chaos-mesh helm/chaos-mesh --namespace=chaos-testing --set dashboard.create=true
A typical way to access it is to use kubectl port-forward
kubectl port-forward -n chaos-testing svc/chaos-dashboard 8080:80
Then you can access http://localhost:8080
in browser.
Community
Please reach out for bugs, feature requests, and other issues via:
- The #chaos-mesh channel in the TiDB Community slack workspace.
- Filing a issue or opening a PR against this repo.
Roadmap
- chaos-operator
- chaos-dashboard
- chaos-verify
- chaos-engine
- chaos-admin
- chaos-cloud
License
Chaos Mesh is licensed under the Apache License, Version 2.0. See LICENSE for the full license text.
Recommend
-
90
README.md TiDB Operator
-
41
README.md Data Migration Platform
-
11
Chaos Mesh® 技术内幕 | 如何注入 I/O 故障?TiDB Robot开源分布式数据库 TiDB在生产环...
-
9
「我的工作是制造混沌」,我与 Chaos Mesh® 的故事TiDB Robot开源分布式数据库 TiDB
-
13
Chaos Mesh® 的 Chaos Engineering as a Service 探索之路TiDB Robot开源分布式数据库 TiDB...
-
15
Chaos Mesh® 在腾讯——腾讯互娱混沌工程实践PingCAPSQL at Scale本篇文章整理自腾讯互娱高...
-
11
README.md Note...
-
11
Chaos Mesh 助力 Apache APISIX 提升稳定性Apache APISIX 中国社区关注发布于: 2021 年 07 月 23 日Apache APISIX 是 Ap...
-
14
迈向混沌工程闭环生态的 Chaos Mesh® 2.0PingCAPSQL at Scale2021 年 7 月...
-
13
How Chaos Mesh Helps Apache APISIX Improve System Stability
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK