3

prodsys - production systems

 1 year ago
source link: https://wilsonmar.github.io/prod-baselines/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Here we maintain assets SREs (System Reliability Engineers) use to fearlessesly and adroitly face production.

“The best and fastest way to learn a language is to live like a native”.

PROTIP: To really learn a system, troubleshoot a running production system, as an SRE (System Reliability Engineer).

The job title “SRE” has existed only a few years. It’s evovlved from “System Administrator”.

Most who hold SREs today got their job out of accident. Being in the right place at the right time. Few 10-year-olds have ever stated that they want to grow up to be an SRE like they say they want to be an astronaut.

We think that’s a wreckless and stupid way to equip people for one of the most important jobs in a company. SREs keep safe and alive websites and other systems critical to the operation of nearly every organization today.

We think that great SREs are built, not born.

We think what’s needed is people collaborating to advance a baseline with variations.

BTW content here are my personal opinions, and not intended to represent any employer (past or present).

SRE Job Descriptions (CV)

Typical job descriptions request people with a number of years of experience.

But we think a flawed metric.

Going from newbie to Junior to Senior to Master.

Your Journey to SRE Maturity

The vitality and security of your production system at work reflects your diligence at learning these stages:

  1. Master basic skills: self control (to make time), Touch Typing/vim, VSCode, MacOS/Linux commands (sed, awk, jq, jsonette, etc.), Shell & Python scripting, Git and GitHub, Git and GitHub Markdown, CI/CD (GitHub Actions?), Docker, Terraform, Ansible, Helm, Kubernetes. Then there’s effective collaboration applying etiquette and tricks to using email, Slack, SMS, Zoom/Teams, etc.

  2. Customize the adoption plan templates here about how to introduce and sustain the entire implementation lifecycle.

  3. Study the baseline configuration assets (Terraform, Policies, GitHub Actions scripts, etc.) by reading and viewing videos.

  4. Use automation to stand up baseline production instances, using the assets and steps described here. (A production system includes observability, dashboards, alerts.)

  5. Trace events during baseline functional and security tests to ensure that systems continuous adhere to all policies.

  6. Analyze results compared with baseline from scalability commands and runs simulating traffic (starting with a small rig) for Observability history and Chaos Engineering.

  7. Ensure compatibility when making modifications among various new releases coming out all the time.

  8. Conduct experiments adding components, variations and Chaos Engineering. Break something and see how quickly you can fix them (as measured by MTTR/RTO/RPO, etc.). We have contests.

  9. Create tutorials for others. Mentor others.

HashiCorp HashiCups demo rig

https://github.com/hashicorp/consul-k8s-prometheus-grafana-hashicups-demoapp from Sep 2020 (by Derek Strickland) contains application and dashboard definitions for the Consul Layer 7 observability with Kubernetes guide located at learn.hashicorp.com

It leverages micro-services and Consul Service Mesh to connect them all together.

It uses HashiCups, one of the standard HashiCorp demo apps.

Code to create the Hashicups app is from https://github.com/hashicorp-demoapp :

  • https://github.com/hashicorp-demoapp/frontend
  • https://github.com/hashicorp-demoapp/payments
  • https://github.com/hashicorp-demoapp/postgres

Also the infrastructure:

  • https://hub.docker.com/repository/docker/hashicorpdemoapp/traffic-simulation
  • https://github.com/hashicorp-demoapp/traffic-simulation by nicholas jackson

https://github.com/hashicorp/learn-consul-k8s-hashicups

https://github.com/hashicorp/field-demo-hashicups-sample

https://learn.hashicorp.com/tutorials/terraform/provider-setup

https://learn.hashicorp.com/tutorials/consul/kubernetes-deployment-guide

https://learn.hashicorp.com/collections/consul/kubernetes-production

https://github.com/hashicorp/terraform-provider-hashicups

https://github.com/hashicorp/learn-terraform-hashicups-provider


Others must know: please click to share:

prodsys - production systems was published on June 27, 2022.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK