1

SRE: Service Reliability

 2 years ago
source link: https://blog.knoldus.com/sre-service-reliability/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Reliability Engineer SRE: Service Reliability

Reading Time: 3 minutes

Hi guys, In this blog, we will looking at what reliable service is and how can we bring reliability to our service.

Reliability is one of the values which is hard to bring in our service. It is important to make sure everyone in our team knows what is the real meaning of these that will help them to bring proper reliability to our service. Otherwise, at the first problem that is pop-out of the box, they will start using the word “reliability” in their own context words.

Before I you introduce reliability in our service, we have to define what it is. It means different things to different people. For instance, the people who work on different parts of a system perceive in different ways.

  • A database administrator sees reliable as accurate data. He makes a store more reliable by normalizing its data, to remove redundant copies.
  • A network engineer sees reliable as guaranteed message delivery. he works with reliable protocols (TCP) and unreliable protocols (UDP).
  • A researcher defines reliable as an accurate web site content. The more availablelow latency, and high throughput, the more reliable it is.
Reliable Service

#3 Principles of Customer Happiness

  • The most important feature is trustworthy.
  • Our user decides reliability it doesn’t matter what monitoring system saying.
  • the pursuit of ever-increasing reliable service.(Increasing availability from 99.99% to 99.999% is much costly does your customer is ready to pay the cost.)

Availability vs Reliability

Most of us confuse Availability and Reliability.
Are both different?
Yes, both are different and these are values that hold our service.

Availability is the amount of time that service or resource is fully available for its consumer.
Ok, but what if your service is available however the user cannot access it properly.
where does it count?
These count in reliability because customers are unable to use the services you’re providing, they can’t take advantage of any of the features you’ve so painstakingly built forth.
It is a costly and often unreachable target to set. A more realistic goal is that the system should meet the expectations of its users and strive to maintain their trust.

Availability vs Reliability

Latency is a time interval between the stimulation and response, or, from a more general point of view, a time delay between the consumer asks for information from service and he received a response.

Throughput(Performance) is the actual amount of data that is successfully sent/received over the communication link. 

Let us consider we have a service i.e www.knoldus.com and we are going to search for blogs on our service then first of all www.knoldus.com must be Available (Availability) so that it can listen for search action and return a response let say the time taken to return response is 0.5 sec (Latency) and the response received is correct without failure (Throughput) i.e it give 9900 correct response out of 10000 requests.

Reliability holds Service

Maintain Reliability in service

Probably the easiest one to convince anyone to use our service is reliability. To maintain reliability in service we need to ensure that all 3 components of reliability must perform well. Availability, Latency, and Throughput Collectively, all these help in creating service reliable.
Reliability (high) = Availability(high) + Latency(low) + Throughput(high)

While building you service you should keep these thing mind.

  1. Build with failure in mind.
  2. Always think about scaling.
  3. Mitigate risk.
  4. Monitor availability.
  5. Respond to availability issues in a predictable and defined way.

Conclusion

The primary role of the Site Reliability Engineer is to identify and manage asset risks that could adversely affect plan or business operations .they help in who build or implement software to improve the reliability of their systems. Follow us on the next blog for more role and responsibility of Site Reliable Engineer.

Reference

You can refer: https://landing.google.com/sre/ for more information.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK