22

Serverless plays Rock-n-Roll at Fender

 5 years ago
source link: https://www.tuicool.com/articles/hit/6fYb2m6
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Welcome to “Serverless Superheroes”!
In this space, I chat with the toolmakers, innovators, and developers who are navigating the brave new world of “serverless” cloud applications.

In this edition, I chatted with Michael Garsk i, Director of Software Engineering at Fender Musical Instruments Corporation. The following interview has been edited and condensed for clarity.

Forrest Brazeal : Michael, I’ve been a musician for many years, but I never thought of combining lessons with Lambda. Why did you decide to use serverless technologies to teach people how to play the guitar?

Michael Garski: I’ve been working in technology in Los Angeles for about 20 years — — in a variety of places such as Fandango and MySpace — and two years ago came into Fender on their Digital team just as we started building our digital suite of products which include Fender Tune, Fender Tone and Fender Play. Fender Play is our flagship subscription product for online guitar instruction.

We initially started looking at serverless for some simple use cases. We needed to ingest product catalog data from our corporate IT systems, and it was easy for them to just drop off a file into an S3 bucket and trigger a Lambda function to process the data and put it into Elasticsearch on our side.

Then when we first created and launched the Fender Tune app, we used Lambda and API Gateway for basic CRUD operations so users could save custom tunings.

Then we were seeing very low server utilization on some of our standard EC2-based microservices — 2% at most — and so we thought that it would be a lot smarter and give us a lot more flexibility to use Lambda and strive for a more event-driven architecture where we can easily respond to a variety of inputs such as SNS messages or Kinesis stream records.

So are you a fully serverless shop now, or are any of those pesky EC2 instances still hanging around?

We still have a couple of servers, although we have plans to sunset those hopefully by the end of this summer.

In other words, there’s no longer any part of this architecture where you say, “I still need something running all the time?”

Correct. We’ve structured the internals of our application so that if for some reason serverless got far too much load and it was more cost effective to use a container or an EC2 instance, we could actually make that change without having to jump through a whole lot of hoops. It’d be more of just dealing with a different input source and then a different type of response.

What do you love most about this switch to serverless?

I’m most excited that we have the ability to set up our functions to do something very specific. When a request comes in for a user to create a subscription, instead of having to make several remote calls from that, we can just insert data into a DynamoDB table. Then we can have a stream off that table trigger additional actions.

With a synchronous client request, when other things happen asynchronously in the background, we’ve got retries and dead-letter queues so that if there’s a problem at least we know users aren’t being directly impacted.

If you had to go back and start this serverless migration all over again, is there anything that you would do differently? What have you learned by trial and error through this process?

If I could start over from the very beginning, we would go with Cognito for our authentication system. We initially developed our own, because when we started we had all these EC2-based services and we didn’t think we could integrate with a third-party login.

How has your application performed as users have scaled up?

Our active user count for Fender Play, I believe, is just over 50,000 people at this time. We have no issues with scaling or anything like that. Everything sort of happens automatically under the covers, and our request rate isn’t that high anyways, but we do need to have reliability. Those are paying users and we need to make sure that we handle their requests properly in a timely manner.

How do you ensure that uptime? Do you have a monitoring solution that has worked well for you?

Currently we’re using a combination of two. For investigation, we use Datadog, and we just connect that into CloudWatch and have alerts based on dead-letter queues and various other metrics. The other tool I’ve really, really enjoyed using is honeycomb.io.

Since all Lambda logs go into individual CloudWatch streams, it’s kind of difficult to see a single request going across several functions, especially on an asynchronous event. So we feed all of our logs into a specific Lambda function that can send things off to Honeycomb.

Do you have any interesting war stories of problems you’ve tracked down using these tools?

Honeycomb actually put a blog post out about one of our stories. We pushed out a deployment to our subscription service, and there was a configuration difference between our QA environment and the production environment. So all the tests passed in QA, but as soon as we released out into production users were unable to create subscriptions.

QjIJz2M.png!web
Fender’s customer focus makes observability using Honeycomb a key requirement

With Honeycomb we were very rapidly able to see what the problem was, push a fix out, and then find all of the users that were impacted. So we were able to reach out to them in a timely manner and let them know that we’ve recovered, so they could continue enjoying Fender Play.

How did you initially get alerted to that issue?

We had an automated alert that pinged us on that. And whenever we do an initial deployment, we always kind of keep an eye on things for a few minutes just to make sure everything is good.

One process improvement we are implementing is using the canary deployment features within Lambda, so we can put a small portion of the traffic onto the new versions of the function and then decide whether to roll forward with the release or pull it back.

Do you have any any regrets at all about choosing the serverless path?

No, none at all. As with any startup or launch we had a few growing pains early on with some architectural decisions. We’re documenting those, and then we can go back and clean up some of that technical debt, but overall everyone here and all the engineers and even the operations team is very happy with using serverless.

What does that serverless team look like? Is it made up primarily of developers?

Currently we have four “DevOps” team members. They set up the deployment pipelines for all of our Lambda functions and web apps. They also handle a lot of legacy stuff. We have three other engineers working on building the services (and we’re hiring!). Then we have a one QA engineer that works on tests for the services as well as a data engineer that is in charge of our data warehouse.

And these folks all feel like they have enough to do? Serverless hasn’t made anybody redundant?

No, not at all. In fact, the ops team actually has started taking on even some small development tasks that don’t have tight deadlines. They can get some feedback from the engineers on their use of Go, and now all of a sudden a couple of our DevOps guys are even creating serverless applications to help their own workflows.

What are your pain points as your serverless workflows mature?

We are working on the best way to model our applications. There’s the microservice model where you have a different function for each route — so the POST to a “users” resource would be a different function than the GET on “users”. Or you can have a route per service — one route covering everything for users. And then there’s the good old monolith which would just be one route for everything.

We kind of went the microservice route, which works out really well from a security perspective: each function does a very specific thing, and you can apply very granular permissions to that function so that it can only read from DynamoDB or whatever. However, our functions are written in Go, a compiled language. So as the number of functions increases, build times increase as well.

That’s because every Lambda function in the project has all the code in its deployment package, even if it’s only running one particular Go function. Now, we have a CI/CD process with CircleCI that builds functions in parallel and we’ve gotten that build time down quite a bit, but managing deployment packages is still a challenge.

So, if all your Lambda functions are managed in one project, that basically means you have to update all your deployed functions even if only one function changes, right? Is that a problem for you?

We’ve looked at possibly writing something to evaluate individual changes and their dependencies to see which functions actually need to be rebuilt. That got sort of complex and we just didn’t have the time to continue on it with launching products.

We also looked at consolidating, using a router inside of a function, and we’re using that in a couple services, but we’re also exploring the build path route and seeing which works out the best.

Finally, what is it that keeps you excited about serverless as you work with it day in and day out?

It’s definitely exciting to work in this event-driven space. I mean, it’s very efficient from a cost perspective, which is huge. Our whole serverless footprint costs less than the two EC2 instances we have for our authentication service. Plus we love being able to handle all sorts of inputs with the same programming model, whether it’s SNS or S3 or what have you.

And then a big part of it is the products that we’re working on. I really think people learning to play musical instruments makes the world a better place, so that’s a good feeling.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK