9

AWS使用CFN+Lambda+CLoudwatch Events+SNS+Role检测EC2 RI利用率

 3 years ago
source link: https://www.ishells.cn/archives/aws-cfn-lambda-cloudwatchevents-sns-role-getreservationutilization
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
AWS使用CFN+Lambda+CLoudwatch Events+SNS+Role检测EC2 RI利用率

AWS使用CFN+Lambda+CLoudwatch Events+SNS+Role检测EC2 RI利用率

78 次访问 2020-11-08

0、引

生产环境中经常经常需要检查EC2、RDS、ElasticCache的RI利用率、到期时间、型号等数据,那现在需要检测EC2、RDS、Elasticache的RI利用率,于是就就参考Amazon SDK 和 之前公司已有的检测RI过期时间的脚本 写了这么一个测试

0.1 v1版本流程
① 首先创建一个SNS主题,订阅通知为Email ② 然后写了一个lambda脚本,主要功能是获取到当前时间,然后返回当前时间一周内的EC2实例 RI的利用率情况,当判断到当前的利用率小于97%时,通过SNS发出邮件提醒 ③ lambda测试完成之后,编写CFN自动创建lambda、role、cloudwatch Events定时调用lambda
0.2 v2版本更新
① CloudFormation 模板中忘记写SNS部分,v2版本在CFN中添加上SNS Topic、SNS Subscription部分,并调整了CFN中它们的位置顺序( 类似yaml这种语言,创建资源的顺序都取决于在文件中定义的前后顺序)这里因为lambda中要使用topic的ARN,所以将lambda的放在SNS主题、订阅后边 ② v2版CloudFormation模板中为Lambda函数添加了环境变量 topic_arn(使用!Ref 从SNS Topic资源获取)、before_days( 用户想获得的当前前多长时间的一个窗口期 )、appenv(项目名称),lambda函数中使用os.environ['变量key']获取到该值 ③ v2版lambda函数较v1版增加了对RDS、Elasticache RI 的具体的实例类型、平台类型、RI数量,可用区等将详细信息,做一个判断,利用率低于97%的时候执行sns通知   
0.3 v3版本更新
① V3版本增加Redshift的利用率报告 ② V3版本增加所有RI的过期时间信息
0.4 体系流程图
CFN_Lambda_SNS_RI.jpg

自动化流程是:

将lambda代码上传到S3存储桶,部署CFN自动创建Role、Lambda Function、Lambda Permission、CloudWatch Events资源,Cloudwatch Events 定时使用 Role 去执行 Lambda 函数,Lambda判断 RI 利用率如果低于某个值,输出RI详细信息并通过调用SNS,通知到用户

1、输出展示

1.1 ( 为展示数据,此账号此输出未作筛选,此账号RI资源多,仅测试Lambda函数功能,未使用CFN自动化 )
RIlog.jpg
1.2  另一账号使用CFN并测试SNS输出展示
image.png
image.png

2、使用此方案流程

2.1 将 lambda 代码上传到S3存储桶
image.png
import jsonimport boto3import datetimeimport os  # 获取当前时间def get_date():    # datetime.datetime.today()获取到当前时间,格式为2020-11-03 14:12:28.339466    # strftime("%Y-%m-%d")是去除时分秒,只留年月日    # datetime.datetime.today() - datetime.timedelta(days=1)是获取到今天日期后,减1天    # 从 lambda 环境变量获取到用户希望收集当前时间前多久的一个时间段的RI利用率值    before_days = os.environ['before_days']    start_time = (datetime.datetime.today() - datetime.timedelta(days=int(before_days))).strftime("%Y-%m-%d")    stop_time = datetime.datetime.today().strftime("%Y-%m-%d")    return start_time, stop_time  # 将response的利用率转为整型# def to_int(str):#     try:#         int(str)#         return int(str)#     except ValueError:#         try:#             float(str)#             return int(float(str))#         except ValueError:#             return False # get_reservation_utilization()的Filter下的Service可以用的值:# Amazon Elastic Compute Cloud - Compute,# Amazon Relational Database Service, Amazon ElastiCache, Amazon Redshift, Amazon Elasticsearch Service  可以填的值!# ec2 ri utilizationdef get_ec2_reservation_utilization():    # regions = ['cn-north-1', 'cn-northwest-1']    # 项目名称,从环境变量中获取    appenv = os.environ['appenv']    # 定义 sns 消息标头内容    message_list = ["  " + appenv + " RI Utilization Monitor: "]     # for region in regions:    ec2_client = boto3.client('ce')    '''        如果定义了时间戳,get_reservation_utilization()返回的数据类型是字典,        它会返回每一天的利用率作为一个Group,并在每一个Group中返回一个利用率,        最后会返回设定时间戳内总的利用率'Total'字段中    '''    start_time, stop_time = get_date()     response = ec2_client.get_reservation_utilization(        TimePeriod={            'Start': start_time,            'End': stop_time        },        Filter={            'Dimensions': {                'Key': 'SERVICE',                'Values': [                    'Amazon Elastic Compute Cloud - Compute'                ]            }         },        GroupBy=[            {                'Type': "DIMENSION",                'Key': "SUBSCRIPTION_ID"            }        ]    )     # 字典类型,直接获取到response对应字段    # TotalUtilizationPercentage = response['Total']['UtilizationPercentageInUnits']     # for循环获取每一个group下的RI     # ec2_ri_groups_list = response['UtilizationsByTime'][0]['Groups'][0]['Attributes']['instanceType']    ec2_ri_groups_list = response['UtilizationsByTime'][0]['Groups']    # for ri in ec2_ri_groups_list:    #     if  ri['Attributes']['instanceType'] == 'db.m4.2xlarge':    #         instanceType = 'db.m4.2xlarge'    #         return instanceType     # print(ec2_ri_groups_list)    # print("0------------------")    if ec2_ri_groups_list:        message_list.append("  -------EC2 RI--------")        for ec2_ri_group in ec2_ri_groups_list:            ec2_ri_region = ec2_ri_group['Attributes']['region']            ec2_ri_numberOfInstances = ec2_ri_group['Attributes']['numberOfInstances']            ec2_ri_instanceType = ec2_ri_group['Attributes']['instanceType']            ec2_ri_platform = ec2_ri_group['Attributes']['platform']            ec2_ri_endDateTime = ec2_ri_group['Attributes']['endDateTime'].split('T')[0]             # 格式化输出->利用率            float_ec2_ri_UtilizationPercentage = float(ec2_ri_group['Utilization']['UtilizationPercentage'])            ec2_ri_UtilizationPercentage = "{:.2f}%".format(float_ec2_ri_UtilizationPercentage)             if float_ec2_ri_UtilizationPercentage < 97:                message = "            On the AZ " + ec2_ri_region + " , " + ec2_ri_numberOfInstances \                          + " EC2 " + ec2_ri_instanceType + " RI Utilization is " + ec2_ri_UtilizationPercentage \                          + " , its platform is " + ec2_ri_platform + " , its  expiration date is " + ec2_ri_endDateTime                message_list.append(message)        # 如果利用率较高,不输出分割线        # if float_ec2_ri_UtilizationPercentage >= 97:        #     message_list.remove("  -------EC2 RI--------")    return message_list  # rds ri utilizationdef get_rds_reservation_utilization():    # regions = ['cn-north-1', 'cn-northwest-1']    message_list = get_ec2_reservation_utilization()    # print(message_list)    # print("0------------------------")    # for region in regions:    rds_client = boto3.client('ce')     start_time, stop_time = get_date()    rds_response = rds_client.get_reservation_utilization(        TimePeriod={            'Start': start_time,            'End': stop_time        },        Filter={            'Dimensions': {                'Key': 'SERVICE',                'Values': [                    'Amazon Relational Database Service'                ]            }        },        GroupBy=[            {                'Type': "DIMENSION",                'Key': "SUBSCRIPTION_ID"            }        ]    )     rds_ri_groups_list = rds_response['UtilizationsByTime'][0]['Groups']    if rds_ri_groups_list:        message_list.append("  -------RDS RI--------")        for rds_ri_group in rds_ri_groups_list:            rds_ri_region = rds_ri_group['Attributes']['region']            rds_ri_numberOfInstance = rds_ri_group['Attributes']['numberOfInstances']            rds_ri_instanceType = rds_ri_group['Attributes']['instanceType']            rds_ri_platform = rds_ri_group['Attributes']['platform']            rds_ri_endDateTime = rds_ri_group['Attributes']['endDateTime'].split('T')[0]             float_rds_ri_UtilizationPercentage = float(rds_ri_group['Utilization']['UtilizationPercentage'])            rds_ri_UtilizationPercentage = "{:.2f}%".format(float_rds_ri_UtilizationPercentage)            if float_rds_ri_UtilizationPercentage < 97:                message = "            On the AZ " + rds_ri_region + " , " + rds_ri_numberOfInstance \                          + " RDS " + rds_ri_instanceType + " RI Utilization is " + rds_ri_UtilizationPercentage \                          + " , its platform is " + rds_ri_platform + " , its expiration date is " + rds_ri_endDateTime                message_list.append(message)        # 如果利用率较高,不输出分割线        # if float_rds_ri_UtilizationPercentage > 97:        #     message_list.remove("  -------RDS RI--------")    return message_list  def get_redshift_utilization():    message_list = get_rds_reservation_utilization()    redshift_client = boto3.client('ce')     start_time, stop_time = get_date()    redshift_response = redshift_client.get_reservation_utilization(        TimePeriod={            'Start': start_time,            'End': stop_time        },        Filter={            'Dimensions': {                'Key': 'SERVICE',                'Values': [                    'Amazon Redshift'                ]            }        },        GroupBy=[            {                'Type': 'DIMENSION',                'Key': 'SUBSCRIPTION_ID'            }        ]    )     redshift_ri_groups_list = redshift_response['UtilizationsByTime'][0]['Groups']    if redshift_ri_groups_list:        message_list.append("  -------Redshift RI--------")        for redshift_ri_group in redshift_ri_groups_list:            redshift_ri_region = redshift_ri_group['Attributes']['region']            redshift_ri_numberOfInstance = redshift_ri_group['Attributes']['numberOfInstances']            redshift_ri_instanceType = redshift_ri_group['Attributes']['instanceType']            redshift_ri_platform = redshift_ri_group['Attributes']['platform']            redshift_ri_endDateTime = rds_ri_group['Attributes']['endDateTime'].split('T')[0]             float_redshift_ri_UtilizationPercentage = float(redshift_ri_group['Utilization']['UtilizationPercentage'])            redshift_ri_UtilizationPercentage = "{:.2f}%".format(float_redshift_ri_UtilizationPercentage)            if float_redshift_ri_UtilizationPercentage < 97:                message = "            On the AZ " + redshift_ri_region + " , " + redshift_ri_numberOfInstance \                          + " Redshift " + redshift_ri_instanceType + " RI Utilization is " + redshift_ri_UtilizationPercentage \                          + " , its platform is " + redshift_ri_platform + " , its expiration date is " + redshift_ri_endDateTime                message_list.append(message)    return message_list  # elasticache ri utilizationdef get_elasticache_utilization():    message_list = get_redshift_utilization()    message_str = "\n".join(message_list)     # for region in regions:    elasticache_client = boto3.client('ce')     start_time, stop_time = get_date()    elasticache_response = elasticache_client.get_reservation_utilization(        TimePeriod={            'Start': start_time,            'End': stop_time        },        Filter={            'Dimensions': {                'Key': 'SERVICE',                'Values': [                    'Amazon ElastiCache'                ]            }        },        GroupBy=[            {                'Type': "DIMENSION",                'Key': "SUBSCRIPTION_ID"            }        ]    )     elasticache_ri_groups_list = elasticache_response['UtilizationsByTime'][0]['Groups']    if elasticache_ri_groups_list:        message_list.append("  -------ElastiCache RI--------")        for elasticache_ri_group in elasticache_ri_groups_list:            elasticache_ri_region = elasticache_ri_group['Attributes']['region']            elasticache_ri_numberOfInstance = elasticache_ri_group['Attributes']['numberOfInstances']            elasticache_ri_instanceType = elasticache_ri_group['Attributes']['instanceType']            elasticache_ri_platform = elasticache_ri_group['Attributes']['platform']            elasticache_ri_endDateTime = elasticache_ri_group['Attributes']['endDateTime'].split('T')[0]             float_elasticache_ri_UtilizationPercentage = float(                elasticache_ri_group['Utilization']['UtilizationPercentage'])            elasticache_ri_UtilizationPercentage = "{:.2f}%".format(float_elasticache_ri_UtilizationPercentage)             if float_elasticache_ri_UtilizationPercentage < 97:                message = "            On the AZ " + elasticache_ri_region + " , " + elasticache_ri_numberOfInstance \                          + " ElastiCache " + elasticache_ri_instanceType + " RI Utilization is " + elasticache_ri_UtilizationPercentage \                          + " , its platform is " + elasticache_ri_platform + " , its expiration date is  " + elasticache_ri_endDateTime                message_list.append(message)                message_str = "\n".join(message_list)    return message_str  def sns_publish():    # 使用os.environ获取lambda中的环境变量,该环境变量值在CloudFormation创建lambda时已获取到,传递给lambda的Environment环境变量    topic_arn = os.environ['topic_arn']    topic_region = 'cn-north-1'    # 获取到返回的消息值    message_str = get_elasticache_utilization()    sns = boto3.client('sns', region_name=topic_region)    response = sns.publish(        TopicArn=topic_arn,        Subject='RI Utilization Monitor',        Message=message_str    )  def lambda_handler(event, context):    # TODO implement    before_days = os.environ['before_days']     # 判断是否执行    # TotalUtilizationPercentage = get_reservation_utilization()[0]    # if float(TotalUtilizationPercentage) < 97:    sns_publish()
2.2 修改CloudFormation代码可变字段
AWSTemplateFormatVersion: 2010-09-09Description: RI-Utilization-Monitor Resources:  LambdaExecutionRole:    Type: AWS::IAM::Role    Properties:      Path: /      AssumeRolePolicyDocument:        Version: 2012-10-17        Statement:          - Effect: Allow            Principal:              Service:                - lambda.amazonaws.com            Action:              - sts:AssumeRole      Policies:        - PolicyName: RI_Utilization_Monitor          PolicyDocument:            Version: 2012-10-17            Statement:              - Effect: Allow                Action:                  - ec2:DescribeReservedInstances                  - ec2:DescribeReservedInstancesModifications                  - ec2:DescribeReservedInstancesOfferings                  - ec2:DescribeReservedInstancesListings                  - ce:GetReservationUtilization                  - sns:Publish                  - s3:Get*                  - s3:List*                Resource: "*"   CloudwacthEventsScheduledRule:    Type: AWS::Events::Rule    Properties:      Name: RI_Utilization_Monitor      Description: AWS Cloudwatch Events Schedule Rule      ScheduleExpression: "cron(00 08 ? * FRI *)"                  # 修改这里周期调用Lambda函数的周期,GMT      State: "ENABLED"      Targets:        -          Arn:            Fn::GetAtt:              - LambdaFunctionCreator              - Arn          Id: GetReservationUtilization   PermissionForEventsToInvokeLambda:    Type: AWS::Lambda::Permission    Properties:      FunctionName: !GetAtt          - LambdaFunctionCreator          - Arn      Action: lambda:InvokeFunction      Principal: events.amazonaws.com      SourceArn: !GetAtt          - CloudwacthEventsScheduledRule          - Arn  SNSTopic:    Type: AWS::SNS::Topic    Properties:      TopicName: RI_Utilization_Monitor  SNSSubscription:    Type: AWS::SNS::Subscription    Properties:      Endpoint: '[email protected]'             # 修改这里SNS订阅的邮箱      Protocol: email      TopicArn: !Ref SNSTopic  LambdaFunctionCreator:    Type: AWS::Lambda::Function    Properties:      FunctionName: GetReservationUtilization      Description: Lambda For RI Utilization Monitor      Environment:        Variables:          topic_arn: !Ref SNSTopic          before_days: 7                               # 修改这里选择返回当前时间前多久的一个RI利用率          appenv: Friso                                # 修改这里项目名      Runtime: python3.7      Handler: GetReservationUtilization.lambda_handler      MemorySize: 128      Role: !GetAtt LambdaExecutionRole.Arn      Timeout: 60      Code:        S3Bucket: XXXXXXXXXXXXXXXX    # S3桶名        S3Key: RI/GetReservationUtilization.zip   # S3路径/文件名
2.3 使用已有模板生成CloudFormation,并上传yaml文件
image.png
2.4 需要一个CloudFormation的执行角色权限
image.png
{   "Version": "2012-10-17",   "Statement": [       {           "Effect": "Allow",           "Action": "*",           "Resource": "*"       }   ]}
2.5 执行CLoudFormation自动化创建Lambda函数、Lambda Role、SNS Topic、Sns 订阅等资源

测试时可以将定时调用的周期距离当前时间近一些,待无问题之后在设定为日常使用所需要的一个周期

3、测试代码内容
l = {  "UtilizationsByTime": [{    "Groups": [    {      'Key': 'string',      'Value': 'string',      "Attributes": {        "AccountId": "0123456789",        "AccountName": "0123456789",        "AvailabilityZone": "",        "CancellationDateTime": "2019-09-28T15:22:31.000Z",        "EndDateTime": "2019-09-28T15:22:31.000Z",        "InstanceType": "t2.nano",        "LeaseId": "0123456789",        "NumberOfInstances": "1",        "OfferingType": "convertible",        "Platform": "Linux/UNIX",        "Region": "us-east-1",        "Scope": "Region",        "StartDateTime": "2016-09-28T15:22:32.000Z",        "SubscriptionId": "359809062",        "SubscriptionStatus": "Active",        "SubscriptionType": "All Upfront",        "Tenancy": "Shared"      },      "Key": "SUBSCRIPTION_ID",      "Utilization": {        "PurchasedHours": 2208,        "TotalActualHours": 2208,        "UnusedHours": 0,        "UtilizationPercentage": 100      },      "Value": "359809062"    },    {      "Attributes": {        "": "0123456789",        "AccountName": "asdasdad",        "AvailabilityZone": "us-east-1d",        "CancellationDateTime": "2017-09-28T15:22:31.000Z",        "EndDateTime": "2017-09-28T15:22:31.000Z",        "InstanceType": "t2.nano",        "LeaseId": "asdasda",        "NumberOfInstances": "1",        "OfferingType": "Standard",        "Platform": "Linux/UNIX",        "Region": "us-east-1",        "Scope": "Availability Zone",        "StartDateTime": "2016-09-28T15:22:32.000Z",        "SubscriptionId": "359809070",        "SubscriptionStatus": "Active",        "SubscriptionType": "All Upfront",        "Tenancy": "Shared"      },      "Key": "SUBSCRIPTION_ID",      "Utilization": {        "PurchasedHours": 2151,        "TotalActualHours": 2151,        "UnusedHours": 0,        "UtilizationPercentage": 100      },      "Value": "359809070"    },    {      "Attributes": {        "AccountId": "0123456789",        "AccountName": "sdasad",        "AvailabilityZone": "us-west-2a",        "CancellationDateTime": "2017-09-20T04:06:02.000Z",        "EndDateTime": "2017-09-20T04:06:02.000Z",        "InstanceType": "t2.nano",        "LeaseId": "asdasda",        "NumberOfInstances": "1",        "OfferingType": "Standard",        "Platform": "Linux/UNIX",        "Region": "us-west-2",        "Scope": "Availability Zone",        "StartDateTime": "2016-09-20T04:06:03.000Z",        "SubscriptionId": "353571154",        "SubscriptionStatus": "Active",        "SubscriptionType": "Partial Upfront"      },      "Key": "SUBSCRIPTION_ID",      "Utilization": {        "PurchasedHours": 1948,        "TotalActualHours": 0,        "UnusedHours": 1948,        "UtilizationPercentage": 0      },      "Value": "353571154"    }  ],  "TimePeriod": {    "End": "2017-10-01",    "Start": "2017-07-01"  },  "Total": {    "PurchasedHours": 6307,    "TotalActualHours": 4359,    "UnusedHours": 1948,    "UtilizationPercentage": 69.11368320913270968764864436340574  }  }]} ec2_ri_groups_list = l['UtilizationsByTime'][0]['Groups'] message_list = [" RI Utilization Monitor: "] for ec2_ri_group in ec2_ri_groups_list:    ec2_ri_region = ec2_ri_group['Attributes']['Region']    ec2_ri_numberOfInstances = ec2_ri_group['Attributes']['NumberOfInstances']    ec2_ri_instanceType = ec2_ri_group['Attributes']['InstanceType']    ec2_ri_platform = ec2_ri_group['Attributes']['Platform']    # 格式化输出利用率    ec2_ri_UtilizationPercentage = "{:.2f}%".format(float(ec2_ri_group['Utilization']['UtilizationPercentage']))    message = "            On the AZ " + ec2_ri_region + " , " + ec2_ri_numberOfInstances \              + " EC2 " + ec2_ri_instanceType + " RI Utilization is " + ec2_ri_UtilizationPercentage + " , its platform is " + ec2_ri_platform    message_list.append(message)    message_str = "\n".join(message_list) print(message_str)
N、CFN Error排错过程

1、cron表达式有误,已解决

设置了每周周日执行,日期就不能填*,必须填?

2、CFN中创建角色的权限不够

排错步骤:

① 首先检查CFN是否执行成功

② 检查CFN中设定的资源是否都创建成功:Role、Cloudwatch Events、Lambda

③ 检查lambda执行情况

排错时找到了lambda报错原因:权限问题

"errorMessage":          "An error occurred (AccessDeniedException) when calling the GetReservationUtilization operation: User: arn:aws-cn:sts::936669166135:assumed-role/RI-Utilization-Monitor-LambdaExecutionRole-KD1YDAO01XWR/GetReservationUtilization is not authorized to perform: ce:GetReservationUtilization on resource: arn:aws:ce:cn-northwest-1:936669166135:/GetReservationUtilization",

更改CFN创建角色的权限为:

     Policies:       - PolicyName: RI_Utilization_Monitor         PolicyDocument:           Version: 2012-10-17           Statement:             - Effect: Allow               Action:                 - ec2:DescribeReservedInstances                 - ec2:DescribeReservedInstancesModifications                 - ec2:DescribeReservedInstancesOfferings                 - ec2:DescribeReservedInstancesListings                 - ce:GetReservationUtilization                 - sns:Publish                 - s3:Get*                 - s3:List*               Resource: "*"

3、CFN中弄混 !GetAtt、!Ref ,无法获取TopicARN

① 查看报错信息

RICFNTopicERROR.png

查看AWS::SNS::Topic返回值官方文档

TopicCFNDoc.png

③ 报错定位

RICFN.png

查看!Ref官方示例

!RefDoc.png

⑤ 修改CFN代码(图片只有部分,详见版本2 CFN代码)

TopicCFNRight!RefARN.png

1、Cloudwatch-Events-Cron 表达式

2、sns boto3

3、GetReservationUtilization Syntax

4、AWS::Events::Rule

5、查看AWS::SNS::Topic返回值官方文档

6、查看!Ref官方示例


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK