Apisix 1.5 升级到 2.2 踩坑备忘

线上运行的 APISIX 为 1.5 版本，而社区已经发布了 Apisix 2.2，是时候需要升级到最新版了，能够享受最版本带来的大量的BugFix，性能增强，以及新增特性的支持等~

从Apisix 1.5升级到Apisix 2.2过程中，不是一帆风顺的，中间踩了不少坑，所谓前车之鉴后事之师，这里给大家简单梳理一下我们团队所在具体业务环境下，升级过程中踩的若干坑，以及一些需要避免的若干注意事项等。

下文所说原先版本，皆指Apisix 1.5，新版则是Apisix 2.2版本。

一、已有服务发现机制无法正常工作

针对上游Upstream没有使用服务发现的路由来讲，本次升级没有遇到什么问题。

公司内部线上业务大都基于Consul KV方式实现服务注册和服务发现，因此我们自行实现了一个 consul_kv.lua 模块实现服务发现流程。

这在Apisix 1.5下面一切工作正常。

但在Apisix 2.2下面，就无法直接工作了，原因如下：

服务发现配置指令变了
上游对象包含服务发现时需增加字段 discovery_type 进行索引

2.1 服务发现配置指令变了

原先运行中仅支持一种服务发现机制，需要配置在 apisix层级下面：

apisix:
    ......
    discover: consul_kv
    ......

新版需要直接在config*.yaml文件中顶层层级下进行配置，可支持多种不同的路由发现机制，如下：

discovery:                      # service discovery center
  eureka:
    host:                       # it's possible to define multiple eureka hosts addresses of the same eureka cluster.
      - "http://127.0.0.1:8761"
    prefix: "/eureka/"
    fetch_interval: 30          # default 30s
    weight: 100                 # default weight for node
    timeout:
      connect: 2000             # default 2000ms
      send: 2000                # default 2000ms
      read: 5000

我们有所变通，直接在配置文件顶层配置consul_kv多个集群相关参数，避免 discovery 层级过深。

 discovery:
    consul_kv: 1
consul_kv:
  servers:
    -
      host: "172.19.5.30"
      port: 8500
    -
      host: "172.19.5.31"
      port: 8500
  prefix: "upstreams"
  timeout:
    connect: 6000
    read: 6000
    wait: 60
  weight: 1
  delay: 5
  connect_type: "long" # long connect
  ......

当然，这仅仅保证了服务发现模块能够在启动时被正常加载。

2.2 upstream对象新增字段discovery_type

Apisix当前同时支持多种服务发现机制，这个很赞。对应的代价，就是需要额外引入 discovery_type 字段，用于索引可能同时存在的多个服务发现机制。

以 Cousul KV方式服务发现为例，那么需要在已有的 upstream 对象中需要添加该字段：

"discovery_type" : "consul_kv"

原先的一个upstream对象，仅仅需要 service_name 字段属性指定服务发现相关地址即可：

{
    "id": "d6c1d325-9003-4217-808d-249aaf52168e",
    "name": "grpc_upstream_hello",
    ......
    "service_name": "http://172.19.5.30:8500/v1/kv/upstreams/grpc/grpc_hello",
    "create_time": 1610437522,
    "desc": "demo grpc service",
    "type": "roundrobin"
}

而新版的则需要添加discovery_type字段，表明该service_name 字段对应的具体模块名称，效果如下：

{
    "id": "d6c1d325-9003-4217-808d-249aaf52168e",
    "name": "grpc_upstream_hello",
    ......
    "service_name": "http://172.19.5.30:8500/v1/kv/upstreams/grpc/grpc_hello",
    "create_time": 1610437522,
    "desc": "demo grpc service",
    "type": "roundrobin",
    "discovery_type":"consul_kv"
}

后面我们若支持Consul Service或ETCD KV方式服务发现机制，则会非常弹性和清晰。

调整了配置指令，添加上述字段之后，后端服务发现其实就已经起作用了。

但gRPC代理路由并不会生效……

二、gRPC当前不支持upstream_id

在我们的系统中，上游和路由是需要单独分开管理的，因此创建的HTTP或GRPC路由需要处理支持upstream_id的索引。

这在1.5版本中，grpc路由是没问题的，但到了apisix 2.2版本中，维护者 @spacewander 暂时没做支持，原因是规划grpc路由和dubbo路由处理逻辑趋于一致，更为紧凑。从维护角度我是认可的，但作为使用者来讲，这就有些不合理了，直接丢弃了针对以往数据的支持。

作为当前Geek一些方式，在 apisix/init.lua 中，最小成本（优雅和成本成反比）修改如下，找到如下代码：

    -- todo: support upstream id
    api_ctx.matched_upstream = (route.dns_value and
                                route.dns_value.upstream)
                               or route.value.upstream

直接替换为下面代码即可解决燃眉之急：

    local up_id = route.value.upstream_id
    if up_id then
        local upstreams = core.config.fetch_created_obj("/upstreams")
        if upstreams then
            local upstream = upstreams:get(tostring(up_id))
            if not upstream then
                core.log.error("failed to find upstream by id: " .. up_id)
                return core.response.exit(502)
            end
            if upstream.has_domain then
                local err
                upstream, err = lru_resolved_domain(upstream,
                                                    upstream.modifiedIndex,
                                                    parse_domain_in_up,
                                                    upstream)
                if err then
                    core.log.error("failed to get resolved upstream: ", err)
                    return core.response.exit(500)
                end
            end
            if upstream.value.pass_host then
                api_ctx.pass_host = upstream.value.pass_host
                api_ctx.upstream_host = upstream.value.upstream_host
            end
            core.log.info("parsed upstream: ", core.json.delay_encode(upstream))
            api_ctx.matched_upstream = upstream.dns_value or upstream.value
        end
    else
        api_ctx.matched_upstream = (route.dns_value and
                                route.dns_value.upstream)
                               or route.value.upstream  
    end

三、自定义auth插件需要微调

新版的apisix auth授权插件支持多个授权插件串行执行，这个功能也很赞，但此举导致了先前为具体业务定制的授权插件无法正常工作，这时需要微调一下。

原先调用方式：

    local consumers = core.lrucache.plugin(plugin_name, "consumers_key",
            consumer_conf.conf_version,
            create_consume_cache, consumer_conf)

因为新版的lrucache不再提供 plugin 函数，需要微调一下：

local lrucache = core.lrucache.new({
  type = "plugin",
})
......
    local consumers = lrucache("consumers_key", consumer_conf.conf_version,
        create_consume_cache, consumer_conf)

另一处是，顺利授权之后，需要赋值consumer相关信息：

    ctx.consumer = consumer
    ctx.consumer_id = consumer.consumer_id

此时需要替换成如下方式，为（可能存在的）后续的授权插件继续作用。

consumer_mod.attach_consumer(ctx, consumer, consumer_conf)

更多请参考：apisix/plugins/key-auth.lua 源码。

四、ETCD V2数据迁移到V3

迁移分为三步：

升级线上已有ETCD 3.3.*版本到3.4.*，满足新版Apisix的要求，这时ETCD实例同时支持了V2和V3格式数据
迁移V2数据到V3
- 因为数据量不是非常多，我采取了一个非常简单和原始的方式
- 使用 etcdctl 完成V2数据到导出
- 然后使用文本编辑器vim等完成数据的替换，生成etcdctl v3格式的数据导入命令脚本
- 运行之后V3数据导入脚本，完成V2到V3的数据导入
修改V3 /apisix/upstreams 中包含服务注册的数据，一一添加 "discovery_type" : "consul_kv"属性

基于以上操作之后，从而完成了ETCD V2到V3的数据迁移。

五、启动apisix后发现ETCD V3已有数据无法加载

我们在运维层面，使用 /usr/local/openresty/bin/openresty -p /usr/local/apisix -g daemon off; 方式运行网关程序。

这也就导致，自动忽略了官方提倡的：apisix start 命令自动提前为ETCD V3初始化的一些键值对内容。

因此，需要提前为ETCD V3建立以下键值对内容：

Key                         Value
/apisix/routes          :   init_dir
/apisix/upstreams       :   init_dir
/apisix/services        :   init_dir
/apisix/plugins         :   init_dir
/apisix/consumers       :   init_dir
/apisix/node_status     :   init_dir
/apisix/ssl             :   init_dir
/apisix/global_rules    :   init_dir
/apisix/stream_routes   :   init_dir
/apisix/proto           :   init_dir
/apisix/plugin_metadata :   init_dir

不提前建立的话，就会导致apisix重启后，无法正常加载ETCD中已有数据。

其实有一个补救措施，需要修改 apisix/init.lua 内容，找到如下代码：

            if not dir_res.nodes then
                dir_res.nodes = {}
            end

比较geek的行为，使用下面代码替换一下即可完成兼容：

                if dir_res.key then
                    dir_res.nodes = { clone_tab(dir_res) }
                else
                    dir_res.nodes = {}
                end

六、apisix-dashboard的支持

我们基于apisix-dashboard定制开发了大量的针对公司实际业务非常实用的企业级特性，但也导致了无法直接升级到最新版的apisix-dashboard。

因为非常基础的上游和路由没有发生多大改变，因此这部分升级的需求可以忽略。

实际上，只是在提交上游表单时，包含服务注册信息JSON字符串中需要增加 discovery_type 字段和对应值即可完成支持。

花费了一些时间完成了从Apisix 1.5升级到Apisix 2.2的行为，虽然有些坑，但整体来讲，还算顺利。目前已经上线并全量部署运行，目前运行良好。

针对还停留在Apisix 1.5的用户，新版增加了Control API以及多种服务发现等新特性支持，还是非常值得升级的。

升级之前，不妨仔细阅读每一个版本的升级日志（地址：https://github.com/apache/apisix/blob/2.2/CHANGELOG.md ），然后需要根据具体业务做好兼容测试准备和准备升级步骤，这些都是非常有必要的。

针对我们团队来讲，升级到最新版，一方面降低了版本升级的压力，另一方面也能够辅助我们能参与到开源社区中去，挺好~

一、已有服务发现机制无法正常工作

2.1 服务发现配置指令变了

2.2 upstream对象新增字段discovery_type

二、gRPC当前不支持upstream_id

三、自定义auth插件需要微调

四、ETCD V2数据迁移到V3

五、启动apisix后发现ETCD V3已有数据无法加载

六、apisix-dashboard的支持

Recommend

领域驱动设计实战 - paulwong - BlogJava

全新模具+3.2K高刷屏， RedmiBook Pro系列4499元起

JQuery工具方法整理

开发资源总结 (持续整理中) - 2021/3/15 更新

Gate.io 芝麻开门将上线 UNION Protocol(UNN) 交易的公告

灰度比特币信托折价超15%，宣布停止对GBTC的新投资

科技爱好者周刊（第 148 期）：微增长时代

用 pyppeteer 制作 PDF文件

The Best Way to Build Your Web Application—Infographic

怎么一键批量删除PDF中的图片水印？

About Joyk