30

在Kubernetes 1.10.3上以Hard模式搭建EFK日志分析平台

 5 years ago
source link: https://tonybai.com/2018/06/13/setup-efk-on-kubernetes-1-10-3-in-the-hard-way/?amp%3Butm_medium=referral
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

在一年多之前,我曾写过一篇文章 《使用Fluentd和ElasticSearch Stack实现Kubernetes的集群Logging》 ,文中讲解了如何在Kubernetes上利用EFK( elastic , fluentd , kibana )搭建一套可用的集中日志分析平台。当时的k8s使用的是1.3.7版本,创建EFK使用的是 kubernetes项目cluster/addons/fluentd-elasticsearch 下面的全套yaml文件,yaml中Elastic Search的volume用的还是emptyDir,并未真正持久化。

经过一年多的发展,Kubernetes发生了 “翻天覆地”的变化 ,EFK技术栈也有了很大的进展。虽然那篇文章中的方案、步骤以及问题的解决思路仍有参考价值,但毕竟“年代”不同了,有些东西需要“与时俱进”。恰好近期在协助同事搭建一个 移动互联网医院 的演示环境时,我又一次搭建了一套“较新”版本的EFK,这里记录一下搭建过程、遇到的坑以及问题的解决过程,算是对之前“陈旧知识”的一个更新吧。

一. 环境和部署方案

这次部署我使用了较新的Kubernetes stable版本: 1.10.3 ,这是一个单master node和三个worker node组成的演示环境,集群由kubeadm创建并引导启动。经过这些年的发展和演进,kubeadm引导启动的集群已经十分稳定了,并且搭建过程也是十分顺利(集群使用的是 weave network 插件)。

在EFK部署方案上,我没有再选择直接使用 kubernetes项目cluster/addons/fluentd-elasticsearch 下面的全套yaml文件,而是打算逐个组件单独安装的hard模式。

下面是一个部署示意图:

B3uqIna.png!web

虽然Kubernetes在持久化存储方面有诸多机制和插件可用,但总体来说,目前的k8s在storage这块依旧是短板,用起来体验较差,希望 Container Storage Interface, CSI 的引入和未来发展能降低开发人员的心智负担。因此,这次我将Elastic Search放在了k8s集群外单独单点部署,并直接使用local file system进行数据存取;fluentd没有变化,依旧是以DaemonSet控制的Pod的形式运行在每个k8s node上; kibana部署在集群内部,并通过ingress将服务暴露到集群外面。

二. 部署Elastic Search

按照部署方案,我们将Elastic Search部署在k8s集群外面,但我们依旧使用容器化部署方式。Elastic Search的官方镜像仓库已经由docker hub迁移到 elasticsearch自己维护的仓库 了。

我们下载当前ElasticSearch的最新版6.2.4:

docker pull docker.elastic.co/elasticsearch/elasticsearch:6.2.4

# docker images
REPOSITORY                                      TAG                 IMAGE ID            CREATED             SIZE
docker.elastic.co/elasticsearch/elasticsearch   6.2.4               7cb69da7148d        8 weeks ago         515 MB

在本地创建elasticsearch的数据存储目录:~/es_data,修改该目录的owner和group均为1000:

# mkdir ~/es_data
# chmod g+rwx es_data
# chgrp 1000 es_data
# chown 1000 -R es_data

# ls -l /root/es_data/
total 8
drwxrwxr-x 2 1000 1000 4096 Jun  8 09:50 ./
drwx------ 8 root root 4096 Jun  8 09:50 ../

注意:务必对es_data按上述命令执行修改,否则在启动elasticsearch容器可能会出现如下错误:

[WARN ][o.e.b.ElasticsearchUncaughtExceptionHandler] [] uncaught exception in thread [main]
_*org.elasticsearch.bootstrap.StartupException: java.lang.IllegalStateException: Failed to create node environment*_
    at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:125) ~[elasticsearch-6.2.4.jar:6.2.4]
... ...
Caused by: java.nio.file.AccessDeniedException: /usr/share/elasticsearch/data/nodes
    at sun.nio.fs.UnixException.translateToIOException(UnixException.java:84) ~[?:?]
... ...

启动elasticsearch容器:

# docker run -d --restart=unless-stopped -p 9200:9200 -p 9300:9300 -v /root/es_data:/usr/share/elasticsearch/data --ulimit nofile=65536:65536 -e "bootstrap.memory_lock=true" --ulimit memlock=-1:-1 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:6.2.4

如果看到下面日志,说明elasticsearch容器启动成功了!

[INFO ][o.e.c.m.MetaDataCreateIndexService] [sGZc7Wa] [.monitoring-es-6-2018.06.08] creating index, cause [auto(bulk api)], templates [.monitoring-es], shards [1]/[0], mappings [doc]
[INFO ][o.e.c.r.a.AllocationService] [sGZc7Wa] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[.monitoring-es-6-2018.06.08][0]] ...]).

检查es健康状态:

# curl http://127.0.0.1:9200/_cat/health
1528424599 02:23:19 docker-cluster green 1 1 1 1 0 0 0 0 - 100.0%

es工作一切健康!

三. 部署Fluentd

相比较而言,fluentd的部署相对简单,因为 fluentd官网文档 有明确的安装说明。由于k8s默认授权机制采用了RBAC,因此我们使用 fluentd-daemonset-elasticsearch.yaml 来创建fluentd daemonset。

不过在创建前,我们需要打开fluentd-daemonset-elasticsearch.yaml修改一下它连接的elasticsearch的地址信息:

containers:
      - name: fluentd
        image: fluent/fluentd-kubernetes-daemonset:elasticsearch
        env:
          - name:  FLUENT_ELASTICSEARCH_HOST
            value: "172.16.66.104" // 172.16.66.104就是我们的elasticsearch运行的节点的ip

接下来创建fluentd:

# kubectl apply -f fluentd-daemonset-elasticsearch-rbac.yaml
serviceaccount "fluentd" created
clusterrole.rbac.authorization.k8s.io "fluentd" created
clusterrolebinding.rbac.authorization.k8s.io "fluentd" created
daemonset.extensions "fluentd" created

查看某一个fluentd pod的启动日志如下:

# kubectl logs -f pods/fluentd-4rptt -n kube-system
[info]: reading config file path="/fluentd/etc/fluent.conf"
[info]: starting fluentd-0.12.33
[info]: gem 'fluent-plugin-elasticsearch' version '1.16.0'
[info]: gem 'fluent-plugin-kubernetes_metadata_filter' version '1.0.2'
[info]: gem 'fluent-plugin-record-reformer' version '0.9.1'
[info]: gem 'fluent-plugin-secure-forward' version '0.4.5'
[info]: gem 'fluentd' version '0.12.33'
[info]: adding match pattern="fluent.**" type="null"
[info]: adding filter pattern="kubernetes.**" type="kubernetes_metadata"
[info]: adding match pattern="**" type="elasticsearch"
[info]: adding source type="tail"
... ...
[info]: following tail of /var/log/containers/weave-net-9kds5_kube-system_weave-13ef6f321b2bc64dc920878c7d361440c0157b91f6025f23c631edb5feb3473a.log
[info]: following tail of /var/log/containers/fluentd-4rptt_kube-system_fluentd-bdc80586d5cafc10729fb277ce01cf28d595059eabf96b66324f32b3b6873e28.log
[info]: Connection opened to Elasticsearch cluster => {:host=>"172.16.66.104", :port=>9200, :scheme=>"http", :user=>"elastic", :password=>"obfuscated"}
... ...

没有报错!似乎fluentd启动ok了。

再来通过elasticsearch日志验证一下:

[INFO ][o.e.c.m.MetaDataCreateIndexService] [sGZc7Wa] [logstash-2018.06.07] creating index, cause [auto(bulk api)], templates [], shards [5]/[1], mappings []
[INFO ][o.e.c.m.MetaDataCreateIndexService] [sGZc7Wa] [logstash-2018.06.08] creating index, cause [auto(bulk api)], templates [], shards [5]/[1], mappings []
[INFO ][o.e.c.m.MetaDataMappingService] [sGZc7Wa] [logstash-2018.06.07/XetLly2ZQFKKd0JVvxl5fA] create_mapping [fluentd]
[INFO ][o.e.c.m.MetaDataMappingService] [sGZc7Wa] [logstash-2018.06.07/XetLly2ZQFKKd0JVvxl5fA] update_mapping [fluentd]
[INFO ][o.e.c.m.MetaDataMappingService] [sGZc7Wa] [logstash-2018.06.07/XetLly2ZQFKKd0JVvxl5fA] update_mapping [fluentd]
[INFO ][o.e.c.m.MetaDataMappingService] [sGZc7Wa] [logstash-2018.06.08/j5soBzyVSNOvBQg-E3NkCA] create_mapping [fluentd]
[INFO ][o.e.c.m.MetaDataMappingService] [sGZc7Wa] [logstash-2018.06.08/j5soBzyVSNOvBQg-E3NkCA] update_mapping [fluentd]
[INFO ][o.e.c.m.MetaDataMappingService] [sGZc7Wa] [logstash-2018.06.08/j5soBzyVSNOvBQg-E3NkCA] update_mapping [fluentd]
[INFO ][o.e.c.m.MetaDataMappingService] [sGZc7Wa] [logstash-2018.06.07/XetLly2ZQFKKd0JVvxl5fA] update_mapping [fluentd]
[INFO ][o.e.c.m.MetaDataMappingService] [sGZc7Wa] [logstash-2018.06.08/j5soBzyVSNOvBQg-E3NkCA] update_mapping [fluentd]

fluentd已经成功连接上es了!

四. 部署Kibana

我们将kibana部署到Kubernetes集群内,我们使用kubernetes项目中的cluster/addons/fluentd-elasticsearch下的kibana yaml文件来创建kibana部署和服务:

https://github.com/kubernetes/kubernetes/blob/master/cluster/addons/fluentd-elasticsearch/kibana-deployment.yaml

https://github.com/kubernetes/kubernetes/blob/master/cluster/addons/fluentd-elasticsearch/kibana-service.yaml

创建前,我们需要修改一下kibana-deployment.yaml:

... ...
        image: docker.elastic.co/kibana/kibana:6.2.4  // 这里,我们使用最新的版本:6.2.4

          - name: ELASTICSEARCH_URL
            value: http://172.16.66.104:9200  //这里,我们用上面的elasticsearch的服务地址填入到value的值中
.... ...

创建kibana:

# kubectl apply -f kibana-service.yaml
service "kibana-logging" created
# kubectl apply -f kibana-deployment.yaml
deployment.apps "kibana-logging" created

查看启动的kibana pod,看到如下错误日志:

{"type":"log","@timestamp":"2018-06-08T07:09:08Z","tags":["fatal"],"pid":1,"message":"\"xpack.monitoring.ui.container.elasticsearch.enabled\" setting was not applied. Check for spelling errors and ensure that expected plugins are installed and enabled."}
FATAL "xpack.monitoring.ui.container.elasticsearch.enabled" setting was not applied. Check for spelling errors and ensure that expected plugins are installed and enabled.

似乎与xpack有关。我们删除kibana-deployment.yaml中的两个环境变量:XPACK_MONITORING_ENABLED和XPACK_SECURITY_ENABLED,再重新apply。查看kibana pod日志:

# kubectl logs -f kibana-logging-648dbdf986-bc24x -n kube-system
{"type":"log","@timestamp":"2018-06-08T07:16:27Z","tags":["status","plugin:[email protected]","info"],"pid":1,"state":"green","message":"Status changed from uninitialized to green - Ready","prevState":"uninitialized","prevMsg":"uninitialized"}
{"type":"log","@timestamp":"2018-06-08T07:16:27Z","tags":["status","plugin:[email protected]","info"],"pid":1,"state":"yellow","message":"Status changed from uninitialized to yellow - Waiting for Elasticsearch","prevState":"uninitialized","prevMsg":"uninitialized"}
... ...
{"type":"log","@timestamp":"2018-06-08T07:16:30Z","tags":["info","monitoring-ui","kibana-monitoring"],"pid":1,"message":"Starting all Kibana monitoring collectors"}
{"type":"log","@timestamp":"2018-06-08T07:16:30Z","tags":["license","info","xpack"],"pid":1,"message":"Imported license information from Elasticsearch for the [monitoring] cluster: mode: basic | status: active | expiry date: 2018-07-08T02:06:08+00:00"}

可以看到kibana启动成功!

使用kubectl proxy启动代理,在浏览器中建立sock5 proxy,然后在浏览器访问:http://localhost:8001/api/v1/namespaces/kube-system/services/kibana-logging/proxy, 你应该可以看到下面的kibana首页:

jIrYV3b.png!web

创建index pattern后,等待一会,查看边栏中的”Discover”,如果你看到类似下面截图中的日志内容输出,说明kibana可以正常从elasticsearch获取数据了:

2uI3Q3I.png!web

五. 为kibana添加ingress

使用kubectl proxy查看kibana虽然简单,但略显麻烦,将kibana服务暴露到集群外更为方便。下面我们就给kibana添加带basic auth的ingress。

1. 部署ingress controller及默认后端(如果cluster已经部署过,则忽略此步骤)

我们选择k8s官方的 ingress-nginx 作为ingress controller,并部署默认后端default-backend,我们把ingress-nginx controller和default-backend统统部署在 kube-system 命令空间下。

下载https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/mandatory.yaml

mandatory.yaml中的namespace的值都改为kube-system

docker pull anjia0532/defaultbackend:1.4
docker tag anjia0532/defaultbackend:1.4 gcr.io/google_containers/defaultbackend:1.4
docker pull quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.15.0

# kubectl apply -f mandatory.yaml
deployment.extensions "default-http-backend" created
service "default-http-backend" created
configmap "nginx-configuration" created
configmap "tcp-services" created
configmap "udp-services" created
serviceaccount "nginx-ingress-serviceaccount" created
clusterrole.rbac.authorization.k8s.io "nginx-ingress-clusterrole" created
role.rbac.authorization.k8s.io "nginx-ingress-role" created
rolebinding.rbac.authorization.k8s.io "nginx-ingress-role-nisa-binding" created
clusterrolebinding.rbac.authorization.k8s.io "nginx-ingress-clusterrole-nisa-binding" created
deployment.extensions "nginx-ingress-controller" created

此时nginx-ingress controller已经安装完毕,nginx-ingress controller本质上就是一个nginx,目前它还没有暴露服务端口,我们通过nodeport方式暴露nginx-ingress service到集群外面:

下载https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/provider/baremetal/service-nodeport.yaml

修改service-nodeport.yaml:

apiVersion: v1
kind: Service
metadata:
  name: ingress-nginx
  namespace: kube-system
spec:
  type: NodePort
  ports:
  - name: http
    port: 80
    targetPort: 80
    nodePort: 30080
    protocol: TCP
  - name: https
    port: 443
    targetPort: 443
    nodePort: 30443
    protocol: TCP
  selector:
    app: ingress-nginx

# kubectl apply -f service-nodeport.yaml
service "ingress-nginx" created
# lsof -i tcp:30080
COMMAND     PID USER   FD   TYPE   DEVICE SIZE/OFF NODE NAME
kube-prox 24565 root    9u  IPv6 10447591      0t0  TCP *:30080 (LISTEN)

我们验证一下nginx-ingress controller工作是否正常:

在任意一个集群node上:

# curl localhost:30080
default backend - 404

2. 为kibana添加ingress

ingress 是一种抽象。对于nginx ingress controller来说,创建一个ingress相当于在nginx.conf中添加一个server入口,并nginx -s reload生效。

我们创建kibana的ingress yaml:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  annotations:
  name: kibana-logging-ingress
  namespace: kube-system
spec:
  rules:
  - host: kibana.tonybai.com
    http:
      paths:
      - backend:
          serviceName: kibana-logging
          servicePort: 5601

由于ingress中的host只能是域名,这里用 kibana.tonybai.com,然后在/etc/hosts中增加该域名的ip地址映射。

创建kibana-logging-ingress:

# kubectl apply -f kibana-logging-ingress.yaml
ingress.extensions "kibana-logging-ingress" created

此时,我们打开浏览器,访问http://kibana.tonybai.com:30080,我们得到了如下结果:

{"statusCode":404,"error":"Not Found","message":"Not Found"}

我们再次用curl试一下:

# curl -L kibana.tonybai.com:30080
<script>var hashRoute = '/api/v1/namespaces/kube-system/services/kibana-logging/proxy/appl;
var defaultRoute = '/api/v1/namespaces/kube-system/services/kibana-logging/proxy/app/kibana';

var hash = window.location.hash;
if (hash.length) {
  window.location = hashRoute + hash;
} else {
  window.location = defaultRoute;

这显然不是我们预想的结果。我们查看一下kibana pod对应的日志,并对比了一下使用kubectl proxy访问kibana的日志:

通过ingress访问的错误日志:

{"type":"response","@timestamp":"2018-06-11T10:20:55Z","tags":[],"pid":1,"method":"get","statusCode":404,"req":{"url":"/api/v1/namespaces/kube-system/services/kibana-logging/proxy/app/kibana","method":"get","headers":{"host":"kibana.tonybai.com:30080","connection":"close","x-request-id":"b066d69c31ce3c9e89efa6264966561c","x-real-ip":"192.168.16.1","x-forwarded-for":"192.168.16.1","x-forwarded-host":"kibana.tonybai.com:30080","x-forwarded-port":"80","x-forwarded-proto":"http","x-original-uri":"/api/v1/namespaces/kube-system/services/kibana-logging/proxy/app/kibana","x-scheme":"http","cache-control":"max-age=0","upgrade-insecure-requests":"1","user-agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36","accept":"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8","accept-language":"zh-CN,zh;q=0.9,en;q=0.8,zh-TW;q=0.7"},"remoteAddress":"192.168.20.5","userAgent":"192.168.20.5"},"res":{"statusCode":404,"responseTime":4,"contentLength":9},"message":"GET /api/v1/namespaces/kube-system/services/kibana-logging/proxy/app/kibana 404 4ms - 9.0B"}

通过kubectl proxy访问的正确日志:

{"type":"response","@timestamp":"2018-06-11T10:20:43Z","tags":[],"pid":1,"method":"get","statusCode":304,"req":{"url":"/ui/fonts/open_sans/open_sans_v13_latin_regular.woff2","method":"get","headers":{"host":"localhost:8001","user-agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36","accept":"*/*","accept-encoding":"gzip, deflate, br","accept-language":"zh-CN,zh;q=0.9,en;q=0.8,zh-TW;q=0.7","if-modified-since":"Thu, 12 Apr 2018 20:57:06 GMT","if-none-match":"\"afc44700053c9a28f9ab26f6aec4862ac1d0795d\"","origin":"http://localhost:8001","referer":"http://localhost:8001/api/v1/namespaces/kube-system/services/kibana-logging/proxy/app/kibana","x-forwarded-for":"127.0.0.1, 172.16.66.101","x-forwarded-uri":"/api/v1/namespaces/kube-system/services/kibana-logging/proxy/ui/fonts/open_sans/open_sans_v13_latin_regular.woff2"},"remoteAddress":"192.168.16.1","userAgent":"192.168.16.1","referer":"http://localhost:8001/api/v1/namespaces/kube-system/services/kibana-logging/proxy/app/kibana"},"res":{"statusCode":304,"responseTime":3,"contentLength":9},"message":"GET /ui/fonts/open_sans/open_sans_v13_latin_regular.woff2 304 3ms - 9.0B"}

我们看到通过ingress访问,似乎将/api/v1/namespaces/kube-system/services/kibana-logging/proxy/app/kibana这个url path也传递给后面的kibana了,而kibana却无法处理。

我们回头看一下kibana-deployment.yaml,那里面有一个env var:

- name: SERVER_BASEPATH
            value: /api/v1/namespaces/kube-system/services/kibana-logging/proxy

问题似乎就出在这里。我们去掉这个env var,并重新apply kibana-deployment.yaml。然后再用浏览器访问:http://kibana.tonybai.com:30080/app/kibana,kibana的页面就会出现在眼前了。

但是这样更新后, 通过kubectl proxy方式似乎就无法正常访问kibana了 ,这里也只能二选一了,我们选择ingress访问。

3. 添加basic auth for kibana-logging ingress

虽然kibana ingress生效了,但目前kibana ingress目前在“裸奔”,我们还是要适当加上一些auth的,我们选择basic auth,从原理上讲这是加到nginx上的basic auth,kibana自身并没有做basic auth:

我们借助htpasswd工具生成用户名和密码,并基于此创建secret对象:

# htpasswd -c auth tonybai
New password:
Re-type new password:
Adding password for user tonybai

# cat auth
tonybai:$apr1$pQuJZfll$KPfa1rXJUTBBKktxtbVsI0

#kubectl create secret generic basic-auth --from-file=auth -n kube-system
secret "basic-auth" created

# kubectl get secret basic-auth -o yaml -n kube-system
apiVersion: v1
data:
  auth: dG9ueWJhaTokYXByMSRwUXVKWmZsbCRLUGZhMXJYSlVUQkJLa3R4dGJWc0kwCg==
kind: Secret
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","data":{"auth":"dG9ueWJhaTokYXByMSRwUXVKWmZsbCRLUGZhMXJYSlVUQkJLa3R4dGJWc0kwCg=="},"kind":"Secret","metadata":{"annotations":{},"name":"basic-auth","namespace":"kube-system"},"type":"Opaque"}
  creationTimestamp: 2018-06-11T23:05:42Z
  name: basic-auth
  namespace: kube-system
  resourceVersion: "579134"
  selfLink: /api/v1/namespaces/kube-system/secrets/basic-auth
  uid: f6ec373e-6dcb-11e8-a0e8-00163e0cd764
type: Opaque

在kibana-logging-ingress.yaml中增加有关auth的annotations:

// kibana-logging-ingress.yaml
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  annotations:
    nginx.ingress.kubernetes.io/auth-type: basic
    nginx.ingress.kubernetes.io/auth-secret: basic-auth
    nginx.ingress.kubernetes.io/auth-realm: "Authentication Required - tonybai"
  name: kibana-logging-ingress
  namespace: kube-system
spec:
  rules:
  - host: kibana.tonybai.com
    http:
      paths:
      - backend:
          serviceName: kibana-logging
          servicePort: 5601

apply kibana-logging-ingress.yaml后,我们再次访问:kibana.tonybai.com:30080

2UFv6n3.png!web

至此,一个演示环境下的EFK日志平台就搭建完毕了。相信有了这种hard way的安装搭建经验,我们可以灵活应对针对其中某个组件的变种部署了(比如 将elasticsearch放到k8s中部署 )。

51短信平台 :企业级短信平台定制开发专家 https://51smspush.com/

smspush : 可部署在企业内部的定制化短信平台,三网覆盖,不惧大并发接入,可定制扩展; 短信内容你来定,不再受约束, 接口丰富,支持长短信,签名可选。

著名云主机服务厂商DigitalOcean发布最新的主机计划,入门级Droplet配置升级为:1 core CPU、1G内存、25G高速SSD,价格5$/月。有使用DigitalOcean需求的朋友,可以打开这个 链接地址 :https://m.do.co/c/bff6eed92687 开启你的DO主机之路。

我的联系方式:

微博:https://weibo.com/bigwhite20xx

微信公众号:iamtonybai

博客:tonybai.com

github: https://github.com/bigwhite

微信赞赏:

7jmyQfJ.jpg!web

商务合作方式:撰稿、出书、培训、在线课程、合伙创业、咨询、广告合作。

© 2018,bigwhite. 版权所有.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK