10

clickhouse 20.x 分布式表测试与chproxy的部署(二)

 1 year ago
source link: https://blog.51cto.com/flyfish225/5475839
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

标签(空测试用例格分隔):clickhouse 系列


一: clickhouse20.x的分布式表测

1.1:clickhosue 分布式表创建

准备测试文件:
参考官网
  https://clickhouse.com/docs/en/getting-started/example-datasets/metrica
  
下载文件:
curl https://datasets.clickhouse.com/hits/tsv/hits_v1.tsv.xz | unxz --threads=`nproc` > hits_v1.tsv
# Validate the checksum
md5sum hits_v1.tsv
# Checksum should be equal to: f3631b6295bf06989c1437491f7592cb

1.2:创建库与分布式表

clickhouse-client -h node01 -u default --password tzck123.com --query "CREATE DATABASE IF NOT EXISTS datasets on cluster tzcluster3s2r02" 
clickhouse-client -h node01 -u default --password tzck123.com --query "show databases"
clickhouse 20.x 分布式表测试与chproxy的部署(二)_clickhouse
创建本地表:

clickhouse-client -h node03 -u default --password tzck123.com

CREATE TABLE datasets.hits_v1 on cluster tzcluster3s2r02 ( WatchID UInt64,  JavaEnable UInt8,  Title String,  GoodEvent Int16,  EventTime DateTime,  EventDate Date,  CounterID UInt32,  ClientIP UInt32,  ClientIP6 FixedString(16),  RegionID UInt32,  UserID UInt64,  CounterClass Int8,  OS UInt8,  UserAgent UInt8,  URL String,  Referer String,  URLDomain String,  RefererDomain String,  Refresh UInt8,  IsRobot UInt8,  RefererCategories Array(UInt16),  URLCategories Array(UInt16), URLRegions Array(UInt32),  RefererRegions Array(UInt32),  ResolutionWidth UInt16,  ResolutionHeight UInt16,  ResolutionDepth UInt8,  FlashMajor UInt8, FlashMinor UInt8,  FlashMinor2 String,  NetMajor UInt8,  NetMinor UInt8, UserAgentMajor UInt16,  UserAgentMinor FixedString(2),  CookieEnable UInt8, JavascriptEnable UInt8,  IsMobile UInt8,  MobilePhone UInt8,  MobilePhoneModel String,  Params String,  IPNetworkID UInt32,  TraficSourceID Int8, SearchEngineID UInt16,  SearchPhrase String,  AdvEngineID UInt8,  IsArtifical UInt8,  WindowClientWidth UInt16,  WindowClientHeight UInt16,  ClientTimeZone Int16,  ClientEventTime DateTime,  SilverlightVersion1 UInt8, SilverlightVersion2 UInt8,  SilverlightVersion3 UInt32,  SilverlightVersion4 UInt16,  PageCharset String,  CodeVersion UInt32,  IsLink UInt8,  IsDownload UInt8,  IsNotBounce UInt8,  FUniqID UInt64,  HID UInt32,  IsOldCounter UInt8, IsEvent UInt8,  IsParameter UInt8,  DontCountHits UInt8,  WithHash UInt8, HitColor FixedString(1),  UTCEventTime DateTime,  Age UInt8,  Sex UInt8,  Income UInt8,  Interests UInt16,  Robotness UInt8,  GeneralInterests Array(UInt16), RemoteIP UInt32,  RemoteIP6 FixedString(16),  WindowName Int32,  OpenerName Int32,  HistoryLength Int16,  BrowserLanguage FixedString(2),  BrowserCountry FixedString(2),  SocialNetwork String,  SocialAction String,  HTTPError UInt16, SendTiming Int32,  DNSTiming Int32,  ConnectTiming Int32,  ResponseStartTiming Int32,  ResponseEndTiming Int32,  FetchTiming Int32,  RedirectTiming Int32, DOMInteractiveTiming Int32,  DOMContentLoadedTiming Int32,  DOMCompleteTiming Int32,  LoadEventStartTiming Int32,  LoadEventEndTiming Int32, NSToDOMContentLoadedTiming Int32,  FirstPaintTiming Int32,  RedirectCount Int8, SocialSourceNetworkID UInt8,  SocialSourcePage String,  ParamPrice Int64, ParamOrderID String,  ParamCurrency FixedString(3),  ParamCurrencyID UInt16, GoalsReached Array(UInt32),  OpenstatServiceName String,  OpenstatCampaignID String,  OpenstatAdID String,  OpenstatSourceID String,  UTMSource String, UTMMedium String,  UTMCampaign String,  UTMContent String,  UTMTerm String, FromTag String,  HasGCLID UInt8,  RefererHash UInt64,  URLHash UInt64,  CLID UInt32,  YCLID UInt64,  ShareService String,  ShareURL String,  ShareTitle String,  ParsedParams Nested(Key1 String,  Key2 String, Key3 String, Key4 String, Key5 String,  ValueDouble Float64),  IslandID FixedString(16),  RequestNum UInt32,  RequestTry UInt8) ENGINE = ReplicatedReplacingMergeTree('/clickhouse/tables/{layer}-{shard}/datasets/hits_v1','{replica}') PARTITION BY toYYYYMM(EventDate) ORDER BY (CounterID, EventDate, intHash32(UserID)) SAMPLE BY intHash32(UserID) SETTINGS index_granularity = 8192
clickhouse 20.x 分布式表测试与chproxy的部署(二)_clickhouse_02
以本地表建分布式表:
   create TABLE datasets.hits_v1_all on cluster tzcluster3s2r02 as datasets.hits_v1
ENGINE = Distributed("tzcluster3s2r02", "datasets", "hits_v1", rand()); 
clickhouse 20.x 分布式表测试与chproxy的部署(二)_clickhouse_03
解压数据:
xz -v -d hits_v1.tsv.xz

插入数据:
clickhouse-client -h node03 -u default -d datasets --password tzck123.com --query "INSERT INTO datasets.hits_v1 FORMAT TSV" < hits_v1.tsv

select count(1) from datasets.datasets.hits_v1

clickhouse 20.x 分布式表测试与chproxy的部署(二)_clickhouse_04

clickhouse 20.x 分布式表测试与chproxy的部署(二)_chproxy_05
从任意一个节点的用分布式表查询
 clickhouse-client -h node01 -u default -d datasets --password tzck123.com --query "select count(1) from datasets.hits_v1_all" 
 clickhouse-client -h node02 -u default -d datasets --password tzck123.com --query "select count(1) from datasets.hits_v1_all" 
 clickhouse-client -h node03 -u default -d datasets --password tzck123.com --query "select count(1) from datasets.hits_v1_all" 
 clickhouse-client -h node04 -u default -d datasets --password tzck123.com --query "select count(1) from datasets.hits_v1_all" 
 clickhouse-client -h node05 -u default -d datasets --password tzck123.com --query "select count(1) from datasets.hits_v1_all" 
 clickhouse-client -h node06 -u default -d datasets --password tzck123.com --query "select count(1) from datasets.hits_v1_all" 

clickhouse 20.x 分布式表测试与chproxy的部署(二)_chproxy_06
clickhouse 20.x 分布式表测试与chproxy的部署(二)_clickhouse_07
clickhouse 20.x 分布式表测试与chproxy的部署(二)_clickhouse_08
clickhouse 20.x 分布式表测试与chproxy的部署(二)_chproxy_09
clickhouse 20.x 分布式表测试与chproxy的部署(二)_clickhouse_10
clickhouse 20.x 分布式表测试与chproxy的部署(二)_chproxy_11

测试完成之后所有的节点分布式表都是1674680 条数据
将导入数据复制到 node01 与  node05 节点插入
目前node01 与 node05 节点的 hits_v1 表没有数据
clickhouse 20.x 分布式表测试与chproxy的部署(二)_chproxy_12
clickhouse 20.x 分布式表测试与chproxy的部署(二)_chproxy_13
clickhouse-client -h node01 -u default -d datasets --password tzck123.com --query "INSERT INTO datasets.hits_v1 FORMAT TSV" < hits_v1.tsv

clickhouse-client -h node05 -u default -d datasets --password tzck123.com --query "INSERT INTO datasets.hits_v1 FORMAT TSV" < hits_v1.tsv
clickhouse 20.x 分布式表测试与chproxy的部署(二)_clickhouse_14
clickhouse 20.x 分布式表测试与chproxy的部署(二)_chproxy_15
clickhouse 20.x 分布式表测试与chproxy的部署(二)_chproxy_16
node01 与 node05 的hits_v1 表分别有 1674680 条数据

看一下分布表hits_v1_all 有多少条数

从任意一个节点的用分布式表查询
 clickhouse-client -h node01 -u default -d datasets --password tzck123.com --query "select count(1) from datasets.hits_v1_all" 
 clickhouse-client -h node02 -u default -d datasets --password tzck123.com --query "select count(1) from datasets.hits_v1_all" 
 clickhouse-client -h node03 -u default -d datasets --password tzck123.com --query "select count(1) from datasets.hits_v1_all" 
 clickhouse-client -h node04 -u default -d datasets --password tzck123.com --query "select count(1) from datasets.hits_v1_all" 
 clickhouse-client -h node05 -u default -d datasets --password tzck123.com --query "select count(1) from datasets.hits_v1_all" 
 clickhouse-client -h node06 -u default -d datasets --password tzck123.com --query "select count(1) from datasets.hits_v1_all" 

clickhouse 20.x 分布式表测试与chproxy的部署(二)_chproxy_17
clickhouse 20.x 分布式表测试与chproxy的部署(二)_chproxy_18
clickhouse 20.x 分布式表测试与chproxy的部署(二)_chproxy_19
clickhouse 20.x 分布式表测试与chproxy的部署(二)_clickhouse_20
clickhouse 20.x 分布式表测试与chproxy的部署(二)_chproxy_21
clickhouse 20.x 分布式表测试与chproxy的部署(二)_clickhouse_22

分布式表有5024040 条数据是 1674680 的 3倍 因为集群是3分片2副本的集群 

通过以上测试与集群的特性 我们可以在生产应用clickhouse的时候进行写本地表,读分布式表

上层的负载可以采用openresty 做tcp 的 8123端口都代理做连接

二:关于openresty 的代理 clickhouse 负载

如何安装 openresty 这边省略 可以参考 flyfish的文章:https://blog.51cto.com/flyfish225/3108573
需要给openresty 加入插件--with-stream 模块支持tcp的代理:

这里列出openresty的配置文件大家可以参考一下:

cd /usr/local/openresty/nginx/conf

vim nginx.conf

-----
#user  nobody;
worker_processes  8;
 
error_log  /usr/local/openresty/nginx/logs/error.log;
#error_log  logs/error.log  notice;
#error_log  logs/error.log  info;
 
pid        logs/nginx.pid;
 
 
events {
    worker_connections  1024;
}
 
 
stream {
 
 
    log_format proxy '$remote_addr [$time_local] '
                 '$protocol $status $bytes_sent $bytes_received '
                 '$session_time "$upstream_addr" '
                 '"$upstream_bytes_sent" "$upstream_bytes_received" "$upstream_connect_time"';
 
    access_log /usr/local/openresty/nginx/logs/tcp-access.log proxy ;
    open_log_file_cache off;
    include /usr/local/openresty/nginx/conf/conf.d/*.stream;
}
 
http {
    include       mime.types;
    default_type  application/octet-stream;
 
    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for"';
 
    access_log  /usr/local/openresty/nginx/logs/access.log  main;
 
    sendfile        on;
    #tcp_nopush     on;
 
    #keepalive_timeout  0;
    keepalive_timeout  60;
 
    gzip  on;
 
    server {
        listen       18080;
        server_name  localhost;
 
        #charset koi8-r;
 
        #access_log  logs/host.access.log  main;
 
        location / {
            root   html;
            index  index.html index.htm;
        }
 
        #error_page  404              /404.html;
 
        # redirect server error pages to the static page /50x.html
        #
        error_page   500 502 503 504  /50x.html;
        location = /50x.html {
            root   html;
        }
    }
}
-----
重要的是启用 openrestry的tcp 代理模块这里

clickhouse 20.x 分布式表测试与chproxy的部署(二)_chproxy_23
cd /usr/local/openresty/nginx/conf/conf.d
vim ck_prod.stream

----
upstream ck {
    server  192.168.100.142:8123 weight=25 max_fails=3 fail_timeout=60s;
    server  192.168.100.143:8123 weight=25 max_fails=3 fail_timeout=60s;
    server  192.168.100.144:8123 weight=25 max_fails=3 fail_timeout=60s;
    server  192.168.100.145:8123 weight=25 max_fails=3 fail_timeout=60s;
    server  192.168.100.146:8123 weight=25 max_fails=3 fail_timeout=60s;
}

server {
    listen 18123;
    proxy_pass ck;
}
----
clickhouse 20.x 分布式表测试与chproxy的部署(二)_clickhouse_24
之后保存从新启动openresty
cd /usr/local/openresty/nginx
sbin/nginx -t 
sbin/nginx 
clickhouse 20.x 分布式表测试与chproxy的部署(二)_chproxy_25
打开dbvs 客户端工具进行连接
clickhouse 20.x 分布式表测试与chproxy的部署(二)_chproxy_26
clickhouse 20.x 分布式表测试与chproxy的部署(二)_clickhouse_27
clickhouse 20.x 分布式表测试与chproxy的部署(二)_clickhouse_28
clickhouse 20.x 分布式表测试与chproxy的部署(二)_clickhouse_29
clickhouse 20.x 分布式表测试与chproxy的部署(二)_clickhouse_30

三: 关于 chproxy 代理

3.1 chproxy 的介绍:

chproxy 一个功能强大的clickhouse http 代理以及负载均衡中间件
chproxy 是基于golang 编写的clickhouse http 服务的代理以及负载均衡中间件,功能比较丰富 
基于yaml 配置,对于多集群流量处理是一个很不错工具

github:
     https://github.com/Vertamedia/chproxy
     
官网:
    https://www.chproxy.org/cn
clickhouse 20.x 分布式表测试与chproxy的部署(二)_clickhouse_31
chproxy 实现写本地表读分布式表:
clickhouse 20.x 分布式表测试与chproxy的部署(二)_clickhouse_32

3.2 chproxy 的部署:

去github 上面下载chproxy

chproxy-linux-amd64-v1.14.0.tar.gz

mv chproxy /usr/bin/

chproxy --version

clickhouse 20.x 分布式表测试与chproxy的部署(二)_chproxy_33

配置chproxy 的代理
mkdir /etc/chproxy/
cd /etc/chproxy/
vim chproxy.yml
-----------------
server:
  http:
      listen_addr: ":19000"
      allowed_networks: ["192.168.100.0/24","192.168.120.0/24" ]
 
users:
  - name: "distributed-write"
    to_cluster: "distributed-write"
    to_user: "default"
 
  - name: "replica-write"
    to_cluster: "replica-write"
    to_user: "default"
 
  - name: "distributed-read"
    to_cluster: "distributed-read"
    to_user: "default"
    max_concurrent_queries: 6
    max_execution_time: 1m
 
clusters:
  - name: "replica-write"
    replicas:
      - name: "replica"
        nodes: ["node01:8123", "node02:8123", "node03:8123", "node04:8123","node05:8123","node06:8123"]
    users:
      - name: "default"
        password: "tzck123.com"
 
  - name: "distributed-write"
    nodes: [
      "node01:8123",
      "node02:8123",
      "node03:8123",
      "node04:8123",
      "node05:8123",
      "node06:8123"
    ]
    users:
      - name: "default"
        password: "tzck123.com"
 
  - name: "distributed-read"
    nodes: [
      "node01:8123",
      "node02:8123",
      "node03:8123",
      "node04:8123",
      "node05:8123",
      "node06:8123"
    ]
    users:
    - name: "default"
      password: "tzck123.com"
 
caches:
  - name: "shortterm"
    dir: "/etc/chproxy/cache/shortterm"
    max_size: 150Mb
    expire: 130s

-----------------

启动文件:

vim chproxy.sh
-------

#!/bin/bash
cd /etc/chproxy
ps -ef | grep chproxy | head -2 | tail -1 | awk '{print $2}' | xargs kill -9
nohup /usr/bin/chproxy -config=/etc/chproxy/config.yml >> ./chproxy.out 2>&1 &

-----

chmod +x chproxy.sh 

./chproxy.sh 
clickhouse 20.x 分布式表测试与chproxy的部署(二)_chproxy_34
测试chproxy 
在node节点上面查询
echo 'select * from system.clusters' | curl 'http://localhost:8123/?user=default&password=tzck123.com' --data-binary @-

在chproxy 代理上面查询:

echo 'select * from system.clusters' | curl 'http://192.168.100.120:19000/?user=distributed-read&password=' --data-binary @-
echo 'select * from system.clusters' | curl 'http://192.168.100.120:19000/?user=distributed-write&password=' --data-binary @-

clickhouse 20.x 分布式表测试与chproxy的部署(二)_chproxy_35
clickhouse 20.x 分布式表测试与chproxy的部署(二)_clickhouse_36
clickhouse 20.x 分布式表测试与chproxy的部署(二)_chproxy_37
echo 'select count(1) from datasets.hits_v1' | curl 'http://192.168.100.120:19000/?user=distributed-read&password=' --data-binary @-

echo 'select count(1) from datasets.hits_v1_all' | curl 'http://192.168.100.120:19000/?user=distributed-read&password=' --data-binary @-

clickhouse 20.x 分布式表测试与chproxy的部署(二)_chproxy_38

Recommend

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK