22

磨人的小妖精 Apache Superset 之绝对不改版

 4 years ago
source link: https://mp.weixin.qq.com/s/dSpItu0a51K_icAb5p5l7A
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

nmiuUjj.gif

Apache Superset

7riAnuf.jpg!web

2018年夏天,经过调研之后最终选择将BI平台定为使用Apache Superset实现。Superset在GitHub上有20K+Star,其综合能力可见一斑,但是经过深入了解和使用之后,只觉得并不能达到该有的预期,倒是很符合 (incubating) 的附加标签,同时也从某种程度上说明好用又免费的BI产品少之又少。毕竟Superset是免费开源产品,虽然对它不是很满意,但是本着不白嫖的原则也给Superset贡献了Star,同时将Superset这一阶段的二次开发告一段落,整理并分享本篇文章。

二次开发概览

版本 内容 初始版 Apache Superset 0.25.6源码 修改版 [改进]定制SQLAlchemy 1.2.2,支持对PostgreSQL 8.3的兼容 确定版 [改进]增加权限控制,用户只能查看和编辑自己创建的对象 最终版 [修复]修复Pivot类型图表自定义配置不生效的BUG 最终版2 [新增]SQL Editor左侧表结构增加表注释、字段注释的展示 最终版3 [新增]封装创建用户接口,封装创建数据源接口 完成版 [修复]修复SQL中包含百分号时查询报错问题 完成版2 [改进]修改SQL Editor中表数据预览策略,支持手动预览 完成版3 [新增]增加图表:普通折线图(XY-Line) 最终完成版 [改进]SQL Editor左侧Database、Schema、Table增加懒加载机制 最终完成版2 [新增]图表展示支持自定义排序 最终完成版3 [修复]修复数据库密码中包含特殊字符时无法连接数据库的问题 绝对不改版 [改进]修复日志记录时间差8小时的问题 绝对不改版2 [改进]改进每次查询新开数据库连接的问题 绝对不改版3 [修复]修复查询结果导出乱码问题 绝对不改版4 [新增]查询结果导出支持自定义文件名 绝对不改版5 [新增]查询结果展示数据条数 绝对不改版6 [改进]解决图表边界遮挡坐标轴刻度值问题 绝对不改版7 [改进]解决SQL Lab中字段包含中文报错问题 绝对不改版8 [新增]查询结果集支持复制 绝对不改版9 [改进]查询强制LIMIT,防止大数据量结果集造成内存飙升

二次开发实现

[改进]定制SQLAlchemy 1.2.2,支持对PostgreSQL 8.3的兼容

背景

使用基于PostgreSQL 8.3devel版本的Greenplum数据源时,在SQL Editor页面左侧选择一张表,会出现表结构无法正常加载的情况。

展开表结构但是展示为空:

QRbAN3f.jpg!web

且Superset会出现 ERROR OCCURRED WHILE FETCHING TABLE METADATA 的提示:

qUN3eqv.jpg!web

打开浏览器控制台发现请求异常信息:

{"error": "(psycopg2.ProgrammingError) relation "pg_catalog.pg_enum" does not exist
LINE 10: LEFT JOIN pg_catalog.pg_enum e ON t.oid = e..."}

解决方案

Superset使用SQLAlchemy查询Greenplum数据库,获取表的字段、索引、注释等信息,SQLAlchemy中对于PostgreSQL不同的版本使用了不同的查询分支,测试结果表明,将SQLAlchemy-1.2.2/lib/sqlalchemy/dialects/postgresql/base.py文件做如下修改,可以修复上述问题。

fmuQNzY.jpg!web

修改SQLAlchemy-1.2.2源码后重新打包安装即可。

[改进]增加权限控制,用户只能查看和编辑自己创建的对象

背景

Superset中的权限过于开放,虽然有用户、角色等概念,但是对于数据源、SQL查询记录、可视化图表等敏感模块,每个人都可以看到其他人创建的对象,无权限控制可言。

解决方案

实现 每个用户只能查看和编辑自己创建的对象。

具体实现

如:用户只能处理自己的数据源

from flask import g
from flask_appbuilder.models.sqla.filters import FilterEqualFunction

def get_user():
    return g.user
    
class DatabaseView(SupersetModelView, DeleteMixin, YamlExportMixin):
    ······
    # 过滤规则
    base_filters = [['created_by', FilterEqualFunction, get_user]]
    ······

常用模块改法概览:

模块 过滤 DatabaseView base_filters = [['created_by', FilterEqualFunction, get_user]] SliceModelView base_filters = [['id', SliceFilter, lambda: []], ['created_by', FilterEqualFunction, get_user]] DashboardModelView base_filters = [['slice', DashboardFilter, lambda: []], ['created_by', FilterEqualFunction, get_user]] search_queries search_user_id = g.user.id SavedQueryView base_filters = [['created_by', FilterEqualFunction, get_user]]

[修复]修复Pivot类型图表自定义配置不生效的BUG

背景

在使用Pivot Table时,如果自定义Metrics设置,则图表无法展示。报错信息提示 unhashable type: 'dict'

y6vMzye.jpg!web

Traceback (most recent call last):
  File "/data/Superset/superset-0.25.6/superset/views/core.py", line 1107, in generate_json
    payload = viz_obj.get_payload()
  File "/data/Superset/superset-0.25.6/superset/viz.py", line 329, in get_payload
    payload['data'] = self.get_data(df)
  File "/data/Superset/superset-0.25.6/superset/viz.py", line 628, in get_data
    margins=self.form_data.get('pivot_margins'),
  File "/home/.local/share/virtualenvs/Superset-VKnIdCi7/lib/python2.7/site-packages/pandas/core/frame.py", line 4468, in pivot_table
    margins_name=margins_name)
  File "/home/.local/share/virtualenvs/Superset-VKnIdCi7/lib/python2.7/site-packages/pandas/core/reshape/pivot.py", line 57, in pivot_table
    if i not in data:
  File "/home/.local/share/virtualenvs/Superset-VKnIdCi7/lib/python2.7/site-packages/pandas/core/generic.py", line 1075, in __contains__
    return key in self._info_axis
  File "/home/.local/share/virtualenvs/Superset-VKnIdCi7/lib/python2.7/site-packages/pandas/core/indexes/base.py", line 1694, in __contains__
    hash(key)
TypeError: unhashable type: 'dict'

解决方案

在PivotTableViz中对于metrics参数改为区分处理即可。

QzeyaaM.jpg!web

具体实现

class PivotTableViz(BaseViz):
    ······
    def get_data(self, df):
        if (
                self.form_data.get('granularity') == 'all' and
                DTTM_ALIAS in df):
            del df[DTTM_ALIAS]
        metrics = self.form_data.get('metrics')
        values = []
        for metric in metrics:
            if isinstance(metric, dict):
                values.append(metric['label'])
            else:
                values.append(metric)
        df = df.pivot_table(
            index=self.form_data.get('groupby'),
            columns=self.form_data.get('columns'),
            values=values,
            aggfunc=self.form_data.get('pandas_aggfunc'),
            margins=self.form_data.get('pivot_margins'),
        )
    ······

[新增]SQL Editor左侧表结构增加表注释、字段注释的展示

背景

SQL Editor左侧表结构默认只加载字段名称、字段类型、主键、外键,而无法得知某个表的注释和字段注释,如果需要查看此类信息则无法满足,给日常使用带来一定不便。

解决方案

效果展示

maEVJnE.png!web

具体实现

superset/views/core.py
修改 /table/<database_id>/<table_name>/<schema>/ 接口,增加获取表注释操作,并将注释信息拼到接口结果集。

eyqUnyf.jpg!web

superset/static/assets/src/SqlLab/components/ColumnElement.jsx
修改字段名称,增加字段注释展示。

aaEnyiz.jpg!web

superset/static/assets/src/SqlLab/components/TableElement.jsx
修改表名,增加表注释展示。

AbyyQjF.jpg!web

[新增]封装创建用户接口,封装创建数据源接口

背景

Superset管理员可以在后台添加用户、给用户设置权限、配置数据源等,但是如果需要与其他平台打通,比如有新员工入职时,自动开通Superset账户,或者用户在数据库权限管理平台申请一个数据库的权限之后,自动设置好Superset数据源,则无法直接实现。

解决方案

封装创建用户接口和创建数据源接口,用于与其他平台做集成。

具体实现

superset/views/core.py

@app.route('/add_superset_user', methods=['POST'])
@csrf.exempt
def add_superset_user():
    """
    封装Superset创建用户接口
    Args:
        {
            'username':, # 账户用户名
            'first_name':,
            'last_name':,
            'email':, # 邮箱
            'role_name':, # 角色名称,若有多个可按英文逗号分隔
            'password': # 账户密码
        }
    :return:
        {
            'status' : '', # True/False
            'message' : '' # 描述
        }
    """
    result = {
        'status' : '',
        'message' : '',
    }
    try:
        username = request.form.get('username')
        first_name = request.form.get('first_name')
        last_name = request.form.get('last_name')
        email = request.form.get('email')
        role_name = request.form.get('role_name')
        role = appbuilder.sm.find_role('Public') # default
        password = request.form.get('password')
        user = appbuilder.sm.add_user(username, first_name, last_name, email, role, password)
        if user:
            if role_name:
                role_names = role_name.split(',')
                for rn in role_names:
                    role = appbuilder.sm.find_role(rn)
                    user.roles.append(role)
                    appbuilder.get_session().commit()
            result['status'] = True
            result['message'] = 'SUCCESS'
        else:
            result['status'] = False
            result['message'] = '用户创建失败'
    except Exception as e:
        logging.exception(e)
        result['status'] = False
        result['message'] = str(e)
    return json.dumps(str(result))


@app.route('/add_superset_dbs', methods=['POST'])
@csrf.exempt
def add_superset_dbs():
    """
    封装Superset添加数据库接口
    一次可添加一个或多个数据库
    Args:
        {
            'superset_user_id':, # 给superset端的哪个用户创建数据库,superset_user_id、superset_user_name至少提供一个
            'superset_user_name':, # 给superset端的哪个用户创建数据库,superset_user_id、superset_user_name至少提供一个
            'db_configs': [ # 数据库配置
                {
                    "db_type" : ,
                    "db_name" : ,
                    "db_host" : ,
                    "db_port" : ,
                    "db_username" : ,
                    "db_password" :
                },
                {
                    ...
                },
                ...
            ]
        }
    :return:
        {
            'status' : '', # True/False
            'message' : '', # 描述
        }
    """
    result = {
        'status' : '',
        'message' : '',
    }
    try:
        superset_user_name = request.form.get('superset_user_name')
        superset_user_id = request.form.get('superset_user_id')
        if superset_user_name and not superset_user_id: # 如果传入用户ID为空,则使用传入用户名称查找对应用户
            create_by_user = appbuilder.sm.find_user(username=superset_user_name)
            superset_user_id = create_by_user.id
        if not superset_user_id:
            raise Exception('没法关联到创建用户')
        if not superset_user_name:
            create_by_user = appbuilder.sm.get_user_by_id(superset_user_id)
            superset_user_name = create_by_user.username
        db_configs = request.form.get('db_configs')
        db_configs = eval(db_configs)
        time_now = datetime.now()
        dbs = []
        for db_config in db_configs:
            db_type = db_config['db_type']
            db_name = db_config['db_name']
            db_host = db_config['db_host']
            db_port = db_config['db_port']
            db_username = db_config['db_username']
            db_password = db_config['db_password']
            # populate_db
            sqlalchemy_uri_template = 'postgresql://%s:%s@%s:%s/%s'
            sqlalchemy_uri = ''
            if db_type == 2: # postgresql
                sqlalchemy_uri = sqlalchemy_uri_template % (db_username, db_password, db_host, db_port, db_name)
            else:
                raise Exception('暂未支持的数据库类型')
            db = models.Database(
                created_on = time_now,
                changed_on = time_now,
                database_name = superset_user_name + '_' + db_name,
                sqlalchemy_uri = sqlalchemy_uri,
                created_by_fk = superset_user_id,
                changed_by_fk = superset_user_id,
                password = db_password,
                cache_timeout = None,
                extra = """{
                                "metadata_params": {},
                                "engine_params": {}
                            }""",
                select_as_create_table_as = 0,
                allow_ctas = 0,
                expose_in_sqllab = 1,
                force_ctas_schema = None,
                allow_run_async = 0,
                allow_run_sync = 1,
                allow_dml = 0,
                verbose_name = None,
                impersonate_user = 0,
                allow_multi_schema_metadata_fetch = 1
            )
            # pre_add
            db.set_sqlalchemy_uri(db.sqlalchemy_uri)
            security_manager.merge_perm('database_access', db.perm)
            for schema in db.all_schema_names():
                security_manager.merge_perm(
                    'schema_access', security_manager.get_schema_perm(db, schema))
            # add
            appbuilder.get_session().add(db)
            appbuilder.get_session().commit()
            dbs.append(db)
        result['status'] = True
        result['message'] = 'SUCCESS'
    except Exception as e:
        logging.exception(e)
        result['status'] = False
        result['message'] = str(e)
    return json.dumps(str(result))

[修复]修复SQL中包含百分号时查询报错问题

背景

受MySQLdb影响,Superset在查询MySQL数据库时,如果SQL中含有百分号 % ,则查询出错。

uQryYjj.jpg!web

报错日志

unsupported format character ''' (0x27) at index 49
Traceback (most recent call last):
  File "/data/Superset/superset-0.25.6/superset/sql_lab.py", line 220, in execute_sql
    **db_engine_spec.cursor_execute_kwargs)
  File "/home/.local/share/virtualenvs/Superset-VKnIdCi7/lib/python2.7/site-packages/MySQLdb/cursors.py", line 238, in execute
    query = query % args
ValueError: unsupported format character ''' (0x27) at index 49

解决方案

将百分号 % 进行替换。

具体实现

superset/connectors/sqla/models.py

z2uaYnA.jpg!web

[改进]修改SQL Editor中表数据预览策略,支持手动预览

背景

在SQL Editor左侧选中一张表,默认会进行表数据预览,而表数据预览并非总是需要的,而且假如这张表很大,则表数据预览会比较耗时。

nYbAvuA.jpg!web

解决方案

改为默认不进行表数据预览,同时支持手动执行预览。

效果展示

nI3UvaN.png!web

具体实现

superset/static/assets/src/SqlLab/components/ResultSet.jsx

nyimm2b.jpg!web

superset/static/assets/src/SqlLab/actions.js

ya2yEjv.jpg!web

[新增]增加图表:普通折线图(XY-Line)

背景

Superset现有图表中只有两种折线图:时序折线图与双轴折线图,如果需要一种普通的(X轴&Y轴)折线图,其中X轴同时支持数值、时间、日期、文字标签等类型,则无法实现。 (注:Bar Chart可以实现类似需求,但其非折线图)

解决方案

新增图表 XY - Line Chart 。数据复用Bar Chart的实现,展现形式使用Line Chart即可。

效果展示

ERRz2ef.jpg!web

BBRz6vI.jpg!web

具体实现

superset/static/assets/src/explore/visTypes.js
xy_line: {
    label: t('XY - Line Chart'),
    showOnExplore: true,
    controlPanelSections: [
      {
        label: t('Query'),
        expanded: true,
        controlSetRows: [
          ['metrics'],
          ['groupby'],
          ['row_limit'],
      ['contribution'],
        ],
      },
      {
        label: t('Chart Options'),
        expanded: true,
        controlSetRows: [
          ['color_scheme'],
          ['show_legend'],
          ['y_axis_format', 'y_axis_label'],
        ],
      },
      {
        label: t('X Axis'),
        expanded: true,
        controlSetRows: [
          ['x_axis_label', 'bottom_margin'],
          ['x_ticks_layout', 'reduce_x_ticks'],
        ],
      },
    ],
    controlOverrides: {
      groupby: {
        label: t('Series'),
      },
    },
  },

superset/static/assets/src/visualizations/index.js

jARZfyY.jpg!web

superset/static/assets/src/visualizations/nvd3_vis.js

2qiUviV.jpg!web

superset/viz.py

from superset.utils import string_to_num
class XYLineViz(DistributionBarViz):

    """A good old xy-line chart"""

    viz_type = 'xy_line'
    verbose_name = _('XY - Line Chart')
    is_timeseries = False

    def query_obj(self):
        return super(XYLineViz, self).query_obj()

    def get_data(self, df):
        fd = self.form_data

        pt = self.sort_df_values(df)
        if fd.get('contribution'):
            pt = pt.fillna(0)
            pt = pt.T
            pt = (pt / pt.sum()).T

        chart_data = []
        for name, ys in pt.items():
            if pt[name].dtype.kind not in 'biufc' or name in self.groupby:
                continue
            if isinstance(name, string_types):
                series_title = name
            elif len(self.metrics) > 1:
                series_title = ', '.join(name)
            else:
                l = [str(s) for s in name[1:]]  # noqa: E741
                series_title = ', '.join(l)
            values = []
            x_i = 0
            import datetime
            for i, v in ys.items():
                x = i
                if isinstance(x, (tuple, list)):
                    x = ', '.join([text_type(s) for s in x])
                if isinstance(x, datetime.date):
                    x = str(x)
                else:
                    xn = string_to_num(x)
                    x = xn if xn else text_type(x)
                values.append({
                    'x': x_i,
                    'y': v,
                    'label': x
                })
                x_i = x_i + 1
            d = {
                'key': series_title,
                'values': values,
            }
            chart_data.append(d)
        return chart_data

superset/static/assets/images/viz_thumbnails/xy_line.png

vqQZvaR.jpg!web

[改进]SQL Editor左侧Database、Schema、Table增加懒加载机制

背景

SQL Editor中每次打开一个Tab时,Superset都会重新加载数据源中的Database、Schema、Table三个列表,使用时存在等待时间,如果列表项过多或者加载过慢,会比较影响体验。

JJ3mqaB.png!web

解决方案载机制

列表增加缓存机制,如果缓存中没有列表项则查库获取最新列表项,否则直接载入缓存中的列表项。同时支持手动获取最新列表项。

效果展示

Ab2q6rB.png!web

具体实现

superset/static/assets/src/SqlLab/components/SqlEditorLeftBar.jsx
······
class SqlEditorLeftBar extends React.PureComponent {
  ······
  componentWillMount() {
      const editorId = this.props.queryEditor.id;
      const storage = window.sessionStorage;
          const schemaOptionsCached = storage.getItem('schemaOptionsCached' + editorId);
      const tableOptionsCached = storage.getItem('tableOptionsCached' + editorId);
          if(schemaOptionsCached) {
        const schemaOptions = JSON.parse(schemaOptionsCached);
        this.setState({ schemaLoading: false, schemaOptions: schemaOptions });
      } else {
        this.fetchSchemas(this.props.queryEditor.dbId);
      }
      if(tableOptionsCached) {
          const tableOptions = JSON.parse(tableOptionsCached);
          const tableLength = tableOptions.length;
          const filterOptions = createFilterOptions({ options: tableOptions });
          this.setState({
            filterOptions,
            tableLoading: false,
            tableOptions: tableOptions,
            tableLength: tableLength,
          });
      } else {
          this.fetchTables(this.props.queryEditor.dbId, this.props.queryEditor.schema);
      }
  }
  refreshDatasources() {
      this.fetchSchemas(this.props.queryEditor.dbId);
      this.fetchTables(this.props.queryEditor.dbId, this.props.queryEditor.schema);
  }
  ······
  fetchTables(dbId, schema, substr) {
    // This can be large so it shouldn't be put in the Redux store
    if (dbId && schema) {
      this.setState({ tableLoading: true, tableOptions: [] });
      const url = `/superset/tables/${dbId}/${schema}/${substr}/`;
      $.get(url).done((data) => {
        const filterOptions = createFilterOptions({ options: data.options });
        this.setState({
          filterOptions,
          tableLoading: false,
          tableOptions: data.options,
          tableLength: data.tableLength,
        });

    // 缓存
    const editorId = this.props.queryEditor.id;
        const storage = window.sessionStorage;
        storage.setItem('tableOptionsCached' + editorId, JSON.stringify(data.options));
      })
      .fail(() => {
        this.setState({ tableLoading: false, tableOptions: [], tableLength: 0 });
        notify.error(t('Error while fetching table list'));
      });
    } else {
      this.setState({ tableLoading: false, tableOptions: [], filterOptions: null });
    }
  }
  ······
  fetchSchemas(dbId) {
    const actualDbId = dbId || this.props.queryEditor.dbId;
    if (actualDbId) {
      this.setState({ schemaLoading: true });
      const url = `/superset/schemas/${actualDbId}/`;
      $.get(url).done((data) => {
        const schemaOptions = data.schemas.map(s => ({ value: s, label: s }));
        this.setState({ schemaOptions, schemaLoading: false });

        // 缓存
    const editorId = this.props.queryEditor.id;
    const storage = window.sessionStorage;
    storage.setItem('schemaOptionsCached' + editorId, JSON.stringify(schemaOptions));
      })
      .fail(() => {
        this.setState({ schemaLoading: false, schemaOptions: [] });
        notify.error(t('Error while fetching schema list'));
      });
    }
  }

  render() {
    ······
    <div className="m-t-5">
        <Button
              bsSize="sm"
              bsStyle="primary"
              onClick={this.refreshDatasources.bind(this)}
            >
              {t('刷新数据源')}
            </Button>
    </div>
    ······

[新增]图表展示支持自定义排序

背景

Superset中的图表默认以第一个指标进行排序,而日常使用中更多的是想根据维度进行排序,但是Superset无法实现此需求。

uAvMv23.jpg!web

解决方案

新增自定义排序机制:图表编辑页面中的 SQL 模块增加 ORDER BY 配置,后台接收此配置并整合到最终执行的SQL中。

效果展示

A7NnuqA.jpg!web

具体实现

superset/static/assets/src/explore/controls.jsx
where: {
    ······
  },

  order_by: {
    type: 'TextAreaControl',
    label: t('Custom ORDER BY clause'),
    default: '',
    language: 'sql',
    minLines: 2,
    maxLines: 10,
    offerEditInModal: false,
    description: t('The text in this box gets included in your query\'s ORDER BY ' +
    'clause. You can include ' +
    'complex expression, parenthesis and anything else ' +
    'supported by the backend it is directed towards.' +
    'Example:{"column1":"desc","column2":"asc"}'
    ),
  },

  having: {
    ······
  },

superset/static/assets/src/explore/visTypes.js

muiU3ii.jpg!web

7B3Avuy.jpg!web

superset/connectors/sqla/models.py
def get_sqla_query(  # sqla
        ······
        template_processor = self.get_template_processor(**template_kwargs)
        db_engine_spec = self.database.db_engine_spec
   
        if not orderby:
            orderby = extras.get('orderby', [])

        orderby = orderby or []
        ······

superset/viz.py

from collections import OrderedDict
······
class BaseViz(object):
    ······
    def query_obj(self):
        ······
        extras = {
            ······
            'orderby': self.parse_order_by()
        }
        ······

    def parse_order_by(self):
        order_by_options = self.form_data.get('order_by', '')
        orderby = []
        if order_by_options:
            try:
                order_dict = {
                    'asc':True,
                    'desc':False
                }
                orderbys = json.loads(order_by_options, object_pairs_hook=OrderedDict)
                for col, odr in orderbys.items():
                    orderby.append((col, order_dict.get(odr.lower(), False)))
            except Exception as e:
                logging.exception(e)
        return orderby

    def sort_df_values(self, df):
        orderby = self.parse_order_by()
        pt = df
        if orderby:
            orderby = dict(orderby)
            by = orderby.keys()
            ascending = orderby.values()
            pt = pt.sort_values(by, ascending=ascending)
            pt = pt.set_index(self.groupby)
        return pt
        
class DistributionBarViz(DistributionPieViz):
    ······
    def get_data(self, df):
        fd = self.form_data

        pt = self.sort_df_values(df)
        if fd.get('contribution'):
            pt = pt.fillna(0)
            pt = pt.T
            pt = (pt / pt.sum()).T

        chart_data = []
        ······

class XYLineViz(DistributionBarViz):
    ······
    def get_data(self, df):
        fd = self.form_data

        pt = self.sort_df_values(df)
        if fd.get('contribution'):
            pt = pt.fillna(0)
            pt = pt.T
            pt = (pt / pt.sum()).T

        chart_data = []
        ······

[修复]修复数据库密码中包含特殊字符时无法连接数据库的问题

背景

配置的数据源如果密码中有特殊字符会报错无法使用。

解决方案

使用urlquote将密码转换即可。

具体实现

superset/models/core.py

from urllib import quote_plus as urlquote
······
    def set_sqlalchemy_uri(self, uri):
        conn = sqla.engine.url.make_url(uri.strip())
        if conn.password and conn.password != PASSWORD_MASK and not custom_password_store:
            # do not over-write the password with the password mask
            self.password = urlquote(conn.password)
        conn.password = PASSWORD_MASK if conn.password else None
        self.sqlalchemy_uri = str(conn)  # hides the password

[改进]修复日志记录时间差8小时的问题

背景

Superset使用MySQL管理元数据时,logs表中dttm字段时间与当前时间差8个小时。

解决方案

具体实现

superset/models/core.py

class Log(Model):
    ······
    dttm = Column(DateTime, default=datetime.utcnow)

[改进]改进每次查询新开数据库连接的问题

背景

SQL Editor中每次查询都会新开一条数据库连接,长时间使用会导致出现大量数据库连接。

解决方案

具体实现

superset/sql_lab.py

zY3QFbi.jpg!web

[修复]修复查询结果导出乱码问题

背景

Superset查询结果导出CSV出现乱码。

解决方案

改为使用 utf-8-sig 编码。

具体实现

superset/config.py

# CSV Options: key/value pairs that will be passed as argument to DataFrame.to_csv method
# note: index option should not be overridden
CSV_EXPORT = {
    'encoding': 'utf-8-sig',
}

[新增]查询结果导出支持自定义文件名

背景

Superset查询结果导出时不支持设置导出文件名称,默认下载的文件名称是无意义字符串,需要先下载到本地磁盘然后更改文件名称。

解决方案

导出时增加设置文件名称的步骤,默认名称是无意义字符串(与原生相同),同时支持输入自定义名称。

效果展示

JfuyyeY.jpg!web

具体实现

superset/static/assets/src/SqlLab/components/ExportResults.jsx

import React from 'react';
import PropTypes from 'prop-types';
import { FormControl, FormGroup, Row, Col } from 'react-bootstrap';

import Button from '../../components/Button';
import ModalTrigger from '../../components/ModalTrigger';
import { t } from '../../locales';

const propTypes = {
  defaultFileName: PropTypes.string,
  queryId: PropTypes.number,
  animation: PropTypes.bool,
  onSave: PropTypes.func,
};
const defaultProps = {
  defaultFileName: t('Undefined'),
  animation: true,
  onSave: () => {},
};

class ExportResults extends React.PureComponent {
  constructor(props) {
    super(props);
    this.state = {
      fileName: props.defaultFileName,
      showSave: false,
    };
    this.toggleSave = this.toggleSave.bind(this);
    this.onSave = this.onSave.bind(this);
    this.onCancel = this.onCancel.bind(this);
    this.onFileNameChange = this.onFileNameChange.bind(this);
  }
  onSave() {
    window.location.href = '/superset/csv/' + this.props.queryId + '/' + this.state.fileName;
    this.saveModal.close();
  }
  onCancel() {
    this.saveModal.close();
  }
  onFileNameChange(e) {
    this.setState({ fileName: e.target.value });
  }
  toggleSave(e) {
    this.setState({ target: e.target, showSave: !this.state.showSave });
  }
  renderModalBody() {
    return (
      <FormGroup bsSize="small">
        <Row>
          <Col md={12}>
            <small>
              <label className="control-label" htmlFor="embed-height">
                {t('File name')}
              </label>
            </small>
            <FormControl
              type="text"
              placeholder={t('File name')}
              value={this.state.fileName}
          onChange={this.onFileNameChange}
            />
          </Col>
        </Row>
        <br />
        <Row>
          <Col md={12}>
            <Button
              bsStyle="primary"
              onClick={this.onSave}
              className="m-r-3"
            >
              {t('Export')}
            </Button>
            <Button onClick={this.onCancel} className="cancelExport">
              {t('Cancel')}
            </Button>
          </Col>
        </Row>
      </FormGroup>
    );
  }
  render() {
    return (
      <span className="ExportResults">
        <ModalTrigger
          ref={(ref) => { this.saveModal = ref; }}
          modalTitle={t('Export Results')}
          modalBody={this.renderModalBody()}
          triggerNode={
            <Button bsSize="small" className="toggleSave" onClick={this.toggleSave}>
              <i className="fa fa-file-text-o" /> {t('Export Results')}
            </Button>
          }
          bsSize="small"
        />
      </span>
    );
  }
}
ExportResults.propTypes = propTypes;
ExportResults.defaultProps = defaultProps;

export default ExportResults;

superset/static/assets/src/SqlLab/components/ResultSet.jsx

import ExportResults from './ExportResults';
······
    if (this.props.search || this.props.visualize || this.props.csv) {
      let csvButton;
      if (this.props.csv) {
        csvButton = (
        <ExportResults
                defaultFileName={this.props.tab + '_' + this.props.query.id}
    queryId={this.props.query.id}
                className="m-r-5"
                onSave={this.props.actions.saveQuery}
              />
        );
      }
      let visualizeButton;
      if (this.props.visualize) {
······

superset/views/core.py

@has_access
    @expose('/csv/<client_id>/<file_name>')
    @log_this
    def csv(self, client_id, file_name):
        """Download the query results as csv."""
        logging.info('Exporting CSV file [{}]'.format(client_id))
        query = (
            db.session.query(Query)
        ······
        response = Response(csv, mimetype='text/csv')
        response.headers['Content-Disposition'] = (
            'attachment; filename={}.csv'.format(parse.quote(file_name or query.name)))
        logging.info('Ready to return response')
        return response

[新增]查询结果展示数据条数

背景

Superset查询结果展示中没有数据条数信息。

解决方案

效果展示

nABrAzn.jpg!web

具体实现

superset/static/assets/src/SqlLab/components/ResultSet.jsx
······
      let resultCountSpan;
      let resultCount = 0;
      if(this.props.query.state === 'success') {
          const results = this.props.query.results;
          let data;
          if (this.props.cache && this.props.query.cached) {
            data = this.state.data;
          } else if (results && results.data) {
            data = results.data;
          }
          if (data && data.length > 0) {
            resultCount = data.length;
          }
      }
      resultCountSpan = (
          <span className="btn-sm">
        {resultCount} {t('records')}
          </span>
      );
      let searchBox;
      if (this.props.search) {
        searchBox = (
          <input
      ······
            <div className="pull-left">
              <ButtonGroup>
                {visualizeButton}
                {csvButton}
                {resultCountSpan}
              </ButtonGroup>
            </div>
            <div className="pull-right">
              {searchBox}
      ······

[改进]解决图表边界遮挡坐标轴刻度值问题

背景

Superset图表中如果刻度值较大,可能存在被边界遮挡的情况。

BnQFjq3.jpg!web

解决方案

调整图表边界控制。

效果展示

v2AfY3y.jpg!web

具体实现

superset/static/assets/src/visualizations/nvd3_vis.js

······
const maxMarginPad = 50;
const animationTime = 1000;
const minHeightForBrush = 480;

const BREAKPOINTS = {
······
    if (chart.yAxis !== undefined || chart.yAxis2 !== undefined) {
      // Hack to adjust y axis left margin to accommodate long numbers
      const containerWidth = slice.container.width();
      const marginPad = Math.ceil(
        Math.min(isExplore ? containerWidth * 0.1 : containerWidth * 0.1, maxMarginPad),
      );
      const maxYAxisLabelWidth = chart.yAxis2 ? getMaxLabelSize(slice.container, 'nv-y1')
                                              : getMaxLabelSize(slice.container, 'nv-y');
      const maxXAxisLabelHeight = getMaxLabelSize(slice.container, 'nv-x');
······

[修复]解决SQL Lab中字段包含中文报错问题

背景

SQL Editor中查询如果字段名称包含中文,则查询报错。

FJvMBve.jpg!web

解决方案

具体实现

superset/dataframe.py

import sys
reload(sys)
sys.setdefaultencoding("utf-8")

[新增]查询结果集支持复制

背景

一种常见的使用场景是将查询结果复制出来,用来查看或作他用。

解决方案

支持将查询结果集复制到剪贴板。

效果展示

BF3EZjM.jpg!web

具体实现

引入依赖 react-copy-to-clipboard

superset/static/assets/src/SqlLab/components/ResultSet.jsx

······
import {CopyToClipboard} from 'react-copy-to-clipboard';
······
      let resultCountSpan;
      let resultCount = 0;
      let copyButton;
      let copyData = "";
      if(this.props.query.state === 'success') {
          const results = this.props.query.results;
          let data;
          if (this.props.cache && this.props.query.cached) {
            data = this.state.data;
          } else if (results && results.data) {
            data = results.data;
          }
          if (data && data.length > 0) {
            resultCount = data.length;

            const columns = results.columns;
            for(let ci = 0;ci < columns.length;ci++) {
                let col = columns[ci];
                copyData += col.name + ' \t ';
            }
            copyData += ' \n ';
            for(let di = 0;di < data.length;di++) {
                let d = data[di];
                for(let ci = 0;ci < columns.length;ci++) {
                    let col = columns[ci];
                    copyData += d[col.name] + ' \t ';
                }
                copyData += ' \n ';
            }
          }
          console.log(results);
          console.log(copyData);
      }
      resultCountSpan = (
          <span className="btn-sm">
        {resultCount} {t('records')}
          </span>
      );
      copyButton = (
        <CopyToClipboard text={copyData}>
          <Button
            bsSize="small"
          >
            <i className="fa fa-copy m-l-1" /> {t('Copy to clipboard')}
          </Button>
        </CopyToClipboard>
      );
      let searchBox;
      if (this.props.search) {
        searchBox = (
          <input
            type="text"
            onChange={this.changeSearch.bind(this)}
            className="form-control input-sm"
            placeholder={t('Search Results')}
          />
        );
      }
      return (
        <div className="ResultSetControls">
          <div className="clearfix">
            <div className="pull-left">
              <ButtonGroup>
                {visualizeButton}
                {csvButton}
                {copyButton}
              {resultCountSpan}
              </ButtonGroup>
            </div>
            <div className="pull-right">
              {searchBox}
            </div>
          </div>
        </div>
      );
    }
    return <div className="noControls" />;
  }
······

[改进]查询强制LIMIT,防止大数据量结果集造成内存飙升

背景

Superset在日常使用过程中,偶尔会出现因查询SQL返回大量结果集,造成服务器内存飙升甚至耗尽的情况。

解决方案

拦截将要执行的SQL使用limit对其进行包裹,即可将SQL查询的最大数据集控制在指定范围内。

具体实现

superset/sql_lab.py

······
    elif (query.limit and superset_query.is_select() and
            db_engine_spec.limit_method == LimitMethod.WRAP_SQL):
        executed_sql = database.wrap_sql_limit(executed_sql, query.limit)
        query.limit_used = True
    elif query.limit and superset_query.is_select(): # 强制添加LIMIT,防止大数据量结果集造成内存飙升
        executed_sql = database.wrap_sql_limit(executed_sql, query.limit)
        query.limit_used = True

    # Hook to allow environment-specific mutation (usually comments) to the SQL
    SQL_QUERY_MUTATOR = config.get('SQL_QUERY_MUTATOR')
    if SQL_QUERY_MUTATOR:
······

q2uyuuQ.gif

QRJbMrF.gif

J3eQzaR.gif


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK