51

企业安全之内部代码管理平台Gitlab下载及权限审计

 4 years ago
source link: https://www.tuicool.com/articles/IRVjEra
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

*本文原创作者:胡说,本文属于FreeBuf原创奖励计划,未经许可禁止转载

企业通常会使用Gitlab作为内部代码管理平台,一来私有仓库更加安全,二来gitlab的功能十分完整。但仍不能保证私有仓库中的代码不被泄露到外部,于是对gitlab的权限审计以及下载审计就变得尤为重要。本文将基于gitlab-ee-11.10版本,详细叙述如何对gitlab的权限及代码下载进行审计。

0×00 快速部署gitlab

笔者使用了docker进行快速部署:

docker pull gitlab/gitlab-ee
docker run --detach --hostname gitlab.example.com --publish 443:443 --publish 80:80 --publish 22:22 --name gitlab --restart always --volume /srv/gitlab/config:/etc/gitlab --volume /srv/gitlab/logs:/var/log/gitlab --volume /srv/gitlab/data:/var/opt/gitlab gitlab/gitlab-ee:latest

gitlab可以与ldap绑定,使用AD域账号进行登录。如果想要修改gitlab.rb文件,可以登入到容器中修改:

docker exec -it CONTAINER_ID /bin/bash

配置完成后,使用root访问 http://ip

官方网站: https://docs.gitlab.com/omnibus/docker/

0×01 Git的传输协议了解

Git主要以两种方式跨越两个仓库传输数据。

1.哑协议

Git基于HTTP之上传输通常被称为哑协议,这是因为它在服务端不需要有针对Git特有的代码。这个获取过程仅仅是一系列GET请求,客户端可以假定服务端的Git仓库中的布局。简单解读官方给出的举例,一次git clone过程:

git clone http://github.com/schacon/simplegit-progit.git  //下载simplegit-progit
Initialized empty Git repository in /private/tmp/simplegit-progit/.git/ //在/private/tmp/simplegit-progit/.git/目录中初始化一个空的git仓库
got ca82a6dff817ec66f44342007202690a93763949 //获取info/refs文件,这个文件由服务端的update-server-info生成,用于给不进行动态包生成的哑服务器提供辅助信息文件,以帮助客户机发现服务器有哪些引用和包,哑服务器意味着通过http访问
walk ca82a6dff817ec66f44342007202690a93763949 //获取commit对象
got 085bb3bcb608e1e8451d4b2432f8ecbe6306e7e7 //查看commit对象的内容
Getting alternates list for http://github.com/schacon/simplegit-progit.git //获取替代仓库list
Getting pack list for http://github.com/schacon/simplegit-progit.git //获取打包文件list
Getting index for pack 816a9b2334da9953e530f27bcac22082a9f5b835 //获取这个打包文件的索引
Getting pack 816a9b2334da9953e530f27bcac22082a9f5b835 which contains cfda3bf379e4f8dba8717dee55aab78aef7f4daf //查看打包文件的索引是否包括要找的对象
walk 085bb3bcb608e1e8451d4b2432f8ecbe6306e7e7 //获取commit
walk a11bef06a3f659402fe7563abf99ad00de2209e6 //下载对象

2.智能协议

HTTP方法是很简单但效率不是很高。使用智能协议是传送数据的更常用的方法。这些协议在远端都有Git智能型进程在服务,它可以读出本地数据并计算出客户端所需要的合适的数据给它,这有两类传输数据的进程:一对用于上传数据和一对用于下载。此处只对下载展开描述:

当下载数据时,fetch-pack和upload-pack进程就起作用了。客户端启动fetch-pack进程,连接至远端的upload-pack进程,以协商后续数据传输过程。在远端仓库有不同的方式启动upload-pack进程。你可以使用与receive-pack(接收推送到存储库中的内容时所启用的进程)相同的透过SSH管道的方式,也可以通过Git后台来启动这个进程,它默认监听在9418号端口上。这里fetch-pack进程在连接后像这样向后发送数据:

fgit-upload-pack schacon/simplegit-progit.git\0host=myserver.com\0

它也是以4字节指定后续字节长度的方式开始,然后是要运行的命令,和一个空字节,然后是服务端的主机名,再跟随一个最后的空字节。Git后台进程会检查这个命令是否可以运行,以及那个仓库是否存在,以及是否具有公开权限。如果所有检查都通过了,它会启动这个upload-pack进程并将客户端的请求移交给它。

如果你透过SSH使用获取功能,fetch-pack会像这样运行: 

ssh -x [email protected] "git-upload-pack 'schacon/simplegit-progit.git'"

0×02 gitlab数据库结构了解

docker镜像使用的是postgresql数据库,一共有236张数据表,我们知道gitlab采用了ueba的用户权限管理模型,因此想要获得用户、项目、项目组、key之间的关系,我们首先要关心这几张表:

identities存储ldap的信息,其中extern_uid存储ldap的部门等信息 :

Column Type id integer extern_uid character varying provider character varying user_id integer created_at timestamp without time zone updated_at timestamp without time zone secondary_extern_uid character varying saml_provider_id integer

keys存储key,以及与user_id的对应关系:

Column Type id integer user_id integer created_at timestamp without time zone updated_at timestamp without time zone key text title character varying type character varying fingerprint character varying public boolean last_used_at timestamp without time zone

namespaces存储用户及项目组的路径,主要用于获取项目组(type=’Group’)的信息(description)

Column Type id integer name character varying path character varying owner_id integer created_at timestamp without time zone updated_at timestamp without time zone type character varying description character varying avatar character varying membership_lock boolean share_with_group_lock boolean visibility_level integer request_access_enabled boolean ldap_sync_status character varying ldap_sync_error character varying ldap_sync_last_update_at timestamp without time zone ldap_sync_last_successful_update_at timestamp without time zone ldap_sync_last_sync_at timestamp without time zone description_html text lfs_enabled boolean parent_id integer shared_runners_minutes_limit integer repository_size_limit bigint require_two_factor_authentication boolean two_factor_grace_period integer cached_markdown_version integer plan_id integer project_creation_level integer runners_token character varying trial_ends_on timestamp with time zone file_template_project_id integer saml_discovery_token character varying runners_token_encrypted character varying custom_project_templates_group_id integer auto_devops_enabled boolean extra_shared_runners_minutes_limit integer

project_authorizations存储用户、项目以及访问权限的关系

Column Type user_id integer project_id integer access_level integer

其中,access_level的含义为:

10 => Guest access    
20 => Reporter access    
30 => Developer access    
40 => Maintainer access    
50 => Owner access # Only valid for groups

projects存储项目信息:

Column Type id integer name character varying(510) path character varying(510) description text created_at timestamp with time zone updated_at timestamp with time zone creator_id integer namespace_id integer last_activity_at timestamp with time zone import_url character varying(510) visibility_level integer archived boolean avatar character varying(510) import_status character varying(510) star_count integer import_type character varying(510) import_source character varying(510) import_error text ci_id integer shared_runners_enabled boolean runners_token character varying build_coverage_regex character varying build_allow_git_fetch boolean build_timeout integer pending_delete boolean public_builds boolean last_repository_check_failed boolean last_repository_check_at timestamp without time zone container_registry_enabled boolean only_allow_merge_if_build_succeeds boolean has_external_issue_tracker boolean repository_storage character varying request_access_enabled boolean has_external_wiki boolean lfs_enabled boolean description_html text only_allow_merge_if_all_discussions_are_resolved boolean

0×03 Gitlab日志了解

Gitlab初始化配置中,会在/var/log/gitlab目录下保存近1个月的日志,每天凌晨1点左右将目录下的*.log文件压缩为gz格式,比如将gitlab-shell.log压缩为gitlab-shell.log.1.gz,这个数字从1-30依次增加和轮换。

基于传输协议我们知道,当git执行一次git clone/git pull/git fetch的下载操作时,会在服务端启用<upload-pack>协议,gitlab本身没有提供直观的下载日志,因此我们需要通过这个协议的启动来进行gitlab的下载审计。

gitlab-shell.log: 此日志文件位于/var/log/gitlab/gitlab-shell中,该日志文件的作用是记录执行gitlab命令以及为项目添加ssh权限的日志文件: 

time="2019-05-06T08:27:37+00:00" level=info msg="executing git command" command="gitaly-upload-pack unix:/var/opt/gitlab/gitaly/gitaly.socket {\"repository\":{\"storage_name\":\"default\",\"relative_path\":\"root/mytest.git\",\"git_object_directory\":\"\",\"git_alternate_object_directories\":[],\"gl_repository\":\"project-1\",\"gl_project_path\":\"root/mytest\"},\"gl_repository\":\"project-1\",\"gl_project_path\":\"root/mytest\",\"gl_id\":\"key-2\",\"gl_username\":\"root\",\"git_config_options\":[],\"git_protocol\":null}" pid=29421 user="user with id key-2" 

更多其他日志的介绍可以浏览: https://docs.gitlab.com/ee/administration/logs.html

0×04 Gitlab代码下载以及权限审计

在了解了以上Gitlab的基础知识后,开始着手于对Gitlab的审计。首先我们讨论如何对代码下载进行审计。

gitlab-shell日志记录了upload pack 的操作,但是日志中并不直观,因此需要对日志处理,希望获取到json日志格式: 

{    
"time": "2019-05-06T08:27:37+00:00",     
"gitcommand": "git-upload-pack",     
"username": "test",     
"name": "测试账号",     
“user_department”: "运维部"    
"project_description": "用于gitlab测试",    
"gitpath": "/data/gitlab/git-data/repositories/root/mytest.git"    
”key_id": "233",     
} 

为了得到上方的日志信息,我们先从gitlab-shell.log日志中提取time、git command、gl_project_path、gl_id:

time="2019-05-06T08:27:37+00:00"    
git command="gitaly-upload-pack    
gl_project_path:"root/mytest"    
gl_id:"key-233"

由于日志中没有直接显示user的信息,因此需要从Gitlab的数据库users、keys、identities、namespaces表中查询:

通过key-id 获取user_id:

SELECT user_id FROM keys WHERE id=key-id 

通过user_id获取name、username以及部门信息user_department:

SELECT name,username FROM users WHERE id=user_id
SELECT extern_uid FROM identities WHERE user_id=usr_id 

再从namespaces中查询path=’mytest’,以获取项目的描述信息project_description,用于与user_department进行比对。

SELECT description FROM projects where path=path

至此我们需要的信息就可以存为一条日志了,通过这种方法将gitlab-shell.log处理为可读性的json日志,再接入到ELK中用于审计。

通过Python可以再写一些告警规则,比如当用户所属部门与项目的描述不一致时,则发出邮件告警。也可以从数据库中获取其他的信息来补充日志。

在Gitlab数据库中,project_authorizations表记录用户、项目以及项目访问权限的关系:

user_id | project_id | access_level     
---------+------------+--------------    
       1 |          1 |           40

我们可以结合users和projects表,统计出可读的用户-项目访问权限表,以此来审计gitlab项目的访问权限,也可将这些用户对项目的访问权限加入到gitlab代码下载日志中去。

由于gitlab的数据库中包含了密钥这类重要敏感信息,因此gitlab的数据库建议绑定本地IP,笔者的做法是每天将仅需要使用到的数据推送到MySQL服务器中,再从MySQL服务器中获取对应信息。

推送users以及keys表:

#!/bin/bash
psql -h /var/opt/gitlab/postgresql -d gitlabhq_production <<EOF
        COPY (SELECT id,user_id FROM keys) TO '/var/opt/gitlab/postgresql/key.csv' with csv header;
        COPY (SELECT id,name,username FROM users) TO '/var/opt/gitlab/postgresql/user.csv' with csv header;
        COPY (SELECT user_id,extern_uid FROM users) TO '/var/opt/gitlab/postgresql/department.csv' with csv header;
EOF
scp user.csv root@远程主机IP:/root
scp key.csv root@ 远程主机IP :/root
scp department.csv root@ 远程主机IP :/root

推送最新打包好的gitlab-shell.log.x.gz日志:

#!/bin/bash
ls | find /var/log/gitlab/gitlab-shell/ -mtime -1 |grep gz |xargs -i cp -f {} /var/log/gitlab/gitlab-shell/gitlab-shell.log.gz
scp -i /var/log/gitlab/gitlab-shell/.ssh/id_rsa gitlab-shell.log.gz root@远程主机IP:远程目录
sleep 10
rm -f gitlab-shell.log.gz

Python处理日志格式(部分代码):

import json
import pymysql
from datetime import datetime
def get_info(key_id):
        conn = pymysql.connect(
                host = 'x.x.x.x',
                port = 3306,
                user = 'gitlab',
                password = 'password',
                database = 'gitlab'
                )
        cursor = conn.cursor()
        #get user_id
        sql = "SELECT user_id FROM keys WHERE id=" + str(key_id)
        cursor.execute(sql)
        row_1 = cursor.fetchone()
        if row_1 != None:
                user_id = row_1[0]
                #get name_username
                sql2 = "SELECT name,username FROM users WHERE id=" + str(user_id)
                cursor.execute(sql2)
                row_2 = cursor.fetchone()
                name_username = row_2
                #get user_department
                sql3 = "SELECT  extern_uid  FROM identities WHERE id=" + str(user_id)
                cursor.execute(sql3)
                row_3 = cursor.fetchone()
                if row_3 != None:
                        userdepartment = row_3[0].split(",")[1]
                else:
                        userdepartment = 'null'
                user_department = userdepartment
        else:
                name_username = {'null','null'
        return (name_username,user_department)
def logtojson():
        with open(r'gitlab-shell.log') as myfile:
                logs = myfile.readlines()
                array = []
                for log in logs:
                        array.append(log)
        logdict = []
        for i in range(len(array)):
                info = array[i]
                info1 = info.split()
                if info1[0] == "I," and info1[6] == "gitlab-shell:":
                        time1 = info1[1].split("[")[1]
                        time = time1.split(".")[0]
                        gitcommand = info1[10].split("<")[1]
                        gitpath = info1[11].split(">")[0]
                        key = info1[16].split("-")[1]
                        key_id = key.split(".")[0]
                        group = gitpath.split("/")[5]
                        info = get_info(key_id)
                        name = info[0][0]
                        username = info[0][1]
                        user_department = info[1]
                        newlog = {
                                "logDate": time,
                                "gitcommand":gitcommand,
                                "gitpath":gitpath,
                                "name":name,
                                "username":username,
                                "user_department":user_department,
                                "key_id":key_id
                                }
                        logdict.append(newlog)
        with open(datetime.now().date().isoformat()+'.log',"w") as f:
                for i in logdict:
                        json.dump(i,f,ensure_ascii=False)
                        f.write('\n')
if __name__ == '__main__':
        logtojson()

以上就是gitlab下载及权限审计,欢迎各位大佬指教。

*本文原创作者:胡说,本文属于FreeBuf原创奖励计划,未经许可禁止转载


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK