Go Gin源码学习(四)

基数树

这次学习的是Gin中的路由，在学习源码一种我们看到了Gin的路由是它的特色。然而基础数据使用了基数树也提供了性能的保障。因为路由这部分比较独立而且逻辑相对复杂，所以需要单独学习。首先我们需要了解的是基数树，百度百科中的解释其中有一个图可以让我们更加直观的看到数据是如何存储的。 RnayA3n.png!web 基数树，相当于是一种前缀树。对于基数树的每个节点，如果该节点是确定的子树的话，就和父节点合并。基数树可用来构建关联数组。在上面的图里也可以看到，数据结构会把所有相同前缀都提取剩余的都作为子节点。

基数树在Gin中的应用

从上面可以看到基数树是一个前缀树，图中也可以看到数据结构。那基数树在Gin中是如何应用的呢？举一个例子其实就能看得出来 router.GET("/support", handler1) router.GET("/search", handler2) router.GET("/contact", handler3) router.GET("/group/user/", handler4) router.GET("/group/user/test", handler5) 最终结果为:

/ (handler = nil, indices = "scg")
    s (handler = nil, indices = "ue")
        upport (handler = handler1, indices = "")
        earch (handler = handler2, indices = "")
    contact (handler = handler3, indices = "")
    group/user/ (handler = handler4, indices = "u")
        uest (handler = handler5, indices = "")

可以看到 router使用get方法添加了5个路由，实际存储结果就是上面显示的。我特地在后面加上了每个节点中的handler和indices。 indices是有序保存所有子节点的第一个字符形成的字符串。为什么要特意突出这个字段，因为在查找子节点下面是否包含path的时候不需要循环子节点，只需要循环这个字段就可以知道是否包含。这样的操作也可以提升一些效率。

源码查看

先看一下节点的对象的定义和如何调用的，需要注意的是indices这个字段上面已经提到了它的作用

type node struct {
	// 保存这个节点上的URL路径
	// 例如上图中的search和support, 共同的parent节点的path="s"
	// 后面两个节点的path分别是"earch"和"upport"
	path string
	// 判断当前节点路径是不是参数节点, 例如上图的:post部分就是wildChild节点
	wildChild bool
	// 节点类型包括static, root, param, catchAll
	// static: 静态节点, 例如上面分裂出来作为parent的s
	// root: 如果插入的节点是第一个, 那么是root节点
	// catchAll: 有*匹配的节点
	// param: 除上面外的节点
	nType nodeType
	// 记录路径上最大参数个数
	maxParams uint8
	// 和children[]对应, 保存的是分裂的分支的第一个字符
	// 例如search和support, 那么s节点的indices对应的"eu"
	// 代表有两个分支, 分支的首字母分别是e和u
	indices string
	// 保存孩子节点
	children []*node
	// 当前节点的处理函数
	handle Handle
	// 优先级
	priority uint32
}

//RouterGrou实现的GET方法调用了handler
func (group *RouterGroup) GET(relativePath string, handlers ...HandlerFunc) IRoutes {
	return group.handle("GET", relativePath, handlers)
}

func (group *RouterGroup) handle(httpMethod, relativePath string, handlers HandlersChain) IRoutes {
	//方法计算出路径，把group中的basepath和relativepath 合并在一起
	absolutePath := group.calculateAbsolutePath(relativePath)
	//合并handler 把group中添加的中间件和传入的handlers合并起来
	handlers = group.combineHandlers(handlers)
	//调用addRoute 添加router
	group.engine.addRoute(httpMethod, absolutePath, handlers)
	return group.returnObj()
}

接下来我们需要看的是addRoute这个方法了，方法体比较长。其实大多的逻辑都在处理带参数的节点，真正核心的逻辑其实并不多。我把主要的逻辑都写上了注释应该还是比较容易理解的。如果看不懂其实一步步debug几次也能帮助理解。

func (engine *Engine) addRoute(method, path string, handlers HandlersChain) {
	assert1(path[0] == '/', "path must begin with '/'")
	assert1(method != "", "HTTP method can not be empty")
	assert1(len(handlers) > 0, "there must be at least one handler")

	debugPrintRoute(method, path, handlers)
	//获取method的树的根节点，每个method都有一个根节点，比如GET，POST 都会维护一个根节点
	root := engine.trees.get(method)
	//如果没有则创建一个节点
	if root == nil {
		root = new(node)
		engine.trees = append(engine.trees, methodTree{method: method, root: root})
	}
	//正式添加路由
	root.addRoute(path, handlers)
}

func (n *node) addRoute(path string, handlers HandlersChain) {
	//记录原始path
	fullPath := path
	n.priority++
	//统计path中包含多少参数 就是判断`：`，`*`的数量 最多255个
	numParams := countParams(path)

	//判断节点是否为空
	if len(n.path) > 0 || len(n.children) > 0 {
	walk:
		for {
			// 更新最大参数数量
			if numParams > n.maxParams {
				n.maxParams = numParams
			}

			// 找到相同前缀 循环次数 是取 path 和 n.path 长度的小那个长度
			i := 0
			max := min(len(path), len(n.path))
			//循环判断是否字符相同，相同则i++ 直到最后
			for i < max && path[i] == n.path[i] {
				i++
			}

			//判断是否有前缀相同，如果有相同的则把目前这个节点提取出来作为子节点
			//再把相同前缀的path部分作为 父节点
			//比如n的path = romaned 现在新增路由的path = romanus 相同前缀为 roman
			//步骤为：
			//1. 提取ed 新建一个child节点 把原来n的属性都复制过去
			//2. 把原来的n的path改为相同前缀：roman 为indices添加 子节点的第一个字符:e
			if i < len(n.path) {
				child := node{
					path:      n.path[i:],
					wildChild: n.wildChild,
					indices:   n.indices,
					children:  n.children,
					handlers:  n.handlers,
					priority:  n.priority - 1,
				}

				// Update maxParams (max of all children)
				for i := range child.children {
					if child.children[i].maxParams > child.maxParams {
						child.maxParams = child.children[i].maxParams
					}
				}

				n.children = []*node{&child}
				// []byte for proper unicode char conversion, see #65
				n.indices = string([]byte{n.path[i]})
				n.path = path[:i]
				n.handlers = nil
				n.wildChild = false
			}

			//原先的节点n现在已经分成2个节点了 结构为：
			//roman 父节点
			//	ed	子节点[0]
			//那么现在需要把传入的路由添加到这个父节点中
			//最终结构为
			//roman 父节点
			//	ed 子节点[0]
			//	us 子节点[1]
			// 其中还有一些情况需要自调用 相当于递归 举例说明：
			//roman
			//	ed
			//	uie
			//当判断父节点n 本来就有一个uie子节点 这时候uie和us 又有相同前缀u 这个时候需要把这个u再次提取出来作为父节点 所以需要递归调用walk
			//最终结果为 三层结构
			//roman
			//	ed
			//	u
			//	    ie
			//	    s
			//还有一种情况是如果是带有参数的路由 则也会再次调用walk
			if i < len(path) {
				path = path[i:]

				if n.wildChild {
					n = n.children[0]
					n.priority++

					// Update maxParams of the child node
					if numParams > n.maxParams {
						n.maxParams = numParams
					}
					numParams--

					// Check if the wildcard matches
					if len(path) >= len(n.path) && n.path == path[:len(n.path)] {
						// check for longer wildcard, e.g. :name and :names
						if len(n.path) >= len(path) || path[len(n.path)] == '/' {
							continue walk
						}
					}

					panic("path segment '" + path +
						"' conflicts with existing wildcard '" + n.path +
						"' in path '" + fullPath + "'")
				}

				c := path[0]

				// slash after param
				if n.nType == param && c == '/' && len(n.children) == 1 {
					n = n.children[0]
					n.priority++
					continue walk
				}

				// Check if a child with the next path byte exists
				for i := 0; i < len(n.indices); i++ {
					if c == n.indices[i] {
						i = n.incrementChildPrio(i)
						n = n.children[i]
						continue walk
					}
				}

				// Otherwise insert it
				if c != ':' && c != '*' {
					// []byte for proper unicode char conversion, see #65
					n.indices += string([]byte{c})
					child := &node{
						maxParams: numParams,
					}
					n.children = append(n.children, child)
					n.incrementChildPrio(len(n.indices) - 1)
					n = child
				}
				n.insertChild(numParams, path, fullPath, handlers)
				return

			} else if i == len(path) {
				if n.handlers != nil {
					panic("handlers are already registered for path '" + fullPath + "'")
				}
				n.handlers = handlers
			}
			return
		}
	} else { // 节点为空，直接添加直接添加路由
		n.insertChild(numParams, path, fullPath, handlers)
		n.nType = root
	}
}

//添加节点函数 主要处理包含参数节点
func (n *node) insertChild(numParams uint8, path string, fullPath string, handlers HandlersChain) {
	var offset int // already handled bytes of the path

	// 循环查找前缀为'：' 或者 '*'
	for i, max := 0, len(path); numParams > 0; i++ {
		c := path[i]
		if c != ':' && c != '*' {
			continue
		}

		// 判断在*参数之后不能再有*或者: 否则则报错 除非到了下一个/
		end := i + 1
		for end < max && path[end] != '/' {
			switch path[end] {
			// the wildcard name must not contain ':' and '*'
			case ':', '*':
				panic("only one wildcard per path segment is allowed, has: '" +
					path[i:] + "' in path '" + fullPath + "'")
			default:
				end++
			}
		}

		//检查这个节点是否存在子节点，如果我们在这里插入通配符，子节点将是不可访问的
		if len(n.children) > 0 {
			panic("wildcard route '" + path[i:end] +
				"' conflicts with existing children in path '" + fullPath + "'")
		}

		// check if the wildcard has a name
		if end-i < 2 {
			panic("wildcards must be named with a non-empty name in path '" + fullPath + "'")
		}

		// 参数类型 相当于注册路由时候带有:
		if c == ':' {
			// split path at the beginning of the wildcard
			if i > 0 {
				n.path = path[offset:i]
				offset = i
			}

			child := &node{
				nType:     param,
				maxParams: numParams,
			}
			n.children = []*node{child}
			n.wildChild = true
			n = child
			n.priority++
			numParams--

			if end < max {
				n.path = path[offset:end]
				offset = end

				child := &node{
					maxParams: numParams,
					priority:  1,
				}
				n.children = []*node{child}
				n = child
			}

		} else {
			//如果是通配符*
			if end != max || numParams > 1 {
				panic("catch-all routes are only allowed at the end of the path in path '" + fullPath + "'")
			}

			if len(n.path) > 0 && n.path[len(n.path)-1] == '/' {
				panic("catch-all conflicts with existing handle for the path segment root in path '" + fullPath + "'")
			}

			// currently fixed width 1 for '/'
			i--
			if path[i] != '/' {
				panic("no / before catch-all in path '" + fullPath + "'")
			}

			n.path = path[offset:i]

			// first node: catchAll node with empty path
			child := &node{
				wildChild: true,
				nType:     catchAll,
				maxParams: 1,
			}
			n.children = []*node{child}
			n.indices = string(path[i])
			n = child
			n.priority++

			// second node: node holding the variable
			child = &node{
				path:      path[i:],
				nType:     catchAll,
				maxParams: 1,
				handlers:  handlers,
				priority:  1,
			}
			n.children = []*node{child}

			return
		}
	}

	// 插入路由 如果不包含参数节点 offset为0
	n.path = path[offset:]
	n.handlers = handlers
}

最后我们要看下根据path获取router的方法getRouter。这个方法还是比较简单的，注释基本也能明白。

//根据path查找路由的方法
func (n *node) getValue(path string, po Params, unescape bool) (handlers HandlersChain, p Params, tsr bool) {
	p = po
walk:
	for {
		if len(path) > len(n.path) {
			if path[:len(n.path)] == n.path {
				path = path[len(n.path):]
				// 判断如果不是参数节点
				// 那path的第一个字符 循环对比indices中的每个字符查找到子节点
				if !n.wildChild {
					c := path[0]
					for i := 0; i < len(n.indices); i++ {
						if c == n.indices[i] {
							n = n.children[i]
							continue walk
						}
					}

					tsr = path == "/" && n.handlers != nil
					return
				}

				// handle wildcard child
				n = n.children[0]
				switch n.nType {
				case param:
					// 如果是普通':'节点, 那么找到/或者path end, 获得参数
					end := 0
					for end < len(path) && path[end] != '/' {
						end++
					}

					// save param value
					if cap(p) < int(n.maxParams) {
						p = make(Params, 0, n.maxParams)
					}
					i := len(p)
					p = p[:i+1] // expand slice within preallocated capacity
					p[i].Key = n.path[1:]
					val := path[:end]
					if unescape {
						var err error
						if p[i].Value, err = url.QueryUnescape(val); err != nil {
							p[i].Value = val // fallback, in case of error
						}
					} else {
						p[i].Value = val
					}

					// 如果参数还没处理完, 继续walk
					if end < len(path) {
						if len(n.children) > 0 {
							path = path[end:]
							n = n.children[0]
							continue walk
						}

						// ... but we can't
						tsr = len(path) == end+1
						return
					}
					// 否则获得handle返回就OK
					if handlers = n.handlers; handlers != nil {
						return
					}
					if len(n.children) == 1 {
						// No handle found. Check if a handle for this path + a
						// trailing slash exists for TSR recommendation
						n = n.children[0]
						tsr = n.path == "/" && n.handlers != nil
					}

					return

				case catchAll:
					// *匹配所有参数
					if cap(p) < int(n.maxParams) {
						p = make(Params, 0, n.maxParams)
					}
					i := len(p)
					p = p[:i+1] // expand slice within preallocated capacity
					p[i].Key = n.path[2:]
					if unescape {
						var err error
						if p[i].Value, err = url.QueryUnescape(path); err != nil {
							p[i].Value = path // fallback, in case of error
						}
					} else {
						p[i].Value = path
					}

					handlers = n.handlers
					return

				default:
					panic("invalid node type")
				}
			}
		} else if path == n.path {
			// We should have reached the node containing the handle.
			// Check if this node has a handle registered.
			if handlers = n.handlers; handlers != nil {
				return
			}

			if path == "/" && n.wildChild && n.nType != root {
				tsr = true
				return
			}

			// No handle found. Check if a handle for this path + a
			// trailing slash exists for trailing slash recommendation
			for i := 0; i < len(n.indices); i++ {
				if n.indices[i] == '/' {
					n = n.children[i]
					tsr = (len(n.path) == 1 && n.handlers != nil) ||
						(n.nType == catchAll && n.children[0].handlers != nil)
					return
				}
			}

			return
		}

		// Nothing found. We can recommend to redirect to the same URL with an
		// extra trailing slash if a leaf exists for that path
		tsr = (path == "/") ||
			(len(n.path) == len(path)+1 && n.path[len(path)] == '/' &&
				path == n.path[:len(n.path)-1] && n.handlers != nil)
		return
	}
}

总结

Gin的路由是它的特色，其实就是因为他的存储结构。基数树的存储结构可以很快的查询到对应路由并且执行到handler。避免了每次请求循环所有路由的逻辑，提升了Gin整体的性能。试想如果一个大型项目中GET路由有100个，如果每次请求都去循环100次查找性能会很差，如果使用基数树的存储方式可能只需要经过几次的查询。

Gin路由代码很长，其中大部分是处理带有参数的节点的逻辑。下一次的学习中，还是老规矩，自己模仿着写一个基数树存储结构的路由查找逻辑。去除掉那些参数逻辑只留下主要核心逻辑。

基数树

基数树在Gin中的应用

源码查看

总结

Recommend

OCaPIC: PIC microcontrollers programmed in OCaml

使用gvm管理多golang版本，但我想使用统一的GOTPAH

Thoughts on why CRDT didn't work out as well for collaborative editing x...

TypeScript - 一种思维方式

GitHub - swiftwasm/swift: WebAssembly backend for the Swift Programming Language

请问大家这个等式怎么证明? - 知乎

美国空军先进到什么程度？ - 知乎

为什么圭贤不回归RS了？ - 知乎

中国住房总量估算：我们的房子过剩了吗？

一场接力

About Joyk