40

go http 框架性能大幅下降原因分析

 5 years ago
source link: https://studygolang.com/articles/15602?amp%3Butm_medium=referral
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

最近在开发一个web 框架,然后业务方使用过程中,跟我们说,压测qps 上不去,我就很纳闷,httprouter + net/http.httpserver , 性能不可能这么差啊,网上的压测结果都是10w qps 以上,几个middleware 至于将性能拖垮?后来一番排查,发现些有意思的东西。

首先,我就简单压测hello world, 每个请求进来,我日志都不打,然后,打开pprof ,显示的情况如下:

这里futex 怎么这么高?看着上面的一些操作,addtimer, deltimer 我想到以前的自己实现的定时器,这估计是超时引起的。然后检查版本,go1.9,  然后框架默认为每个conn 设置了4个timeout,readtimeout, writetimeout, idletimeout, headertimeout ,这直接导致了定时器在添加和删除回调的时候,锁的压力特别大。

下面我们分析下,正常的加超时操作,到底发生了些什么,下面是个最简单的例子,为了安全,每个连接设置超时。

package main

import (

"fmt"

"github.com/julienschmidt/httprouter"

"log"

"net/http"

"time"

)

func Index(w http.ResponseWriter, r *http.Request, _ httprouter.Params) {

fmt.Fprint(w, "Welcome!\n")

}

func Hello(w http.ResponseWriter, r *http.Request, ps httprouter.Params) {

fmt.Fprintf(w, "hello, %s!\n", ps.ByName("name"))

}

func main() {

router := httprouter.New()
router.GET("/", Index)
router.GET("/hello/:name", Hello)

srv := &http.Server{
    ReadTimeout:       5 * time.Second,
    WriteTimeout:      10 * time.Second,
    ReadHeaderTimeout: 10 * time.Second,
    IdleTimeout:       10 * time.Second,
    Addr:              "0.0.0.0:8998",
    Handler:           router,
}

log.Fatal(srv.ListenAndServe())

}

其中,ListenAndServe() 在调用accept 每个连接后,会调用 server.serve(), 根据是否添加超时,调用conn.SetReadDeadline等函数,对应的是 net/http/server.go,如下:

// Serve a new connection.

func (c *conn) serve(ctx context.Context) {

...

if tlsConn, ok := c.rwc.(*tls.Conn); ok {
    if d := c.server.ReadTimeout; d != 0 {
        c.rwc.SetReadDeadline(time.Now().Add(d)) // 设置读超时
    }
    if d := c.server.WriteTimeout; d != 0 {
        c.rwc.SetWriteDeadline(time.Now().Add(d))// 设置写超时
    }
    if err := tlsConn.Handshake(); err != nil {
        c.server.logf("http: TLS handshake error from %s: %v", c.rwc.RemoteAddr(), err)
        return
    }
    c.tlsState = new(tls.ConnectionState)
    *c.tlsState = tlsConn.ConnectionState()
    if proto := c.tlsState.NegotiatedProtocol; validNPN(proto) {
        if fn := c.server.TLSNextProto[proto]; fn != nil {
            h := initNPNRequest{tlsConn, serverHandler{c.server}}
            fn(c.server, tlsConn, h)
        }
        return
    }
}

...

之后,con.SetReadDeadline 会调用 internal/poll/fd_poll_runtime.go的 fd.setReadDeadline,最后调用runtime/netpoll.go 的poll_runtime_pollSetDeadline, 这个函数会链接成internal/poll.runtime_pollSetDeadline。这个函数比较关键:

//go:linkname poll_runtime_pollSetDeadline internal/poll.runtime_pollSetDeadline

func poll_runtime_pollSetDeadline(pd

pollDesc, d int64, mode int) {

lock(&pd.lock)

if pd.closing {

unlock(&pd.lock)

return

}

pd.seq++ // invalidate current timers

// Reset current timers.

if pd.rt.f != nil {

deltimer(&pd.rt)

pd.rt.f = nil

}

if pd.wt.f != nil {

deltimer(&pd.wt)

pd.wt.f = nil

}

// Setup new timers.

if d != 0 && d <= nanotime() {

d = -1

}

if mode == 'r' || mode == 'r'+'w' {

pd.rd = d

}

if mode == 'w' || mode == 'r'+'w' {

pd.wd = d

}

if pd.rd > 0 && pd.rd == pd.wd {

pd.rt.f = netpollDeadline

pd.rt.when = pd.rd

// Copy current seq into the timer arg.

// Timer func will check the seq against current descriptor seq,

// if they differ the descriptor was reused or timers were reset.

pd.rt.arg = pd

pd.rt.seq = pd.seq

addtimer(&pd.rt)

} else {

if pd.rd > 0 {

pd.rt.f = netpollReadDeadline // 设置读的定时回调

pd.rt.when = pd.rd

pd.rt.arg = pd

pd.rt.seq = pd.seq

addtimer(&pd.rt) // 添加到系统定时器中

}

if pd.wd > 0 {

pd.wt.f = netpollWriteDeadline // 设置写的定时回调

pd.wt.when = pd.wd

pd.wt.arg = pd

pd.wt.seq = pd.seq

addtimer(&pd.wt) // 添加到系统定时器中

}

}

// If we set the new deadline in the past, unblock currently pending IO if any.

g

atomicstorep(unsafe.Pointer(&wg), nil) // full memory barrier between stores to rd/wd and load of rg/wg in netpollunblock

if pd.rd < 0 {

rg = netpollunblock(pd, 'r', false)

}

if pd.wd < 0 {

wg = netpollunblock(pd, 'w', false)

}

unlock(&pd.lock)

if rg != nil {

netpollgoready(rg, 3)

}

if wg != nil {

netpollgoready(wg, 3)

}

}

这里主要工作就是检查过期定时器,然后添加定时器,设置回调函数为netpollReadDeadline 或者netpollWriteDeadline。 从中可以看出添加和删除定时器操作为addtimer(&pd.rt), deltimer(&pd.rt)。

后面就是核心了,为啥加超时后这么慢,看下addtimer 的实现,timer 是个四叉小顶堆,每次添加一个超时,最后都需要对一个全局的timers 进行加锁,当qps 很高,一个请求,多次加锁,这性能能很高吗?

type timer struct {

i int // heap index

// Timer wakes up at when, and then at when+period, ... (period > 0 only)
// each time calling f(arg, now) in the timer goroutine, so f must be
// a well-behaved function and not block.
when   int64
period int64
f      func(interface{}, uintptr)
arg    interface{}
seq    uintptr

}

var timers struct {

lock mutex

gp

g

created bool

sleeping bool

rescheduling bool

sleepUntil int64

waitnote note

timer

}

//添加一个定时器

func addtimer(t *timer) {

lock(&timers.lock)

addtimerLocked(t)

unlock(&timers.lock)

}

解决锁冲突改怎么办?分段锁是很常见一个思路,在go1.10 后,timers 由一个,变成64个,定时器被打散到64个锁上去,自然锁冲突就降低了。看1.10的runtime/time.go 可以发现定义如下,每个p有单独的timer, 每个timer能被多个p使用:

// Package time knows the layout of this structure.

// If this struct changes, adjust ../time/sleep.go:/runtimeTimer.

// For GOOS=nacl, package syscall knows the layout of this structure.

// If this struct changes, adjust ../syscall/net_nacl.go:/runtimeTimer.

type timer struct {

tb *timersBucket // the bucket the timer lives in

i int // heap index

// Timer wakes up at when, and then at when+period, ... (period > 0 only)
// each time calling f(arg, now) in the timer goroutine, so f must be
// a well-behaved function and not block.
when   int64
period int64
f      func(interface{}, uintptr)
arg    interface{}
seq    uintptr

}

// timersLen is the length of timers array.

//

// Ideally, this would be set to GOMAXPROCS, but that would require

// dynamic reallocation

//

// The current value is a compromise between memory usage and performance

// that should cover the majority of GOMAXPROCS values used in the wild.

const timersLen = 64 //64个bucket

// timers contains "per-P" timer heaps.

//

// Timers are queued into timersBucket associated with the current P,

// so each P may work with its own timers independently of other P instances.

//

// Each timersBucket may be associated with multiple P

// if GOMAXPROCS > timersLen.

var timers [timersLen]struct {

timersBucket

// The padding should eliminate false sharing
// between timersBucket values.
pad [sys.CacheLineSize - unsafe.Sizeof(timersBucket{})%sys.CacheLineSize]byte

}

下面是go1.10 后的timer 数据结构(此图来源于网络):

总结,网上很多httpserver 框架压测 qps 很高,但是它们的demo并没有设置超时,数据真实值会差很多。线上如果需要设置超时,需要注意go 的版本,qps 很高的情况下,最好使用1.10以上。最终我们不做任何其他操作情况下,仅将go 版本提高到1.10,qps 提高接近2倍。

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK