Go execution tracer. (import "runtime/trace") The tracer captures a wide range of execution events like goroutine creation/blocking/unblocking, syscall enter/exit/block, GC-related events, changes of heap size, processor start/stop, etc and writes them to an io.Writer in a compact form. A precise nanosecond-precision timestamp and a stack trace is captured for most events. A trace can be analyzed later with 'go tool trace' command.
GODEBUG (一个跨平台的Go程序调试工具)、 gctrace 和 schedtrace
BPF 追踪以做很多事,但都有自己的优点和缺点,接下来将详细说明。首先我从一个简单的 Go 程序开始( hello.go)
$ file hellohello: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=4dc45f1eb023f44ddb32c15bbe0fb4f933e61815, not stripped
$ ls -lh hello-rwxr-xr-x 1 bgregg root 29K Jan 1221:18 hello$ ldd hello linux-vdso.so.1 => (0x00007ffc4cb1a000) libgo.so.9 =>/usr/lib/x86_64-linux-gnu/libgo.so.9 (0x00007f25f2407000) libgcc_s.so.1 =>/lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f25f21f1000) libc.so.6 =>/lib/x86_64-linux-gnu/libc.so.6 (0x00007f25f1e27000)/lib64/ld-linux-x86-64.so.2 (0x0000560b44960000) libpthread.so.0 =>/lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f25f1c0a000) libm.so.6 =>/lib/x86_64-linux-gnu/libm.so.6 (0x00007f25f1901000)$ objdump -tT /usr/lib/x86_64-linux-gnu/libgo.so.9| grep fmt.Println0000000001221070 g O .data.rel.ro 0000000000000008 fmt.Println$descriptor0000000000978090 g F .text 0000000000000075 fmt.Println0000000001221070 g DO .data.rel.ro 0000000000000008 Base fmt.Println$descriptor0000000000978090 g DF .text 0000000000000075 Base fmt.Println
这些内容看起来非常像一个编译过的 C 语言二进制程序,因此可以使用包括 bcc/BPF在内的许多现有的调试工具和追踪器观测。相对于测量即时编译的运行时要简单得多(例如 Java 和 Node.js)。到目前为止,这个例子唯一的麻烦事函数名称中可能包含非标准的字符,例如“.”。
funccount also has options like -p to match a PID, and -i to emit output every interval. It currently can only handle up to 1000 probes at a time, so "fmt.*" was ok, but matching everything in libgo:
\#trace'/home/bgregg/functions:main.add'PIDTIDCOMMFUNC1442414424functionsmain.add#... and with both its arguments:\#trace'/home/bgregg/functions:main.add "%d %d" arg1, arg2'PIDTIDCOMMFUNC-1439014390functionsmain.add4213
3.2 Go gc Function Tracing
同样的程序,如果使用 go build 就没有 main.add() ? 禁用代码嵌入( Disabling inlining)即可。
$ go build functions.go# trace '/home/bgregg/functions:main.add "%d %d" arg1, arg2'could not determine address of symbol main.add$ objdump -t functions | grep main.add$
$ go build -gcflags '-l' functions.go$ objdump -t functions | grep main.add0000000000401000 g F .text 0000000000000020 main.add# trace '/home/bgregg/functions:main.add "%d %d" arg1, arg2'PID TID COMM FUNC -1606116061 functions main.add 53691250416
启动信息中包含一个关于 runtime-gdb.py 的警告,它非常有用:如果需要进一步深入挖掘 Go 上下文,我希望能够修复并找出告警原因。即使没有该信息,gdb 依然可以输出参数变量的值是 "x=42, y=13"。我也将它们从寄存器导出与 x86_64 ABI(Application Binary Interface,应用程序二进制接口)对比,which is how bcc's trace reads them. From the syscall(2) man page:
The reason is that Go's gc compiler is not following the standard AMD64 ABI function calling convention, which causes problems with this and other debuggers.
42 和 13 没有出现在 rdi , rsi 或者其它任何一个寄存器。原因是 Go 的 gc 编译器不遵循标准的 AMD64 ABI 函数调用约定,其它调试器也会存在这个问题。这很烦人。我猜 Go 语言的返回值使用的是另外一种 ABI,因为它可以返回多个值,所以即使入口参数是标准的,我们仍然会遇到差异。我浏览了指南(Quick Guide to Go's Assembler and the Plan 9 assembly manual),看起来函数在堆栈上传递。这些是我们的 42 和 13:
$ gdb ./hello[...](gdb) b fmt.PrintlnBreakpoint 1 at 0x401c50(gdb) rStarting program: /home/bgregg/hello[Thread debugging using libthread_db enabled]Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".[New Thread 0x7ffff449c700 (LWP 16836)][New Thread 0x7ffff3098700 (LWP 16837)][Switching to Thread 0x7ffff3098700 (LWP 16837)]Thread 3"hello" hit Breakpoint 1, fmt.Println (a=...) at ../../../src/libgo/go/fmt/print.go:263263 ../../../src/libgo/go/fmt/print.go: No such file or directory.(gdb) p a$1= {__values =0xc208000240, __count =1, __capacity =1}(gdb) p a.__values$18= (struct {...} *) 0xc208000240(gdb) p a.__values[0]$20= {__type_descriptor =0x4037c0<__go_tdn_string>, __object =0xc208000210}(gdb) x/s *0xc2080002100x403483: "Hello, BPF!"
五、Function Latency
示例:循环调用 fmt.Println() 函数的时延直方图(纳秒)
WARNING: Go 函数调用过程中,如果从一个进程(goroutine)切换到另外一个系统进程,funclatency 无法匹配入口-返回。这种场景需要一个新的工具 —— gofunclatency ,它基于 Go 内建的 GOID 替代系统的 TID 追踪时延,在某些情况下, uretprobes 修改 Go 程序可能出现崩溃的问题,因此在调试之前需要准备周全的计划。
I took a quick look at Golang with dynamic tracing and Linux enhanced BPF, via bcc's funccount and trace tools, with some successes and some challenges. Counting function calls works already. Tracing function arguments when compiled with gccgo also works, whereas Go's gc compiler doesn't follow the standard ABI calling convention, so the tools need to be updated to support this. As a proof of concept I modified the bcc trace tool to show it could be done, but that feature needs to be coded properly and integrated. Processing interface objects will also be a challenge, and multi-return values, again, areas where we can improve the tools to make this easier, as well as add macros to C for writing other custom Go observability tools as well.
七、Tips
6.1 安装和编译 BCC
git clone https://github.com/iovisor/bcc.git
mkdir bcc/build; cd bcc/build
cmake -DCMAKE_INSTALL_PREFIX=/usr \
-DLUAJIT_INCLUDE_DIR=`pkg-config --variable=includedir luajit` \ # for lua support
..
make
sudo make install
cmake -DPYTHON_CMD=python3 .. # build python3 binding
pushd src/python/
make
sudo make install
popd
LLVM (Low Level Virtual Machine)是一种编译器基础设施,以C++写成。起源于2000年伊利诺伊大学厄巴纳-香槟分校维克拉姆·艾夫(Vikram Adve)与克里斯·拉特纳(Chris Lattner)的研究,他们想要为所有静态及动态语言创造出动态的编译技术。2005年,Apple直接雇用了克里斯·拉特纳及他的团队,为了苹果电脑开发应用程序,期间克里斯·拉特纳设计发明了 Swift 语言,LLVM 成为 Mac OS X 及 iOS 开发工具的一部分。LLVM 的范围早已经不局限于创建一个虚拟机,成为了众多编译工具及低级工具技术的统称,适用于LLVM下的所有项目,包含LLVM中介码(LLVM IR)、LLVM除错工具、LLVM C++标准库等。