3

Some notes on internal working of profiler

 1 year ago
source link: https://inspektor.cloud/blog/how-profiler-works/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
handwritten banner for blog

Many of us use profiler to measure the CPU or memory consumed by the piece of code. This led me to figure out how profilers work.

To learn about profiling, I groked a popular profiling crate pprof-rs. This library is used to measure the CPU usage of a rust program.

If you are also interested in contributing to open source code or want to learn how to read complex project source code. I would highly recommend Contributing to Complex Projects by Mitchell Hashimoto.

Basics of profiling

let's just profile a sample rust program and see how pprof used.

Here is the modifled example program that I've taken from pprof-rs. You find the full source here.

The sample program calculates the number of prime numbers from 1 to 50000.

fn main() {
let prime_numbers = prepare_prime_numbers();
// start profiling
let guard = pprof::ProfilerGuardBuilder::default()
.frequency(100)
.build()
.unwrap();
let mut v = 0;
for i in 1..50000 {
// use `is_prime_number1` function only if the incoming value
// i is divisable by 3.
if i % 3 == 0 {
if is_prime_number1(i, &prime_numbers) {
v += 1;
}
}
else {
if is_prime_number2(i, &prime_numbers) {
v += 1;
}
}
}
println!("Prime numbers: {}", v);
// stop profiling and generate the profiled report.
if let Ok(report) = guard.report().build() {
let mut file = File::create("profile.pb").unwrap();
let profile = report.pprof().unwrap();

let mut content = Vec::new();
profile.write_to_vec(&mut content).unwrap();
file.write_all(&content).unwrap();
};
}

In the above example, We started profiling at the beginning of the program using ProfilerGuardBuilder

    let guard = pprof::ProfilerGuardBuilder::default()
.frequency(100)
.build()
.unwrap();

At the end of the program, we generated and wrote the report to profile.pb file.

if let Ok(report) = guard.report().build() {
let mut file = File::create("profile.pb").unwrap();
let profile = report.pprof().unwrap();
let mut content = Vec::new();
profile.write_to_vec(&mut content).unwrap();
file.write_all(&content).unwrap();
};

The report is generated by running the program and it's visualized using google's pprof

 ~/go/bin/pprof --http=localhost:8080  profile.pb

After executing the above command, pprof will let you to visualize the profile at http://localhost:8080

cpu profile of rust program

From the visualized profile, you can clearly see that is_prime_number2 have consumed more cpu than is_prime_number1. That's because is_prime_number1 is used only the given number is divisible by 3.

Now, that we learned how to profile rust program using pprof-rs. Let's learn how pprof-rs works internally.

Please don't get too worn out yet! So far, we've learned the basics of profiler and how to use pprof-rs. Before we begin internal working of profiler, let's take a sip of water to rehydrate ourselves.

drink_water_to_stay_hyderated.webp

image source

Gist of cpu profilers

Before we get into pprof-rs code, let's learn cpu profiling in theory.

Profiler pause the program in certain interval of time and resumes after sampling the current stack trace. While sampling, it takes each stack frame and increments its count. Then the sampled data is then used to create a flamegraph or something similar.

stack traces: stack traces are the list of call stack of function calls. for eg: is_prime_number_1 -> main

Gist of profiler notes

pprof-rs implementation and its syscalls

start profiling

When you start the profiling with ProfilerGuardBuilder, pprof-rs will register signal handler and timer to specify how often programs is supposed to pause.

 let guard = pprof::ProfilerGuardBuilder::default()
.frequency(100)
.build()
.unwrap();

registering signal handler

perf_signal_handler callback function is registered for SIGPROF signal. whenever SIGPROF signal emitted, perf_signal_handler is invoked.

SIGPROF: This signal typically indicates expiration of a timer that measures both CPU time used by the current process, and CPU time expended on behalf of the process by the system. Such a timer is used to implement code profiling facilities, hence the name of this signal.

Link to the source

let handler = signal::SigHandler::SigAction(perf_signal_handler);
let sigaction = signal::SigAction::new(
handler,
signal::SaFlags::SA_SIGINFO,
signal::SigSet::empty(),
);
unsafe { signal::sigaction(signal::SIGPROF, &sigaction) }?;

specifying interval

Interval time for the every SIGPROF signal is configured using setitimer syscall. This is useful to determine how often the sample needs to be taken.

Link to the source

unsafe {
setitimer(
ITIMER_PROF,
&mut Itimerval {
it_interval,
it_value,
},
null_mut(),
)
};

handling SIGPROF signal

Since we registered perf_signal_handler function to handle SIGPROF signals, it is invoked whenever a SIGPROF signal is emitted. perf_signal_handler takes ucontext as one of the arguments. ucontext contains the current instruction pointer of machine code that is being executed.

Using that instruction pointer, current call stack trace is retrivied. That is done using backtrace crate. The collected backtrace and thread name is passed to the profiler.sample for sampling.

Link to the source

extern "C" fn perf_signal_handler(
_signal: c_int,
_siginfo: *mut libc::siginfo_t,
ucontext: *mut libc::c_void,
) {
if let Some(mut guard) = PROFILER.try_write() {
if let Ok(profiler) = guard.as_mut() {
let mut bt: SmallVec<[<TraceImpl as Trace>::Frame; MAX_DEPTH]> =
SmallVec::with_capacity(MAX_DEPTH);
// ucontext is passed to trace method to retrive
// stack frame of current instruction pointer.
TraceImpl::trace(ucontext, |frame| {
bt.push(frame.clone());
});
let current_thread = unsafe { libc::pthread_self() };
let mut name = [0; MAX_THREAD_NAME];
let name_ptr = &mut name as *mut [libc::c_char] as *mut libc::c_char;
write_thread_name(current_thread, &mut name);
let name = unsafe { std::ffi::CStr::from_ptr(name_ptr) };
profiler.sample(bt, name.to_bytes(), current_thread as u64);
}
}
}

sampling

profiler.sample interally calls a hashmap to insert stack frame and its count. As a side note, this is a custom implementation of the hashmap, rather than the rust's built-in hashmap. That's because heap allocation is forbidden inside signal handler so hashmap can't grow dynamically.

Link to the source

pub fn add(&mut self, key: T, count: isize) -> Option<Entry<T>> {
let mut done = false;
self.entries[0..self.length].iter_mut().for_each(|ele| {
if ele.item == key {
ele.count += count;
done = true;
}
});
...
}

A stack frame with least count is evicted from the hashmap if the incoming stack frame can't find a place in it, and a temporary file is created to store the evicted stack frame.

Link to the source

let mut min_index = 0;
let mut min_count = self.entries[0].count;
for index in 0..self.length {
let count = self.entries[index].count;
if count < min_count {
min_index = index;
min_count = count;
}

let mut new_entry = Entry { item: key, count };
std::mem::swap(&mut self.entries[min_index], &mut new_entry);
Some(new_entry)

ploting

The collected stack frame and its count is passed to the flamegraph crate to create a flamegraph.

Link to the source

pub fn flamegraph_with_options<W>(
&self,
writer: W,
options: &mut flamegraph::Options,
) -> Result<()>
where
W: Write,
{
let lines: Vec<String> = self
.data
.iter()
.map(|(key, value)| {
let mut line = key.thread_name_or_id();
line.push(';')
for frame in key.frames.iter().rev() {
for symbol in frame.iter().rev() {
line.push_str(&format!("{}", symbol));
line.push(';');
}

line.pop().unwrap_or_default();
line.push_str(&format!(" {}", value))
line
})
.collect();
if !lines.is_empty() {
flamegraph::from_lines(options, lines.iter().map(|s| &**s), writer).unwrap();

Ok(())
}

flamegraph crate will generate differential flamegraph from folded stack line. Example

main;prime_number1 3
main;prime_number2 5
demo flamegraph

pprof-rs also encodes the sampled data in google's pprof format which let you plot interactive graphs.

In recent days I've been excited about continuous profiling and it has been a hot thing in the observability space. Follow these amazing open-source projects parca and pyroscope to know more about continuous profiling.

Reference

  • pprof-rs
  • You can learn more about diffrential flamegraph from Brendan Gregg's Blog

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK