Cookbook¶
When in doubt, use Hunter.
Walkthrough¶
Sometimes you just want to get an overview of an unfamiliar application code, eg: only see calls/returns/exceptions.
In this situation, you could use something like
~Q(kind="line"),~Q(module_in=["six","pkg_resources"]),~Q(filename=""),stdlib=False
. Lets break that down:
~Q(kind="line")
means skip line events (~
is a negation of the filter).stdlib=False
means we don’t want to see anything from stdlib.~Q(module_in=["six","pkg_resources")]
means we’re tired of seeing stuff from those modules in site-packages.~Q(filename="")
is necessary for filtering out events that come from code without a source (like the interpreter bootstrap stuff).
You would run the application (in Bash) like:
PYTHONHUNTER='~Q(kind="line"),~Q(module_in=["six","pkg_resources"]),~Q(filename=""),stdlib=False' myapp (or python myapp.py)
Additionally you can also add a depth filter (eg: depth_lt=10
) to avoid too deep output.
Packaging¶
I frequently use Hunter to figure out how distutils/setuptools work. It’s very hard to figure out what’s going on by just looking at the code - lots of stuff happens at runtime. If you ever tried to write a custom command you know what I mean.
To show everything that is being run:
PYTHONHUNTER='module_startswith=["setuptools", "distutils", "wheel"]' python setup.py bdist_wheel
If you want too see some interesting variables:
PYTHONHUNTER='module_startswith=["setuptools", "distutils", "wheel"], actions=[CodePrinter, VarsPrinter("self.bdist_dir")]' python setup.py bdist_wheel
Typical¶
Normally you’d only want to look at your code. For that purpose, there’s the stdlib
option. Set it to False
.
Building a bit on the previous example, if I have a build
Distutils command and I only want to see my code then I’d run
this:
PYTHONHUNTER='stdlib=False' python setup.py build
But this also means I’d be seeing anything from site-packages
. I could filter on only the events from the current
directory (assuming the filename is going to be a relative path):
PYTHONHUNTER='~Q(filename_startswith="/")' python setup.py build
Needle in the haystack¶
If the needle might be though the stdlib then you got not choice. But some of the hay is very verbose and useless, like
stuff from the re
module.
Note that there are few “hidden” modules like sre
, sre_parse
, sre_compile
etc. You can filter that out with:
~Q(module_regex="(re|sre.*)$")
Although filtering out that regex stuff can cut down lots of useless output you usually still get lots of output.
Another way, if you got at least some vague idea of what might be going on is to “grep” for sourcecode. Example, to show all
the code that does something with a build_dir
property:
source_contains=".build_dir"
You could even extend that a bit to dump some variables:
source_contains=".build_dir", actions=[CodePrinter, VarsPrinter("self.build_dir")]
Stop after N calls¶
Say you want to stop tracing after 1000 events, you’d do this:
~Q(calls_gt=1000, action=Stop)
Explanation:
Q(calls_gt=1000, action=Stop)
will translate toWhen(Query(calls_gt=1000), Stop)
Q(calls_gt=1000)
will returnTrue
when 1000 call count is hit.
When(something, Stop)
will callStop
whensomething
returnsTrue
. However it will also return the result ofsomething
- the net effect being nothing being shown up to 1000 calls. Clearly not what we want …So then we invert the result,
~When(...)
is the same asNot(When)
.This may not seem intuitive but for now it makes internals simpler. If
When
would always returnTrue
thenOr(When, When)
would never run the secondWhen
and we’d need to have all sorts of checks for this. This may change in the future however.
“Probe” - lightweight tracing¶
Based on Robert Brewer’s FunctionProbe example.
The use-case is that you’d like to trace a huge application and running a tracer (even a cython one) would have a too great impact. To solve this you’d start the tracer only in placer where it’s actually needed.
To make this work you’d monkeypatch the function that needs the tracing. This example uses aspectlib:
def probe(qualname, *actions, **filters):
def tracing_decorator(func):
@functools.wraps(func)
def tracing_wrapper(*args, **kwargs):
# create the Tracer manually to avoid spending time in likely useless things like:
# - loading PYTHONHUNTERCONFIG
# - setting up the clear_env_var or thread_support options
# - atexit cleanup registration
with hunter.Tracer().trace(hunter.When(hunter.Query(**filters), *actions)):
return func(*args, **kwargs)
return tracing_wrapper
aspectlib.weave(qualname, tracing_decorator) # this does the monkeypatch
Suggested use:
to get the regular tracing for that function:
probe('module.func', hunter.VarsPrinter('var1', 'var2'))
to log some variables at the end of the target function, and nothing deeper:
probe('module.func', hunter.VarsPrinter('var1', 'var2'), kind="return", depth=0)
Another interesting thing is that you may note that you can reduce the implementation of the probe
function down to
just:
def probe(qualname, *actions, **kwargs):
aspectlib.weave(qualname, functools.partial(hunter.wrap, actions=actions, **kwargs))
It will work the same, hunter.wrap
being a decorator. However, while hunter.wrap
offers the convenience
of tracing just inside the target function (eg: probe('module.func', local=True)
) it will also add a lot of extra
filtering to trim irrelevant events from around the function (like return from tracer setup, and the internals of the
decorator), in addition to what hunter.trace()
does. Not exactly lightweight…
Silenced exception runtime analysis¶
Finding code that discards exceptions is sometimes really hard.
Note
This was made available in hunter.actions.ErrorSnooper
for convenience. This cookbook entry will remain for educational
purposes.
While this is easy to find with a grep "except:" -R .
:
def silenced_easy():
try:
error()
except:
pass
Variants of this ain’t easy to grep:
def silenced_easy():
try:
error()
except Exception:
pass
If you can’t simply review all the sourcecode then runtime analysis is one way to tackle this:
class DumpExceptions(hunter.CodePrinter):
events = ()
depth = 0
count = 0
exc = None
def __init__(self, max_count=10, **kwargs):
self.max_count = max_count
self.backlog = collections.deque(maxlen=5)
super(DumpExceptions, self).__init__(**kwargs)
def __call__(self, event):
self.count += 1
if event.kind == 'exception': # something interesting happened ;)
self.events = list(self.backlog)
self.events.append(event.detach(self.try_repr))
self.exc = self.try_repr(event.arg[1])
self.depth = event.depth
self.count = 0
elif self.events:
if event.depth > self.depth: # too many details
return
elif event.depth < self.depth and event.kind == 'return': # stop if function returned
op = event.instruction
op = op if isinstance(op, int) else ord(op)
if op == RETURN_VALUE:
self.output("{BRIGHT}{fore(BLUE)}{} tracing {} on {}{RESET}\n",
">" * 46, event.function, self.exc)
for event in self.events:
super(DumpExceptions, self).__call__(event)
if self.count > 10:
self.output("{BRIGHT}{fore(BLACK)}{} too many lines{RESET}\n",
"-" * 46)
else:
self.output("{BRIGHT}{fore(BLACK)}{} function exit{RESET}\n",
"-" * 46)
self.events = []
self.exc = None
elif self.count < self.max_count:
self.events.append(event.detach(self.try_repr))
else:
self.backlog.append(event.detach(self.try_repr))
Profiling¶
Hunter can be used to implement profiling (measure function timings).
The most basic implementation that only measures timings looks like this:
from hunter.actions import Action
from hunter.actions import RETURN_VALUE
class ProfileAction(Action):
def __init__(self):
self.timings = {}
def __call__(self, event):
if event.kind == 'call':
self.timings[id(event.frame)] = time()
elif event.kind == 'return':
start_time = self.timings.pop(id(event.frame), None)
if start_time is None:
return
delta = time() - start_time
print(f'{event.function} returned: {event.arg}. Duration: {delta:.4f}s\n')
If you don’t care about exceptions at all this will be fine, but then you might just as well use a real profiler.
When exceptions occur Python send this to the tracer:
event.kind="exception", event.arg=(exc_value, exc_type, tb)
event.kind="return", event.arg=None
Unfortunately Python emits the return event even if the exception wasn’t discarded so we need to do some extra checks around the last bytecode instruction that run at the return event.
This means that we have to store the exception for a little while, and do the check at return:
from hunter.actions import Action
from hunter.actions import RETURN_VALUE
class ProfileAction(Action):
def __init__(self):
self.timings = {}
def __call__(self, event):
current_time = time()
frame_id = id(event.frame)
if event.kind == 'call':
self.timings[frame_id] = current_time, None
elif frame_id in self.timings:
start_time, depth, exception = self.timings.pop(frame_id)
if event.kind == 'exception':
# store the exception
# (there will be a followup 'return' event in which we deal with it)
self.timings[frame_id] = start_time, event.arg
elif event.kind == 'return':
delta = current_time - start_time
if event.instruction == RETURN_VALUE:
# exception was discarded
print(f'{event.function} returned: {event.arg}. Duration: {delta:.4f}s\n')
else:
print(f'{event.function} raised exception: {exception}. Duration: {delta:.4f}s\n')
If you try that example you may notice that it’s not completely equivalent to any of the profilers available out there: data for builtin functions is missing.
Python does in fact have a profiling mode (eg: hunter.trace(profile=True
) and that will make hunter use sys.setprofile
instead
of sys.setrace
. However there are some downsides with that API:
- exception data will be missing (most likely because profiling is designed for speed and tracebacks are costly to build)
- trace events for builtin functions do not have their own frame objects (so we need to cater for that)
Behold, a ProfileAction that works in any mode:
from hunter.actions import ColorStreamAction
from hunter.actions import RETURN_VALUE
class ProfileAction(ColorStreamAction):
# using ColorStreamAction brings this more in line with the other actions
# (stream option, coloring and such, see the other examples for colors)
def __init__(self, **kwargs):
self.timings = {}
super(ProfileAction, self).__init__(**kwargs)
def __call__(self, event):
current_time = time()
# include event.builtin in the id so we don't have problems
# with Python reusing frame objects from the previous call for builtin calls
frame_id = id(event.frame), str(event.builtin)
if event.kind == 'call':
self.timings[frame_id] = current_time, None
elif frame_id in self.timings:
start_time, exception = self.timings.pop(frame_id)
# try to find a complete function name for display
function_object = event.function_object
if event.builtin:
function = '<builtin>.{}'.format(event.builtin.__name__)
elif function_object:
if hasattr(function_object, '__qualname__'):
function = '{}.{}'.format(
function_object.__module__, function_object.__qualname__
)
else:
function = '{}.{}'.format(
function_object.__module__,
function_object.__name__
)
else:
function = event.function
if event.kind == 'exception':
# store the exception
# (there will be a followup 'return' event in which we deal with it)
self.timings[frame_id] = start_time, event.arg
elif event.kind == 'return':
delta = current_time - start_time
if event.instruction == RETURN_VALUE:
# exception was discarded
self.output(
'{fore(BLUE)}{} returned: {}. Duration: {:.4f}s{RESET}\n',
function, event.arg, delta
)
else:
self.output(
'{fore(RED)}{} raised exception: {}. Duration: {:.4f}s{RESET}\n',
function, exception, delta
)