Introducing PyStack: Your Ultimate Python Debugger

Debugging can be a formidable challenge, especially when dealing with stubborn issues like deadlocks, segmentation faults, crashing applications, or hanging processes. But there's a new player in town - PyStack, a powerful debugger that promises to work its magic and help you navigate these complex problems. In this blog post, we're going to explore how PyStack can be your troubleshooting sidekick for these perplexing scenarios.

Why Do We Need PyStack?

You might be wondering, with a multitude of debugging tools, including interactive IDE debuggers at your disposal, why do you need PyStack? The answer lies in the nature of certain elusive bugs and issues that are incredibly challenging to resolve. Here's why PyStack comes to the rescue:
1. Deadlocks and Hanging Processes: When you encounter a hanging process, it's often hard to discern whether it's actively working or stuck in a deadlock. PyStack can provide insights into the state of these processes.
2. Hybrid Applications: Applications that blend Python with C/C++ components, like Python extension modules, or popular libraries such as NumPy or TensorFlow, can be tricky to debug. PyStack can help you tackle issues like NumPy crashes with segfaults.
3. Unique Circumstances: Some issues are peculiar, occurring under specific conditions like heavy load or after an application has been running for a certain duration. PyStack can help you investigate these niche problems.
While other tools like GDB exist, PyStack offers several advantages. It doesn't modify your code, it can inspect core dump files, and automatically fetches debugging information for your specific distribution, making it a valuable addition to your debugging toolkit.

Setting Up PyStack

Before diving into debugging, you'll need to prepare your environment:


sudo apt update
sudo apt install systemd-coredump python3-pip python3.10-venv
echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope
python3 -m venv ./venv
source ./venv/bin/activate
pip install pystack pytest-pystack

Notably, you'll need to enable core dump generation by installing `systemd-coredump` and temporarily enable `ptrace` syscalls with the `echo` command. Lastly, you'll install PyStack itself along with the Pytest plugin.

Debugging with PyStack

PyStack provides two ways to debug a program: attaching to a running process or analyzing a core dump of a crashed process. Let's start with the former. Consider this code snippet:


# wait.py 
from time import sleep 
def wait(): 
some_var = "data" 
other_var = [1, 2, 3] 
while True: sleep(5) 
print("Sleeping...") 
wait()

You can run this code in the background and use PyStack to inspect it:


nohup python wait.py &
# [1] 44000
pystack remote 44000 --locals --no-block
# Traceback for thread 44000 (python) [] (most recent call last):
#     (Python) File "/home/.../wait.py", line 18, in <module>
#         wait()
#     (Python) File "/home/.../wait.py", line 15, in wait
#         sleep(5)
#       Locals:
#         other_var: [1, 2, 3]
#         some_var: "data"

PyStack will provide a traceback, highlighting where your program hangs and displaying local variables, offering crucial context.
Unfortunately, PyStack is limited to Linux, but it works seamlessly in Docker. For instance, to debug a deadlock in a Docker container:


# Build the Docker image
docker build -t python-pystack -f Dockerfile .

# Run the container with necessary privileges
docker run --cap-add=SYS_PTRACE --name python-pystack --rm python-pystack

# Enter the container
docker exec -it python-pystack /bin/bash

‍

Analyzing Core Files

Sometimes, you won't be able to debug a live process, and that's where analyzing core dump files becomes vital. Core dumps are snapshots of a process when it crashes, often indicated by messages like "Segmentation fault (core dumped)." You can inspect core dumps with PyStack:


# Force a crash in your code
# ...
pystack core ./core --locals

This allows you to investigate the state of the program when it crashed, displaying local variables crucial for understanding the issue.

Dealing with Segmentation Faults in Libraries like NumPy

PyStack proves invaluable when debugging issues in libraries like NumPy or PyTorch that have C/C++ components. Consider this NumPy example that triggers a segmentation fault:


# pip install numpy
from multiprocessing import shared_memory
import numpy as np
#

Running this code will lead to a segfault, but analyzing the core dump with PyStack provides extensive information, making it easier to pinpoint the problem.

PyStack and Pytest

If you use Pytest for your test suite, PyStack can be integrated as a plugin to automatically run when a test exceeds a specified timeout:


pytest -s --pystack-threshold=2 
--pystack-args='--locals' 
--pystack-output-file='./pystack.log'

This allows you to inspect the process running your tests if they run for too long, providing valuable insights into test failures.

PyStack has a fairly specific set of uses and isn't necessary for debugging typical problems,although it can be incredibly helpful when problems like deadlocks or segfaults do occur. There aren't any other tools that can do it so well, and being able to determine what a programme is doing while it is running or what it was doing when it crashed is tremendously useful. Additionally, you should try PyStack if you already use a profiler (such as py-spy or Austin) or other tools that investigate stack information because they work well together.

‍

Want to receive update about our upcoming podcast?

Latest Articles

View All Articles

Implementing custom windowing and triggering mechanisms in Apache Flink for advanced event aggregation

Dive into advanced Apache Flink stream processing with this comprehensive guide to custom windowing and triggering mechanisms. Learn how to implement volume-based windows, pattern-based triggers, and dynamic session windows that adapt to user behavior. The article provides practical Java code examples, performance optimization tips, and real-world implementation strategies for complex event processing scenarios beyond Flink's built-in capabilities.

15

min read

Implementing feature flags for controlled rollouts and experimentation in production

Discover how feature flags can revolutionize your software deployment strategy in this comprehensive guide. Learn to implement everything from basic toggles to sophisticated experimentation platforms with practical code examples in Java, JavaScript, and Node.js. The post covers essential implementation patterns, best practices for flag management, and real-world architectures that have helped companies like Spotify reduce deployment risks by 80%. Whether you're looking to enable controlled rollouts, A/B testing, or zero-downtime migrations, this guide provides the technical foundation you need to build robust feature flagging systems.

12

min read

Implementing incremental data processing using Databricks Delta Lake's change data feed

Discover how to implement efficient incremental data processing with Databricks Delta Lake's Change Data Feed. This comprehensive guide walks through enabling CDF, reading change data, and building robust processing pipelines that only handle modified data. Learn advanced patterns for schema evolution, large data volumes, and exactly-once processing, plus real-world applications including real-time analytics dashboards and data quality monitoring. Perfect for data engineers looking to optimize resource usage and processing time.

12

min read