Developer Guide
This guide is for developers who want to contribute to pydftracer or understand its internals.
Development Setup
Prerequisites
Python 3.9 or higher
Git
C++ compiler (for building the core DFTracer library)
pip and virtualenv
Clone the Repository
pip install git+https://github.com/LLNL/dftracer.git --no-deps
pip install ".[dev]"
Create Development Environment
# Create virtual environment
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install in development mode with all dependencies
pip install ".[dev,docs,dynamo]"
This installs:
dev: Testing and development tools (pytest, ruff, mypy)
docs: Documentation building tools (Sphinx, themes)
dynamo: PyTorch integration for Dynamo tracing
Project Structure
Repository Layout
pydftracer/
├── python/
│ └── dftracer/
│ └── python/ # Main Python package
│ ├── __init__.py
│ ├── logger.py # Core logger implementation
│ ├── common.py # Common utilities
│ ├── env.py # Environment configuration
│ ├── ai.py # AI/ML tracing API
│ ├── ai_common.py # AI common utilities
│ ├── ai_init.py # AI initialization
│ ├── dynamo.py # PyTorch Dynamo integration
│ └── dbg/ # Debug utilities
│ ├── __init__.py
│ ├── logger.py
│ └── ai.py
├── tests/ # Test suite
│ ├── test_dftracer.py
│ ├── test_ai_logging.py
│ ├── test_dynamo.py
│ └── utils.py
├── docs/ # Documentation
│ ├── source/
│ └── Makefile
├── pyproject.toml # Project configuration
└── README.md
Key Modules
- dftracer.python.logger
Core logging functionality,
dftracerclass, anddft_fndecorator- dftracer.python.common
Common utilities, type definitions, and the profiler protocol
- dftracer.python.env
Environment variable handling and logger setup
- dftracer.python.ai
AI/ML specific tracing decorators and utilities
- dftracer.python.dynamo
PyTorch Dynamo integration for model tracing
Running Tests
Run All Tests
# Run all tests
pytest
# Run with coverage
pytest --cov=python/dftracer --cov-report=html
# Run in parallel
pytest -n auto
Run Specific Tests
# Run specific test file
pytest tests/test_dftracer.py
# Run specific test function
pytest tests/test_dftracer.py::TestDFTracerLogger::test_dftracer_singleton
# Run tests matching a pattern
pytest -k "test_ai"
Test with Environment Variables
# Enable DFTracer for tests
DFTRACER_ENABLE=1 pytest tests/test_dftracer.py
# Set log level for debugging
DFTRACER_LOG_LEVEL=DEBUG pytest tests/test_ai_logging.py
Code Quality
Linting
The project uses ruff for linting:
# Run ruff linter
ruff check python/dftracer
# Auto-fix issues
ruff check --fix python/dftracer
Type Checking
The project uses mypy for type checking:
# Run mypy
mypy python/dftracer/python/
# Check specific file
mypy python/dftracer/python/logger.py
Formatting
Follow the project’s coding style:
Line length: 88 characters (Black default)
Use type hints where possible
Follow Google/NumPy docstring conventions
Configuration
Linting and type checking rules are defined in pyproject.toml:
[tool.ruff]
line-length = 88
target-version = "py39"
[tool.ruff.lint]
select = ["E", "F", "W", "B", "I", "UP"]
ignore = ["E501", "B006", "B008", ...]
[tool.mypy]
python_version = "3.9"
warn_return_any = true
disallow_untyped_defs = true
Building Documentation
Build HTML Docs
cd docs
make html
# View the docs
open build/html/index.html # macOS
# or
xdg-open build/html/index.html # Linux
Clean Build
make clean
make html
Check for Broken Links
make linkcheck
Build Other Formats
make latexpdf # PDF (requires LaTeX)
make epub # EPUB
make man # Man pages
Contributing
Development Workflow
Fork and Clone
git clone https://github.com/YOUR_USERNAME/dftracer.git cd dftracer/pydftracer
Create a Branch
git checkout -b feature/my-feature # or git checkout -b fix/issue-123
Make Changes
Write code following the style guide
Add tests for new functionality
Update documentation
Run Tests
pytest ruff check python/dftracer mypy python/dftracer/python/
Commit Changes
git add . git commit -m "Add feature: description"
Push and Create PR
git push origin feature/my-feature
Then create a Pull Request on GitHub.
Commit Message Guidelines
Follow conventional commit format:
<type>: <description>
[optional body]
[optional footer]
Types:
feat: New featurefix: Bug fixdocs: Documentation changestest: Adding or updating testsrefactor: Code refactoringperf: Performance improvementschore: Maintenance tasks
Example:
feat: add support for custom trace categories
- Implement custom category registration
- Add tests for category validation
- Update documentation
Closes #123
Adding New Features
Adding a New Tracer Category
To add a new AI/ML tracer category:
Define the category in ai_common.py
class MyCategory(DFTracerAI): def __init__(self, ...): super().__init__(cat="my_category", ...)
Add to AI class hierarchy
class AI(DFTracerAI): def __init__(self): super().__init__(cat="ai", ...) self.my_category = MyCategory()
Export in __init__.py
from dftracer.python.ai_common import MyCategory __all__ = [..., "MyCategory"]
Add tests
def test_my_category(): @ai.my_category def my_function(): pass
Update documentation
Add examples to AI/ML Tracing Guide
Adding New Environment Variables
Define in env.py
MY_NEW_VAR_ENV = "DFTRACER_MY_VAR" MY_NEW_VAR = os.getenv(MY_NEW_VAR_ENV, "default_value")
Export in __init__.py
from dftracer.python.env import MY_NEW_VAR __all__ = [..., "MY_NEW_VAR"]
Document in env.rst
Add to API reference
Debugging
Enable Debug Logging
export DFTRACER_LOG_LEVEL=DEBUG
python your_script.py
Use Debug Logger
from dftracer.python.dbg import logger as dbg_logger
# This provides more verbose output
log = dbg_logger()
Common Issues
Issue: Tests fail with “DFTracer not available”
Solution: Ensure the C++ DFTracer library is installed:
pip install dftracer
pip install . # rewrite to install local changes
# OR
pip install dftracer --no-deps # since dftracer depends on this package
Issue: Import errors in tests
Solution: Install in development mode:
pip install .
Issue: Type checking fails
Solution: Update type stubs or add to mypy overrides in pyproject.toml
Performance Considerations
Profiling Overhead
DFTracer is designed for minimal overhead, but consider:
Decorator overhead: ~1-5% for most functions
I/O tracing: Depends on I/O frequency
Event logging: Buffered writes, minimal impact
Reducing Overhead
Selective tracing: Only trace critical paths
Disable categories: Turn off unused categories
Batch logging: Use streaming mode for high-frequency events
# Disable unused categories
ai.comm.disable()
ai.checkpoint.disable()
# Use metadata mode for high-frequency events
for epoch in range(num_epochs):
ai.pipeline.epoch.start(metadata=True)
# Training code
ai.pipeline.epoch.stop(metadata=True)
Release Process
Versioning
pydftracer follows Semantic Versioning:
MAJOR: Incompatible API changes
MINOR: New features, backward compatible
PATCH: Bug fixes, backward compatible
The version is managed by setuptools-scm from git tags.
Creating a Release
Update CHANGELOG
Document all changes since last release
Create Git Tag
git tag -a v0.2.0 -m "Release version 0.2.0" git push origin v0.2.0
Build
python -m build
Update Documentation
Documentation is auto-deployed from tags
Resources
Main Repository: https://github.com/LLNL/dftracer
DFAnalyzer: https://github.com/LLNL/dfanalyzer
Getting Help
If you need help:
Check the Quick Start and API Reference
Search existing GitHub Issues
Contact the maintainers
License
pydftracer is released under the MIT License. See the LICENSE file in the repository for details.
Contributing to Documentation
The documentation is built with Sphinx. See docs/README.md for details.
Key files:
docs/source/conf.py- Sphinx configurationdocs/source/*.rst- ReStructuredText source filesdocs/source/api/- API reference
To contribute:
Edit the appropriate
.rstfilesBuild locally to preview:
make htmlCheck for warnings and broken links
Submit PR with documentation changes