Reader

Namespace: dftracer::utils::utilities::reader

For usage guide and examples, see Reader Components.

struct JsonLine

Public Members

std::string_view content
std::size_t line_number
JsonParser *parser
struct ReadConfig

Per-read configuration for range, buffering, and query filtering.

Public Functions

inline bool has_line_range() const
inline bool has_byte_range() const

Public Members

std::size_t start_line = 0

First line (1-indexed, 0 = beginning).

std::size_t end_line = 0

Last line (0 = end of file).

std::size_t start_byte = 0

First byte offset (0 = beginning).

std::size_t end_byte = 0

Last byte offset (0 = end of file).

bool line_aligned = true

Align raw chunks to line boundaries.

bool multi_line = true

Allow multiple lines per raw chunk.

std::size_t buffer_size = 4 * 1024 * 1024

Internal read buffer.

std::string query

Query DSL string for event filtering (empty = no filter). When set and an index exists, chunk pruning skips non-matching chunks. Per-event filtering always applies unless chunk_prune_only is set.

bool chunk_prune_only = false

When true, the query is used only for chunk-level pruning via the index. Per-line filtering is skipped (caller handles it).

bool skip_pruning = false

When true, the reader skips its own chunk pruner pass entirely and trusts the caller’s start_line/end_line window. Intended for the checkpoint-level work-item dispatcher, which already pruned once per file at enumeration time. Without this the pruner would re-run per work item (hundreds-of-thousands of RocksDB opens).

bool start_at_checkpoint = false
bool end_at_checkpoint = false
bool flatten_objects = false

When true, top-level object values (e.g. args) are expanded one level into parent.child columns with native Arrow types instead of being serialized as a JSON string column. One-level only; deeper nesting still round-trips as JSON text under the flattened key.

class TraceReader

Smart trace file reader with auto-detection of sequential vs indexed reading, optional query filtering, and chunk pruning.

Public Functions

explicit TraceReader(TraceReaderConfig config)
coro::AsyncGenerator<Line> read_lines(ReadConfig config = {})

Read lines with optional query filtering and chunk pruning.

coro::AsyncGenerator<JsonLine> read_json(ReadConfig config = {})

Read parsed JSON lines. Parses each line once with simdjson ondemand, applies query filtering, and yields the parsed document. The yielded JsonParser is valid until the next next() call.

coro::AsyncGenerator<std::span<const char>> read_raw(ReadConfig config = {})

Read raw byte chunks.

coro::AsyncGenerator<common::arrow::ArrowExportResult> read_arrow(ReadConfig config = {}, std::size_t batch_size = 10000)

Direct Arrow batch pipeline: chunk-prune + line-level prefilter + simdjson iterate_many + inline row build. Yields complete Arrow record batches sized at batch_size rows. Emits the final partial batch on generator close. Non-normalized schema only (dynamic columns follow the first row seen).

bool has_index() const

True if a .dftindex database was found at construction time.

std::size_t get_max_bytes()

Decompressed size (0 if no index for compressed files).

std::size_t get_num_lines()

Total line count (0 if no index).

struct TraceReaderConfig

File-level configuration for TraceReader.

Public Members

std::string file_path

Path to trace file (.pfw.gz or plain).

std::string index_dir

Directory containing .dftindex roots.

std::size_t checkpoint_size = 32 * 1024 * 1024

Checkpoint interval.

bool auto_build_index = false

Auto-build index if missing.