Reader¶
Namespace: dftracer::utils::utilities::reader
For usage guide and examples, see Reader Components.
-
struct JsonLine¶
-
struct ReadConfig¶
Per-read configuration for range, buffering, and query filtering.
Public Members
-
std::size_t start_line = 0¶
First line (1-indexed, 0 = beginning).
-
std::size_t end_line = 0¶
Last line (0 = end of file).
-
std::size_t start_byte = 0¶
First byte offset (0 = beginning).
-
std::size_t end_byte = 0¶
Last byte offset (0 = end of file).
-
bool line_aligned = true¶
Align raw chunks to line boundaries.
-
bool multi_line = true¶
Allow multiple lines per raw chunk.
-
std::size_t buffer_size = 4 * 1024 * 1024¶
Internal read buffer.
-
std::string query¶
Query DSL string for event filtering (empty = no filter). When set and an index exists, chunk pruning skips non-matching chunks. Per-event filtering always applies unless chunk_prune_only is set.
-
bool chunk_prune_only = false¶
When true, the query is used only for chunk-level pruning via the index. Per-line filtering is skipped (caller handles it).
-
bool skip_pruning = false¶
When true, the reader skips its own chunk pruner pass entirely and trusts the caller’s start_line/end_line window. Intended for the checkpoint-level work-item dispatcher, which already pruned once per file at enumeration time. Without this the pruner would re-run per work item (hundreds-of-thousands of RocksDB opens).
-
bool start_at_checkpoint = false¶
-
bool end_at_checkpoint = false¶
-
bool flatten_objects = false¶
When true, top-level object values (e.g.
args) are expanded one level intoparent.childcolumns with native Arrow types instead of being serialized as a JSON string column. One-level only; deeper nesting still round-trips as JSON text under the flattened key.
-
std::size_t start_line = 0¶
-
class TraceReader¶
Smart trace file reader with auto-detection of sequential vs indexed reading, optional query filtering, and chunk pruning.
Public Functions
-
explicit TraceReader(TraceReaderConfig config)¶
-
coro::AsyncGenerator<Line> read_lines(ReadConfig config = {})¶
Read lines with optional query filtering and chunk pruning.
-
coro::AsyncGenerator<JsonLine> read_json(ReadConfig config = {})¶
Read parsed JSON lines. Parses each line once with simdjson ondemand, applies query filtering, and yields the parsed document. The yielded JsonParser is valid until the next next() call.
-
coro::AsyncGenerator<std::span<const char>> read_raw(ReadConfig config = {})¶
Read raw byte chunks.
-
coro::AsyncGenerator<common::arrow::ArrowExportResult> read_arrow(ReadConfig config = {}, std::size_t batch_size = 10000)¶
Direct Arrow batch pipeline: chunk-prune + line-level prefilter + simdjson iterate_many + inline row build. Yields complete Arrow record batches sized at
batch_sizerows. Emits the final partial batch on generator close. Non-normalized schema only (dynamic columns follow the first row seen).
-
bool has_index() const¶
True if a
.dftindexdatabase was found at construction time.
-
std::size_t get_max_bytes()¶
Decompressed size (0 if no index for compressed files).
-
std::size_t get_num_lines()¶
Total line count (0 if no index).
-
explicit TraceReader(TraceReaderConfig config)¶
-
struct TraceReaderConfig¶
File-level configuration for TraceReader.