Lines¶
Namespace: dftracer::utils::utilities::fileio::lines
-
struct Line¶
Represents a single line of text with metadata.
This structure holds a line’s content along with its line number, enabling easy tracking and processing of lines from various sources.
NOTE: Uses string_view for zero-copy performance. The view is only valid until the next call to next() on the iterator.
-
class LineBytesRange¶
Type-erased range of lines from byte boundaries.
This class provides a unified interface for iterating over lines within byte ranges from different sources (indexed files, plain files) using std::variant for type erasure. It mirrors the design of LineRange but operates on byte offsets instead of line numbers.
Line boundaries are automatically aligned to ensure complete lines.
Usage:
// From indexed file with byte range auto reader = ReaderFactory::create("file.gz", "/data/.dftindex"); LineBytesRange range1 = LineBytesRange::from_indexed_file(reader, 1000, 5000); // From plain file with byte range LineBytesRange range2 = LineBytesRange::from_plain_file("data.txt", 1000, 5000); // Iterate uniformly while (range1.has_next()) { Line line = range1.next(); // Process line... }
Public Types
-
using Iterator = LineIterator<LineBytesRange>¶
Public Functions
-
LineBytesRange() = default¶
Default constructor creates empty range.
-
inline bool has_next() const¶
Check if more lines are available.
-
inline Line next()¶
Get the next line.
- Throws:
std::runtime_error – if no more lines available
- Returns:
Line object with content and line number
-
inline std::size_t current_position() const¶
Get the current line position.
-
inline std::vector<Line> collect()¶
Collect all remaining lines into a vector.
- Returns:
Vector of all lines
-
inline std::vector<Line> take(std::size_t n)¶
Collect up to N lines into a vector.
- Parameters:
n – Maximum number of lines to collect
- Returns:
Vector of up to N lines
-
template<typename Func>
inline void for_each(Func &&func)¶ Apply a function to each line.
- Parameters:
func – Function to apply to each line
Public Static Functions
Create LineBytesRange from indexed file (gzip, tar.gz, etc.).
Reads lines that fall within the specified byte range. Line boundaries are automatically aligned.
- Parameters:
reader – Shared pointer to Reader
start_byte – Starting byte offset (0-based, inclusive)
end_byte – Ending byte offset (0-based, exclusive)
-
static inline LineBytesRange from_plain_file(const std::string &file_path, std::size_t start_byte, std::size_t end_byte)¶
Create LineBytesRange from plain text file.
Reads lines that fall within the specified byte range. Line boundaries are automatically aligned.
- Parameters:
file_path – Path to plain text file
start_byte – Starting byte offset (0-based, inclusive)
end_byte – Ending byte offset (0-based, exclusive)
-
using Iterator = LineIterator<LineBytesRange>¶
-
template<typename Container>
class LineIterator¶ Generic input iterator template for line-by-line access.
This template provides a reusable iterator implementation for any container that implements has_next() and next() methods. It eliminates code duplication across LineRange, LineBytesRange, and IndexedFileLineIterator.
Requirements for Container:
bool has_next() const - Check if more items available
Line next() - Get the next line
Usage:
class MyLineContainer { public: using Iterator = LineIterator<MyLineContainer>; bool has_next() const { ... } Line next() { ... } Iterator begin() { return Iterator(this, false); } Iterator end() { return Iterator(nullptr, true); } }; // Now you can use range-based for loops: MyLineContainer container; for (const auto& line : container) { std::cout << line.content << "\n"; }
- Template Parameters:
Container – The container type that provides has_next() and next() methods
Public Types
-
using iterator_category = std::input_iterator_tag¶
-
using difference_type = std::ptrdiff_t¶
Public Functions
-
inline LineIterator(Container *parent, bool is_end)¶
Construct an iterator.
- Parameters:
parent – Pointer to the parent container (nullptr for end iterator)
is_end – True if this is an end iterator
-
inline LineIterator &operator++()¶
Pre-increment operator.
- Returns:
Reference to this iterator after incrementing
-
inline LineIterator operator++(int)¶
Post-increment operator.
- Returns:
Copy of this iterator before incrementing
-
inline bool operator==(const LineIterator &other) const¶
Equality comparison.
- Parameters:
other – The iterator to compare with
- Returns:
True if iterators are equal
-
inline bool operator!=(const LineIterator &other) const¶
Inequality comparison.
- Parameters:
other – The iterator to compare with
- Returns:
True if iterators are not equal
-
class LineRange¶
Type-erased range of lines from various sources.
This class provides a unified interface for iterating over lines from different sources (indexed files, plain files, memory, streams) using std::variant for type erasure. It enables composition and interchangeable use of different line sources.
Usage:
// From indexed file auto reader = ReaderFactory::create("file.gz", "file.gz.idx"); LineRange range1 = LineRange::from_indexed_file(reader, 1, 100); // From plain file LineRange range2 = LineRange::from_plain_file("data.txt"); // Iterate uniformly while (range1.has_next()) { Line line = range1.next(); // Process line... }
Public Types
-
using Iterator = LineIterator<LineRange>¶
Public Functions
-
LineRange() = default¶
Default constructor creates empty range.
-
inline bool has_next() const¶
Check if more lines are available.
-
inline Line next()¶
Get the next line.
- Throws:
std::runtime_error – if no more lines available
- Returns:
Line object with content and line number
-
inline std::size_t current_position() const¶
Get the current line position (1-based).
-
inline std::vector<Line> collect()¶
Collect all remaining lines into a vector.
- Returns:
Vector of all lines
-
inline std::vector<Line> take(std::size_t n)¶
Collect up to N lines into a vector.
- Parameters:
n – Maximum number of lines to collect
- Returns:
Vector of up to N lines
-
template<typename Func>
inline void for_each(Func &&func)¶ Apply a function to each line.
- Parameters:
func – Function to apply to each line
Public Static Functions
-
static inline LineRange from_indexed_file(const sources::IndexedFileLineIteratorConfig &config)¶
Create LineRange from indexed file (gzip, tar.gz, etc.).
- Parameters:
config – Configuration for the indexed file line iterator
-
static inline LineRange from_plain_file(const std::string &file_path)¶
Create LineRange from plain text file.
- Parameters:
file_path – Path to plain text file
-
static inline LineRange from_plain_file(const std::string &file_path, std::size_t start_line, std::size_t end_line)¶
Create LineRange from plain text file with line range.
- Parameters:
file_path – Path to plain text file
start_line – Starting line number (1-based, inclusive)
end_line – Ending line number (1-based, inclusive)
-
using Iterator = LineIterator<LineRange>¶
-
struct LineReadInput¶
Input for reading a range of lines from an indexed file.
This structure encapsulates all information needed to read a specific range of lines from an indexed archive (gzip, tar.gz, etc.). Used for lazy evaluation and caching strategies.
Usage:
auto input = LineReadInput::from_file("data.txt") .with_index("/data/.dftindex") .with_range(10, 100);
Public Functions
-
inline LineReadInput()¶
-
inline LineReadInput(std::string file_path_, std::string idx_path_, std::size_t start_line_, std::size_t end_line_)¶
-
inline LineReadInput &with_index(std::string idx)¶
-
inline LineReadInput &with_range(std::size_t start, std::size_t end)¶
-
inline bool operator==(const LineReadInput &other) const¶
-
inline bool operator!=(const LineReadInput &other) const¶
-
inline std::size_t num_lines() const¶
Public Members
-
std::string file_path¶
-
std::string index_path¶
-
std::size_t start_line¶
-
std::size_t end_line¶
Public Static Functions
-
static inline LineReadInput from_file(std::string path)¶
-
inline LineReadInput()¶
-
struct Lines¶
Multiple lines of text.
A collection of lines with convenient constructors for building from vectors of strings or Line objects.
-
class StreamingLineReader¶
Composable utility for streaming line reading from various sources.
This utility automatically detects the file format and creates the appropriate line iterator. It supports:
Indexed compressed files (.gz, .tar.gz) via Reader
Plain text files
Automatic
.dftindexdetection for compressed files
Usage:
// Auto-detect format auto range = StreamingLineReader::read("data.gz"); // Uses index if available while (range.has_next()) { Line line = range.next(); // Process line... } // Explicit line range auto range2 = StreamingLineReader::read("data.gz", 100, 200); // Force plain file reading (no decompression) auto range3 = StreamingLineReader::read_plain("data.txt");
Public Static Functions
-
static inline LineRange read(const StreamingLineReaderConfig &config)¶
Read lines from a file, auto-detecting format and
.dftindex.This method automatically:
Detects if a
.dftindexstore existsCreates appropriate reader (indexed or plain)
Returns a LineRange for streaming iteration
- Parameters:
config – Configuration for the line reader
- Returns:
LineRange for streaming iteration
-
static inline LineRange read_indexed(sources::IndexedFileLineIteratorConfig &config)¶
Read lines from a file using indexed reader.
- Parameters:
file_path – Path to the compressed file
config – Indexed reader configuration
start_line – Starting line (1-based, inclusive), 0 means start
end_line – Ending line (1-based, inclusive), 0 means end
- Returns:
LineRange for streaming iteration
-
static inline LineRange read_plain(const std::string &file_path, std::size_t start_line = 0, std::size_t end_line = 0)¶
Read lines from a plain text file (no decompression).
- Parameters:
file_path – Path to the plain text file
start_line – Starting line (1-based, inclusive), 0 means start
end_line – Ending line (1-based, inclusive), 0 means end
- Returns:
LineRange for streaming iteration
-
static inline coro::AsyncGenerator<Line> read_async(const StreamingLineReaderConfig &config)¶
Async read lines from a file, auto-detecting format.
Returns an AsyncGenerator<Line> for non-blocking iteration:
auto gen = StreamingLineReader::read_async(config); while (auto line = co_await gen.next()) { co_await process(*line); }
-
static inline coro::AsyncGenerator<Line> read_indexed_async(sources::IndexedFileLineIteratorConfig config)¶
Async read lines from indexed file.
-
static inline coro::AsyncGenerator<Line> read_plain_async(const std::string &file_path, std::size_t start_line = 0, std::size_t end_line = 0)¶
Async read lines from plain text file.
-
static inline coro::AsyncGenerator<Line> read_streaming_gz_async(const std::string &file_path, std::size_t start_line = 0, std::size_t end_line = 0)¶
Async read lines from compressed file without an index.
Stream-decompresses the file and splits into lines in a single pass, avoiding the overhead of building a
.dftindexstore.
-
class StreamingLineReaderConfig¶
Configuration for StreamingLineReader with fluent API.
Usage:
auto config = StreamingLineReaderConfig() .with_file("file.gz") .with_index("trace-root/.dftindex") .with_line_range(1, 100); auto range = StreamingLineReader::read(config);
Public Functions
-
StreamingLineReaderConfig() = default¶
-
inline StreamingLineReaderConfig &with_file(const std::string &file_path)¶
-
inline StreamingLineReaderConfig &with_index(const std::string &index_path)¶
-
inline StreamingLineReaderConfig &with_line_range(std::size_t start_line, std::size_t end_line)¶
-
inline const std::string &file_path() const¶
-
inline const std::string &index_path() const¶
-
inline std::size_t start_line() const¶
-
inline std::size_t end_line() const¶
-
StreamingLineReaderConfig() = default¶