Lines

Namespace: dftracer::utils::utilities::fileio::lines

struct Line

Represents a single line of text with metadata.

This structure holds a line’s content along with its line number, enabling easy tracking and processing of lines from various sources.

NOTE: Uses string_view for zero-copy performance. The view is only valid until the next call to next() on the iterator.

Public Functions

inline Line()
inline Line(std::string_view content_, std::size_t line_number_)
inline bool empty() const
inline std::size_t size() const

Public Members

std::string_view content
std::size_t line_number
class LineBytesRange

Type-erased range of lines from byte boundaries.

This class provides a unified interface for iterating over lines within byte ranges from different sources (indexed files, plain files) using std::variant for type erasure. It mirrors the design of LineRange but operates on byte offsets instead of line numbers.

Line boundaries are automatically aligned to ensure complete lines.

Usage:

// From indexed file with byte range
auto reader = ReaderFactory::create("file.gz", "/data/.dftindex");
LineBytesRange range1 = LineBytesRange::from_indexed_file(reader, 1000,
5000);

// From plain file with byte range
LineBytesRange range2 = LineBytesRange::from_plain_file("data.txt", 1000,
5000);

// Iterate uniformly
while (range1.has_next()) {
    Line line = range1.next();
    // Process line...
}

Public Types

using Iterator = LineIterator<LineBytesRange>

Public Functions

LineBytesRange() = default

Default constructor creates empty range.

inline bool has_next() const

Check if more lines are available.

inline Line next()

Get the next line.

Throws:

std::runtime_error – if no more lines available

Returns:

Line object with content and line number

inline std::size_t current_position() const

Get the current line position.

inline std::vector<Line> collect()

Collect all remaining lines into a vector.

Returns:

Vector of all lines

inline std::vector<Line> take(std::size_t n)

Collect up to N lines into a vector.

Parameters:

n – Maximum number of lines to collect

Returns:

Vector of up to N lines

template<typename Func>
inline void for_each(Func &&func)

Apply a function to each line.

Parameters:

func – Function to apply to each line

template<typename Predicate>
inline std::vector<Line> filter(Predicate &&predicate)

Filter lines based on a predicate.

Parameters:

predicate – Function that returns true for lines to keep

Returns:

Vector of lines that match the predicate

inline Iterator begin()

Get an iterator to the beginning.

inline Iterator end()

Get an iterator to the end.

Public Static Functions

static inline LineBytesRange from_indexed_file(std::shared_ptr<reader::internal::Reader> reader, std::size_t start_byte, std::size_t end_byte)

Create LineBytesRange from indexed file (gzip, tar.gz, etc.).

Reads lines that fall within the specified byte range. Line boundaries are automatically aligned.

Parameters:
  • reader – Shared pointer to Reader

  • start_byte – Starting byte offset (0-based, inclusive)

  • end_byte – Ending byte offset (0-based, exclusive)

static inline LineBytesRange from_plain_file(const std::string &file_path, std::size_t start_byte, std::size_t end_byte)

Create LineBytesRange from plain text file.

Reads lines that fall within the specified byte range. Line boundaries are automatically aligned.

Parameters:
  • file_path – Path to plain text file

  • start_byte – Starting byte offset (0-based, inclusive)

  • end_byte – Ending byte offset (0-based, exclusive)

template<typename Container>
class LineIterator

Generic input iterator template for line-by-line access.

This template provides a reusable iterator implementation for any container that implements has_next() and next() methods. It eliminates code duplication across LineRange, LineBytesRange, and IndexedFileLineIterator.

Requirements for Container:

  • bool has_next() const - Check if more items available

  • Line next() - Get the next line

Usage:

class MyLineContainer {
   public:
    using Iterator = LineIterator<MyLineContainer>;

    bool has_next() const { ... }
    Line next() { ... }

    Iterator begin() { return Iterator(this, false); }
    Iterator end() { return Iterator(nullptr, true); }
};

// Now you can use range-based for loops:
MyLineContainer container;
for (const auto& line : container) {
    std::cout << line.content << "\n";
}

Template Parameters:

Container – The container type that provides has_next() and next() methods

Public Types

using iterator_category = std::input_iterator_tag
using value_type = Line
using difference_type = std::ptrdiff_t
using pointer = const Line*
using reference = const Line&

Public Functions

inline LineIterator(Container *parent, bool is_end)

Construct an iterator.

Parameters:
  • parent – Pointer to the parent container (nullptr for end iterator)

  • is_end – True if this is an end iterator

inline reference operator*() const

Dereference operator.

Returns:

Reference to the current line

inline pointer operator->() const

Member access operator.

Returns:

Pointer to the current line

inline LineIterator &operator++()

Pre-increment operator.

Returns:

Reference to this iterator after incrementing

inline LineIterator operator++(int)

Post-increment operator.

Returns:

Copy of this iterator before incrementing

inline bool operator==(const LineIterator &other) const

Equality comparison.

Parameters:

other – The iterator to compare with

Returns:

True if iterators are equal

inline bool operator!=(const LineIterator &other) const

Inequality comparison.

Parameters:

other – The iterator to compare with

Returns:

True if iterators are not equal

class LineRange

Type-erased range of lines from various sources.

This class provides a unified interface for iterating over lines from different sources (indexed files, plain files, memory, streams) using std::variant for type erasure. It enables composition and interchangeable use of different line sources.

Usage:

// From indexed file
auto reader = ReaderFactory::create("file.gz", "file.gz.idx");
LineRange range1 = LineRange::from_indexed_file(reader, 1, 100);

// From plain file
LineRange range2 = LineRange::from_plain_file("data.txt");

// Iterate uniformly
while (range1.has_next()) {
    Line line = range1.next();
    // Process line...
}

Public Types

using Iterator = LineIterator<LineRange>

Public Functions

LineRange() = default

Default constructor creates empty range.

inline bool has_next() const

Check if more lines are available.

inline Line next()

Get the next line.

Throws:

std::runtime_error – if no more lines available

Returns:

Line object with content and line number

inline std::size_t current_position() const

Get the current line position (1-based).

inline std::vector<Line> collect()

Collect all remaining lines into a vector.

Returns:

Vector of all lines

inline std::vector<Line> take(std::size_t n)

Collect up to N lines into a vector.

Parameters:

n – Maximum number of lines to collect

Returns:

Vector of up to N lines

template<typename Func>
inline void for_each(Func &&func)

Apply a function to each line.

Parameters:

func – Function to apply to each line

template<typename Predicate>
inline std::vector<Line> filter(Predicate &&predicate)

Filter lines based on a predicate.

Parameters:

predicate – Function that returns true for lines to keep

Returns:

Vector of lines that match the predicate

inline Iterator begin()

Get an iterator to the beginning.

inline Iterator end()

Get an iterator to the end.

Public Static Functions

static inline LineRange from_indexed_file(const sources::IndexedFileLineIteratorConfig &config)

Create LineRange from indexed file (gzip, tar.gz, etc.).

Parameters:

config – Configuration for the indexed file line iterator

static inline LineRange from_plain_file(const std::string &file_path)

Create LineRange from plain text file.

Parameters:

file_path – Path to plain text file

static inline LineRange from_plain_file(const std::string &file_path, std::size_t start_line, std::size_t end_line)

Create LineRange from plain text file with line range.

Parameters:
  • file_path – Path to plain text file

  • start_line – Starting line number (1-based, inclusive)

  • end_line – Ending line number (1-based, inclusive)

struct LineReadInput

Input for reading a range of lines from an indexed file.

This structure encapsulates all information needed to read a specific range of lines from an indexed archive (gzip, tar.gz, etc.). Used for lazy evaluation and caching strategies.

Usage:

auto input = LineReadInput::from_file("data.txt")
                 .with_index("/data/.dftindex")
                 .with_range(10, 100);

Public Functions

inline LineReadInput()
inline LineReadInput(std::string file_path_, std::string idx_path_, std::size_t start_line_, std::size_t end_line_)
inline LineReadInput &with_index(std::string idx)
inline LineReadInput &with_range(std::size_t start, std::size_t end)
inline bool operator==(const LineReadInput &other) const
inline bool operator!=(const LineReadInput &other) const
inline std::size_t num_lines() const

Public Members

std::string file_path
std::string index_path
std::size_t start_line
std::size_t end_line

Public Static Functions

static inline LineReadInput from_file(std::string path)
struct Lines

Multiple lines of text.

A collection of lines with convenient constructors for building from vectors of strings or Line objects.

Public Functions

Lines() = default
inline explicit Lines(std::vector<Line> ls)
inline explicit Lines(const std::vector<std::string> &strings)
inline explicit Lines(std::vector<std::string> &&strings)
inline std::size_t size() const
inline bool empty() const

Public Members

std::vector<Line> lines
std::vector<std::string> storage
class StreamingLineReader

Composable utility for streaming line reading from various sources.

This utility automatically detects the file format and creates the appropriate line iterator. It supports:

  • Indexed compressed files (.gz, .tar.gz) via Reader

  • Plain text files

  • Automatic .dftindex detection for compressed files

Usage:

// Auto-detect format
auto range = StreamingLineReader::read("data.gz");  // Uses index if
available while (range.has_next()) { Line line = range.next();
    // Process line...
}

// Explicit line range
auto range2 = StreamingLineReader::read("data.gz", 100, 200);

// Force plain file reading (no decompression)
auto range3 = StreamingLineReader::read_plain("data.txt");

Public Static Functions

static inline LineRange read(const StreamingLineReaderConfig &config)

Read lines from a file, auto-detecting format and .dftindex.

This method automatically:

  1. Detects if a .dftindex store exists

  2. Creates appropriate reader (indexed or plain)

  3. Returns a LineRange for streaming iteration

Parameters:

config – Configuration for the line reader

Returns:

LineRange for streaming iteration

static inline LineRange read_indexed(sources::IndexedFileLineIteratorConfig &config)

Read lines from a file using indexed reader.

Parameters:
  • file_path – Path to the compressed file

  • config – Indexed reader configuration

  • start_line – Starting line (1-based, inclusive), 0 means start

  • end_line – Ending line (1-based, inclusive), 0 means end

Returns:

LineRange for streaming iteration

static inline LineRange read_plain(const std::string &file_path, std::size_t start_line = 0, std::size_t end_line = 0)

Read lines from a plain text file (no decompression).

Parameters:
  • file_path – Path to the plain text file

  • start_line – Starting line (1-based, inclusive), 0 means start

  • end_line – Ending line (1-based, inclusive), 0 means end

Returns:

LineRange for streaming iteration

static inline coro::AsyncGenerator<Line> read_async(const StreamingLineReaderConfig &config)

Async read lines from a file, auto-detecting format.

Returns an AsyncGenerator<Line> for non-blocking iteration:

auto gen = StreamingLineReader::read_async(config);
while (auto line = co_await gen.next()) {
    co_await process(*line);
}

static inline coro::AsyncGenerator<Line> read_indexed_async(sources::IndexedFileLineIteratorConfig config)

Async read lines from indexed file.

static inline coro::AsyncGenerator<Line> read_plain_async(const std::string &file_path, std::size_t start_line = 0, std::size_t end_line = 0)

Async read lines from plain text file.

static inline coro::AsyncGenerator<Line> read_streaming_gz_async(const std::string &file_path, std::size_t start_line = 0, std::size_t end_line = 0)

Async read lines from compressed file without an index.

Stream-decompresses the file and splits into lines in a single pass, avoiding the overhead of building a .dftindex store.

class StreamingLineReaderConfig

Configuration for StreamingLineReader with fluent API.

Usage:

auto config = StreamingLineReaderConfig()
    .with_file("file.gz")
    .with_index("trace-root/.dftindex")
    .with_line_range(1, 100);

auto range = StreamingLineReader::read(config);

Public Functions

StreamingLineReaderConfig() = default
inline StreamingLineReaderConfig &with_file(const std::string &file_path)
inline StreamingLineReaderConfig &with_index(const std::string &index_path)
inline StreamingLineReaderConfig &with_line_range(std::size_t start_line, std::size_t end_line)
inline const std::string &file_path() const
inline const std::string &index_path() const
inline std::size_t start_line() const
inline std::size_t end_line() const