Text Processing¶
Namespace: dftracer::utils::utilities::text
-
struct FilterableLine¶
A line with a predicate function for filtering.
Public Functions
-
FilterableLine() = default¶
-
FilterableLine() = default¶
-
class LineFilterUtility : public dftracer::utils::utilities::Utility<FilterableLine, std::optional<Line>>¶
Utility that filters lines based on a predicate function.
This utility takes a FilterableLine (line + predicate) and returns the line only if it passes the predicate. Otherwise, returns empty optional.
Features:
Flexible predicate-based filtering
Returns std::optional for composability
Can be used in map/filter pipelines
Can be tagged with Retryable, Monitored behaviors (Note: Cacheable not recommended due to std::function in input)
Usage:
auto filter = std::make_shared<LineFilter>(); auto predicate = [](const Line& line) { return line.content.find("ERROR") != std::string::npos; }; FilterableLine input{Line{"ERROR: Something went wrong"}, predicate}; auto result = filter->process(input); if (result.has_value()) { std::cout << "Matched: " << result->content << "\n"; }
Filtering multiple lines:
auto is_error = [](const Line& line) { return line.content.find("ERROR") != std::string::npos; }; Lines input = get_lines(); std::vector<Line> filtered; for (const auto& line : input.lines) { auto result = filter->process(FilterableLine{line, is_error}); if (result.has_value()) { filtered.push_back(*result); } }
Public Functions
-
LineFilterUtility() = default¶
-
~LineFilterUtility() override = default¶
-
inline coro::CoroTask<std::optional<Line>> process(const FilterableLine &input) override¶
Filter a line based on predicate.
- Parameters:
input – Line with predicate function
- Returns:
Optional line (has_value if predicate returns true)
-
class LineSplitterUtility : public dftracer::utils::utilities::Utility<Text, Lines, utilities::tags::Parallelizable>¶
Utility that splits text into individual lines.
This utility takes multi-line text and splits it into a vector of lines, preserving line numbers. It can be used standalone or composed in pipelines.
Features:
Splits on ‘
’ characters
Assigns line numbers (1-indexed)
Handles empty lines
Can be tagged with Cacheable, Retryable, Monitored behaviors
Usage:
auto splitter = std::make_shared<LineSplitter>(); Text input("Line 1\nLine 2\nLine 3"); Lines output = splitter->process(input); for (const auto& line : output.lines) { std::cout << line.line_number << ": " << line.content << "\n"; }
With pipeline:
Pipeline pipeline; auto task = use(splitter).emit_on(pipeline); auto output = SequentialExecutor().execute(pipeline, Text{"data"}); auto lines = output.get<Lines>(task.id());
-
class MultiLinesFilterUtility : public dftracer::utils::utilities::Utility<Lines, Lines, utilities::tags::Parallelizable>¶
Utility that filters multiple lines based on a predicate.
This is a batch version that processes all lines at once.