Parallel

Namespace: dftracer::utils::utilities::fileio::parallel

struct LayoutInfo

Public Members

FileLayout layout
FilesystemKind fs
std::size_t stripe_size
std::size_t stripe_count
class ParallelWriter

Parallel file writer interface. Concrete impls (striped, sharded) hide the on-disk layout; for gzip output the caller must feed standalone gzip members so chunks stay valid at any offset.

Public Functions

virtual ~ParallelWriter() = default
virtual coro::CoroTask<int> open(std::string path, std::size_t num_workers, bool gzip_extension, CoroScope *scope) = 0

Create/truncate backing storage. scope may be null for layouts that don’t spawn internal coroutines; padded-striped requires a non-null scope that outlives close().

virtual coro::CoroTask<int> write_header(ByteView data) = 0

Prologue, written before any worker chunk.

virtual coro::CoroTask<int> write_chunk(std::size_t worker_idx, ByteView data) = 0

Striped: placed at an atomic offset. Sharded: appended to shard N.

virtual coro::CoroTask<int> write_footer(ByteView data) = 0

Epilogue, written after all workers drain.

virtual coro::CoroTask<int> close() = 0
virtual std::vector<std::string> output_paths() const = 0

One entry for striped; N entries (read order) for sharded.

inline virtual std::span<const MemberSpan> member_layout() const

Member offsets recorded by write_chunk, sorted by ascending offset. Returned span is owned by the writer; valid until destruction. Must be called after close() (no concurrent writes). Empty for layouts that don’t expose member boundaries.

inline virtual std::optional<MemberSpan> last_member(std::size_t) const

Span of the most recent write_chunk(worker_idx, ...) call on this worker. Caller must invoke immediately after co_await write_chunk() returns; subsequent calls overwrite. For sharded layouts the offset is shard-local; remap with shard_base_offsets() after close.

inline virtual std::vector<std::uint64_t> shard_base_offsets() const

Per-worker base offset to add to a shard-local MemberSpan.offset to get the merged-file offset. Empty by default (no remap needed for single-stream layouts). Call after close().

struct MemberSpan

Per-write_chunk layout entry: byte offset + length of one independently decompressable gzip member (or raw chunk for non-gzip layouts).

Public Members

std::uint64_t offset
std::uint64_t length
struct WriterConfig

Public Members

FileLayout layout = FileLayout::STRIPED
std::size_t stripe_size = 0
bool gzip = false
struct WriterSizing

Public Members

std::size_t num_workers
std::size_t flush_threshold
std::size_t buffer_capacity