Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/alex-ber/AlexBerUtils/llms.txt

Use this file to discover all available pages before exploring further.

The alexber.utils.files module provides helpers for working with files produced by concurrent workers.

join_files(f)

Joins multiple partial files into a single output file. When multiple threads or processes write output to separate files (for example report_0.csv, report_1.csv, report_2.csv), join_files collects all files that match the pattern <stem>_*<suffix> in the same directory as f and concatenates them into f.
from alexber.utils.files import join_files
from pathlib import Path

# Partial files produced by workers:
#   output/results_0.txt
#   output/results_1.txt
#   output/results_2.txt

join_files(Path('output/results.txt'))

# output/results.txt now contains the concatenated content of all three files
ParameterTypeDescription
fstr or PathThe path to the output file. Also serves as the glob pattern anchor: all files in the same directory whose name matches <stem>_*<suffix> are joined into this file.
The output file is always opened in write mode ('w'), so any pre-existing content is overwritten. The partial files are not deleted after joining.

Usage example

The following example shows a typical producer/consumer pattern where each worker writes to its own numbered file, and the main thread joins them at the end.
1

Workers write to separate partial files

import threading
from pathlib import Path

OUTPUT = Path('output/report.csv')
OUTPUT.parent.mkdir(parents=True, exist_ok=True)

def worker(index, rows):
    partial = OUTPUT.parent / f"{OUTPUT.stem}_{index}{OUTPUT.suffix}"
    with open(partial, 'w') as fh:
        for row in rows:
            fh.write(','.join(row) + '\n')

threads = [
    threading.Thread(target=worker, args=(i, chunk))
    for i, chunk in enumerate(data_chunks)
]
for t in threads:
    t.start()
for t in threads:
    t.join()
2

Join the partial files into the final output

from alexber.utils.files import join_files

join_files(OUTPUT)
# output/report.csv now contains all rows from all workers
The glob pattern used internally is {stem}_*{suffix}, so partial files must follow the naming convention <base>_<anything><ext> — for example report_0.csv, report_worker-a.csv, or report_2024-01-01.csv.

Build docs developers (and LLMs) love