Dump multiple JSON Lines Files into an Archive (ZIP or TAR) incrementally

  • Support ZIP or TAR archives, including compressed TAR archives (e.g., .tar.gz, .tar.bz2, .tar.xz).
  • Support for both compressed and uncompressed .jsonl files inside the archive. (e.g., *.jsonl.gz or *.jsonl.bz2 or *.jsonl.xz).
  • Optional custom serialization and opener callbacks for advanced use cases.

Warning

If the given archive already exists on the given path, it will be overwritten.

Note

  • Paths provided in the items_by_relpath argument must be relative. Absolute paths are not allowed and will raise an error.
  • If items_by_relpath contains multiple items for the same path, they will be appended to the corresponding file within the archive.

Example usage:

import jsonl

data = [
    # this will create a new file1.jsonl in the archive
    ("file1.jsonl", [{"name": "Alice", "age": 30}, {"name": "Bob", "age": 25}]),
    # this will create a new file1.jsonl.gz in the archive
    ("path/to/file2.jsonl.gz", [{"name": "Charlie", "age": 35}, {"name": "David", "age": 40}]),
    # this will append to the file1.jsonl
    ("file1.jsonl", [{"name": "Eve", "age": 28}]),
]
jsonl.dump_archive("my_archive.zip", data)