Dump to multiple jsonlines files
Dump multiple iterables incrementally to the specified jsonlines file paths, optimizing memory usage.
The files can be compressed using gzip
, bzip2
, or xz
formats. If the file extension is not recognized, it will be
dumped to a text file.
Example #1
This example uses jsonl.dump_fork
to incrementally write structured data to multiple .jsonl files—one per key (in this case, player name).
This helps organize and efficiently store data for separate entities.
This example creates individual JSON Lines files for each player, storing their respective wins.
import jsonl
def generate_win_data():
"""Yield player wins data for multiple players."""
data = (
{
"name": "Gilbert",
"wins": [
{"hand": "straight", "card": "7♣"},
{"hand": "one pair", "card": "10♥"},
]
},
{
"name": "May",
"wins": [
{"hand": "two pair", "card": "9♠"},
]
},
{
"name": "Gilbert",
"wins": [
{"hand": "three of a kind", "card": "A♦"},
]
}
)
for player in data:
name = player["name"]
yield (f"{name}.jsonl", player["wins"])
# Write the generated data to files in JSON Lines format
jsonl.dump_fork(generate_win_data())
Example #2
This example demonstrates how to dump data using different JSON libraries.
You can install orjson
and ujson
to run the following example.
pip install orjson ujson # Ignore this command if these libraries are already installed.
import orjson
import ujson
import jsonl
def worker():
yield ("num.jsonl", ({"value": 1}, {"value": 2}))
yield ("foo.jsonl", iter(({"a": "1"}, {"b": 2})))
yield ("num.jsonl", [{"value": 3}])
yield ("foo.jsonl", ())
# Dump the data using the default json.dumps function.
jsonl.dump_fork(worker())
# Dump the data using the ujson library.
jsonl.dump_fork(worker(), json_dumps=ujson.dumps, ensure_ascii=False)
# Dump the data using the orjson library.
jsonl.dump_fork(worker(), json_dumps=orjson.dumps) # using (orjson)