Skip to content

Commit

Permalink
Add automatic detection of number of CPU cores
Browse files Browse the repository at this point in the history
  • Loading branch information
Jakobovski committed Jun 27, 2024
1 parent 9755682 commit 38d68bf
Showing 1 changed file with 2 additions and 1 deletion.
3 changes: 2 additions & 1 deletion data/openwebtext/prepare.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# saves the openwebtext dataset to a binary file for training. following was helpful:
# https://github.com/HazyResearch/flash-attention/blob/main/training/src/datamodules/language_modeling_hf.py

import multiprocessing
import os
from tqdm import tqdm
import numpy as np
Expand All @@ -9,7 +10,7 @@

# number of workers in .map() call
# good number to use is ~order number of cpu cores // 2
num_proc = 8
num_proc = multiprocessing.cpu_count() // 2

# number of workers in load_dataset() call
# best number might be different from num_proc above as it also depends on NW speed.
Expand Down

0 comments on commit 38d68bf

Please sign in to comment.