Skip to content

Commit

Permalink
Fix issues during readme validation (#1083)
Browse files Browse the repository at this point in the history
Fix review comments.

(cherry picked from commit 9e1319f)
  • Loading branch information
libinta authored and mfuntowicz committed Jun 21, 2024
1 parent 0996308 commit 6adad16
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 1 deletion.
3 changes: 3 additions & 0 deletions examples/summarization/run_summarization.py
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -764,6 +764,9 @@ def compute_metrics(eval_preds):
else:
training_args.generation_config.max_length = data_args.val_max_target_length
if data_args.num_beams is not None:
if data_args.num_beams == 1:
training_args.generation_config.length_penalty = None
training_args.generation_config.early_stopping = False
training_args.generation_config.num_beams = data_args.num_beams
elif training_args.generation_num_beams is not None:
training_args.generation_config.num_beams = training_args.generation_num_beams
Expand Down
4 changes: 3 additions & 1 deletion examples/text-generation/README.md
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -443,7 +443,9 @@ More information on usage of the unifier script can be found in fp8 Habana docs:
Some models can fit on HPU DRAM but can't fit on the CPU RAM.
When we run a model on single card and don't use deepspeed, the `--disk_offload` flag allows to offload weights to disk during model quantization in HQT. When this flag is mentioned, during the quantization process, each weight first is loaded from disk to CPU RAM, when brought to HPU DRAM and quantized there. This way not all the model is on the CPU RAM but only one weight each time.
To enable this weights offload mechanism, add `--disk_offload` flag to the topology command line.
Here is an example of using disk_offload in quantize command. Please make sure to run the measurement first.
Here is an example of using disk_offload in quantize command.
Please follow the "Running FP8 models on single device" section first before running the cmd below.

```bash
QUANT_CONFIG=./quantization_config/maxabs_quant.json TQDM_DISABLE=1 \
python run_generation.py \
Expand Down

0 comments on commit 6adad16

Please sign in to comment.