Benchmarks: Encoding Issue With Some Exercism Files #2664

marvijo-code · 2024-12-19T06:31:08Z

Issue

We seem to have decoding issues with some files from the Exercism repo. I noticed it when running benchmarks on a Windows system. This is significant because it might give test failures even if the LLM didn't even run.

Small fix, will submit a PR just now.

Steps to reproduce:

Clone the repository

git clone [email protected]:Aider-AI/aider.git

Navigate to the project directory

cd aider

Prepare for benchmarking

git clone [email protected]:exercism/python.git

mkdir tmp.benchmarks\exercism-python\all-your-base

robocopy python\exercises\practice\all-your-base tmp.benchmarks\exercism-python\all-your-base /E /COPY:DAT /R:0

python -m venv .venv

..venv\Scripts\activate

pip install -r requirements.txt # if errors, you might need: ..venv\Scripts\python.exe -m pip install -r requirements.txt
pip install -r requirements/requirements-dev.txt

pip install -e .

set AIDER_DOCKER=1 # not recommended, use Docker. This makes it quicker to run
python benchmark/benchmark.py tmp.benchmarks --keywords all-your-base --model openrouter/qwen/qwen-2.5-coder-32b-instruct

Error:

Copying tmp.benchmarks\exercism-python -> tmp.benchmarks\2024-12-19-08-08-25--tmp.benchmarks ...
...done

Test failed
'charmap' codec can't decode byte 0x81 in position 506: character maps to
Traceback (most recent call last):
File "C:\tmp\tmp-a\aider\benchmark\benchmark.py", line 552, in run_test
return run_test_real(original_dname, testdir, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\tmp\tmp-a\aider\benchmark\benchmark.py", line 620, in run_test_real
instructions += (testdir / ".docs/instructions.md").read_text()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\marvi.conda\envs\aider\Lib\pathlib.py", line 1028, in read_text
return f.read()
^^^^^^^^
File "C:\Users\marvi.conda\envs\aider\Lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 506: character maps to
───────────────────────────────────────────────────────── tmp.benchmarks\2024-12-19-08-08-25--tmp.benchmarks ──────────────────────────────────────────────────────────

dirname: 2024-12-19-08-08-25--tmp.benchmarks
test_cases: 1
commit_hash:
percent_cases_well_formed: 100.0
error_outputs: 0
num_malformed_responses: 0
num_with_malformed_responses: 0
user_asks: 0
lazy_comments: 0
syntax_errors: 0
indentation_errors: 0
exhausted_context_windows: 0
test_timeouts: 0
Traceback (most recent call last):
File "C:\tmp\tmp-a\aider\benchmark\benchmark.py", line 831, in
app()
File "C:\tmp\tmp-a\aider.venv\Lib\site-packages\typer\main.py", line 340, in call
raise e
File "C:\tmp\tmp-a\aider.venv\Lib\site-packages\typer\main.py", line 323, in call
return get_command(self)(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\tmp\tmp-a\aider.venv\Lib\site-packages\click\core.py", line 1157, in call
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\tmp\tmp-a\aider.venv\Lib\site-packages\typer\core.py", line 680, in main
return _main(
^^^^^^
File "C:\tmp\tmp-a\aider.venv\Lib\site-packages\typer\core.py", line 198, in _main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "C:\tmp\tmp-a\aider.venv\Lib\site-packages\click\core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\tmp\tmp-a\aider.venv\Lib\site-packages\click\core.py", line 783, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\tmp\tmp-a\aider.venv\Lib\site-packages\typer\main.py", line 698, in wrapper
return callback(**use_params)
^^^^^^^^^^^^^^^^^^^^^^
File "C:\tmp\tmp-a\aider\benchmark\benchmark.py", line 300, in main
summarize_results(dirname)
File "C:\tmp\tmp-a\aider\benchmark\benchmark.py", line 488, in summarize_results
a_model = set(variants["model"]).pop()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'pop from an empty set'

Version and model info

Model: openrouter/qwen/qwen-2.5-coder-32b-instruct with diff edit format
Git repo: none
Repo-map: disabled

paul-gauthier · 2024-12-19T16:21:34Z

Thanks for trying aider and filing this issue.

The benchmark is intended to be run inside the docker container built from benchmark/Dockerfile. It's not safe to run it on your actual host system, as it runs a lot of LLM generated code that could do unsafe things.

marvijo-code · 2024-12-20T10:34:51Z

Yes Paul, hence my comment "not recommended". I raised a PR for this because the issue will happen even in a docker environment

marvijo-code · 2024-12-20T22:39:06Z

#2665

paul-gauthier · 2024-12-20T23:06:21Z

Sorry, charmap is an encoding primarily used on windows. The Dockerfile for benchmarking should not use charmap. Can you actually reproduce this problem inside the provided docker container?

marvijo-code linked a pull request Dec 19, 2024 that will close this issue

fix: encoding/decoding issues reading Exercism benchmark files #2665

Open

github-actions bot added the question Further information is requested label Dec 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmarks: Encoding Issue With Some Exercism Files #2664

Benchmarks: Encoding Issue With Some Exercism Files #2664

marvijo-code commented Dec 19, 2024

paul-gauthier commented Dec 19, 2024

marvijo-code commented Dec 20, 2024

marvijo-code commented Dec 20, 2024

paul-gauthier commented Dec 20, 2024

Benchmarks: Encoding Issue With Some Exercism Files #2664

Benchmarks: Encoding Issue With Some Exercism Files #2664

Comments

marvijo-code commented Dec 19, 2024

Issue

Clone the repository

Navigate to the project directory

Prepare for benchmarking

Copying tmp.benchmarks\exercism-python -> tmp.benchmarks\2024-12-19-08-08-25--tmp.benchmarks ... ...done

Version and model info

paul-gauthier commented Dec 19, 2024

marvijo-code commented Dec 20, 2024

marvijo-code commented Dec 20, 2024

paul-gauthier commented Dec 20, 2024

Copying tmp.benchmarks\exercism-python -> tmp.benchmarks\2024-12-19-08-08-25--tmp.benchmarks ...
...done