You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We seem to have decoding issues with some files from the Exercism repo. I noticed it when running benchmarks on a Windows system. This is significant because it might give test failures even if the LLM didn't even run.
pip install -r requirements.txt # if errors, you might need: ..venv\Scripts\python.exe -m pip install -r requirements.txt
pip install -r requirements/requirements-dev.txt
pip install -e .
set AIDER_DOCKER=1 # not recommended, use Docker. This makes it quicker to run
python benchmark/benchmark.py tmp.benchmarks --keywords all-your-base --model openrouter/qwen/qwen-2.5-coder-32b-instruct
Test failed
'charmap' codec can't decode byte 0x81 in position 506: character maps to
Traceback (most recent call last):
File "C:\tmp\tmp-a\aider\benchmark\benchmark.py", line 552, in run_test
return run_test_real(original_dname, testdir, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\tmp\tmp-a\aider\benchmark\benchmark.py", line 620, in run_test_real
instructions += (testdir / ".docs/instructions.md").read_text()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\marvi.conda\envs\aider\Lib\pathlib.py", line 1028, in read_text
return f.read()
^^^^^^^^
File "C:\Users\marvi.conda\envs\aider\Lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 506: character maps to
───────────────────────────────────────────────────────── tmp.benchmarks\2024-12-19-08-08-25--tmp.benchmarks ──────────────────────────────────────────────────────────
dirname: 2024-12-19-08-08-25--tmp.benchmarks
test_cases: 1
commit_hash:
percent_cases_well_formed: 100.0
error_outputs: 0
num_malformed_responses: 0
num_with_malformed_responses: 0
user_asks: 0
lazy_comments: 0
syntax_errors: 0
indentation_errors: 0
exhausted_context_windows: 0
test_timeouts: 0
Traceback (most recent call last):
File "C:\tmp\tmp-a\aider\benchmark\benchmark.py", line 831, in
app()
File "C:\tmp\tmp-a\aider.venv\Lib\site-packages\typer\main.py", line 340, in call
raise e
File "C:\tmp\tmp-a\aider.venv\Lib\site-packages\typer\main.py", line 323, in call
return get_command(self)(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\tmp\tmp-a\aider.venv\Lib\site-packages\click\core.py", line 1157, in call
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\tmp\tmp-a\aider.venv\Lib\site-packages\typer\core.py", line 680, in main
return _main(
^^^^^^
File "C:\tmp\tmp-a\aider.venv\Lib\site-packages\typer\core.py", line 198, in _main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "C:\tmp\tmp-a\aider.venv\Lib\site-packages\click\core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\tmp\tmp-a\aider.venv\Lib\site-packages\click\core.py", line 783, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\tmp\tmp-a\aider.venv\Lib\site-packages\typer\main.py", line 698, in wrapper
return callback(**use_params)
^^^^^^^^^^^^^^^^^^^^^^
File "C:\tmp\tmp-a\aider\benchmark\benchmark.py", line 300, in main
summarize_results(dirname)
File "C:\tmp\tmp-a\aider\benchmark\benchmark.py", line 488, in summarize_results
a_model = set(variants["model"]).pop()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'pop from an empty set'
Version and model info
Model: openrouter/qwen/qwen-2.5-coder-32b-instruct with diff edit format
Git repo: none
Repo-map: disabled
The text was updated successfully, but these errors were encountered:
The benchmark is intended to be run inside the docker container built from benchmark/Dockerfile. It's not safe to run it on your actual host system, as it runs a lot of LLM generated code that could do unsafe things.
Sorry, charmap is an encoding primarily used on windows. The Dockerfile for benchmarking should not use charmap. Can you actually reproduce this problem inside the provided docker container?
Issue
We seem to have decoding issues with some files from the Exercism repo. I noticed it when running benchmarks on a Windows system. This is significant because it might give test failures even if the LLM didn't even run.
Small fix, will submit a PR just now.
Steps to reproduce:
Clone the repository
git clone [email protected]:Aider-AI/aider.git
Navigate to the project directory
cd aider
Prepare for benchmarking
git clone [email protected]:exercism/python.git
mkdir tmp.benchmarks\exercism-python\all-your-base
robocopy python\exercises\practice\all-your-base tmp.benchmarks\exercism-python\all-your-base /E /COPY:DAT /R:0
python -m venv .venv
..venv\Scripts\activate
pip install -r requirements.txt # if errors, you might need: ..venv\Scripts\python.exe -m pip install -r requirements.txt
pip install -r requirements/requirements-dev.txt
pip install -e .
set AIDER_DOCKER=1 # not recommended, use Docker. This makes it quicker to run
python benchmark/benchmark.py tmp.benchmarks --keywords all-your-base --model openrouter/qwen/qwen-2.5-coder-32b-instruct
Error:
Copying tmp.benchmarks\exercism-python -> tmp.benchmarks\2024-12-19-08-08-25--tmp.benchmarks ...
...done
Test failed
'charmap' codec can't decode byte 0x81 in position 506: character maps to
Traceback (most recent call last):
File "C:\tmp\tmp-a\aider\benchmark\benchmark.py", line 552, in run_test
return run_test_real(original_dname, testdir, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\tmp\tmp-a\aider\benchmark\benchmark.py", line 620, in run_test_real
instructions += (testdir / ".docs/instructions.md").read_text()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\marvi.conda\envs\aider\Lib\pathlib.py", line 1028, in read_text
return f.read()
^^^^^^^^
File "C:\Users\marvi.conda\envs\aider\Lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 506: character maps to
───────────────────────────────────────────────────────── tmp.benchmarks\2024-12-19-08-08-25--tmp.benchmarks ──────────────────────────────────────────────────────────
test_cases: 1
commit_hash:
percent_cases_well_formed: 100.0
error_outputs: 0
num_malformed_responses: 0
num_with_malformed_responses: 0
user_asks: 0
lazy_comments: 0
syntax_errors: 0
indentation_errors: 0
exhausted_context_windows: 0
test_timeouts: 0
Traceback (most recent call last):
File "C:\tmp\tmp-a\aider\benchmark\benchmark.py", line 831, in
app()
File "C:\tmp\tmp-a\aider.venv\Lib\site-packages\typer\main.py", line 340, in call
raise e
File "C:\tmp\tmp-a\aider.venv\Lib\site-packages\typer\main.py", line 323, in call
return get_command(self)(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\tmp\tmp-a\aider.venv\Lib\site-packages\click\core.py", line 1157, in call
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\tmp\tmp-a\aider.venv\Lib\site-packages\typer\core.py", line 680, in main
return _main(
^^^^^^
File "C:\tmp\tmp-a\aider.venv\Lib\site-packages\typer\core.py", line 198, in _main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "C:\tmp\tmp-a\aider.venv\Lib\site-packages\click\core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\tmp\tmp-a\aider.venv\Lib\site-packages\click\core.py", line 783, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\tmp\tmp-a\aider.venv\Lib\site-packages\typer\main.py", line 698, in wrapper
return callback(**use_params)
^^^^^^^^^^^^^^^^^^^^^^
File "C:\tmp\tmp-a\aider\benchmark\benchmark.py", line 300, in main
summarize_results(dirname)
File "C:\tmp\tmp-a\aider\benchmark\benchmark.py", line 488, in summarize_results
a_model = set(variants["model"]).pop()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'pop from an empty set'
Version and model info
Model: openrouter/qwen/qwen-2.5-coder-32b-instruct with diff edit format
Git repo: none
Repo-map: disabled
The text was updated successfully, but these errors were encountered: