Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmarks: Encoding Issue With Some Exercism Files #2664

Open
marvijo-code opened this issue Dec 19, 2024 · 4 comments · May be fixed by #2665
Open

Benchmarks: Encoding Issue With Some Exercism Files #2664

marvijo-code opened this issue Dec 19, 2024 · 4 comments · May be fixed by #2665
Labels
question Further information is requested

Comments

@marvijo-code
Copy link

Issue

We seem to have decoding issues with some files from the Exercism repo. I noticed it when running benchmarks on a Windows system. This is significant because it might give test failures even if the LLM didn't even run.

Small fix, will submit a PR just now.

Steps to reproduce:

Clone the repository

git clone [email protected]:Aider-AI/aider.git

Navigate to the project directory

cd aider

Prepare for benchmarking

git clone [email protected]:exercism/python.git

mkdir tmp.benchmarks\exercism-python\all-your-base

robocopy python\exercises\practice\all-your-base tmp.benchmarks\exercism-python\all-your-base /E /COPY:DAT /R:0

python -m venv .venv

..venv\Scripts\activate

pip install -r requirements.txt # if errors, you might need: ..venv\Scripts\python.exe -m pip install -r requirements.txt
pip install -r requirements/requirements-dev.txt

pip install -e .

set AIDER_DOCKER=1 # not recommended, use Docker. This makes it quicker to run
python benchmark/benchmark.py tmp.benchmarks --keywords all-your-base --model openrouter/qwen/qwen-2.5-coder-32b-instruct

Error:

Copying tmp.benchmarks\exercism-python -> tmp.benchmarks\2024-12-19-08-08-25--tmp.benchmarks ...
...done

Test failed
'charmap' codec can't decode byte 0x81 in position 506: character maps to
Traceback (most recent call last):
File "C:\tmp\tmp-a\aider\benchmark\benchmark.py", line 552, in run_test
return run_test_real(original_dname, testdir, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\tmp\tmp-a\aider\benchmark\benchmark.py", line 620, in run_test_real
instructions += (testdir / ".docs/instructions.md").read_text()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\marvi.conda\envs\aider\Lib\pathlib.py", line 1028, in read_text
return f.read()
^^^^^^^^
File "C:\Users\marvi.conda\envs\aider\Lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 506: character maps to
───────────────────────────────────────────────────────── tmp.benchmarks\2024-12-19-08-08-25--tmp.benchmarks ──────────────────────────────────────────────────────────

  • dirname: 2024-12-19-08-08-25--tmp.benchmarks
    test_cases: 1
    commit_hash:
    percent_cases_well_formed: 100.0
    error_outputs: 0
    num_malformed_responses: 0
    num_with_malformed_responses: 0
    user_asks: 0
    lazy_comments: 0
    syntax_errors: 0
    indentation_errors: 0
    exhausted_context_windows: 0
    test_timeouts: 0
    Traceback (most recent call last):
    File "C:\tmp\tmp-a\aider\benchmark\benchmark.py", line 831, in
    app()
    File "C:\tmp\tmp-a\aider.venv\Lib\site-packages\typer\main.py", line 340, in call
    raise e
    File "C:\tmp\tmp-a\aider.venv\Lib\site-packages\typer\main.py", line 323, in call
    return get_command(self)(*args, **kwargs)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "C:\tmp\tmp-a\aider.venv\Lib\site-packages\click\core.py", line 1157, in call
    return self.main(*args, **kwargs)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "C:\tmp\tmp-a\aider.venv\Lib\site-packages\typer\core.py", line 680, in main
    return _main(
    ^^^^^^
    File "C:\tmp\tmp-a\aider.venv\Lib\site-packages\typer\core.py", line 198, in _main
    rv = self.invoke(ctx)
    ^^^^^^^^^^^^^^^^
    File "C:\tmp\tmp-a\aider.venv\Lib\site-packages\click\core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "C:\tmp\tmp-a\aider.venv\Lib\site-packages\click\core.py", line 783, in invoke
    return __callback(*args, **kwargs)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "C:\tmp\tmp-a\aider.venv\Lib\site-packages\typer\main.py", line 698, in wrapper
    return callback(**use_params)
    ^^^^^^^^^^^^^^^^^^^^^^
    File "C:\tmp\tmp-a\aider\benchmark\benchmark.py", line 300, in main
    summarize_results(dirname)
    File "C:\tmp\tmp-a\aider\benchmark\benchmark.py", line 488, in summarize_results
    a_model = set(variants["model"]).pop()
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    KeyError: 'pop from an empty set'

Version and model info

Model: openrouter/qwen/qwen-2.5-coder-32b-instruct with diff edit format
Git repo: none
Repo-map: disabled

@paul-gauthier
Copy link
Collaborator

Thanks for trying aider and filing this issue.

The benchmark is intended to be run inside the docker container built from benchmark/Dockerfile. It's not safe to run it on your actual host system, as it runs a lot of LLM generated code that could do unsafe things.

@github-actions github-actions bot added the question Further information is requested label Dec 20, 2024
@marvijo-code
Copy link
Author

Yes Paul, hence my comment "not recommended". I raised a PR for this because the issue will happen even in a docker environment

@marvijo-code
Copy link
Author

#2665

@paul-gauthier
Copy link
Collaborator

Sorry, charmap is an encoding primarily used on windows. The Dockerfile for benchmarking should not use charmap. Can you actually reproduce this problem inside the provided docker container?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants