Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dirpath isn't updated when logger chages dir after first run #20092

Open
ScarWar opened this issue Jul 16, 2024 · 2 comments · May be fixed by #20202
Open

dirpath isn't updated when logger chages dir after first run #20092

ScarWar opened this issue Jul 16, 2024 · 2 comments · May be fixed by #20202
Labels
bug Something isn't working ver: 2.2.x

Comments

@ScarWar
Copy link

ScarWar commented Jul 16, 2024

Bug description

I'm using the great library https://github.com/SkafteNicki/pl_crossvalidate to cross validate in my project. The library is overriding some of the internal behavior of the trainer and the logs directory.

The checkpoint path is resolved once during the first fold then it is short circuited and therefore never resolving to the new fold directory.

I suggest moving some of the initialization into the setup method

What version are you seeing the problem on?

master

How to reproduce the bug

trainer = KFoldTrainer(
        num_folds=training_args['folds'],
        max_epochs=training_args['epochs'],
        accelerator="gpu",
        callbacks=[
            ModelCheckpoint(
                monitor=training_args['monitor_metric'],
                save_top_k=1,
                mode=training_args['metric_mode'],
                verbose=True
            )
        ]
    )

Error messages and logs

No response

Environment

Current environment
  • CUDA:
    - GPU:
    - NVIDIA GeForce RTX 3050 Laptop GPU
    - available: True
    - version: 11.8
  • Lightning:
    - lightning: 2.3.1
    - lightning-cloud: 0.5.37
    - lightning-utilities: 0.11.3.post0
    - pytorch-lightning: 2.0.4
    - torch: 2.0.0
    - torch-cluster: 1.6.1
    - torch-geometric: 2.3.0
    - torch-scatter: 2.1.1
    - torchaudio: 2.0.0
    - torchmetrics: 1.4.0.post0
    - torchvision: 0.15.2a0
  • Packages:
    - absl-py: 1.4.0
    - addict: 2.4.0
    - aiofiles: 22.1.0
    - aiohttp: 3.8.5
    - aiosignal: 1.3.1
    - aiosqlite: 0.18.0
    - albumentations: 1.3.1
    - alphashape: 1.3.1
    - anyio: 3.5.0
    - appdirs: 1.4.4
    - argon2-cffi: 21.3.0
    - argon2-cffi-bindings: 21.2.0
    - arrow: 1.2.3
    - asttokens: 2.2.1
    - async-timeout: 4.0.2
    - attrs: 23.1.0
    - azure-core: 1.29.4
    - azure-identity: 1.14.0
    - azure-storage-blob: 12.18.2
    - babel: 2.11.0
    - backcall: 0.2.0
    - beautifulsoup4: 4.12.2
    - binaryornot: 0.4.4
    - bleach: 4.1.0
    - blessed: 1.19.1
    - blinker: 1.6.2
    - bottleneck: 1.3.5
    - branca: 0.6.0
    - brotlipy: 0.7.0
    - build: 0.10.0
    - cachecontrol: 0.13.1
    - cached-property: 1.5.2
    - cachetools: 5.3.1
    - certifi: 2024.6.2
    - cffi: 1.16.0
    - chardet: 5.2.0
    - charset-normalizer: 3.3.0
    - chex: 0.1.83
    - cleo: 2.0.1
    - click: 8.1.3
    - click-log: 0.4.0
    - click-plugins: 1.1.1
    - cligj: 0.7.2
    - colorama: 0.4.6
    - coloredlogs: 15.0.1
    - comm: 0.1.2
    - configargparse: 1.5.3
    - contourpy: 1.0.7
    - cookiecutter: 2.4.0
    - crashtest: 0.4.1
    - croniter: 1.3.15
    - cryptography: 41.0.4
    - cycler: 0.11.0
    - cython: 0.29.37
    - daal: 2024.5.0
    - daal4py: 2024.5.0
    - dash: 2.9.1
    - dash-core-components: 2.0.0
    - dash-html-components: 2.0.0
    - dash-table: 5.0.0
    - dataclasses: 0.8
    - dateutils: 0.6.12
    - debugpy: 1.6.6
    - decorator: 5.1.1
    - deepdiff: 6.3.1
    - defusedxml: 0.7.1
    - dill: 0.3.6
    - distlib: 0.3.7
    - dm-tree: 0.1.8
    - docopt: 0.6.2
    - docutils: 0.20.1
    - dulwich: 0.21.6
    - easydict: 1.10
    - elastic-transport: 8.4.1
    - elasticsearch: 8.10.0
    - entrypoints: 0.4
    - exceptiongroup: 1.0.4
    - executing: 1.2.0
    - fastapi: 0.100.0
    - fastjsonschema: 2.16.3
    - filelock: 3.12.4
    - fiona: 1.8.22
    - flask: 2.2.3
    - flatbuffers: 23.5.26
    - flax: 0.6.1
    - folium: 0.14.0
    - fonttools: 4.39.2
    - freetype-py: 2.4.0
    - frozenlist: 1.4.0
    - fsspec: 2023.6.0
    - future: 1.0.0
    - gdal: 3.5.3
    - geomloss: 0.2.6
    - geopandas: 0.12.2
    - gmpy2: 2.1.2
    - google-auth: 2.22.0
    - google-auth-oauthlib: 1.0.0
    - gpustat: 1.0.0
    - grpcio: 1.54.2
    - h11: 0.14.0
    - h5py: 3.8.0
    - hdbscan: 0.8.37
    - html5lib: 1.1
    - humanfriendly: 10.0
    - idna: 3.4
    - imageio: 2.31.1
    - importlib-metadata: 6.8.0
    - importlib-resources: 5.12.0
    - iniconfig: 1.1.1
    - inquirer: 3.1.3
    - insightface: 0.7.3
    - installer: 0.7.0
    - ipykernel: 6.22.0
    - ipython: 8.11.0
    - ipython-genutils: 0.2.0
    - ipywidgets: 8.0.4
    - isodate: 0.6.1
    - itsdangerous: 2.1.2
    - jaraco.classes: 3.3.0
    - jax: 0.4.13
    - jaxlib: 0.4.12
    - jedi: 0.18.2
    - jeepney: 0.8.0
    - jinja2: 3.1.2
    - joblib: 1.2.0
    - json5: 0.9.6
    - jsonpatch: 1.32
    - jsonpointer: 2.1
    - jsons: 1.6.3
    - jsonschema: 4.17.3
    - jupyter-client: 8.1.0
    - jupyter-core: 5.3.0
    - jupyter-events: 0.6.3
    - jupyter-server: 2.5.0
    - jupyter-server-fileid: 0.9.0
    - jupyter-server-terminals: 0.4.4
    - jupyter-server-ydoc: 0.8.0
    - jupyter-ydoc: 0.2.4
    - jupyterlab: 3.6.3
    - jupyterlab-pygments: 0.1.2
    - jupyterlab-server: 2.22.0
    - jupyterlab-widgets: 3.0.5
    - keyring: 24.2.0
    - kivy: 2.2.1
    - kiwisolver: 1.4.4
    - kneed: 0.8.2
    - lazy-loader: 0.3
    - lightning: 2.3.1
    - lightning-cloud: 0.5.37
    - lightning-utilities: 0.11.3.post0
    - llvmlite: 0.40.1
    - lockfile: 0.12.2
    - lxml: 4.9.1
    - mamba-gator: 5.2.0
    - mapclassify: 2.5.0
    - markdown: 3.4.4
    - markdown-it-py: 2.2.0
    - markupsafe: 2.1.2
    - mat73: 0.60
    - matplotlib: 3.8.4
    - matplotlib-inline: 0.1.6
    - mdurl: 0.1.0
    - mistune: 0.8.4
    - mkl-fft: 1.3.6
    - mkl-random: 1.2.2
    - mkl-service: 2.4.0
    - ml-dtypes: 0.4.0
    - more-itertools: 10.1.0
    - mpi4py: 3.1.4
    - mpmath: 1.3.0
    - msal: 1.24.1
    - msal-extensions: 1.0.0
    - msgpack: 1.0.7
    - multidict: 6.0.4
    - munch: 2.5.0
    - munkres: 1.1.4
    - nbclassic: 0.5.5
    - nbclient: 0.5.13
    - nbconvert: 6.5.4
    - nbformat: 5.7.0
    - nest-asyncio: 1.5.6
    - networkx: 3.1
    - notebook: 6.5.4
    - notebook-shim: 0.2.2
    - numba: 0.57.1
    - numexpr: 2.8.4
    - numpy: 1.24.3
    - nvidia-ml-py: 11.495.46
    - oauthlib: 3.2.2
    - onnx: 1.14.1
    - onnxruntime-gpu: 1.16.0
    - open3d: 0.17.0
    - opencv-python-headless: 4.7.0.72
    - opt-einsum: 3.3.0
    - optax: 0.2.2
    - ordered-set: 4.1.0
    - orjson: 3.9.2
    - packaging: 23.2
    - pandas: 1.5.3
    - pandocfilters: 1.5.0
    - parso: 0.8.3
    - patsy: 0.5.3
    - pexpect: 4.8.0
    - pickleshare: 0.7.5
    - pillow: 9.4.0
    - pip: 23.2.1
    - pipreqs: 0.4.11
    - pkginfo: 1.9.6
    - pl-crossvalidate: 0.1.0
    - platformdirs: 3.11.0
    - plotly: 5.13.1
    - pluggy: 1.0.0
    - poetry: 1.6.1
    - poetry-core: 1.7.0
    - poetry-plugin-export: 1.5.0
    - pooch: 1.4.0
    - portalocker: 2.8.2
    - pot: 0.9.0
    - pretty-errors: 1.2.25
    - prettytable: 3.9.0
    - prometheus-client: 0.14.1
    - prompt-toolkit: 3.0.38
    - protobuf: 4.21.12
    - psutil: 5.9.4
    - ptyprocess: 0.7.0
    - pure-eval: 0.2.2
    - py: 1.11.0
    - pyasn1: 0.4.8
    - pyasn1-modules: 0.2.8
    - pybind11: 2.11.1
    - pycparser: 2.21
    - pydantic: 1.10.10
    - pydiffmap: 0.2.0.1
    - pyglet: 1.5.27
    - pygments: 2.14.0
    - pygsp: 0.5.1
    - pyjwt: 2.7.0
    - pynndescent: 0.5.10
    - pyopengl: 3.1.6
    - pyopenssl: 23.2.0
    - pyparsing: 3.0.9
    - pyproj: 3.5.0
    - pyproject-hooks: 1.0.0
    - pyquaternion: 0.9.9
    - pyrender: 0.1.45
    - pyrsistent: 0.19.3
    - pysocks: 1.7.1
    - pytest: 7.4.0
    - python-dateutil: 2.8.2
    - python-editor: 1.0.4
    - python-json-logger: 2.0.7
    - python-multipart: 0.0.6
    - python-slugify: 8.0.1
    - pytorch-lightning: 2.0.4
    - pytz: 2022.7.1
    - pyu2f: 0.1.5
    - pyyaml: 6.0
    - pyzmq: 25.0.2
    - qudida: 0.0.4
    - rapidfuzz: 2.15.2
    - readchar: 4.0.5.dev0
    - requests: 2.31.0
    - requests-oauthlib: 1.3.1
    - requests-toolbelt: 1.0.0
    - rfc3339-validator: 0.1.4
    - rfc3986-validator: 0.1.1
    - rich: 13.3.5
    - rsa: 4.9
    - rtree: 1.0.1
    - scienceplots: 2.1.1
    - scikit-image: 0.22.0
    - scikit-learn: 1.3.0
    - scikit-learn-intelex: 20230131.200013
    - scipy: 1.10.1
    - seaborn: 0.13.2
    - secretstorage: 3.3.3
    - send2trash: 1.8.0
    - setuptools: 68.0.0
    - shapely: 2.0.1
    - shellingham: 1.5.3
    - six: 1.16.0
    - sniffio: 1.2.0
    - soupsieve: 2.4
    - stack-data: 0.6.2
    - starlette: 0.27.0
    - starsessions: 1.3.0
    - statsmodels: 0.14.0
    - sympy: 1.11.1
    - tabulate: 0.9.0
    - tbb: 2021.13.0
    - tenacity: 8.2.2
    - tensorboard: 2.13.0
    - tensorboard-data-server: 0.7.0
    - terminado: 0.17.1
    - text-unidecode: 1.3
    - threadpoolctl: 3.1.0
    - tifffile: 2023.9.26
    - tinycss2: 1.2.1
    - tomli: 2.0.1
    - tomlkit: 0.12.1
    - toolz: 0.12.1
    - torch: 2.0.0
    - torch-cluster: 1.6.1
    - torch-geometric: 2.3.0
    - torch-scatter: 2.1.1
    - torchaudio: 2.0.0
    - torchmetrics: 1.4.0.post0
    - torchvision: 0.15.2a0
    - tornado: 6.2
    - tqdm: 4.66.4
    - traitlets: 5.9.0
    - transforms3d: 0.4.1
    - trimesh: 3.21.5
    - triton: 2.0.0
    - trove-classifiers: 2023.10.17
    - typing-extensions: 4.5.0
    - typish: 1.9.3
    - umap-learn: 0.5.3
    - urllib3: 2.0.7
    - uvicorn: 0.22.0
    - virtualenv: 20.24.5
    - visdom: 0.2.4
    - wcwidth: 0.2.6
    - webencodings: 0.5.1
    - websocket-client: 0.58.0
    - websockets: 11.0.3
    - werkzeug: 2.2.3
    - wheel: 0.38.4
    - widgetsnbextension: 4.0.5
    - xyzservices: 2023.2.0
    - y-py: 0.5.9
    - yarg: 0.1.9
    - yarl: 1.9.2
    - ypy-websocket: 0.8.2
    - zipp: 3.17.0
  • System:
    - OS: Linux
    - architecture:
    - 64bit
    - ELF
    - processor: x86_64
    - python: 3.9.16
    - release: 5.15.153.1-microsoft-standard-WSL2
    - version: Proposal for help #1 SMP Fri Mar 29 23:14:13 UTC 2024

More info

Expected behavior

When new fold is created, the checkpoint path should be changed to the new directory

Current behavior

The checkpoint path is resolved once during the first fold then it is short circuited and therefore never resolving to the new fold directory.

@ScarWar ScarWar added bug Something isn't working needs triage Waiting to be triaged by maintainers labels Jul 16, 2024
@awaelchli
Copy link
Contributor

Hey @ScarWar I saw the PR. Though my question is does this have to be changed in Lightning or is it possible to address it in the pl_crossvalidate library directly?

@awaelchli awaelchli removed the needs triage Waiting to be triaged by maintainers label Aug 3, 2024
@ScarWar
Copy link
Author

ScarWar commented Aug 15, 2024

Hey @awaelchli,

After some experimentation, I've identified that the issue seems to stem from the setup method not handling all the necessary initialization logic. This leads to problems when the same callback instance is used multiple times.

I believe a more robust approach would be to have the setup method encapsulate all the initialization logic and reset the internal state of the instance. This would ensure that each time the callback is used, it starts with a clean state.

EDIT:

added a PR #20202

@ScarWar ScarWar linked a pull request Aug 15, 2024 that will close this issue
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working ver: 2.2.x
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants