Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update retrieve_data in era5.py #414

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

TimFuermann
Copy link

@TimFuermann TimFuermann commented Dec 3, 2024

  • changed download_format to .zip
  • added entcrypting of .zip file from download and merging of contained .nc files
  • add url

Closes # (if applicable).

Changes proposed in this Pull Request

Checklist

  • Code changes are sufficiently documented; i.e. new functions contain docstrings and further explanations may be given in doc.
  • Unit tests for new features were added (if applicable).
  • Newly introduced dependencies are added to environment.yaml, environment_docs.yaml and setup.py (if applicable).
  • A note for the release notes doc/release_notes.rst of the upcoming release is included.
  • I consent to the release of this PR's code under the MIT license.

TimFuermann and others added 2 commits December 3, 2024 16:36
- changed download_format to .zip
- added entcrypting of .zip file from download and merging of contained .nc files
- add url
@TimFuermann
Copy link
Author

TimFuermann commented Dec 3, 2024

This pull request should solve the issue mentioned in #413

Copy link
Member

@lkstrp lkstrp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @TimFuermann ! I just have a couple of questions

Comment on lines +344 to +346
# Set url for data download, this allows to switch to different data
# sources more easily.
url = "https://cds.climate.copernicus.eu/api"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would you switch to other sources? Could you explain?
That should probably be a constant at module level, and maybe an optional argument

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Lukas, you are right, I just added it here for later improvement. For example, I have a local version in which I added the glofas data, but they are stored under a different url, it is not cds, but rather ads. Switching the url in the dataset is then just simpler then going to the "config" file of the cds-api. Furthermore, with a fixed url in the data retrieval, changes can be made more easily. It was just a comment here and will be taken up in future additions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, that makes sense. But I don't think we want to hardcode the url in here without the ability to pass it as an argument. We are basically overriding any urls a user may have set up manually via the cds api config file without being able to change it. So just move it to an extra argument to get this merged. Future additions we can make anyways

atlite/datasets/era5.py Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
TimFuermann and others added 2 commits December 9, 2024 09:32
This needs to be done, as the new cds api takes only string values for a valid data call.
@lkstrp
Copy link
Member

lkstrp commented Dec 9, 2024

Next to the comments, please add a release note. Afterwards we could merge this. Thank you !

@lkstrp lkstrp linked an issue Dec 9, 2024 that may be closed by this pull request
2 tasks
Co-authored-by: Lukas Trippe <[email protected]>
Copy link
Member

@fneum fneum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @TimFuermann!

As outlined in the corresponding issue, I would first like to reproduce the problem to see if this change is necessary.

Second thing is that I would like to see a brief performance check in terms of memory and time.

"month": str(time[0].month),
"day": str(time[0].day),
"year": [str(time[0].year)],
"month": [str(time[0].month).zfill(2)],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If leading zeros are required, perhaps this is better (and "%d", "%Y" for day and year):

Suggested change
"month": [str(time[0].month).zfill(2)],
"month": [time[0].strftime("%m")],

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Changed Dataformat from ERA5 Single Levels: .nc to .zip
3 participants