-
Notifications
You must be signed in to change notification settings - Fork 697
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Apply transforms in PreProcessor #2467
base: release/v2.0.0
Are you sure you want to change the base?
Apply transforms in PreProcessor #2467
Conversation
Yes I also thought about the problem and what you now did, adding augmentations to the dataset, is the onyl solution to this.. |
@alexriedel1 As you've seen, I re-introduced transforms on the dataset/datamodule side, but now named as Motivation: In addition to the transforms applied by the This way, we keep the model-specific transforms on the model side, which has the advantages mentioned in my earlier comment. At the same time, the augmentation transforms on the datamodule side make it easier for the user to define their custom augmentation pipeline. The main advantage over earlier designs is that the user now only has to define the augmentations, and does not need to include the model-specific transforms such as resizing and normalization (because these are handled separately by the Concerning resizing:
Please let me know what you think, any suggestions are more than welcome :) |
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
This reverts commit 151a179.
Yes I (and most users too I guess) am happy with this! You should make clear in the docs that preprocessing transforms are stored in the model after export while augmentations are not, so people know exactly which one to use for their use case. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## release/v2.0.0 #2467 +/- ##
==================================================
+ Coverage 78.50% 78.58% +0.08%
==================================================
Files 303 306 +3
Lines 12955 12970 +15
==================================================
+ Hits 10170 10193 +23
+ Misses 2785 2777 -8
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
val_split_ratio: float, | ||
train_augmentations: Transform | None = None, | ||
val_augmentations: Transform | None = None, | ||
test_augmentations: Transform | None = None, | ||
augmentations: Transform | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the naming, are we 100% clear that augmentations
is the best naming in this case? The reason for asking is the annotation for augmentations
is, obviously, Transform
.
Would it make absolute sense to users?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok so there are transforms that can be pre-defined model specific and they are stored in the exported model.
And there are augmentations that are only applied during training.
Both can be defined separately for training, testing and validation.
Maybe it would make sense to call the transforms something like input_image_transforms
? Or something that makes clearer that it is "baked" into the model and the last thing that transforms images before being processed by the model..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the datamodule side, I think augmentations
is a more accurate and descriptive name than transforms
. The augmentations
naming signals that these are meant to define the data augmentation pipeline. transforms
is more ambiguous and can refer to both data augmentations and model-specific input image pre-processing.
On the model/pre-processor side, we could think of a more descriptive name than transform
, maybeinput_transforms
? In any case we should ensure that there is a thorough how-to guide that clearly explains these concepts.
Both can be defined separately for training, testing and validation.
@alexriedel1 Please note that in the latest version of this PR, we only keep 1 transform on the PreProcessor
side. Since the model always requires the inputs in the same format, we figured there was no longer a use-case for having separate model transforms between the different subsets (but please let us know if you disagree).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alexriedel1 Please note that in the latest version of this PR, we only keep 1 transform on the
PreProcessor
side. Since the model always requires the inputs in the same format, we figured there was no longer a use-case for having separate model transforms between the different subsets (but please let us know if you disagree).
Yes thats a good idea!
📝 Description
PreProcessor
. This is more in line with the other auxiliary components such as post-processing and evaluation, where the full functionality of the aux operation is contained in a single class.✨ Changes
Select what type of change your PR is:
✅ Checklist
Before you submit your pull request, please make sure you have completed the following steps:
For more information about code review checklists, see the Code Review Checklist.