You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I'm trying to understand the details of the multistep sparsity training scheduler, in particular I'm learning from the config example. Here is the part that confuses me:
"params": {
"schedule": "multistep", // The type of scheduling to use for adjusting the target sparsity level"patience": 3, // A regular patience parameter for the scheduler, as for any other standard scheduler. Specified in units of scheduler steps."sparsity_target": 0.7, // Target value of the sparsity level for the model"sparsity_target_epoch": 3, // Index of the epoch from which the sparsity level of the model will be equal to spatsity_target value"sparsity_freeze_epoch": 50, // Index of the epoch from which the sparsity mask will be frozen and no longer trained"multistep_steps": [10, 20], // A list of scheduler steps at which to transition to the next scheduled sparsity level (multistep scheduler only)."multistep_sparsity_levels": [0.2, 0.5, 0.7] // Levels of sparsity to use at each step of the scheduler as specified in the 'multistep_steps' attribute. The first sparsity level will be applied immediately, so the length of this list should be larger than the length of the 'steps' by one. The last sparsity level will function as the ultimate sparsity target, overriding the "sparsity_target" setting if it is present.
},
Too me it seems that the multistep_* params are in contrast with the sparsity_* ones. According to sparsity_target and sparsity_target_epoch the schedule would be something like:
train for the first 3 epochs with an unspecified level of sparsity (i suppose it is 0 but I'm not sure)
train the remaining epochs with a sparsity level of 0.7 (it's clearly stated Index of the epoch from which the sparsity level of the model will be equal to spatsity_target value)
meanwhile the multistep_* params are describing a schedule that looks like this:
train the first 10 epochs with a sparsity level of 0.2
train from epoch 10 to 20 with a sparsity of 0.5
train the remaining epoch with a sparsity of 0.7
since these two behaviours are not really compatible it is not clear which one takes precedence over the other. As if this was not enough sparsity_freeze_epoch can be tricky in the following situation:
using this configuration will lead to a network that is only sparsified with a level of 0.3! I guess this is the intended behaviour but in my opinion it is too much error prone.
I know this is mainly due to the fact that multistep sparsity is not the only kind of schedule but the example doesn't do a good job at describing it. In my opinion a better description of the schedule would be achieved using a dictionary with epoch numbers as keys and sparsity levels as values, for example:
Hi, @lpuglia. Thanks for your feedback and proposal!
The multistep scheduler ignores sparsity_target and sparsity_target_epoch parameters and calculates these parameters from multistep_steps and multistep_sparsity_levels parameters. Based on your feedback we will update documentation (cc' @MaximProshin) to make it clear.
Your proposal looks good and we will consider it. I would mention, that you can provide the PR with your proposal for review.
Hello, I'm trying to understand the details of the multistep sparsity training scheduler, in particular I'm learning from the config example. Here is the part that confuses me:
Too me it seems that the
multistep_*
params are in contrast with thesparsity_*
ones. According tosparsity_target
andsparsity_target_epoch
the schedule would be something like:Index of the epoch from which the sparsity level of the model will be equal to spatsity_target value
)meanwhile the
multistep_*
params are describing a schedule that looks like this:since these two behaviours are not really compatible it is not clear which one takes precedence over the other. As if this was not enough
sparsity_freeze_epoch
can be tricky in the following situation:using this configuration will lead to a network that is only sparsified with a level of 0.3! I guess this is the intended behaviour but in my opinion it is too much error prone.
I know this is mainly due to the fact that multistep sparsity is not the only kind of schedule but the example doesn't do a good job at describing it. In my opinion a better description of the schedule would be achieved using a dictionary with epoch numbers as keys and sparsity levels as values, for example:
I think this is just a more immediate way to understand what the training scheduler is going to look like.
Thanks for the reply!
The text was updated successfully, but these errors were encountered: