Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sparsity multitstep config is confusing and error prone #909

Open
lpuglia opened this issue Aug 26, 2021 · 1 comment
Open

Sparsity multitstep config is confusing and error prone #909

lpuglia opened this issue Aug 26, 2021 · 1 comment
Assignees

Comments

@lpuglia
Copy link

lpuglia commented Aug 26, 2021

Hello, I'm trying to understand the details of the multistep sparsity training scheduler, in particular I'm learning from the config example. Here is the part that confuses me:

    "params": {
            "schedule": "multistep",  // The type of scheduling to use for adjusting the target sparsity level
            "patience": 3, // A regular patience parameter for the scheduler, as for any other standard scheduler. Specified in units of scheduler steps.
            "sparsity_target": 0.7, // Target value of the sparsity level for the model
            "sparsity_target_epoch": 3, // Index of the epoch from which the sparsity level of the model will be equal to spatsity_target value
            "sparsity_freeze_epoch": 50, // Index of the epoch from which the sparsity mask will be frozen and no longer trained
            "multistep_steps": [10, 20], // A list of scheduler steps at which to transition to the next scheduled sparsity level (multistep scheduler only).
            "multistep_sparsity_levels": [0.2, 0.5, 0.7] // Levels of sparsity to use at each step of the scheduler as specified in the 'multistep_steps' attribute. The first sparsity level will be applied immediately, so the length of this list should be larger than the length of the 'steps' by one. The last sparsity level will function as the ultimate sparsity target, overriding the "sparsity_target" setting if it is present.
    },

Too me it seems that the multistep_* params are in contrast with the sparsity_* ones. According to sparsity_target and sparsity_target_epoch the schedule would be something like:

  1. train for the first 3 epochs with an unspecified level of sparsity (i suppose it is 0 but I'm not sure)
  2. train the remaining epochs with a sparsity level of 0.7 (it's clearly stated Index of the epoch from which the sparsity level of the model will be equal to spatsity_target value)

meanwhile the multistep_* params are describing a schedule that looks like this:

  1. train the first 10 epochs with a sparsity level of 0.2
  2. train from epoch 10 to 20 with a sparsity of 0.5
  3. train the remaining epoch with a sparsity of 0.7

since these two behaviours are not really compatible it is not clear which one takes precedence over the other. As if this was not enough sparsity_freeze_epoch can be tricky in the following situation:

"sparsity_freeze_epoch": 20
"multistep_steps": [20]
"multistep_sparsity_levels": [0.3, 0.5] 

using this configuration will lead to a network that is only sparsified with a level of 0.3! I guess this is the intended behaviour but in my opinion it is too much error prone.

I know this is mainly due to the fact that multistep sparsity is not the only kind of schedule but the example doesn't do a good job at describing it. In my opinion a better description of the schedule would be achieved using a dictionary with epoch numbers as keys and sparsity levels as values, for example:

"params" : {
    "multistep_scheduler" : {
        "0" : 0.0,
        "10" : 0.2,
        "20" : 0.5,
        "29" : 0.7,
        "30" : "freeze"
    }
}

I think this is just a more immediate way to understand what the training scheduler is going to look like.

Thanks for the reply!

@alexsu52
Copy link
Contributor

Hi, @lpuglia. Thanks for your feedback and proposal!

  • The multistep scheduler ignores sparsity_target and sparsity_target_epoch parameters and calculates these parameters from multistep_steps and multistep_sparsity_levels parameters. Based on your feedback we will update documentation (cc' @MaximProshin) to make it clear.
  • Your proposal looks good and we will consider it. I would mention, that you can provide the PR with your proposal for review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants