Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA Out of memory when training SwinIR with L2 Loss #177

Open
IMasI2Cat opened this issue Feb 27, 2024 · 1 comment
Open

CUDA Out of memory when training SwinIR with L2 Loss #177

IMasI2Cat opened this issue Feb 27, 2024 · 1 comment

Comments

@IMasI2Cat
Copy link

Hi, I'm trying to run a SwinIR training. I followed as a template the json in options/swinir/train_swinir_sr_realworld_x4_psnr.json for my own data (RGB) with just some slight modifications (i.e. dataset_type to "sr", LR, checkpoints, paths to my data and dataloader_num_workers to 1). I have been working with this configuration for several experiments with no issues. However, now I tried to run it with L2 loss ("G_lossfn_type": "l2" ) and I'm having a CUDA Out of memory error on a certain step of the first validation. I also tried "l2sum" and the same happens, as well as I tried to reduce batch size to 1. I'm copying the error trace. Any idea would help.

Traceback (most recent call last):                                                                                                                                                                                 
  File "/home/imas/code/Repos/KAIR/main_train_psnr.py", line 281, in <module>                                                                                                                             
    main()                                                                                                                                                                                                         
  File "/home/imas/code/Repos/KAIR/main_train_psnr.py", line 242, in main                                                                                                                                 
    model.test()                                                                                                                                                                                                   
  File "/home/imas/code/Repos/KAIR/models/model_plain.py", line 219, in test                                                                                                                              
    self.netG_forward()                                                                                                                                                                                            
  File "/home/imas/code/Repos/KAIR/models/model_plain.py", line 169, in netG_forward                                                                                                                      
    self.E = self.netG(self.L)                                                                                                                                                                                     
  File "/home/imas/miniconda3/envs/pytorch3d2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl                                                                             
    return forward_call(*input, **kwargs)                                                                                                                                                                          
  File "/home/imas/miniconda3/envs/pytorch3d2/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py", line 169, in forward                                                                         
    return self.module(*inputs[0], **kwargs[0])                                                                                                                                                                    
  File "/home/imas/miniconda3/envs/pytorch3d2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl                                                                             
    return forward_call(*input, **kwargs)                                                                                                                                                                          
  File "/home/imas/code/Repos/KAIR/models/network_swinir.py", line 826, in forward                                                                                                                        
    x = self.conv_after_body(self.forward_features(x)) + x                                                                                                                                                         
  File "/home/imas/code/Repos/KAIR/models/network_swinir.py", line 798, in forward_features                                                                                                               
    x = layer(x, x_size)                                                                                                                                                                                           
  File "/home/imas/miniconda3/envs/pytorch3d2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl                                                                             
    return forward_call(*input, **kwargs)                                                                                                                                                                          
  File "/home/imas/code/Repos/KAIR/models/network_swinir.py", line 482, in forward                                                                                                                        
    return self.patch_embed(self.conv(self.patch_unembed(self.residual_group(x, x_size), x_size))) + x                                                                                                             
  File "/home/imas/miniconda3/envs/pytorch3d2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl                                                                             
    return forward_call(*input, **kwargs)                                                                                                                                                                          
  File "/home/imas/code/Repos/KAIR/models/network_swinir.py", line 402, in forward                                                                                                                        
    x = blk(x, x_size)                                                                                                                                                                                             
  File "/home/imas/miniconda3/envs/pytorch3d2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl                                                                             
    return forward_call(*input, **kwargs)                                                                                                                                                                          
  File "/home/imas/code/Repos/KAIR/models/network_swinir.py", line 262, in forward                                                                                                                        
    attn_windows = self.attn(x_windows, mask=self.calculate_mask(x_size).to(x.device))                                                                                                                             
  File "/home/imas/miniconda3/envs/pytorch3d2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl                                                                             
    return forward_call(*input, **kwargs)                                                                                                                                                                          
  File "/home/imas/code/Repos/KAIR/models/network_swinir.py", line 130, in forward                                                                                                                        
    attn = attn + relative_position_bias.unsqueeze(0)                                                                                                                                                              
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.05 GiB (GPU 0; 23.68 GiB total capacity; 17.74 GiB already allocated; 796.50 MiB free; 21.27 GiB reserved in total by PyTorch) If reserved mem
ory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF                                                              
srun: error: i2c01: task 0: Exited with exit code 1 
@YounZh
Copy link

YounZh commented Dec 5, 2024

Hi, I'm trying to run a SwinIR training. I followed as a template the json in options/swinir/train_swinir_sr_realworld_x4_psnr.json for my own data (RGB) with just some slight modifications (i.e. dataset_type to "sr", LR, checkpoints, paths to my data and dataloader_num_workers to 1). I have been working with this configuration for several experiments with no issues. However, now I tried to run it with L2 loss ("G_lossfn_type": "l2" ) and I'm having a CUDA Out of memory error on a certain step of the first validation. I also tried "l2sum" and the same happens, as well as I tried to reduce batch size to 1. I'm copying the error trace. Any idea would help.

Traceback (most recent call last):                                                                                                                                                                                 
  File "/home/imas/code/Repos/KAIR/main_train_psnr.py", line 281, in <module>                                                                                                                             
    main()                                                                                                                                                                                                         
  File "/home/imas/code/Repos/KAIR/main_train_psnr.py", line 242, in main                                                                                                                                 
    model.test()                                                                                                                                                                                                   
  File "/home/imas/code/Repos/KAIR/models/model_plain.py", line 219, in test                                                                                                                              
    self.netG_forward()                                                                                                                                                                                            
  File "/home/imas/code/Repos/KAIR/models/model_plain.py", line 169, in netG_forward                                                                                                                      
    self.E = self.netG(self.L)                                                                                                                                                                                     
  File "/home/imas/miniconda3/envs/pytorch3d2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl                                                                             
    return forward_call(*input, **kwargs)                                                                                                                                                                          
  File "/home/imas/miniconda3/envs/pytorch3d2/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py", line 169, in forward                                                                         
    return self.module(*inputs[0], **kwargs[0])                                                                                                                                                                    
  File "/home/imas/miniconda3/envs/pytorch3d2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl                                                                             
    return forward_call(*input, **kwargs)                                                                                                                                                                          
  File "/home/imas/code/Repos/KAIR/models/network_swinir.py", line 826, in forward                                                                                                                        
    x = self.conv_after_body(self.forward_features(x)) + x                                                                                                                                                         
  File "/home/imas/code/Repos/KAIR/models/network_swinir.py", line 798, in forward_features                                                                                                               
    x = layer(x, x_size)                                                                                                                                                                                           
  File "/home/imas/miniconda3/envs/pytorch3d2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl                                                                             
    return forward_call(*input, **kwargs)                                                                                                                                                                          
  File "/home/imas/code/Repos/KAIR/models/network_swinir.py", line 482, in forward                                                                                                                        
    return self.patch_embed(self.conv(self.patch_unembed(self.residual_group(x, x_size), x_size))) + x                                                                                                             
  File "/home/imas/miniconda3/envs/pytorch3d2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl                                                                             
    return forward_call(*input, **kwargs)                                                                                                                                                                          
  File "/home/imas/code/Repos/KAIR/models/network_swinir.py", line 402, in forward                                                                                                                        
    x = blk(x, x_size)                                                                                                                                                                                             
  File "/home/imas/miniconda3/envs/pytorch3d2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl                                                                             
    return forward_call(*input, **kwargs)                                                                                                                                                                          
  File "/home/imas/code/Repos/KAIR/models/network_swinir.py", line 262, in forward                                                                                                                        
    attn_windows = self.attn(x_windows, mask=self.calculate_mask(x_size).to(x.device))                                                                                                                             
  File "/home/imas/miniconda3/envs/pytorch3d2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl                                                                             
    return forward_call(*input, **kwargs)                                                                                                                                                                          
  File "/home/imas/code/Repos/KAIR/models/network_swinir.py", line 130, in forward                                                                                                                        
    attn = attn + relative_position_bias.unsqueeze(0)                                                                                                                                                              
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.05 GiB (GPU 0; 23.68 GiB total capacity; 17.74 GiB already allocated; 796.50 MiB free; 21.27 GiB reserved in total by PyTorch) If reserved mem
ory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF                                                              
srun: error: i2c01: task 0: Exited with exit code 1 

您好,我也遇到了这个问题,请问有找到问题所在吗?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants