Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OTX D-Fine Detection Algorithm Integration #4142

Open
wants to merge 52 commits into
base: develop
Choose a base branch
from

Conversation

eugene123tw
Copy link
Contributor

@eugene123tw eugene123tw commented Dec 4, 2024

Summary

OTX D-Fine Detection Algorithm Integration: https://github.com/Peterande/D-FINE

  • Introduced five variants of the D-Fine detection algorithm.
  • Integrated the HGNetv2 backbone from PaddleDetection.
  • Cleaned and optimized the original codebase by:
    • Reducing code duplication where possible.
    • Adding docstrings for all methods and functions.
    • Benchmarking OpenVINO/PyTorch detection results for accuracy and performance.

Next phase

  • Validate potential module combinations that could be unified in future iterations, such as:
    • D-Fine Decoder and RT-DETR Decoder.
    • D-Fine Hybrid Encoder and RT-DETR Decoder.
    • D-Fine Criterion and RT-DETR Criterion.
  • Validate Post-Training Optimization Tool (POT) results and assess potential accuracy drops.
  • Validate XAI feature.

How to test

  • otx train --config src/otx/recipe/detection/dfine_x.yaml --data_root DATA_ROOT
  • pytest tests/unit/algo/detection/test_dfine.py

Checklist

  • I have added unit tests to cover my changes.​
  • I have added integration tests to cover my changes.​
  • I have ran e2e tests and there is no issues.
  • I have added the description of my changes into CHANGELOG in my target branch (e.g., CHANGELOG in develop).​
  • I have updated the documentation in my target branch accordingly (e.g., documentation in develop).
  • I have linked related issues.

License

  • I submit my code changes under the same Apache License that covers the project.
    Feel free to contact the maintainers if that's a concern.
  • I have updated the license header for each file (see an example below).
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

@github-actions github-actions bot added the TEST Any changes in tests label Dec 18, 2024
@eugene123tw eugene123tw changed the title [Draft] D-Fine PoC D-Fine Detection Algorithm Dec 20, 2024
@eugene123tw eugene123tw marked this pull request as ready for review December 20, 2024 15:32
@eugene123tw eugene123tw changed the title D-Fine Detection Algorithm OTX D-Fine Detection Algorithm Integration Dec 20, 2024
@github-actions github-actions bot added the DOC Improvements or additions to documentation label Dec 20, 2024
Copy link
Collaborator

@kprokofi kprokofi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, Eugene for your great contribution!
I will try D-Fine from your branch with Intel GPUs

return output.permute(0, 2, 1)


class MSDeformableAttentionV2(nn.Module):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use this for RTDetr as well? Maybe it will be upgrade for RTDetrV2

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Secondly, I would rather put it to otx/src/otx/algo/common/layers/transformer_layers.py as done for RTDetr.


PRETRAINED_ROOT: str = "https://github.com/Peterande/storage/releases/download/dfinev1.0/"

PRETRAINED_WEIGHTS: dict[str, str] = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether we need all of these variants? We are currently overwhelmed with detection recipes. Could we choose maybe 2 models to expose and omit others? The largest one shows the best performance and it is a candidate for Geti largest template revamp, but other templates seems to be not so beneficial comparing with already introduced models.
So, I would consider cleaning some model versions here (same concerns RTDetr and YOLOX, but it is another story)

)


def distance2bbox(points: Tensor, distance: Tensor, reg_scale: Tensor) -> Tensor:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe put this to utils?

return box_convert(bboxes, in_fmt="xyxy", out_fmt="cxcywh")


def deformable_attention_core_func_v2(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about the location, maybe: otx/src/otx/algo/modules/transformer.py?

class HybridEncoderModule(nn.Module):
"""HybridEncoder for DFine.

TODO(Eugene): Merge with current rtdetr.HybridEncoderModule in next PR.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -3921,3 +3921,44 @@ def _dispatch_transform(cls, cfg_transform: DictConfig | dict | tvt_v2.Transform
raise TypeError(msg)

return transform


class RandomIoUCrop(tvt_v2.RandomIoUCrop):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use already defined RandomIOUCrop in this file, the performance issues occur?
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DOC Improvements or additions to documentation TEST Any changes in tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants