huggingface DeformableDetr code reading by long8v · Pull Request #74 · long8v/PTIR

long8v · 2022-09-21T08:34:35Z

원본 코드는 https://github.com/fundamentalvision/Deformable-DETR/blob/main/models/deformable_transformer.py 인데 어차피 transformers 쓸거니 transformers 코드 보자 https://github.com/huggingface/transformers/tree/main/src/transformers/models/deformable_detr

long8v

한번 날리고.. 리뷰~

long8v · 2022-09-23T07:25:54Z

deformable_detr/configuration_deformable_detr.py

+}
+
+
+class DeformableDetrConfig(PretrainedConfig):


long8v · 2022-09-23T07:30:49Z

deformable_detr/configuration_deformable_detr.py

+        encoder_n_points (`int`, *optional*, defaults to 4):
+            The number of sampled keys in each feature level for each attention head in the encoder.
+        decoder_n_points (`int`, *optional*, defaults to 4):
+            The number of sampled keys in each feature level for each attention head in the decoder.


deformable attention에서 필요한 config

long8v · 2022-09-23T07:31:38Z

deformable_detr/configuration_deformable_detr.py

+        two_stage (`bool`, *optional*, defaults to `False`):
+            Whether to apply a two-stage deformable DETR, where the region proposals are also generated by a variant of
+            Deformable DETR, which are further fed into the decoder for iterative bounding box refinement.
+        two_stage_num_proposals (`int`, *optional*, defaults to 300):
+            The number of region proposals to be generated, in case `two_stage` is set to `True`.


two-stage model 관련 config들

long8v · 2022-09-23T07:32:04Z

deformable_detr/configuration_deformable_detr.py

+        with_box_refine (`bool`, *optional*, defaults to `False`):
+            Whether to apply iterative bounding box refinement, where each decoder layer refines the bounding boxes
+            based on the predictions from the previous layer.


with_box_refine : 이전 레이어에서 나온 bbox를 초기값으로 사용함

long8v · 2022-09-23T07:32:20Z

deformable_detr/configuration_deformable_detr.py

+        num_feature_levels (`int`, *optional*, defaults to 4):
+            The number of input feature levels.


multi-scale 관련

long8v · 2022-09-23T07:57:19Z

deformable_detr/modeling_deformable_detr.py

+        if self.config.two_stage:
+            object_query_embedding, output_proposals = self.gen_encoder_output_proposals(
+                encoder_outputs[0], ~mask_flatten, spatial_shapes
+            )
+
+            # hack implementation for two-stage Deformable DETR
+            # apply a detection head to each pixel (A.4 in paper)
+            # linear projection for bounding box binary classification (i.e. foreground and background)
+            enc_outputs_class = self.decoder.class_embed[-1](object_query_embedding)
+            # 3-layer FFN to predict bounding boxes coordinates (bbox regression branch)
+            delta_bbox = self.decoder.bbox_embed[-1](object_query_embedding)
+            enc_outputs_coord_logits = delta_bbox + output_proposals
+
+            # only keep top scoring `config.two_stage_num_proposals` proposals
+            topk = self.config.two_stage_num_proposals
+            topk_proposals = torch.topk(enc_outputs_class[..., 0], topk, dim=1)[1]
+            topk_coords_logits = torch.gather(
+                enc_outputs_coord_logits, 1, topk_proposals.unsqueeze(-1).repeat(1, 1, 4)
+            )
+
+            topk_coords_logits = topk_coords_logits.detach()
+            reference_points = topk_coords_logits.sigmoid()
+            init_reference_points = reference_points
+            pos_trans_out = self.pos_trans_norm(self.pos_trans(self.get_proposal_pos_embed(topk_coords_logits)))
+            query_embed, target = torch.split(pos_trans_out, num_channels, dim=2)


two-stage는 일단 모든 픽셀에 대해 bbox들을 뽑고 top k의 bbox coordinate을 positional embedding을 query_embed로 줌

long8v · 2022-09-23T07:57:59Z

deformable_detr/modeling_deformable_detr.py

+    """,
+    DEFORMABLE_DETR_START_DOCSTRING,
+)
+class DeformableDetrForObjectDetection(DeformableDetrPreTrainedModel):


OD를 위한 head

long8v · 2022-09-23T07:58:19Z

deformable_detr/modeling_deformable_detr.py

+
+    @add_start_docstrings_to_model_forward(DEFORMABLE_DETR_INPUTS_DOCSTRING)
+    @replace_return_docstrings(output_type=DeformableDetrObjectDetectionOutput, config_class=_CONFIG_FOR_DOC)
+    def forward(


long8v · 2022-09-23T07:59:11Z

deformable_detr/modeling_deformable_detr.py

+        for level in range(hidden_states.shape[0]):
+            if level == 0:
+                reference = init_reference
+            else:
+                reference = inter_references[level - 1]
+            reference = inverse_sigmoid(reference)
+            outputs_class = self.class_embed[level](hidden_states[level])
+            delta_bbox = self.bbox_embed[level](hidden_states[level])
+            if reference.shape[-1] == 4:
+                outputs_coord_logits = delta_bbox + reference
+            elif reference.shape[-1] == 2:
+                delta_bbox[..., :2] += reference
+                outputs_coord_logits = delta_bbox


전 결과 값의 역시그모이드 + dx의 sigmoid 별거 아니고 0~1 값으로 해주려고 하는 짓임

long8v · 2022-09-23T08:00:03Z

deformable_detr/modeling_deformable_detr.py

+
+
+# Copied from transformers.models.detr.modeling_detr.DetrHungarianMatcher
+class DeformableDetrHungarianMatcher(nn.Module):


hungarian matcher는 같습니당~

long8v

feature extractor 부분 리뷰

long8v · 2022-09-27T03:39:57Z