Conversation
long8v
commented
Nov 21, 2022
- code : https://github.com/fundamentalvision/Deformable-DETR.git
- huggingface에 구현이 틀린 부분이 많아 오리지널 레포 다시 읽기
| # ------------------------------------------------------------------------ | ||
| # Deformable DETR | ||
| # Copyright (c) 2020 SenseTime. All Rights Reserved. | ||
| # Licensed under the Apache License, Version 2.0 [see LICENSE for details] | ||
| # ------------------------------------------------------------------------ | ||
| # Modified from DETR (https://github.com/facebookresearch/detr) | ||
| # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved | ||
| # ------------------------------------------------------------------------ |
| parser.add_argument('--lr_linear_proj_names', default=['reference_points', 'sampling_offsets'], type=str, nargs='+') | ||
| parser.add_argument('--lr_linear_proj_mult', default=0.1, type=float) |
There was a problem hiding this comment.
detail : projection 하는 부분은 lr * 1/10 해줌
| parser.add_argument('--clip_max_norm', default=0.1, type=float, | ||
| help='gradient clipping max norm') |
There was a problem hiding this comment.
gradient clipping이 있었넹? 논문에선 못본 것 같은데
| # Variants of Deformable DETR | ||
| parser.add_argument('--with_box_refine', default=False, action='store_true') | ||
| parser.add_argument('--two_stage', default=False, action='store_true') |
There was a problem hiding this comment.
bbox refinement / two stage
store_true : 추가 옵션을 받지 않고 단지 옵션의 유/무만 필요한 경우 action="store_true"를 사용합니다.
| parser.add_argument('--dilation', action='store_true', | ||
| help="If true, we replace stride with dilation in the last convolutional block (DC5)") |
| class DeformableTransformerDecoder(nn.Module): | ||
| def __init__(self, decoder_layer, num_layers, return_intermediate=False): | ||
| super().__init__() | ||
| self.layers = _get_clones(decoder_layer, num_layers) | ||
| self.num_layers = num_layers | ||
| self.return_intermediate = return_intermediate |
| # hack implementation for iterative bounding box refinement and two-stage Deformable DETR | ||
| self.bbox_embed = None | ||
| self.class_embed = None |
There was a problem hiding this comment.
왜 Hack이라는지 알 것 같기도.. bbox_embed, class_embed는 밖에서 정의된건데 그걸 가지고 와서 안에서 처리하는 식으로 되어있어서
| def forward(self, tgt, reference_points, src, src_spatial_shapes, src_level_start_index, src_valid_ratios, | ||
| query_pos=None, src_padding_mask=None): | ||
| output = tgt | ||
|
|
||
| intermediate = [] | ||
| intermediate_reference_points = [] | ||
| for lid, layer in enumerate(self.layers): | ||
| if reference_points.shape[-1] == 4: | ||
| reference_points_input = reference_points[:, :, None] \ | ||
| * torch.cat([src_valid_ratios, src_valid_ratios], -1)[:, None] | ||
| else: | ||
| assert reference_points.shape[-1] == 2 | ||
| reference_points_input = reference_points[:, :, None] * src_valid_ratios[:, None] | ||
| output = layer(output, query_pos, reference_points_input, src, src_spatial_shapes, src_level_start_index, src_padding_mask) |
There was a problem hiding this comment.
reference point들 받고 마스킹같은거 처리하고 DecoderLayer에 통과
| # hack implementation for iterative bounding box refinement | ||
| if self.bbox_embed is not None: | ||
| tmp = self.bbox_embed[lid](output) | ||
| if reference_points.shape[-1] == 4: | ||
| new_reference_points = tmp + inverse_sigmoid(reference_points) | ||
| new_reference_points = new_reference_points.sigmoid() | ||
| else: | ||
| assert reference_points.shape[-1] == 2 | ||
| new_reference_points = tmp | ||
| new_reference_points[..., :2] = tmp[..., :2] + inverse_sigmoid(reference_points) | ||
| new_reference_points = new_reference_points.sigmoid() | ||
| reference_points = new_reference_points.detach() |
There was a problem hiding this comment.
bbox_embed가 주어지면 기존 DecoderLayer 통과한 Output을 가지고 bounding box를 예측하고 이걸 기반으로 reference point를 조금 수정함
| if self.return_intermediate: | ||
| return torch.stack(intermediate), torch.stack(intermediate_reference_points) | ||
|
|
||
| return output, reference_points |
There was a problem hiding this comment.
DecoderLayer output과 reference points Return. two-stage, refinement 없으면 reference point는 첫 레이어나 마지막 레이어나 바뀌지 않음