|
|
|
@ -14197,43 +14197,85 @@ def deformable_roi_pooling(input,
|
|
|
|
|
position_sensitive=False,
|
|
|
|
|
name=None):
|
|
|
|
|
"""
|
|
|
|
|
Deformable PSROI Pooling Layer
|
|
|
|
|
Deformable ROI Pooling Layer
|
|
|
|
|
|
|
|
|
|
Performs deformable region-of-interest pooling on inputs. As described
|
|
|
|
|
in `Deformable Convolutional Networks <https://arxiv.org/abs/1703.06211>`_, it will get offset for each bin after
|
|
|
|
|
roi pooling so that pooling at correct region. Batch_size will change to the number of region bounding boxes after deformable_roi_pooling.
|
|
|
|
|
|
|
|
|
|
The operation has three steps:
|
|
|
|
|
|
|
|
|
|
Args:
|
|
|
|
|
input (Variable):The input of Deformable PSROIPooling.The shape of input tensor is
|
|
|
|
|
[N,C,H,W]. Where N is batch size,C is number of input channels,H
|
|
|
|
|
is height of the feature, and W is the width of the feature.
|
|
|
|
|
rois (Variable): ROIs (Regions of Interest) to pool over.It should be
|
|
|
|
|
a 2-D LoDTensor of shape (num_rois, 4), the lod level
|
|
|
|
|
is 1. Given as [[x1, y1, x2, y2], ...], (x1, y1) is
|
|
|
|
|
the top left coordinates, and (x2, y2) is the bottom
|
|
|
|
|
right coordinates.
|
|
|
|
|
trans (Variable): Offset of features on ROIs while pooling.The format is NCHW, where
|
|
|
|
|
N is number of ROIs, C is number of channels, which indicate the offset distance
|
|
|
|
|
in the x and y directions, H is pooled height, and W is pooled width.
|
|
|
|
|
no_trans (bool): Whether to add offset to get new value or not while roi pooling, which
|
|
|
|
|
value is True or False. Default: False.
|
|
|
|
|
spatial_scale (float): Ratio of input feature map height (or width) to raw image height (or width).
|
|
|
|
|
Equals the reciprocal of total stride in convolutional layers, Default: 1.0.
|
|
|
|
|
group_size (list|tuple): The number of groups which input channels are divided.(eg.number of input channels
|
|
|
|
|
is k1*k2*(C+1), which k1 and k2 are group width and height and C+1 is number of output
|
|
|
|
|
chanels. eg.(4, 6), which 4 is height of group and 6 is width of group. Default: [1, 1].
|
|
|
|
|
pooled_height (integer): The pooled output height. Default: 1.
|
|
|
|
|
pooled_width (integer): The pooled output width. Default: 1.
|
|
|
|
|
part_size (list|tuple): The height and width of offset, eg.(4, 6), which height is 4 and width is 6, Default:
|
|
|
|
|
if None, default value is [pooled_height, pooled_width].
|
|
|
|
|
sample_per_part (integer): The number of samples in each bin. Default: 1.
|
|
|
|
|
trans_std (float): Coefficient of offset. Default: 0.1.
|
|
|
|
|
position_sensitive (bool): Whether to choose deformable psroi pooling mode or not. Default: False.
|
|
|
|
|
name (str): Name of layer. Default: None.
|
|
|
|
|
Returns:
|
|
|
|
|
Variable: The tensor variable storing the deformable psroi pooling \
|
|
|
|
|
result.
|
|
|
|
|
|
|
|
|
|
1. Dividing each region proposal into equal-sized sections with the pooled_width and pooled_height.
|
|
|
|
|
|
|
|
|
|
2. Add offset to pixel in ROI to get new location and the new value which are computed directly through
|
|
|
|
|
bilinear interpolation with four nearest pixel.
|
|
|
|
|
|
|
|
|
|
3. Sample several points in each bin to get average values as output.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Args:
|
|
|
|
|
input (Variable):The input of deformable roi pooling and it is tensor which value type is float32. The shape of input is
|
|
|
|
|
[N, C, H, W]. Where N is batch size, C is number of input channels,
|
|
|
|
|
H is height of the feature, and W is the width of the feature.
|
|
|
|
|
rois (Variable): ROIs (Regions of Interest) with type float32 to pool over. It should be
|
|
|
|
|
a 2-D LoDTensor of shape (num_rois, 4), and the lod level
|
|
|
|
|
is 1. Given as [[x1, y1, x2, y2], ...], (x1, y1) is
|
|
|
|
|
the top left coordinates, and (x2, y2) is the bottom
|
|
|
|
|
right coordinates, which value type is float32.
|
|
|
|
|
trans (Variable): Offset of features on ROIs while pooling which value type is float32. The format is [N, C, H, W], where
|
|
|
|
|
N is number of ROIs, C is number of channels, which indicate the offset distance
|
|
|
|
|
in the x and y directions, H is pooled height, and W is pooled width.
|
|
|
|
|
no_trans (bool): Whether to add offset to get new value or not while roi pooling, which value with type bool is True or False.
|
|
|
|
|
If value is True, no offset will be added in operation. Default: False.
|
|
|
|
|
spatial_scale (float): Ratio of input feature map height (or width) to raw image height (or width), which value type is float32.
|
|
|
|
|
Equals the reciprocal of total stride in convolutional layers, Default: 1.0.
|
|
|
|
|
group_size (list|tuple): The number of groups which input channels are divided and the input is list or tuple, which value type is int32. (eg.number of input channels
|
|
|
|
|
is k1 * k2 * (C + 1), which k1 and k2 are group width and height and C+1 is number of output
|
|
|
|
|
chanels.) eg.(4, 6), which 4 is height of group and 6 is width of group. Default: [1, 1].
|
|
|
|
|
pooled_height (int): The pooled output height which value type is int32. Default: 1.
|
|
|
|
|
pooled_width (int): The pooled output width which value type is int32. Default: 1.
|
|
|
|
|
part_size (list|tuple): The height and width of offset which values in list or tuple is int32, eg.(4, 6), which height is 4 and width is 6, and values always equal to pooled_height \
|
|
|
|
|
and pooled_width. Default: if None, default value is [pooled_height, pooled_width].
|
|
|
|
|
sample_per_part (int): The number of samples in each bin which value type is int32. If value is bigger, it will consume more performance. Default: 1.
|
|
|
|
|
trans_std (float): Coefficient of offset which value type is float32. It controls weight of offset. Default: 0.1.
|
|
|
|
|
position_sensitive (bool): Whether to choose deformable psroi pooling mode or not, and value type is bool(True or False). If value is False, input dimension equals to output dimension. \
|
|
|
|
|
If value is True, input dimension shoule be output dimension * pooled_height * pooled_width. Default: False.
|
|
|
|
|
name (str|None): Name of layer. Default: None.
|
|
|
|
|
Returns:
|
|
|
|
|
Variable: Output of deformable roi pooling is that, if position sensitive is False, input dimension equals to output dimension. If position sensitive is True,\
|
|
|
|
|
input dimension should be the result of output dimension divided by pooled height and pooled width.
|
|
|
|
|
|
|
|
|
|
Examples:
|
|
|
|
|
.. code-block:: python
|
|
|
|
|
|
|
|
|
|
# position_sensitive=True
|
|
|
|
|
import paddle.fluid as fluid
|
|
|
|
|
input = fluid.layers.data(name="input",
|
|
|
|
|
shape=[2, 192, 64, 64],
|
|
|
|
|
dtype='float32',
|
|
|
|
|
append_batch_size=False)
|
|
|
|
|
rois = fluid.layers.data(name="rois",
|
|
|
|
|
shape=[4],
|
|
|
|
|
dtype='float32',
|
|
|
|
|
lod_level=1)
|
|
|
|
|
trans = fluid.layers.data(name="trans",
|
|
|
|
|
shape=[2, 384, 64, 64],
|
|
|
|
|
dtype='float32',
|
|
|
|
|
append_batch_size=False)
|
|
|
|
|
x = fluid.layers.nn.deformable_roi_pooling(input=input,
|
|
|
|
|
rois=rois,
|
|
|
|
|
trans=trans,
|
|
|
|
|
no_trans=False,
|
|
|
|
|
spatial_scale=1.0,
|
|
|
|
|
group_size=(1, 1),
|
|
|
|
|
pooled_height=8,
|
|
|
|
|
pooled_width=8,
|
|
|
|
|
part_size=(8, 8),
|
|
|
|
|
sample_per_part=4,
|
|
|
|
|
trans_std=0.1,
|
|
|
|
|
position_sensitive=True)
|
|
|
|
|
|
|
|
|
|
# position_sensitive=False
|
|
|
|
|
import paddle.fluid as fluid
|
|
|
|
|
input = fluid.layers.data(name="input",
|
|
|
|
|
shape=[2, 192, 64, 64],
|
|
|
|
|