region_detector

allennlp.modules.vision.region_detector

RegionDetectorOutput¶

class RegionDetectorOutput(NamedTuple)

The output type from the forward pass of a RegionDetector.

features¶

class RegionDetectorOutput(NamedTuple):
 | ...
 | features: List[Tensor] = None

A list of tensors, each with shape (num_boxes, feature_dim).

boxes¶

class RegionDetectorOutput(NamedTuple):
 | ...
 | boxes: List[Tensor] = None

A list of tensors containing the coordinates for each box. Each has shape (num_boxes, 4).

class_probs¶

class RegionDetectorOutput(NamedTuple):
 | ...
 | class_probs: Optional[List[Tensor]] = None

An optional list of tensors. These tensors can have shape (num_boxes,) or (num_boxes, *) if probabilities for multiple classes are given.

class_labels¶

class RegionDetectorOutput(NamedTuple):
 | ...
 | class_labels: Optional[List[Tensor]] = None

An optional list of tensors that give the labels corresponding to the class_probs tensors. This should be non-None whenever class_probs is, and each tensor should have the same shape as the corresponding tensor from class_probs.

RegionDetector¶

class RegionDetector(nn.Module,  Registrable)

A RegionDetector takes a batch of images, their sizes, and an ordered dictionary of image features as input, and finds regions of interest (or "boxes") within those images.

Those regions of interest are described by three values:

features (List[Tensor]): A feature vector for each region, which is a tensor of shape (num_boxes, feature_dim).
boxes (List[Tensor]): The coordinates of each region within the original image, with shape (num_boxes, 4).
class_probs (Optional[List[Tensor]]): Class probabilities from some object detector that was used to find the regions of interest, with shape (num_boxes,) or (num_boxes, *) if probabilities for more than one class are given.
class_labels (Optional[List[Tensor]]): The labels corresponding to class_probs. Each tensor in this list has the same shape as the corresponding tensor in class_probs.

forward¶

class RegionDetector(nn.Module,  Registrable):
 | ...
 | def forward(
 |     self,
 |     images: FloatTensor,
 |     sizes: IntTensor,
 |     image_features: "OrderedDict[str, FloatTensor]"
 | ) -> RegionDetectorOutput

RandomRegionDetector¶

@RegionDetector.register("random")
class RandomRegionDetector(RegionDetector):
 | def __init__(self, seed: Optional[int] = None)

A RegionDetector that returns two proposals per image, for testing purposes. The features for the proposal are a random 10-dimensional vector, and the coordinates are the size of the image.

forward¶

class RandomRegionDetector(RegionDetector):
 | ...
 | def forward(
 |     self,
 |     images: FloatTensor,
 |     sizes: IntTensor,
 |     image_features: "OrderedDict[str, FloatTensor]"
 | ) -> RegionDetectorOutput

FasterRcnnRegionDetector¶

@RegionDetector.register("faster_rcnn")
class FasterRcnnRegionDetector(RegionDetector):
 | def __init__(
 |     self,
 |     *, box_score_thresh: float = 0.05,
 |     *, box_nms_thresh: float = 0.5,
 |     *, max_boxes_per_image: int = 100
 | )

A Faster R-CNN pretrained region detector.

Unless you really know what you're doing, this should be used with the image features created from the ResnetBackbone GridEmbedder and on images loaded using the TorchImageLoader with the default settings.

Note

This module does not have any trainable parameters by default. All pretrained weights are frozen.

Parameters¶

box_score_thresh : float, optional (default = 0.05)
During inference, only proposal boxes / regions with a label classification score greater than box_score_thresh will be returned.
box_nms_thresh : float, optional (default = 0.5)
During inference, non-maximum suppression (NMS) will applied to groups of boxes that share a common label.

NMS iteratively removes lower scoring boxes which have an intersection-over-union (IoU) greater than box_nms_thresh with another higher scoring box.
max_boxes_per_image : int, optional (default = 100)
During inference, at most max_boxes_per_image boxes will be returned. The number of boxes returned will vary by image and will often be lower than max_boxes_per_image depending on the values of box_score_thresh and box_nms_thresh.

forward¶

class FasterRcnnRegionDetector(RegionDetector):
 | ...
 | def forward(
 |     self,
 |     images: FloatTensor,
 |     sizes: IntTensor,
 |     image_features: "OrderedDict[str, FloatTensor]"
 | ) -> RegionDetectorOutput

Extract regions and region features from the given images.

In most cases image_features should come directly from the ResnetBackbone GridEmbedder. The images themselves should be standardized and resized using the default settings for the TorchImageLoader.