Operations
This module contains optimized deep learning related operations used in the Ultralytics YOLO framework
Non-max suppression
Perform non-maximum suppression (NMS) on a set of boxes, with support for masks and multiple labels per box.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
prediction |
torch.Tensor
|
A tensor of shape (batch_size, num_boxes, num_classes + 4 + num_masks) containing the predicted boxes, classes, and masks. The tensor should be in the format output by a model, such as YOLO. |
required |
conf_thres |
float
|
The confidence threshold below which boxes will be filtered out. Valid values are between 0.0 and 1.0. |
0.25
|
iou_thres |
float
|
The IoU threshold below which boxes will be filtered out during NMS. Valid values are between 0.0 and 1.0. |
0.45
|
classes |
List[int]
|
A list of class indices to consider. If None, all classes will be considered. |
None
|
agnostic |
bool
|
If True, the model is agnostic to the number of classes, and all classes will be considered as one. |
False
|
multi_label |
bool
|
If True, each box may have multiple labels. |
False
|
labels |
List[List[Union[int, float, torch.Tensor]]]
|
A list of lists, where each inner list contains the apriori labels for a given image. The list should be in the format output by a dataloader, with each label being a tuple of (class_index, x1, y1, x2, y2). |
()
|
max_det |
int
|
The maximum number of boxes to keep after NMS. |
300
|
nm |
int
|
The number of masks output by the model. |
0
|
Returns:
Type | Description |
---|---|
List[torch.Tensor]
|
A list of length batch_size, where each element is a tensor of shape (num_boxes, 6 + num_masks) containing the kept boxes, with columns (x1, y1, x2, y2, confidence, class, mask1, mask2, ...). |
Source code in ultralytics/yolo/utils/ops.py
132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 |
|
handler: python options: show_source: false show_root_toc_entry: false
Scale boxes
Rescales bounding boxes (in the format of xyxy) from the shape of the image they were originally specified in (img1_shape) to the shape of a different image (img0_shape).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
img1_shape |
tuple
|
The shape of the image that the bounding boxes are for, in the format of (height, width). |
required |
boxes |
torch.Tensor
|
the bounding boxes of the objects in the image, in the format of (x1, y1, x2, y2) |
required |
img0_shape |
tuple
|
the shape of the target image, in the format of (height, width). |
required |
ratio_pad |
tuple
|
a tuple of (ratio, pad) for scaling the boxes. If not provided, the ratio and pad will be calculated based on the size difference between the two images. |
None
|
Returns:
Name | Type | Description |
---|---|---|
boxes |
torch.Tensor
|
The scaled bounding boxes, in the format of (x1, y1, x2, y2) |
Source code in ultralytics/yolo/utils/ops.py
handler: python options: show_source: false show_root_toc_entry: false
Scale image
Takes a mask, and resizes it to the original image size
Parameters:
Name | Type | Description | Default |
---|---|---|---|
im1_shape |
tuple
|
model input shape, [h, w] |
required |
masks |
torch.Tensor
|
[h, w, num] |
required |
im0_shape |
tuple
|
the original image shape |
required |
ratio_pad |
tuple
|
the ratio of the padding to the original image. |
None
|
Returns:
Name | Type | Description |
---|---|---|
masks |
torch.Tensor
|
The masks that are being returned. |
Source code in ultralytics/yolo/utils/ops.py
handler: python options: show_source: false show_root_toc_entry: false
clip boxes
It takes a list of bounding boxes and a shape (height, width) and clips the bounding boxes to the shape
Parameters:
Name | Type | Description | Default |
---|---|---|---|
boxes |
torch.Tensor
|
the bounding boxes to clip |
required |
shape |
tuple
|
the shape of the image |
required |
Source code in ultralytics/yolo/utils/ops.py
handler: python options: show_source: false show_root_toc_entry: false
Box Format Conversion
xyxy2xywh
Convert bounding box coordinates from (x1, y1, x2, y2) format to (x, y, width, height) format.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
np.ndarray) or (torch.Tensor
|
The input bounding box coordinates in (x1, y1, x2, y2) format. |
required |
Returns:
Name | Type | Description |
---|---|---|
y |
np.ndarray) or (torch.Tensor
|
The bounding box coordinates in (x, y, width, height) format. |
Source code in ultralytics/yolo/utils/ops.py
handler: python options: show_source: false show_root_toc_entry: false
xywh2xyxy
Convert bounding box coordinates from (x, y, width, height) format to (x1, y1, x2, y2) format where (x1, y1) is the top-left corner and (x2, y2) is the bottom-right corner.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
np.ndarray) or (torch.Tensor
|
The input bounding box coordinates in (x, y, width, height) format. |
required |
Returns:
Name | Type | Description |
---|---|---|
y |
np.ndarray) or (torch.Tensor
|
The bounding box coordinates in (x1, y1, x2, y2) format. |
Source code in ultralytics/yolo/utils/ops.py
handler: python options: show_source: false show_root_toc_entry: false
xywhn2xyxy
Convert normalized bounding box coordinates to pixel coordinates.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
np.ndarray) or (torch.Tensor
|
The bounding box coordinates. |
required |
w |
int
|
Width of the image. Defaults to 640 |
640
|
h |
int
|
Height of the image. Defaults to 640 |
640
|
padw |
int
|
Padding width. Defaults to 0 |
0
|
padh |
int
|
Padding height. Defaults to 0 |
0
|
Returns:
Name | Type | Description |
---|---|---|
y |
np.ndarray) or (torch.Tensor
|
The coordinates of the bounding box in the format [x1, y1, x2, y2] where x1,y1 is the top-left corner, x2,y2 is the bottom-right corner of the bounding box. |
Source code in ultralytics/yolo/utils/ops.py
handler: python options: show_source: false show_root_toc_entry: false
xyxy2xywhn
Convert bounding box coordinates from (x1, y1, x2, y2) format to (x, y, width, height, normalized) format. x, y, width and height are normalized to image dimensions
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
np.ndarray) or (torch.Tensor
|
The input bounding box coordinates in (x1, y1, x2, y2) format. |
required |
w |
int
|
The width of the image. Defaults to 640 |
640
|
h |
int
|
The height of the image. Defaults to 640 |
640
|
clip |
bool
|
If True, the boxes will be clipped to the image boundaries. Defaults to False |
False
|
eps |
float
|
The minimum value of the box's width and height. Defaults to 0.0 |
0.0
|
Returns:
Name | Type | Description |
---|---|---|
y |
np.ndarray) or (torch.Tensor
|
The bounding box coordinates in (x, y, width, height, normalized) format |
Source code in ultralytics/yolo/utils/ops.py
handler: python options: show_source: false show_root_toc_entry: false
xyn2xy
Convert normalized coordinates to pixel coordinates of shape (n,2)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
np.ndarray) or (torch.Tensor
|
The input tensor of normalized bounding box coordinates |
required |
w |
int
|
The width of the image. Defaults to 640 |
640
|
h |
int
|
The height of the image. Defaults to 640 |
640
|
padw |
int
|
The width of the padding. Defaults to 0 |
0
|
padh |
int
|
The height of the padding. Defaults to 0 |
0
|
Returns:
Name | Type | Description |
---|---|---|
y |
np.ndarray) or (torch.Tensor
|
The x and y coordinates of the top left corner of the bounding box |
Source code in ultralytics/yolo/utils/ops.py
handler: python options: show_source: false show_root_toc_entry: false
xywh2ltwh
Convert the bounding box format from [x, y, w, h] to [x1, y1, w, h], where x1, y1 are the top-left coordinates.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
np.ndarray) or (torch.Tensor
|
The input tensor with the bounding box coordinates in the xywh format |
required |
Returns:
Name | Type | Description |
---|---|---|
y |
np.ndarray) or (torch.Tensor
|
The bounding box coordinates in the xyltwh format |
Source code in ultralytics/yolo/utils/ops.py
handler: python options: show_source: false show_root_toc_entry: false
xyxy2ltwh
Convert nx4 bounding boxes from [x1, y1, x2, y2] to [x1, y1, w, h], where xy1=top-left, xy2=bottom-right
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
np.ndarray) or (torch.Tensor
|
The input tensor with the bounding boxes coordinates in the xyxy format |
required |
Returns:
Name | Type | Description |
---|---|---|
y |
np.ndarray) or (torch.Tensor
|
The bounding box coordinates in the xyltwh format. |
Source code in ultralytics/yolo/utils/ops.py
handler: python options: show_source: false show_root_toc_entry: false
ltwh2xywh
Convert nx4 boxes from [x1, y1, w, h] to [x, y, w, h] where xy1=top-left, xy=center
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
torch.Tensor
|
the input tensor |
required |
Source code in ultralytics/yolo/utils/ops.py
handler: python options: show_source: false show_root_toc_entry: false
ltwh2xyxy
It converts the bounding box from [x1, y1, w, h] to [x1, y1, x2, y2] where xy1=top-left, xy2=bottom-right
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
np.ndarray) or (torch.Tensor
|
the input image |
required |
Returns:
Name | Type | Description |
---|---|---|
y |
np.ndarray) or (torch.Tensor
|
the xyxy coordinates of the bounding boxes. |
Source code in ultralytics/yolo/utils/ops.py
handler: python options: show_source: false show_root_toc_entry: false
segment2box
Convert 1 segment label to 1 box label, applying inside-image constraint, i.e. (xy1, xy2, ...) to (xyxy)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
segment |
torch.Tensor
|
the segment label |
required |
width |
int
|
the width of the image. Defaults to 640 |
640
|
height |
int
|
The height of the image. Defaults to 640 |
640
|
Returns:
Type | Description |
---|---|
np.ndarray
|
the minimum and maximum x and y values of the segment. |
Source code in ultralytics/yolo/utils/ops.py
handler: python options: show_source: false show_root_toc_entry: false
Mask Operations
resample_segments
Inputs a list of segments (n,2) and returns a list of segments (n,2) up-sampled to n points each.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
segments |
list
|
a list of (n,2) arrays, where n is the number of points in the segment. |
required |
n |
int
|
number of points to resample the segment to. Defaults to 1000 |
1000
|
Returns:
Name | Type | Description |
---|---|---|
segments |
list
|
the resampled segments. |
Source code in ultralytics/yolo/utils/ops.py
handler: python options: show_source: false show_root_toc_entry: false
crop_mask
It takes a mask and a bounding box, and returns a mask that is cropped to the bounding box
Parameters:
Name | Type | Description | Default |
---|---|---|---|
masks |
torch.Tensor
|
[h, w, n] tensor of masks |
required |
boxes |
torch.Tensor
|
[n, 4] tensor of bbox coordinates in relative point form |
required |
Returns:
Type | Description |
---|---|
torch.Tensor
|
The masks are being cropped to the bounding box. |
Source code in ultralytics/yolo/utils/ops.py
handler: python options: show_source: false show_root_toc_entry: false
process_mask_upsample
It takes the output of the mask head, and applies the mask to the bounding boxes. This produces masks of higher quality but is slower.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
protos |
torch.Tensor
|
[mask_dim, mask_h, mask_w] |
required |
masks_in |
torch.Tensor
|
[n, mask_dim], n is number of masks after nms |
required |
bboxes |
torch.Tensor
|
[n, 4], n is number of masks after nms |
required |
shape |
tuple
|
the size of the input image (h,w) |
required |
Returns:
Type | Description |
---|---|
torch.Tensor
|
The upsampled masks. |
Source code in ultralytics/yolo/utils/ops.py
handler: python options: show_source: false show_root_toc_entry: false
process_mask
It takes the output of the mask head, and applies the mask to the bounding boxes. This is faster but produces downsampled quality of mask
Parameters:
Name | Type | Description | Default |
---|---|---|---|
protos |
torch.Tensor
|
[mask_dim, mask_h, mask_w] |
required |
masks_in |
torch.Tensor
|
[n, mask_dim], n is number of masks after nms |
required |
bboxes |
torch.Tensor
|
[n, 4], n is number of masks after nms |
required |
shape |
tuple
|
the size of the input image (h,w) |
required |
Returns:
Type | Description |
---|---|
torch.Tensor
|
The processed masks. |
Source code in ultralytics/yolo/utils/ops.py
handler: python options: show_source: false show_root_toc_entry: false
process_mask_native
It takes the output of the mask head, and crops it after upsampling to the bounding boxes.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
protos |
torch.Tensor
|
[mask_dim, mask_h, mask_w] |
required |
masks_in |
torch.Tensor
|
[n, mask_dim], n is number of masks after nms |
required |
bboxes |
torch.Tensor
|
[n, 4], n is number of masks after nms |
required |
shape |
tuple
|
the size of the input image (h,w) |
required |
Returns:
Name | Type | Description |
---|---|---|
masks |
torch.Tensor
|
The returned masks with dimensions [h, w, n] |
Source code in ultralytics/yolo/utils/ops.py
handler: python options: show_source: false show_root_toc_entry: false
scale_segments
Rescale segment coordinates (xyxy) from img1_shape to img0_shape
Parameters:
Name | Type | Description | Default |
---|---|---|---|
img1_shape |
tuple
|
The shape of the image that the segments are from. |
required |
segments |
torch.Tensor
|
the segments to be scaled |
required |
img0_shape |
tuple
|
the shape of the image that the segmentation is being applied to |
required |
ratio_pad |
tuple
|
the ratio of the image size to the padded image size. |
None
|
normalize |
bool
|
If True, the coordinates will be normalized to the range [0, 1]. Defaults to False |
False
|
Returns:
Name | Type | Description |
---|---|---|
segments |
torch.Tensor
|
the segmented image. |
Source code in ultralytics/yolo/utils/ops.py
handler: python options: show_source: false show_root_toc_entry: false
masks2segments
It takes a list of masks(n,h,w) and returns a list of segments(n,xy)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
masks |
torch.Tensor
|
the output of the model, which is a tensor of shape (batch_size, 160, 160) |
required |
strategy |
str
|
'concat' or 'largest'. Defaults to largest |
'largest'
|
Returns:
Name | Type | Description |
---|---|---|
segments |
List
|
list of segment masks |
Source code in ultralytics/yolo/utils/ops.py
handler: python options: show_source: false show_root_toc_entry: false
clip_segments
It takes a list of line segments (x1,y1,x2,y2) and clips them to the image shape (height, width)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
segments |
list
|
a list of segments, each segment is a list of points, each point is a list of x,y |
required |
coordinates shape (tuple): the shape of the image
Source code in ultralytics/yolo/utils/ops.py
handler: python options: show_source: false show_root_toc_entry: false