Operations

This module contains optimized deep learning related operations used in the Ultralytics YOLO framework

Non-max suppression

Perform non-maximum suppression (NMS) on a set of boxes, with support for masks and multiple labels per box.

Parameters:

Name	Type	Description	Default
`prediction`	`torch.Tensor`	A tensor of shape (batch_size, num_boxes, num_classes + 4 + num_masks) containing the predicted boxes, classes, and masks. The tensor should be in the format output by a model, such as YOLO.	required
`conf_thres`	`float`	The confidence threshold below which boxes will be filtered out. Valid values are between 0.0 and 1.0.	`0.25`
`iou_thres`	`float`	The IoU threshold below which boxes will be filtered out during NMS. Valid values are between 0.0 and 1.0.	`0.45`
`classes`	`List[int]`	A list of class indices to consider. If None, all classes will be considered.	`None`
`agnostic`	`bool`	If True, the model is agnostic to the number of classes, and all classes will be considered as one.	`False`
`multi_label`	`bool`	If True, each box may have multiple labels.	`False`
`labels`	`List[List[Union[int, float, torch.Tensor]]]`	A list of lists, where each inner list contains the apriori labels for a given image. The list should be in the format output by a dataloader, with each label being a tuple of (class_index, x1, y1, x2, y2).	`()`
`max_det`	`int`	The maximum number of boxes to keep after NMS.	`300`
`nm`	`int`	The number of masks output by the model.	`0`

Returns:

Type	Description
`List[torch.Tensor]`	A list of length batch_size, where each element is a tensor of shape (num_boxes, 6 + num_masks) containing the kept boxes, with columns (x1, y1, x2, y2, confidence, class, mask1, mask2, ...).

Source code in ultralytics/yolo/utils/ops.py

def non_max_suppression(
        prediction,
        conf_thres=0.25,
        iou_thres=0.45,
        classes=None,
        agnostic=False,
        multi_label=False,
        labels=(),
        max_det=300,
        nm=0,  # number of masks
):
    """
    Perform non-maximum suppression (NMS) on a set of boxes, with support for masks and multiple labels per box.

    Arguments:
        prediction (torch.Tensor): A tensor of shape (batch_size, num_boxes, num_classes + 4 + num_masks)
            containing the predicted boxes, classes, and masks. The tensor should be in the format
            output by a model, such as YOLO.
        conf_thres (float): The confidence threshold below which boxes will be filtered out.
            Valid values are between 0.0 and 1.0.
        iou_thres (float): The IoU threshold below which boxes will be filtered out during NMS.
            Valid values are between 0.0 and 1.0.
        classes (List[int]): A list of class indices to consider. If None, all classes will be considered.
        agnostic (bool): If True, the model is agnostic to the number of classes, and all
            classes will be considered as one.
        multi_label (bool): If True, each box may have multiple labels.
        labels (List[List[Union[int, float, torch.Tensor]]]): A list of lists, where each inner
            list contains the apriori labels for a given image. The list should be in the format
            output by a dataloader, with each label being a tuple of (class_index, x1, y1, x2, y2).
        max_det (int): The maximum number of boxes to keep after NMS.
        nm (int): The number of masks output by the model.

    Returns:
        (List[torch.Tensor]): A list of length batch_size, where each element is a tensor of
            shape (num_boxes, 6 + num_masks) containing the kept boxes, with columns
            (x1, y1, x2, y2, confidence, class, mask1, mask2, ...).
    """

    # Checks
    assert 0 <= conf_thres <= 1, f'Invalid Confidence threshold {conf_thres}, valid values are between 0.0 and 1.0'
    assert 0 <= iou_thres <= 1, f'Invalid IoU {iou_thres}, valid values are between 0.0 and 1.0'
    if isinstance(prediction, (list, tuple)):  # YOLOv8 model in validation model, output = (inference_out, loss_out)
        prediction = prediction[0]  # select only inference output

    device = prediction.device
    mps = 'mps' in device.type  # Apple MPS
    if mps:  # MPS not fully supported yet, convert tensors to CPU before NMS
        prediction = prediction.cpu()
    bs = prediction.shape[0]  # batch size
    nc = prediction.shape[1] - nm - 4  # number of classes
    mi = 4 + nc  # mask start index
    xc = prediction[:, 4:mi].amax(1) > conf_thres  # candidates

    # Settings
    # min_wh = 2  # (pixels) minimum box width and height
    max_wh = 7680  # (pixels) maximum box width and height
    max_nms = 30000  # maximum number of boxes into torchvision.ops.nms()
    time_limit = 0.5 + 0.05 * bs  # seconds to quit after
    redundant = True  # require redundant detections
    multi_label &= nc > 1  # multiple labels per box (adds 0.5ms/img)
    merge = False  # use merge-NMS

    t = time.time()
    output = [torch.zeros((0, 6 + nm), device=prediction.device)] * bs
    for xi, x in enumerate(prediction):  # image index, image inference
        # Apply constraints
        # x[((x[:, 2:4] < min_wh) | (x[:, 2:4] > max_wh)).any(1), 4] = 0  # width-height
        x = x.transpose(0, -1)[xc[xi]]  # confidence

        # Cat apriori labels if autolabelling
        if labels and len(labels[xi]):
            lb = labels[xi]
            v = torch.zeros((len(lb), nc + nm + 5), device=x.device)
            v[:, :4] = lb[:, 1:5]  # box
            v[range(len(lb)), lb[:, 0].long() + 4] = 1.0  # cls
            x = torch.cat((x, v), 0)

        # If none remain process next image
        if not x.shape[0]:
            continue

        # Detections matrix nx6 (xyxy, conf, cls)
        box, cls, mask = x.split((4, nc, nm), 1)
        box = xywh2xyxy(box)  # center_x, center_y, width, height) to (x1, y1, x2, y2)
        if multi_label:
            i, j = (cls > conf_thres).nonzero(as_tuple=False).T
            x = torch.cat((box[i], x[i, 4 + j, None], j[:, None].float(), mask[i]), 1)
        else:  # best class only
            conf, j = cls.max(1, keepdim=True)
            x = torch.cat((box, conf, j.float(), mask), 1)[conf.view(-1) > conf_thres]

        # Filter by class
        if classes is not None:
            x = x[(x[:, 5:6] == torch.tensor(classes, device=x.device)).any(1)]

        # Apply finite constraint
        # if not torch.isfinite(x).all():
        #     x = x[torch.isfinite(x).all(1)]

        # Check shape
        n = x.shape[0]  # number of boxes
        if not n:  # no boxes
            continue
        x = x[x[:, 4].argsort(descending=True)[:max_nms]]  # sort by confidence and remove excess boxes

        # Batched NMS
        c = x[:, 5:6] * (0 if agnostic else max_wh)  # classes
        boxes, scores = x[:, :4] + c, x[:, 4]  # boxes (offset by class), scores
        i = torchvision.ops.nms(boxes, scores, iou_thres)  # NMS
        i = i[:max_det]  # limit detections
        if merge and (1 < n < 3E3):  # Merge NMS (boxes merged using weighted mean)
            # update boxes as boxes(i,4) = weights(i,n) * boxes(n,4)
            iou = box_iou(boxes[i], boxes) > iou_thres  # iou matrix
            weights = iou * scores[None]  # box weights
            x[i, :4] = torch.mm(weights, x[:, :4]).float() / weights.sum(1, keepdim=True)  # merged boxes
            if redundant:
                i = i[iou.sum(1) > 1]  # require redundancy

        output[xi] = x[i]
        if mps:
            output[xi] = output[xi].to(device)
        if (time.time() - t) > time_limit:
            LOGGER.warning(f'WARNING ⚠️ NMS time limit {time_limit:.3f}s exceeded')
            break  # time limit exceeded

    return output

handler: python options: show_source: false show_root_toc_entry: false

Scale boxes

Rescales bounding boxes (in the format of xyxy) from the shape of the image they were originally specified in (img1_shape) to the shape of a different image (img0_shape).

Parameters:

Name	Type	Description	Default
`img1_shape`	`tuple`	The shape of the image that the bounding boxes are for, in the format of (height, width).	required
`boxes`	`torch.Tensor`	the bounding boxes of the objects in the image, in the format of (x1, y1, x2, y2)	required
`img0_shape`	`tuple`	the shape of the target image, in the format of (height, width).	required
`ratio_pad`	`tuple`	a tuple of (ratio, pad) for scaling the boxes. If not provided, the ratio and pad will be calculated based on the size difference between the two images.	`None`

Returns:

Name	Type	Description
`boxes`	`torch.Tensor`	The scaled bounding boxes, in the format of (x1, y1, x2, y2)

Source code in ultralytics/yolo/utils/ops.py

def scale_boxes(img1_shape, boxes, img0_shape, ratio_pad=None):
    """
    Rescales bounding boxes (in the format of xyxy) from the shape of the image they were originally specified in
    (img1_shape) to the shape of a different image (img0_shape).

    Args:
      img1_shape (tuple): The shape of the image that the bounding boxes are for, in the format of (height, width).
      boxes (torch.Tensor): the bounding boxes of the objects in the image, in the format of (x1, y1, x2, y2)
      img0_shape (tuple): the shape of the target image, in the format of (height, width).
      ratio_pad (tuple): a tuple of (ratio, pad) for scaling the boxes. If not provided, the ratio and pad will be
                         calculated based on the size difference between the two images.

    Returns:
      boxes (torch.Tensor): The scaled bounding boxes, in the format of (x1, y1, x2, y2)
    """
    if ratio_pad is None:  # calculate from img0_shape
        gain = min(img1_shape[0] / img0_shape[0], img1_shape[1] / img0_shape[1])  # gain  = old / new
        pad = (img1_shape[1] - img0_shape[1] * gain) / 2, (img1_shape[0] - img0_shape[0] * gain) / 2  # wh padding
    else:
        gain = ratio_pad[0][0]
        pad = ratio_pad[1]

    boxes[..., [0, 2]] -= pad[0]  # x padding
    boxes[..., [1, 3]] -= pad[1]  # y padding
    boxes[..., :4] /= gain
    clip_boxes(boxes, img0_shape)
    return boxes

handler: python options: show_source: false show_root_toc_entry: false

Scale image

Takes a mask, and resizes it to the original image size

Parameters:

Name	Type	Description	Default
`im1_shape`	`tuple`	model input shape, [h, w]	required
`masks`	`torch.Tensor`	[h, w, num]	required
`im0_shape`	`tuple`	the original image shape	required
`ratio_pad`	`tuple`	the ratio of the padding to the original image.	`None`

Returns:

Name	Type	Description
`masks`	`torch.Tensor`	The masks that are being returned.

Source code in ultralytics/yolo/utils/ops.py

def scale_image(im1_shape, masks, im0_shape, ratio_pad=None):
    """
    Takes a mask, and resizes it to the original image size

    Args:
      im1_shape (tuple): model input shape, [h, w]
      masks (torch.Tensor): [h, w, num]
      im0_shape (tuple): the original image shape
      ratio_pad (tuple): the ratio of the padding to the original image.

    Returns:
      masks (torch.Tensor): The masks that are being returned.
    """
    # Rescale coordinates (xyxy) from im1_shape to im0_shape
    if ratio_pad is None:  # calculate from im0_shape
        gain = min(im1_shape[0] / im0_shape[0], im1_shape[1] / im0_shape[1])  # gain  = old / new
        pad = (im1_shape[1] - im0_shape[1] * gain) / 2, (im1_shape[0] - im0_shape[0] * gain) / 2  # wh padding
    else:
        pad = ratio_pad[1]
    top, left = int(pad[1]), int(pad[0])  # y, x
    bottom, right = int(im1_shape[0] - pad[1]), int(im1_shape[1] - pad[0])

    if len(masks.shape) < 2:
        raise ValueError(f'"len of masks shape" should be 2 or 3, but got {len(masks.shape)}')
    masks = masks[top:bottom, left:right]
    # masks = masks.permute(2, 0, 1).contiguous()
    # masks = F.interpolate(masks[None], im0_shape[:2], mode='bilinear', align_corners=False)[0]
    # masks = masks.permute(1, 2, 0).contiguous()
    masks = cv2.resize(masks, (im0_shape[1], im0_shape[0]))

    if len(masks.shape) == 2:
        masks = masks[:, :, None]
    return masks

handler: python options: show_source: false show_root_toc_entry: false

clip boxes

It takes a list of bounding boxes and a shape (height, width) and clips the bounding boxes to the shape

Parameters:

Name	Type	Description	Default
`boxes`	`torch.Tensor`	the bounding boxes to clip	required
`shape`	`tuple`	the shape of the image	required

Source code in ultralytics/yolo/utils/ops.py

def clip_boxes(boxes, shape):
    """
    It takes a list of bounding boxes and a shape (height, width) and clips the bounding boxes to the
    shape

    Args:
      boxes (torch.Tensor): the bounding boxes to clip
      shape (tuple): the shape of the image
    """
    if isinstance(boxes, torch.Tensor):  # faster individually
        boxes[..., 0].clamp_(0, shape[1])  # x1
        boxes[..., 1].clamp_(0, shape[0])  # y1
        boxes[..., 2].clamp_(0, shape[1])  # x2
        boxes[..., 3].clamp_(0, shape[0])  # y2
    else:  # np.array (faster grouped)
        boxes[..., [0, 2]] = boxes[..., [0, 2]].clip(0, shape[1])  # x1, x2
        boxes[..., [1, 3]] = boxes[..., [1, 3]].clip(0, shape[0])  # y1, y2

handler: python options: show_source: false show_root_toc_entry: false

Box Format Conversion

xyxy2xywh

Convert bounding box coordinates from (x1, y1, x2, y2) format to (x, y, width, height) format.

Parameters:

Name	Type	Description	Default
`x`	`np.ndarray) or (torch.Tensor`	The input bounding box coordinates in (x1, y1, x2, y2) format.	required

Returns:

Name	Type	Description
`y`	`np.ndarray) or (torch.Tensor`	The bounding box coordinates in (x, y, width, height) format.

Source code in ultralytics/yolo/utils/ops.py

def xyxy2xywh(x):
    """
    Convert bounding box coordinates from (x1, y1, x2, y2) format to (x, y, width, height) format.

    Args:
        x (np.ndarray) or (torch.Tensor): The input bounding box coordinates in (x1, y1, x2, y2) format.
    Returns:
       y (np.ndarray) or (torch.Tensor): The bounding box coordinates in (x, y, width, height) format.
    """
    y = x.clone() if isinstance(x, torch.Tensor) else np.copy(x)
    y[..., 0] = (x[..., 0] + x[..., 2]) / 2  # x center
    y[..., 1] = (x[..., 1] + x[..., 3]) / 2  # y center
    y[..., 2] = x[..., 2] - x[..., 0]  # width
    y[..., 3] = x[..., 3] - x[..., 1]  # height
    return y

handler: python options: show_source: false show_root_toc_entry: false

xywh2xyxy

Convert bounding box coordinates from (x, y, width, height) format to (x1, y1, x2, y2) format where (x1, y1) is the top-left corner and (x2, y2) is the bottom-right corner.

Parameters:

Name	Type	Description	Default
`x`	`np.ndarray) or (torch.Tensor`	The input bounding box coordinates in (x, y, width, height) format.	required

Returns:

Name	Type	Description
`y`	`np.ndarray) or (torch.Tensor`	The bounding box coordinates in (x1, y1, x2, y2) format.

Source code in ultralytics/yolo/utils/ops.py

def xywh2xyxy(x):
    """
    Convert bounding box coordinates from (x, y, width, height) format to (x1, y1, x2, y2) format where (x1, y1) is the
    top-left corner and (x2, y2) is the bottom-right corner.

    Args:
        x (np.ndarray) or (torch.Tensor): The input bounding box coordinates in (x, y, width, height) format.
    Returns:
        y (np.ndarray) or (torch.Tensor): The bounding box coordinates in (x1, y1, x2, y2) format.
    """
    y = x.clone() if isinstance(x, torch.Tensor) else np.copy(x)
    y[..., 0] = x[..., 0] - x[..., 2] / 2  # top left x
    y[..., 1] = x[..., 1] - x[..., 3] / 2  # top left y
    y[..., 2] = x[..., 0] + x[..., 2] / 2  # bottom right x
    y[..., 3] = x[..., 1] + x[..., 3] / 2  # bottom right y
    return y

handler: python options: show_source: false show_root_toc_entry: false

xywhn2xyxy

Convert normalized bounding box coordinates to pixel coordinates.

Parameters:

Name	Type	Description	Default
`x`	`np.ndarray) or (torch.Tensor`	The bounding box coordinates.	required
`w`	`int`	Width of the image. Defaults to 640	`640`
`h`	`int`	Height of the image. Defaults to 640	`640`
`padw`	`int`	Padding width. Defaults to 0	`0`
`padh`	`int`	Padding height. Defaults to 0	`0`

Returns:

Name	Type	Description
`y`	`np.ndarray) or (torch.Tensor`	The coordinates of the bounding box in the format [x1, y1, x2, y2] where x1,y1 is the top-left corner, x2,y2 is the bottom-right corner of the bounding box.

Source code in ultralytics/yolo/utils/ops.py

def xywhn2xyxy(x, w=640, h=640, padw=0, padh=0):
    """
    Convert normalized bounding box coordinates to pixel coordinates.

    Args:
        x (np.ndarray) or (torch.Tensor): The bounding box coordinates.
        w (int): Width of the image. Defaults to 640
        h (int): Height of the image. Defaults to 640
        padw (int): Padding width. Defaults to 0
        padh (int): Padding height. Defaults to 0
    Returns:
        y (np.ndarray) or (torch.Tensor): The coordinates of the bounding box in the format [x1, y1, x2, y2] where
            x1,y1 is the top-left corner, x2,y2 is the bottom-right corner of the bounding box.
    """
    y = x.clone() if isinstance(x, torch.Tensor) else np.copy(x)
    y[..., 0] = w * (x[..., 0] - x[..., 2] / 2) + padw  # top left x
    y[..., 1] = h * (x[..., 1] - x[..., 3] / 2) + padh  # top left y
    y[..., 2] = w * (x[..., 0] + x[..., 2] / 2) + padw  # bottom right x
    y[..., 3] = h * (x[..., 1] + x[..., 3] / 2) + padh  # bottom right y
    return y

handler: python options: show_source: false show_root_toc_entry: false

xyxy2xywhn

Convert bounding box coordinates from (x1, y1, x2, y2) format to (x, y, width, height, normalized) format. x, y, width and height are normalized to image dimensions

Parameters:

Name	Type	Description	Default
`x`	`np.ndarray) or (torch.Tensor`	The input bounding box coordinates in (x1, y1, x2, y2) format.	required
`w`	`int`	The width of the image. Defaults to 640	`640`
`h`	`int`	The height of the image. Defaults to 640	`640`
`clip`	`bool`	If True, the boxes will be clipped to the image boundaries. Defaults to False	`False`
`eps`	`float`	The minimum value of the box's width and height. Defaults to 0.0	`0.0`

Returns:

Name	Type	Description
`y`	`np.ndarray) or (torch.Tensor`	The bounding box coordinates in (x, y, width, height, normalized) format

Source code in ultralytics/yolo/utils/ops.py

def xyxy2xywhn(x, w=640, h=640, clip=False, eps=0.0):
    """
    Convert bounding box coordinates from (x1, y1, x2, y2) format to (x, y, width, height, normalized) format.
    x, y, width and height are normalized to image dimensions

    Args:
        x (np.ndarray) or (torch.Tensor): The input bounding box coordinates in (x1, y1, x2, y2) format.
        w (int): The width of the image. Defaults to 640
        h (int): The height of the image. Defaults to 640
        clip (bool): If True, the boxes will be clipped to the image boundaries. Defaults to False
        eps (float): The minimum value of the box's width and height. Defaults to 0.0
    Returns:
        y (np.ndarray) or (torch.Tensor): The bounding box coordinates in (x, y, width, height, normalized) format
    """
    if clip:
        clip_boxes(x, (h - eps, w - eps))  # warning: inplace clip
    y = x.clone() if isinstance(x, torch.Tensor) else np.copy(x)
    y[..., 0] = ((x[..., 0] + x[..., 2]) / 2) / w  # x center
    y[..., 1] = ((x[..., 1] + x[..., 3]) / 2) / h  # y center
    y[..., 2] = (x[..., 2] - x[..., 0]) / w  # width
    y[..., 3] = (x[..., 3] - x[..., 1]) / h  # height
    return y

handler: python options: show_source: false show_root_toc_entry: false

xyn2xy

Convert normalized coordinates to pixel coordinates of shape (n,2)

Parameters:

Name	Type	Description	Default
`x`	`np.ndarray) or (torch.Tensor`	The input tensor of normalized bounding box coordinates	required
`w`	`int`	The width of the image. Defaults to 640	`640`
`h`	`int`	The height of the image. Defaults to 640	`640`
`padw`	`int`	The width of the padding. Defaults to 0	`0`
`padh`	`int`	The height of the padding. Defaults to 0	`0`

Returns:

Name	Type	Description
`y`	`np.ndarray) or (torch.Tensor`	The x and y coordinates of the top left corner of the bounding box

Source code in ultralytics/yolo/utils/ops.py

def xyn2xy(x, w=640, h=640, padw=0, padh=0):
    """
    Convert normalized coordinates to pixel coordinates of shape (n,2)

    Args:
        x (np.ndarray) or (torch.Tensor): The input tensor of normalized bounding box coordinates
        w (int): The width of the image. Defaults to 640
        h (int): The height of the image. Defaults to 640
        padw (int): The width of the padding. Defaults to 0
        padh (int): The height of the padding. Defaults to 0
    Returns:
        y (np.ndarray) or (torch.Tensor): The x and y coordinates of the top left corner of the bounding box
    """
    y = x.clone() if isinstance(x, torch.Tensor) else np.copy(x)
    y[..., 0] = w * x[..., 0] + padw  # top left x
    y[..., 1] = h * x[..., 1] + padh  # top left y
    return y

handler: python options: show_source: false show_root_toc_entry: false

xywh2ltwh

Convert the bounding box format from [x, y, w, h] to [x1, y1, w, h], where x1, y1 are the top-left coordinates.

Parameters:

Name	Type	Description	Default
`x`	`np.ndarray) or (torch.Tensor`	The input tensor with the bounding box coordinates in the xywh format	required

Returns:

Name	Type	Description
`y`	`np.ndarray) or (torch.Tensor`	The bounding box coordinates in the xyltwh format

Source code in ultralytics/yolo/utils/ops.py

def xywh2ltwh(x):
    """
    Convert the bounding box format from [x, y, w, h] to [x1, y1, w, h], where x1, y1 are the top-left coordinates.

    Args:
        x (np.ndarray) or (torch.Tensor): The input tensor with the bounding box coordinates in the xywh format
    Returns:
        y (np.ndarray) or (torch.Tensor): The bounding box coordinates in the xyltwh format
    """
    y = x.clone() if isinstance(x, torch.Tensor) else np.copy(x)
    y[:, 0] = x[:, 0] - x[:, 2] / 2  # top left x
    y[:, 1] = x[:, 1] - x[:, 3] / 2  # top left y
    return y

handler: python options: show_source: false show_root_toc_entry: false

xyxy2ltwh

Convert nx4 bounding boxes from [x1, y1, x2, y2] to [x1, y1, w, h], where xy1=top-left, xy2=bottom-right

Parameters:

Name	Type	Description	Default
`x`	`np.ndarray) or (torch.Tensor`	The input tensor with the bounding boxes coordinates in the xyxy format	required

Returns:

Name	Type	Description
`y`	`np.ndarray) or (torch.Tensor`	The bounding box coordinates in the xyltwh format.

Source code in ultralytics/yolo/utils/ops.py

def xyxy2ltwh(x):
    """
    Convert nx4 bounding boxes from [x1, y1, x2, y2] to [x1, y1, w, h], where xy1=top-left, xy2=bottom-right

    Args:
      x (np.ndarray) or (torch.Tensor): The input tensor with the bounding boxes coordinates in the xyxy format
    Returns:
      y (np.ndarray) or (torch.Tensor): The bounding box coordinates in the xyltwh format.
    """
    y = x.clone() if isinstance(x, torch.Tensor) else np.copy(x)
    y[:, 2] = x[:, 2] - x[:, 0]  # width
    y[:, 3] = x[:, 3] - x[:, 1]  # height
    return y

handler: python options: show_source: false show_root_toc_entry: false

ltwh2xywh

Convert nx4 boxes from [x1, y1, w, h] to [x, y, w, h] where xy1=top-left, xy=center

Parameters:

Name	Type	Description	Default
`x`	`torch.Tensor`	the input tensor	required

Source code in ultralytics/yolo/utils/ops.py

def ltwh2xywh(x):
    """
    Convert nx4 boxes from [x1, y1, w, h] to [x, y, w, h] where xy1=top-left, xy=center

    Args:
      x (torch.Tensor): the input tensor
    """
    y = x.clone() if isinstance(x, torch.Tensor) else np.copy(x)
    y[:, 0] = x[:, 0] + x[:, 2] / 2  # center x
    y[:, 1] = x[:, 1] + x[:, 3] / 2  # center y
    return y

handler: python options: show_source: false show_root_toc_entry: false

ltwh2xyxy

It converts the bounding box from [x1, y1, w, h] to [x1, y1, x2, y2] where xy1=top-left, xy2=bottom-right

Parameters:

Name	Type	Description	Default
`x`	`np.ndarray) or (torch.Tensor`	the input image	required

Returns:

Name	Type	Description
`y`	`np.ndarray) or (torch.Tensor`	the xyxy coordinates of the bounding boxes.

Source code in ultralytics/yolo/utils/ops.py

def ltwh2xyxy(x):
    """
    It converts the bounding box from [x1, y1, w, h] to [x1, y1, x2, y2] where xy1=top-left, xy2=bottom-right

    Args:
      x (np.ndarray) or (torch.Tensor): the input image

    Returns:
      y (np.ndarray) or (torch.Tensor): the xyxy coordinates of the bounding boxes.
    """
    y = x.clone() if isinstance(x, torch.Tensor) else np.copy(x)
    y[:, 2] = x[:, 2] + x[:, 0]  # width
    y[:, 3] = x[:, 3] + x[:, 1]  # height
    return y

handler: python options: show_source: false show_root_toc_entry: false

segment2box

Convert 1 segment label to 1 box label, applying inside-image constraint, i.e. (xy1, xy2, ...) to (xyxy)

Parameters:

Name	Type	Description	Default
`segment`	`torch.Tensor`	the segment label	required
`width`	`int`	the width of the image. Defaults to 640	`640`
`height`	`int`	The height of the image. Defaults to 640	`640`

Returns:

Type	Description
`np.ndarray`	the minimum and maximum x and y values of the segment.

Source code in ultralytics/yolo/utils/ops.py

def segment2box(segment, width=640, height=640):
    """
    Convert 1 segment label to 1 box label, applying inside-image constraint, i.e. (xy1, xy2, ...) to (xyxy)

    Args:
      segment (torch.Tensor): the segment label
      width (int): the width of the image. Defaults to 640
      height (int): The height of the image. Defaults to 640

    Returns:
      (np.ndarray): the minimum and maximum x and y values of the segment.
    """
    # Convert 1 segment label to 1 box label, applying inside-image constraint, i.e. (xy1, xy2, ...) to (xyxy)
    x, y = segment.T  # segment xy
    inside = (x >= 0) & (y >= 0) & (x <= width) & (y <= height)
    x, y, = x[inside], y[inside]
    return np.array([x.min(), y.min(), x.max(), y.max()]) if any(x) else np.zeros(4)  # xyxy

handler: python options: show_source: false show_root_toc_entry: false

Mask Operations

resample_segments

Inputs a list of segments (n,2) and returns a list of segments (n,2) up-sampled to n points each.

Parameters:

Name	Type	Description	Default
`segments`	`list`	a list of (n,2) arrays, where n is the number of points in the segment.	required
`n`	`int`	number of points to resample the segment to. Defaults to 1000	`1000`

Returns:

Name	Type	Description
`segments`	`list`	the resampled segments.

Source code in ultralytics/yolo/utils/ops.py

def resample_segments(segments, n=1000):
    """
    Inputs a list of segments (n,2) and returns a list of segments (n,2) up-sampled to n points each.

    Args:
      segments (list): a list of (n,2) arrays, where n is the number of points in the segment.
      n (int): number of points to resample the segment to. Defaults to 1000

    Returns:
      segments (list): the resampled segments.
    """
    for i, s in enumerate(segments):
        s = np.concatenate((s, s[0:1, :]), axis=0)
        x = np.linspace(0, len(s) - 1, n)
        xp = np.arange(len(s))
        segments[i] = np.concatenate([np.interp(x, xp, s[:, i]) for i in range(2)]).reshape(2, -1).T  # segment xy
    return segments

handler: python options: show_source: false show_root_toc_entry: false

crop_mask

It takes a mask and a bounding box, and returns a mask that is cropped to the bounding box

Parameters:

Name	Type	Description	Default
`masks`	`torch.Tensor`	[h, w, n] tensor of masks	required
`boxes`	`torch.Tensor`	[n, 4] tensor of bbox coordinates in relative point form	required

Returns:

Type	Description
`torch.Tensor`	The masks are being cropped to the bounding box.

Source code in ultralytics/yolo/utils/ops.py

def crop_mask(masks, boxes):
    """
    It takes a mask and a bounding box, and returns a mask that is cropped to the bounding box

    Args:
      masks (torch.Tensor): [h, w, n] tensor of masks
      boxes (torch.Tensor): [n, 4] tensor of bbox coordinates in relative point form

    Returns:
      (torch.Tensor): The masks are being cropped to the bounding box.
    """
    n, h, w = masks.shape
    x1, y1, x2, y2 = torch.chunk(boxes[:, :, None], 4, 1)  # x1 shape(1,1,n)
    r = torch.arange(w, device=masks.device, dtype=x1.dtype)[None, None, :]  # rows shape(1,w,1)
    c = torch.arange(h, device=masks.device, dtype=x1.dtype)[None, :, None]  # cols shape(h,1,1)

    return masks * ((r >= x1) * (r < x2) * (c >= y1) * (c < y2))

handler: python options: show_source: false show_root_toc_entry: false

process_mask_upsample

It takes the output of the mask head, and applies the mask to the bounding boxes. This produces masks of higher quality but is slower.

Parameters:

Name	Type	Description	Default
`protos`	`torch.Tensor`	[mask_dim, mask_h, mask_w]	required
`masks_in`	`torch.Tensor`	[n, mask_dim], n is number of masks after nms	required
`bboxes`	`torch.Tensor`	[n, 4], n is number of masks after nms	required
`shape`	`tuple`	the size of the input image (h,w)	required

Returns:

Type	Description
`torch.Tensor`	The upsampled masks.

Source code in ultralytics/yolo/utils/ops.py

def process_mask_upsample(protos, masks_in, bboxes, shape):
    """
    It takes the output of the mask head, and applies the mask to the bounding boxes. This produces masks of higher
    quality but is slower.

    Args:
      protos (torch.Tensor): [mask_dim, mask_h, mask_w]
      masks_in (torch.Tensor): [n, mask_dim], n is number of masks after nms
      bboxes (torch.Tensor): [n, 4], n is number of masks after nms
      shape (tuple): the size of the input image (h,w)

    Returns:
      (torch.Tensor): The upsampled masks.
    """
    c, mh, mw = protos.shape  # CHW
    masks = (masks_in @ protos.float().view(c, -1)).sigmoid().view(-1, mh, mw)
    masks = F.interpolate(masks[None], shape, mode='bilinear', align_corners=False)[0]  # CHW
    masks = crop_mask(masks, bboxes)  # CHW
    return masks.gt_(0.5)

handler: python options: show_source: false show_root_toc_entry: false

process_mask

It takes the output of the mask head, and applies the mask to the bounding boxes. This is faster but produces downsampled quality of mask

Parameters:

Name	Type	Description	Default
`protos`	`torch.Tensor`	[mask_dim, mask_h, mask_w]	required
`masks_in`	`torch.Tensor`	[n, mask_dim], n is number of masks after nms	required
`bboxes`	`torch.Tensor`	[n, 4], n is number of masks after nms	required
`shape`	`tuple`	the size of the input image (h,w)	required

Returns:

Type	Description
`torch.Tensor`	The processed masks.

Source code in ultralytics/yolo/utils/ops.py

def process_mask(protos, masks_in, bboxes, shape, upsample=False):
    """
    It takes the output of the mask head, and applies the mask to the bounding boxes. This is faster but produces
    downsampled quality of mask

    Args:
      protos (torch.Tensor): [mask_dim, mask_h, mask_w]
      masks_in (torch.Tensor): [n, mask_dim], n is number of masks after nms
      bboxes (torch.Tensor): [n, 4], n is number of masks after nms
      shape (tuple): the size of the input image (h,w)

    Returns:
      (torch.Tensor): The processed masks.
    """

    c, mh, mw = protos.shape  # CHW
    ih, iw = shape
    masks = (masks_in @ protos.float().view(c, -1)).sigmoid().view(-1, mh, mw)  # CHW

    downsampled_bboxes = bboxes.clone()
    downsampled_bboxes[:, 0] *= mw / iw
    downsampled_bboxes[:, 2] *= mw / iw
    downsampled_bboxes[:, 3] *= mh / ih
    downsampled_bboxes[:, 1] *= mh / ih

    masks = crop_mask(masks, downsampled_bboxes)  # CHW
    if upsample:
        masks = F.interpolate(masks[None], shape, mode='bilinear', align_corners=False)[0]  # CHW
    return masks.gt_(0.5)

handler: python options: show_source: false show_root_toc_entry: false

process_mask_native

It takes the output of the mask head, and crops it after upsampling to the bounding boxes.

Parameters:

Name	Type	Description	Default
`protos`	`torch.Tensor`	[mask_dim, mask_h, mask_w]	required
`masks_in`	`torch.Tensor`	[n, mask_dim], n is number of masks after nms	required
`bboxes`	`torch.Tensor`	[n, 4], n is number of masks after nms	required
`shape`	`tuple`	the size of the input image (h,w)	required

Returns:

Name	Type	Description
`masks`	`torch.Tensor`	The returned masks with dimensions [h, w, n]

Source code in ultralytics/yolo/utils/ops.py

def process_mask_native(protos, masks_in, bboxes, shape):
    """
    It takes the output of the mask head, and crops it after upsampling to the bounding boxes.

    Args:
      protos (torch.Tensor): [mask_dim, mask_h, mask_w]
      masks_in (torch.Tensor): [n, mask_dim], n is number of masks after nms
      bboxes (torch.Tensor): [n, 4], n is number of masks after nms
      shape (tuple): the size of the input image (h,w)

    Returns:
      masks (torch.Tensor): The returned masks with dimensions [h, w, n]
    """
    c, mh, mw = protos.shape  # CHW
    masks = (masks_in @ protos.float().view(c, -1)).sigmoid().view(-1, mh, mw)
    gain = min(mh / shape[0], mw / shape[1])  # gain  = old / new
    pad = (mw - shape[1] * gain) / 2, (mh - shape[0] * gain) / 2  # wh padding
    top, left = int(pad[1]), int(pad[0])  # y, x
    bottom, right = int(mh - pad[1]), int(mw - pad[0])
    masks = masks[:, top:bottom, left:right]

    masks = F.interpolate(masks[None], shape, mode='bilinear', align_corners=False)[0]  # CHW
    masks = crop_mask(masks, bboxes)  # CHW
    return masks.gt_(0.5)

handler: python options: show_source: false show_root_toc_entry: false

scale_segments

Rescale segment coordinates (xyxy) from img1_shape to img0_shape

Parameters:

Name	Type	Description	Default
`img1_shape`	`tuple`	The shape of the image that the segments are from.	required
`segments`	`torch.Tensor`	the segments to be scaled	required
`img0_shape`	`tuple`	the shape of the image that the segmentation is being applied to	required
`ratio_pad`	`tuple`	the ratio of the image size to the padded image size.	`None`
`normalize`	`bool`	If True, the coordinates will be normalized to the range [0, 1]. Defaults to False	`False`

Returns:

Name	Type	Description
`segments`	`torch.Tensor`	the segmented image.

Source code in ultralytics/yolo/utils/ops.py

def scale_segments(img1_shape, segments, img0_shape, ratio_pad=None, normalize=False):
    """
    Rescale segment coordinates (xyxy) from img1_shape to img0_shape

    Args:
      img1_shape (tuple): The shape of the image that the segments are from.
      segments (torch.Tensor): the segments to be scaled
      img0_shape (tuple): the shape of the image that the segmentation is being applied to
      ratio_pad (tuple): the ratio of the image size to the padded image size.
      normalize (bool): If True, the coordinates will be normalized to the range [0, 1]. Defaults to False

    Returns:
      segments (torch.Tensor): the segmented image.
    """
    if ratio_pad is None:  # calculate from img0_shape
        gain = min(img1_shape[0] / img0_shape[0], img1_shape[1] / img0_shape[1])  # gain  = old / new
        pad = (img1_shape[1] - img0_shape[1] * gain) / 2, (img1_shape[0] - img0_shape[0] * gain) / 2  # wh padding
    else:
        gain = ratio_pad[0][0]
        pad = ratio_pad[1]

    segments[:, 0] -= pad[0]  # x padding
    segments[:, 1] -= pad[1]  # y padding
    segments /= gain
    clip_segments(segments, img0_shape)
    if normalize:
        segments[:, 0] /= img0_shape[1]  # width
        segments[:, 1] /= img0_shape[0]  # height
    return segments

handler: python options: show_source: false show_root_toc_entry: false

masks2segments

It takes a list of masks(n,h,w) and returns a list of segments(n,xy)

Parameters:

Name	Type	Description	Default
`masks`	`torch.Tensor`	the output of the model, which is a tensor of shape (batch_size, 160, 160)	required
`strategy`	`str`	'concat' or 'largest'. Defaults to largest	`'largest'`

Returns:

Name	Type	Description
`segments`	`List`	list of segment masks

Source code in ultralytics/yolo/utils/ops.py

def masks2segments(masks, strategy='largest'):
    """
    It takes a list of masks(n,h,w) and returns a list of segments(n,xy)

    Args:
      masks (torch.Tensor): the output of the model, which is a tensor of shape (batch_size, 160, 160)
      strategy (str): 'concat' or 'largest'. Defaults to largest

    Returns:
      segments (List): list of segment masks
    """
    segments = []
    for x in masks.int().cpu().numpy().astype('uint8'):
        c = cv2.findContours(x, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)[0]
        if c:
            if strategy == 'concat':  # concatenate all segments
                c = np.concatenate([x.reshape(-1, 2) for x in c])
            elif strategy == 'largest':  # select largest segment
                c = np.array(c[np.array([len(x) for x in c]).argmax()]).reshape(-1, 2)
        else:
            c = np.zeros((0, 2))  # no segments found
        segments.append(c.astype('float32'))
    return segments

handler: python options: show_source: false show_root_toc_entry: false

clip_segments

It takes a list of line segments (x1,y1,x2,y2) and clips them to the image shape (height, width)

Parameters:

Name	Type	Description	Default
`segments`	`list`	a list of segments, each segment is a list of points, each point is a list of x,y	required

coordinates shape (tuple): the shape of the image

Source code in ultralytics/yolo/utils/ops.py

def clip_segments(segments, shape):
    """
    It takes a list of line segments (x1,y1,x2,y2) and clips them to the image shape (height, width)

    Args:
      segments (list): a list of segments, each segment is a list of points, each point is a list of x,y
    coordinates
      shape (tuple): the shape of the image
    """
    if isinstance(segments, torch.Tensor):  # faster individually
        segments[:, 0].clamp_(0, shape[1])  # x
        segments[:, 1].clamp_(0, shape[0])  # y
    else:  # np.array (faster grouped)
        segments[:, 0] = segments[:, 0].clip(0, shape[1])  # x
        segments[:, 1] = segments[:, 1].clip(0, shape[0])  # y

handler: python options: show_source: false show_root_toc_entry: false