目标检测Bounding Box裁剪定位错误：如何获取正确的检测框裁剪图像？

阿华AIGC实验室

2026-4-29

Fix Inaccurate Bounding Box Cropping After Object Detection

Hey there, let's sort out why your cropped images are misaligned—you're really close, just a couple of small code issues throwing things off.

Key Issues Causing Misalignment

Incorrect Parameter Order for PIL.Image.crop()
PIL's crop() method expects a tuple in the order (left, upper, right, lower)—which translates directly to (xmin, ymin, xmax, ymax). Your original code uses (xmin, xmax, ymin, ymax), swapping the upper/right boundaries and making the crop region totally off-target.
Hardcoded Image Dimensions
Setting width=600 and height=900 manually works only if your input image exactly matches those sizes. If the image's real dimensions differ, the normalized bounding box coordinates will convert to wrong pixel positions. Always pull the actual dimensions directly from the loaded image.

Corrected Cropping Code

Here's the fixed version of your code, with clear explanations:

from PIL import Image

# Load the image first to get its real dimensions
img = Image.open(image_path)
width, height = img.size  # This grabs the actual width and height of your image

# Extract detection boxes (note: most frameworks return boxes as [ymin, xmin, ymax, xmax] normalized values)
boxes = detections['detection_boxes']
ymin = int(boxes[0][0][0] * height)
xmin = int(boxes[0][0][1] * width)
ymax = int(boxes[0][0][2] * height)
xmax = int(boxes[0][0][3] * width)

print(f"xmin: {xmin}, ymin: {ymin}, xmax: {xmax}, ymax: {ymax}")

# Crop with the correct parameter order: (left, upper, right, lower)
cropped_img = img.crop((xmin, ymin, xmax, ymax))
cropped_img.save("/content/gdrive/MyDrive/UrduDetection/Croped_images/img8.jpg")

Bonus: Batch Crop All Valid Detection Boxes

If you want to crop every object that meets your score threshold (like the .5 you used in visualization), loop through all valid boxes instead of just the first one:

scores = detections['detection_scores'][0].numpy()
min_score_thresh = 0.5

for idx, score in enumerate(scores):
    if score >= min_score_thresh:
        ymin = int(boxes[0][idx][0] * height)
        xmin = int(boxes[0][idx][1] * width)
        ymax = int(boxes[0][idx][2] * height)
        xmax = int(boxes[0][idx][3] * width)
        
        cropped_img = img.crop((xmin, ymin, xmax, ymax))
        cropped_img.save(f"/content/gdrive/MyDrive/UrduDetection/Croped_images/img8_{idx}.jpg")

Quick Validation Check

Double-check that your detection framework returns boxes in the [ymin, xmin, ymax, xmax] format (most popular ones like TensorFlow Object Detection API do). If yours uses a different order, adjust the indices accordingly.

内容的提问来源于stack exchange，提问作者maryam mehboob