目标检测Bounding Box裁剪定位错误:如何获取正确的检测框裁剪图像?
Hey there, let's sort out why your cropped images are misaligned—you're really close, just a couple of small code issues throwing things off.
Key Issues Causing Misalignment
Incorrect Parameter Order for
PIL.Image.crop()
PIL'scrop()method expects a tuple in the order(left, upper, right, lower)—which translates directly to(xmin, ymin, xmax, ymax). Your original code uses(xmin, xmax, ymin, ymax), swapping the upper/right boundaries and making the crop region totally off-target.Hardcoded Image Dimensions
Settingwidth=600andheight=900manually works only if your input image exactly matches those sizes. If the image's real dimensions differ, the normalized bounding box coordinates will convert to wrong pixel positions. Always pull the actual dimensions directly from the loaded image.
Corrected Cropping Code
Here's the fixed version of your code, with clear explanations:
from PIL import Image # Load the image first to get its real dimensions img = Image.open(image_path) width, height = img.size # This grabs the actual width and height of your image # Extract detection boxes (note: most frameworks return boxes as [ymin, xmin, ymax, xmax] normalized values) boxes = detections['detection_boxes'] ymin = int(boxes[0][0][0] * height) xmin = int(boxes[0][0][1] * width) ymax = int(boxes[0][0][2] * height) xmax = int(boxes[0][0][3] * width) print(f"xmin: {xmin}, ymin: {ymin}, xmax: {xmax}, ymax: {ymax}") # Crop with the correct parameter order: (left, upper, right, lower) cropped_img = img.crop((xmin, ymin, xmax, ymax)) cropped_img.save("/content/gdrive/MyDrive/UrduDetection/Croped_images/img8.jpg")
Bonus: Batch Crop All Valid Detection Boxes
If you want to crop every object that meets your score threshold (like the .5 you used in visualization), loop through all valid boxes instead of just the first one:
scores = detections['detection_scores'][0].numpy() min_score_thresh = 0.5 for idx, score in enumerate(scores): if score >= min_score_thresh: ymin = int(boxes[0][idx][0] * height) xmin = int(boxes[0][idx][1] * width) ymax = int(boxes[0][idx][2] * height) xmax = int(boxes[0][idx][3] * width) cropped_img = img.crop((xmin, ymin, xmax, ymax)) cropped_img.save(f"/content/gdrive/MyDrive/UrduDetection/Croped_images/img8_{idx}.jpg")
Quick Validation Check
Double-check that your detection framework returns boxes in the [ymin, xmin, ymax, xmax] format (most popular ones like TensorFlow Object Detection API do). If yours uses a different order, adjust the indices accordingly.
内容的提问来源于stack exchange,提问作者maryam mehboob




