如何用Keras构建CNN定位鲸鱼图像特征点？目标格式处理

如何用Keras构建CNN定位鲸鱼图像特征点？目标格式处理

阿华AIGC实验室

2026-5-26

Keras CNN实现鲸鱼特征点定位指南（含目标数据格式化核心方案）

我来帮你梳理用Keras构建CNN完成鲸鱼2个特征点定位的完整流程，重点解决你最困惑的目标数据格式化问题，全程都是可落地的代码和思路。

一、核心问题拆解：目标数据该怎么处理？

你的任务是回归任务（输出连续的坐标值），不是分类。针对2个特征点（每个点含x、y坐标），目标数据必须满足以下要求：

形状匹配：目标数组的形状必须是 (样本数量, 4) —— 每个样本对应4个连续值：[x1, y1, x2, y2]
归一化处理：必须把坐标缩放到0-1之间（除以图像的宽/高），否则模型训练时损失会异常大，难以收敛。

举个实际例子：
假设你的图像尺寸是224×224，某张图的两个特征点原始坐标是(100, 50)和(150, 80)，那么归一化后的目标值就是：

x1_norm = 100 / 224 ≈ 0.446
y1_norm = 50 / 224 ≈ 0.223
x2_norm = 150 / 224 ≈ 0.669
y2_norm = 80 / 224 ≈ 0.357

对应的目标数组该行就是 [0.446, 0.223, 0.669, 0.357]

二、完整实现步骤

1. 数据预处理（图像+目标）

图像预处理

把你的numpy数组格式图像做标准化和形状调整：

import numpy as np

# 假设你的原始图像数组是images，形状为(样本数, 高度, 宽度, 通道数)
# 标准化到0-1区间
images = images.astype('float32') / 255.0

# 如果是灰度图，确保通道数为1；如果是RGB则为3，这里以224×224 RGB为例
target_shape = (224, 224, 3)
# 统一图像尺寸（如果原始尺寸不一致）
from keras.preprocessing.image import smart_resize
images = smart_resize(images, target_shape[:2])

目标数据格式化

假设你的原始目标标签是一个列表，每个元素是[(x1,y1), (x2,y2)]，转成归一化的numpy数组：

img_width, img_height = target_shape[0], target_shape[1]
targets = []

# 遍历每个样本的标签
for (x1, y1), (x2, y2) in original_labels:
    # 归一化坐标
    x1_norm = x1 / img_width
    y1_norm = y1 / img_height
    x2_norm = x2 / img_width
    y2_norm = y2 / img_height
    targets.append([x1_norm, y1_norm, x2_norm, y2_norm])

# 转成numpy数组，形状为(样本数, 4)
targets_array = np.array(targets, dtype='float32')

2. 构建CNN回归模型

因为是回归任务，最后一层用Dense(4)输出4个连续值，激活函数用linear（默认就是linear）。这里给出一个轻量但有效的结构：

from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout

model = Sequential([
    # 卷积块：提取图像特征
    Conv2D(32, (3,3), activation='relu', input_shape=target_shape),
    MaxPooling2D((2,2)),
    Conv2D(64, (3,3), activation='relu'),
    MaxPooling2D((2,2)),
    Conv2D(128, (3,3), activation='relu'),
    MaxPooling2D((2,2)),
    Conv2D(256, (3,3), activation='relu'),
    MaxPooling2D((2,2)),
    # 全连接层：映射特征到坐标
    Flatten(),
    Dense(512, activation='relu'),
    Dropout(0.5),  # 防止过拟合
    Dense(256, activation='relu'),
    # 输出层：4个值对应两个特征点的x、y
    Dense(4, activation='linear')
])

# 编译模型：回归任务用MSE损失，Adam优化器
model.compile(optimizer='adam', loss='mse')
model.summary()

3. 训练模型

拆分训练集和验证集，开始训练：

from sklearn.model_selection import train_test_split

# 拆分数据
X_train, X_val, y_train, y_val = train_test_split(images, targets_array, test_size=0.2, random_state=42)

# 训练模型
history = model.fit(
    X_train, y_train,
    epochs=50,
    batch_size=16,
    validation_data=(X_val, y_val),
    verbose=1
)

4. 新图像预测

对未标记的新图像，预处理后预测，再反归一化得到原始坐标：

def predict_keypoints(model, new_image, img_width, img_height):
    # 预处理新图像：标准化+调整尺寸+增加batch维度
    new_image = new_image.astype('float32') / 255.0
    new_image = smart_resize(np.expand_dims(new_image, axis=0), (img_width, img_height))
    
    # 预测归一化坐标
    pred_norm = model.predict(new_image)[0]
    
    # 反归一化得到原始坐标
    x1 = int(pred_norm[0] * img_width)
    y1 = int(pred_norm[1] * img_height)
    x2 = int(pred_norm[2] * img_width)
    y2 = int(pred_norm[3] * img_height)
    
    return (x1, y1), (x2, y2)

# 示例：假设new_img是你的未标记图像numpy数组
keypoint1, keypoint2 = predict_keypoints(model, new_img, img_width, img_height)
print(f"特征点1：{keypoint1}，特征点2：{keypoint2}")

三、优化技巧

数据增强：用ImageDataGenerator做旋转、平移、翻转等增强，提升模型泛化能力
迁移学习：如果样本量小，用预训练模型（比如VGG16、ResNet50）的卷积部分做特征提取，冻结后加全连接层，训练更快效果更好
早停法：用EarlyStopping监控验证集loss，防止过拟合
学习率调整：用ReduceLROnPlateau在验证loss停滞时降低学习率

内容的提问来源于stack exchange，提问作者owhjf98w4ehf890w3hf

火山引擎最新活动

方舟 Coding Plan

模型自由，工具不限，免费解锁 ArkClaw，7*24 小时在线的专属智能伙伴

一键部署 OpenClaw

分钟级部署，云服务器包月低至￥9.9，与 CodingPlan 组合购买仅需19.8元

Seedance2.0 体验中心上线

注册即享免费500万Tokens，抢先领略新一代AI视频技术跃迁

新用户特惠专场

大模型19元起，Al应用9.9元畅享，新人首购爆款尽享优惠