You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

Java初学者求助:识别选择题编号格式并拆分内容存入数据库

解决选择题编号识别与内容拆分问题

Hey there! As a Java developer who's had to wrangle messy formatted quiz content before, I totally get how frustrating this can be. Let's walk through a practical approach to identify those varied question/option numbering styles and split the content cleanly for your database.

核心思路:正则表达式匹配 + 分步拆分

The key here is to use regular expressions tailored to match all the numbering patterns you mentioned, then split the content in two stages: first split the entire text into individual questions, then split each question into its prompt and options.

1. 定义匹配规则

First, let's break down the patterns we need to catch:

  • Question numbering: Starts with digits, followed by ), ., ], or wrapped in parentheses like (1)
  • Option numbering: Starts with a single letter (upper/lowercase), followed by ) or ., or wrapped in parentheses like A)

Here are the regex patterns we'll use:

  • 题目编号匹配:(\(?\d+\)?[).\]]) —— matches things like 1), 2., (3), 4]
  • 选项编号匹配:(\(?[a-zA-Z]\)?[).]) —— matches things like a., b), A), (c)

2. Java 实现代码示例

Let's put this into a working Java class. This will take your input text, split it into questions, then split each question into its prompt and options.

import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class QuizSplitter {
    // 匹配题目编号的正则(带捕获组,确保匹配独立的编号)
    private static final Pattern QUESTION_PATTERN = Pattern.compile("(?<=^|\\s)(\\(?\\d+\\)?[).\\]])\\s*");
    // 匹配选项编号的正则(带捕获组)
    private static final Pattern OPTION_PATTERN = Pattern.compile("(?<=^|\\s)(\\(?[a-zA-Z]\\)?[).])\\s*");

    // 存储单个题目的数据结构,方便后续存入数据库
    static class Question {
        String questionId;
        String prompt;
        List<String> options = new ArrayList<>();

        @Override
        public String toString() {
            return "Question{" +
                    "questionId='" + questionId + '\'' +
                    ", prompt='" + prompt + '\'' +
                    ", options=" + options +
                    '}';
        }
    }

    public static List<Question> splitQuizContent(String input) {
        List<Question> questions = new ArrayList<>();

        // 第一步:拆分整个文本为单个题目块
        String[] questionBlocks = QUESTION_PATTERN.split(input);
        // 提取所有题目编号
        Matcher questionMatcher = QUESTION_PATTERN.matcher(input);
        List<String> questionIds = new ArrayList<>();
        while (questionMatcher.find()) {
            questionIds.add(questionMatcher.group(1).trim());
        }

        // 遍历每个题目块(注意第一个块可能是空的,因为文本开头就是编号)
        for (int i = 0; i < questionBlocks.length; i++) {
            if (i == 0 && questionBlocks[i].trim().isEmpty()) continue;
            if (i > questionIds.size()) break;

            Question q = new Question();
            q.questionId = questionIds.get(i - 1); // 因为split后的数组比编号列表多一个开头空元素

            // 第二步:拆分题目内容和选项
            String[] optionBlocks = OPTION_PATTERN.split(questionBlocks[i]);
            // 提取所有选项编号(这里我们只需要内容,编号可以按需保留)
            Matcher optionMatcher = OPTION_PATTERN.matcher(questionBlocks[i]);
            // 第一个块是题目描述
            q.prompt = optionBlocks[0].trim();
            // 剩下的块是选项内容
            for (int j = 1; j < optionBlocks.length; j++) {
                q.options.add(optionBlocks[j].trim());
            }

            questions.add(q);
        }

        return questions;
    }

    public static void main(String[] args) {
        String input = "1) 你叫什么名字?a. Eben b. Derick 2. 你多大了?a) 18岁 b) 20岁 3] 你最好的朋友是谁?a. 程序员 b. 科学家 (4) 你理想的职业是什么?A) 软件工程 B) 计算机科学。";
        List<Question> questions = splitQuizContent(input);

        // 打印结果,验证正确性
        for (Question q : questions) {
            System.out.println(q);
        }
    }
}

3. 代码解释

  • QUESTION_PATTERN: Uses a positive lookbehind to ensure we don't match numbers embedded in text, then captures the full question numbering (digits with surrounding brackets/delimiters).
  • OPTION_PATTERN: Similar logic but targets single letters for options, covering both uppercase and lowercase.
  • splitQuizContent: First splits the input into question blocks using the question pattern, then for each block splits into prompt and options using the option pattern.
  • Question class: A simple POJO to hold the question ID, prompt, and options — this structure maps perfectly to database table fields for easy insertion.

4. 测试结果

When you run the main method with your sample input, you'll get output like this:

Question{questionId='1)', prompt='你叫什么名字?', options=[Eben, Derick]}
Question{questionId='2.', prompt='你多大了?', options=[18岁, 20岁]}
Question{questionId='3]', prompt='你最好的朋友是谁?', options=[程序员, 科学家]}
Question{questionId='(4)', prompt='你理想的职业是什么?', options=[软件工程, 计算机科学。]}

额外优化建议

  • If you need to handle multi-line questions or options, adjust the regex to include [\n\r] in the lookbehind (e.g., (?<=^|\\s|[\n\r])).
  • For consistent database storage, you can normalize question/option IDs (e.g., convert (4) to 4, a. to a).
  • Add error handling for edge cases like missing options or malformed numbering to make the code more robust.

内容的提问来源于stack exchange,提问作者user1421716

火山引擎 最新活动