Java初学者求助:识别选择题编号格式并拆分内容存入数据库
Hey there! As a Java developer who's had to wrangle messy formatted quiz content before, I totally get how frustrating this can be. Let's walk through a practical approach to identify those varied question/option numbering styles and split the content cleanly for your database.
核心思路:正则表达式匹配 + 分步拆分
The key here is to use regular expressions tailored to match all the numbering patterns you mentioned, then split the content in two stages: first split the entire text into individual questions, then split each question into its prompt and options.
1. 定义匹配规则
First, let's break down the patterns we need to catch:
- Question numbering: Starts with digits, followed by
),.,], or wrapped in parentheses like(1) - Option numbering: Starts with a single letter (upper/lowercase), followed by
)or., or wrapped in parentheses likeA)
Here are the regex patterns we'll use:
- 题目编号匹配:
(\(?\d+\)?[).\]])—— matches things like1),2.,(3),4] - 选项编号匹配:
(\(?[a-zA-Z]\)?[).])—— matches things likea.,b),A),(c)
2. Java 实现代码示例
Let's put this into a working Java class. This will take your input text, split it into questions, then split each question into its prompt and options.
import java.util.ArrayList; import java.util.List; import java.util.regex.Matcher; import java.util.regex.Pattern; public class QuizSplitter { // 匹配题目编号的正则(带捕获组,确保匹配独立的编号) private static final Pattern QUESTION_PATTERN = Pattern.compile("(?<=^|\\s)(\\(?\\d+\\)?[).\\]])\\s*"); // 匹配选项编号的正则(带捕获组) private static final Pattern OPTION_PATTERN = Pattern.compile("(?<=^|\\s)(\\(?[a-zA-Z]\\)?[).])\\s*"); // 存储单个题目的数据结构,方便后续存入数据库 static class Question { String questionId; String prompt; List<String> options = new ArrayList<>(); @Override public String toString() { return "Question{" + "questionId='" + questionId + '\'' + ", prompt='" + prompt + '\'' + ", options=" + options + '}'; } } public static List<Question> splitQuizContent(String input) { List<Question> questions = new ArrayList<>(); // 第一步:拆分整个文本为单个题目块 String[] questionBlocks = QUESTION_PATTERN.split(input); // 提取所有题目编号 Matcher questionMatcher = QUESTION_PATTERN.matcher(input); List<String> questionIds = new ArrayList<>(); while (questionMatcher.find()) { questionIds.add(questionMatcher.group(1).trim()); } // 遍历每个题目块(注意第一个块可能是空的,因为文本开头就是编号) for (int i = 0; i < questionBlocks.length; i++) { if (i == 0 && questionBlocks[i].trim().isEmpty()) continue; if (i > questionIds.size()) break; Question q = new Question(); q.questionId = questionIds.get(i - 1); // 因为split后的数组比编号列表多一个开头空元素 // 第二步:拆分题目内容和选项 String[] optionBlocks = OPTION_PATTERN.split(questionBlocks[i]); // 提取所有选项编号(这里我们只需要内容,编号可以按需保留) Matcher optionMatcher = OPTION_PATTERN.matcher(questionBlocks[i]); // 第一个块是题目描述 q.prompt = optionBlocks[0].trim(); // 剩下的块是选项内容 for (int j = 1; j < optionBlocks.length; j++) { q.options.add(optionBlocks[j].trim()); } questions.add(q); } return questions; } public static void main(String[] args) { String input = "1) 你叫什么名字?a. Eben b. Derick 2. 你多大了?a) 18岁 b) 20岁 3] 你最好的朋友是谁?a. 程序员 b. 科学家 (4) 你理想的职业是什么?A) 软件工程 B) 计算机科学。"; List<Question> questions = splitQuizContent(input); // 打印结果,验证正确性 for (Question q : questions) { System.out.println(q); } } }
3. 代码解释
QUESTION_PATTERN: Uses a positive lookbehind to ensure we don't match numbers embedded in text, then captures the full question numbering (digits with surrounding brackets/delimiters).OPTION_PATTERN: Similar logic but targets single letters for options, covering both uppercase and lowercase.splitQuizContent: First splits the input into question blocks using the question pattern, then for each block splits into prompt and options using the option pattern.Questionclass: A simple POJO to hold the question ID, prompt, and options — this structure maps perfectly to database table fields for easy insertion.
4. 测试结果
When you run the main method with your sample input, you'll get output like this:
Question{questionId='1)', prompt='你叫什么名字?', options=[Eben, Derick]} Question{questionId='2.', prompt='你多大了?', options=[18岁, 20岁]} Question{questionId='3]', prompt='你最好的朋友是谁?', options=[程序员, 科学家]} Question{questionId='(4)', prompt='你理想的职业是什么?', options=[软件工程, 计算机科学。]}
额外优化建议
- If you need to handle multi-line questions or options, adjust the regex to include
[\n\r]in the lookbehind (e.g.,(?<=^|\\s|[\n\r])). - For consistent database storage, you can normalize question/option IDs (e.g., convert
(4)to4,a.toa). - Add error handling for edge cases like missing options or malformed numbering to make the code more robust.
内容的提问来源于stack exchange,提问作者user1421716




