如何用Java NLP精准提取句子主谓宾并对接SimpleNLG？

阿华AIGC实验室

2026-5-7

解决Java中精准提取主谓宾并对接SimpleNLG的问题

我明白你现在的痛点——想用SimpleNLG生成句子，但精准提取主谓宾等成分一直卡壳，试过CoreNLP、OpenNLP等库都没得到理想的结构化结果，还不想靠一堆if-else硬凑规则对吧？其实不用被迫写大量条件判断，有更简便的结构化API可以用，下面给你几个可行的方案：

方案1：用斯坦福CoreNLP的结构化依存解析API

你之前用的是TreePrint输出字符串形式的依存关系，这需要自己解析格式，很麻烦。其实CoreNLP提供了直接操作依存关系对象的API，能直接拿到主语、谓语、宾语等成分的结构化数据，不用手动处理字符串。

示例代码

import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.semgraph.*;
import edu.stanford.nlp.trees.GrammaticalRelation;
import java.util.*;

public class DependencyParserDemo {
    public static void main(String[] args) {
        // 初始化CoreNLP管道，包含依存解析组件
        Properties props = new Properties();
        props.setProperty("annotators", "tokenize,ssplit,pos,lemma,parse,depparse");
        StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

        // 待分析的句子
        String inputSentence = "I use a parser";
        Annotation document = new Annotation(inputSentence);
        pipeline.annotate(document);

        // 遍历每个句子（这里只有一个）
        for (CoreMap sentence : document.get(CoreAnnotations.SentencesAnnotation.class)) {
            // 获取折叠后的依存关系图
            SemanticGraph dependencies = sentence.get(SemanticGraphCoreAnnotations.CollapsedDependenciesAnnotation.class);
            
            // 获取根节点（通常是谓语动词）
            IndexedWord rootVerb = dependencies.getFirstRoot();
            System.out.println("谓语动词: " + rootVerb.word());

            // 获取主语（nsubj关系）
            for (IndexedWord subject : dependencies.getChildren(rootVerb, GrammaticalRelation.NOMINAL_SUBJECT)) {
                System.out.println("主语: " + subject.word());
            }

            // 获取直接宾语（dobj关系）
            for (IndexedWord object : dependencies.getChildren(rootVerb, GrammaticalRelation.DIRECT_OBJECT)) {
                System.out.println("宾语: " + object.word());
                // 获取宾语的限定词（det关系）
                for (IndexedWord determiner : dependencies.getChildren(object, GrammaticalRelation.DETERMINER)) {
                    System.out.println("限定词: " + determiner.word());
                }
            }
        }
    }
}

为什么这个方案更好？

直接通过GrammaticalRelation枚举（比如NOMINAL_SUBJECT、DIRECT_OBJECT）获取对应成分，避免了自己解析字符串的错误
返回的IndexedWord对象包含词、词性、索引等结构化信息，方便直接传给SimpleNLG

方案2：将CoreNLP提取的成分直接对接SimpleNLG

拿到结构化的主语、动词、宾语后，可以直接用SimpleNLG的API构建句子，不用手动拼接：

示例代码

import simplenlg.framework.*;
import simplenlg.lexicon.*;
import simplenlg.realiser.english.*;
import simplenlg.phrasespec.*;

public class SimpleNLGIntegration {
    public static void main(String[] args) {
        // 假设已经从CoreNLP拿到了这些成分
        String subjectStr = "I";
        String verbStr = "use";
        String detStr = "a";
        String objectStr = "parser";

        // 初始化SimpleNLG组件
        Lexicon lexicon = Lexicon.getDefaultLexicon();
        NLGFactory factory = new NLGFactory(lexicon);
        Realiser realiser = new Realiser(lexicon);

        // 构建句子结构
        SPhraseSpec sentence = factory.createClause();
        sentence.setSubject(subjectStr);
        sentence.setVerb(verbStr);

        // 构建带限定词的宾语短语
        NPhraseSpec objectPhrase = factory.createNounPhrase(detStr, objectStr);
        sentence.setObject(objectPhrase);

        // 生成并输出句子
        String generatedSentence = realiser.realiseSentence(sentence);
        System.out.println(generatedSentence); // 输出: "I use a parser."
    }
}

关于规则的补充说明

完全不用规则是不现实的——比如遇到被动句（parser is used by me），需要处理nsubjpass（被动主语）和agent（施动者）关系；遇到复合主语、并列宾语等特殊结构，也需要少量逻辑处理。但这种基于结构化依存关系的规则，比直接判断POS标签（比如NN、VBZ）要可靠得多，维护成本也低很多。

如果不想用斯坦福CoreNLP，Apache OpenNLP也提供了类似的依存解析API，核心思路都是获取结构化的依存关系对象，而非字符串输出，避免自己造轮子解析格式。

内容的提问来源于stack exchange，提问作者pritul panchal