关于Stanford Stanza多句文档依存解析输出代码的合理性咨询
Great work—your code is totally valid and achieves exactly what you want: adding sentence numbers to each line of the dependency parse output for multi-sentence documents. Let’s confirm why it works, then look at some optional tweaks to make it even cleaner.
Why Your Current Code Works
Let’s break down the key parts that make your code effective:
- You use
enumerate(doc.sentences)to track each sentence’s position, then add 1 to get a human-readable sentence number (since Python starts counting at 0). - For every word in the sentence, you include the
sentence: {i+1}field in your formatted string, which correctly links each word to its parent sentence. - The logic for fetching the head word (
sentence.words[word.head-1].text if word.head > 0 else "root") is spot-on—it handles the root node case perfectly, just like the official Stanza example.
When you run your code, you’ll get output like this (truncated for brevity):
sentence: 1 id: 1 word: Chris head id: 2 head: teaches deprel: nsubj sentence: 1 id: 2 word: teaches head id: 0 head: root deprel: root ... sentence: 2 id: 1 word: He head id: 2 head: lives deprel: nsubj sentence: 2 id: 2 word: lives head id: 0 head: root deprel: root
This is exactly the behavior you’re aiming for.
Optional Refinements
If you want to make your code more maintainable or readable, here are a couple of suggestions:
1. Extract Reusable Logic into a Function
If you plan to generate this output multiple times, wrapping the line-formatting logic in a function will make future updates easier:
import stanza def format_dep_line(sentence_num, word, sentence): head_text = sentence.words[word.head-1].text if word.head > 0 else "root" return f'sentence: {sentence_num}\tid: {word.id}\tword: {word.text}\thead id: {word.head}\thead: {head_text}\tdeprel: {word.deprel}' nlp = stanza.Pipeline(lang='en', processors='tokenize,mwt,pos,lemma,depparse') doc = nlp("Chris Manning teaches at Stanford University. He lives in the Bay Area.") for i, sentence in enumerate(doc.sentences, start=1): # Use start=1 to skip i+1 later print(*[format_dep_line(i, word, sentence) for word in sentence.words], sep='\n')
Here, we use enumerate(..., start=1) to directly get a 1-based sentence number, eliminating the need for i+1.
2. Use Stanza's Built-in Print Utilities (For Quick Debugging)
If you just need to inspect parse results quickly, Stanza has a built-in print_dependencies() method for sentences:
for i, sentence in enumerate(doc.sentences, start=1): print(f"--- Sentence {i} ---") sentence.print_dependencies()
This outputs a simpler format, but it’s perfect for quick checks without building your own custom string.
Final Verdict
Your original code is completely functional and correct. The refinements above are just optional ways to make it more scalable or readable—stick with what works best for your needs!
内容的提问来源于stack exchange,提问作者mcr




