PDFBox 3.0.2:无法正确为图片添加可访问性标签且结构树不显示的问题咨询
PDFBox 3.0.2:无法正确为图片添加可访问性标签且结构树不显示的问题咨询
我正尝试创建一个可访问性更优的PDF文档,不需要满足任何官方标准认证,但希望能给图片添加替代文本。目前我能正确标记文本内容,但用类似的流程处理图片时,始终无法完成正确标记。我参考了Stack Overflow上的相关帖子,写出了当前的代码。
以下是我的代码:
public static void main(String[] args) throws IOException { int mcidCounter = 0; int structParentCounter = 0; PDDocument document = new PDDocument(); PDPage page = new PDPage(PDRectangle.A4); document.addPage(page); page.setStructParents(structParentCounter); PDPageContentStream contentStream = null; try { contentStream = new PDPageContentStream(document, page); } catch (IOException e) { throw new RuntimeException(e); } PDImageXObject pdImage = PDImageXObject.createFromFile("image_file", document); PDDocumentCatalog catalog = document.getDocumentCatalog(); PDStructureTreeRoot structureTreeRoot = new PDStructureTreeRoot(); catalog.setStructureTreeRoot(structureTreeRoot); PDViewerPreferences prefs = new PDViewerPreferences(new COSDictionary()); prefs.setDisplayDocTitle(true); catalog.setViewerPreferences(prefs); PDMarkInfo markInfo = new PDMarkInfo(); markInfo.setMarked(true); catalog.setMarkInfo(markInfo); PDStructureElement documentElement = new PDStructureElement(StandardStructureTypes.DOCUMENT, structureTreeRoot); structureTreeRoot.appendKid(documentElement); PDStructureElement paragraphElement = new PDStructureElement(StandardStructureTypes.P, documentElement); paragraphElement.setPage(page); documentElement.appendKid(paragraphElement); COSDictionary markedContentDictionary = new COSDictionary(); markedContentDictionary.setInt(COSName.MCID, mcidCounter); PDMarkedContentReference mcr = new PDMarkedContentReference(); mcr.setMCID(mcidCounter); paragraphElement.appendKid(mcr); contentStream.beginMarkedContent(COSName.P, PDPropertyList.create(markedContentDictionary)); contentStream.setFont(new PDType1Font(Standard14Fonts.FontName.HELVETICA_BOLD), 12); contentStream.beginText(); contentStream.newLineAtOffset(50, 700); contentStream.showText("Document Title"); contentStream.endText(); contentStream.endMarkedContent(); PDStructureElement figureElement = new PDStructureElement(StandardStructureTypes.Figure, documentElement); figureElement.setPage(page); figureElement.setAlternateDescription("Alternate Image Description"); documentElement.appendKid(figureElement); COSDictionary markedContentDictionary3 = new COSDictionary(); markedContentDictionary3.setInt(COSName.MCID, mcidCounter + 2); markedContentDictionary3.setString(COSName.ALT, "Alternate Image Description"); PDMarkedContentReference mcr3 = new PDMarkedContentReference(); mcr3.setMCID(mcidCounter + 2); figureElement.appendKid(mcr3); contentStream.beginMarkedContent(COSName.IMAGE, PDPropertyList.create(markedContentDictionary3)); contentStream.drawImage(pdImage, 50, 0); contentStream.endMarkedContent(); contentStream.close(); COSDictionary parentTreeRoot = new COSDictionary(); PDNumberTreeNode parentTree = new PDNumberTreeNode(parentTreeRoot, COSBase.class); Map<Integer, COSObjectable> parentTreeMap = new HashMap<>(); parentTreeMap.put(structParentCounter, paragraphElement); parentTree.setNumbers(parentTreeMap); structureTreeRoot.setParentTree(parentTree); ByteArrayOutputStream outputStream = new ByteArrayOutputStream(); document.save(outputStream); byte[] pdfBytes = outputStream.toByteArray(); document.close(); Path actualPath = Path.of("test.pdf"); Files.write(actualPath, pdfBytes, StandardOpenOption.CREATE, StandardOpenOption.WRITE); }
这是PDF的内部结构截图:

我明明已经把结构树根节点关联到了文档目录,但就是看不到结构树。请问是我的代码有遗漏,还是我对预期要显示的内容理解有误?
备注:内容来源于Stack Exchange,提问作者MoonLock68




