You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

PDFBox 3.0.2:无法正确为图片添加可访问性标签且结构树不显示的问题咨询

PDFBox 3.0.2:无法正确为图片添加可访问性标签且结构树不显示的问题咨询

我正尝试创建一个可访问性更优的PDF文档,不需要满足任何官方标准认证,但希望能给图片添加替代文本。目前我能正确标记文本内容,但用类似的流程处理图片时,始终无法完成正确标记。我参考了Stack Overflow上的相关帖子,写出了当前的代码。

以下是我的代码:

public static void main(String[] args) throws IOException {
        int mcidCounter = 0;
        int structParentCounter = 0;
        PDDocument document = new PDDocument();
        PDPage page = new PDPage(PDRectangle.A4);
        document.addPage(page);

        page.setStructParents(structParentCounter);

        PDPageContentStream contentStream = null;
        try {
            contentStream = new PDPageContentStream(document, page);
        } catch (IOException e) {
            throw new RuntimeException(e);
        }

        PDImageXObject pdImage = PDImageXObject.createFromFile("image_file", document);

        PDDocumentCatalog catalog = document.getDocumentCatalog();
        PDStructureTreeRoot structureTreeRoot = new PDStructureTreeRoot();
        catalog.setStructureTreeRoot(structureTreeRoot);

        PDViewerPreferences prefs = new PDViewerPreferences(new COSDictionary());
        prefs.setDisplayDocTitle(true);
        catalog.setViewerPreferences(prefs);

        PDMarkInfo markInfo = new PDMarkInfo();
        markInfo.setMarked(true);
        catalog.setMarkInfo(markInfo);

        PDStructureElement documentElement = new PDStructureElement(StandardStructureTypes.DOCUMENT, structureTreeRoot);
        structureTreeRoot.appendKid(documentElement);

        PDStructureElement paragraphElement = new PDStructureElement(StandardStructureTypes.P, documentElement);
        paragraphElement.setPage(page);
        documentElement.appendKid(paragraphElement);

        COSDictionary markedContentDictionary = new COSDictionary();
        markedContentDictionary.setInt(COSName.MCID, mcidCounter);

        PDMarkedContentReference mcr = new PDMarkedContentReference();
        mcr.setMCID(mcidCounter);
        paragraphElement.appendKid(mcr);

        contentStream.beginMarkedContent(COSName.P, PDPropertyList.create(markedContentDictionary));
        contentStream.setFont(new PDType1Font(Standard14Fonts.FontName.HELVETICA_BOLD), 12);
        contentStream.beginText();
        contentStream.newLineAtOffset(50, 700);
        contentStream.showText("Document Title");
        contentStream.endText();
        contentStream.endMarkedContent();

        PDStructureElement figureElement = new PDStructureElement(StandardStructureTypes.Figure, documentElement);
        figureElement.setPage(page);
        figureElement.setAlternateDescription("Alternate Image Description");
        documentElement.appendKid(figureElement);

        COSDictionary markedContentDictionary3 = new COSDictionary();
        markedContentDictionary3.setInt(COSName.MCID, mcidCounter + 2);
        markedContentDictionary3.setString(COSName.ALT, "Alternate Image Description");

        PDMarkedContentReference mcr3 = new PDMarkedContentReference();
        mcr3.setMCID(mcidCounter + 2);
        figureElement.appendKid(mcr3);

        contentStream.beginMarkedContent(COSName.IMAGE, PDPropertyList.create(markedContentDictionary3));
        contentStream.drawImage(pdImage, 50, 0);
        contentStream.endMarkedContent();

        contentStream.close();

        COSDictionary parentTreeRoot = new COSDictionary();
        PDNumberTreeNode parentTree = new PDNumberTreeNode(parentTreeRoot, COSBase.class);

        Map<Integer, COSObjectable> parentTreeMap = new HashMap<>();
        parentTreeMap.put(structParentCounter, paragraphElement);
        parentTree.setNumbers(parentTreeMap);
        structureTreeRoot.setParentTree(parentTree);

        ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
        document.save(outputStream);

        byte[] pdfBytes = outputStream.toByteArray();
        document.close();

        Path actualPath = Path.of("test.pdf");
        Files.write(actualPath, pdfBytes, StandardOpenOption.CREATE, StandardOpenOption.WRITE);
    }

这是PDF的内部结构截图:

PDF内部结构

我明明已经把结构树根节点关联到了文档目录,但就是看不到结构树。请问是我的代码有遗漏,还是我对预期要显示的内容理解有误?


备注:内容来源于Stack Exchange,提问作者MoonLock68

火山引擎 最新活动