求基于Base64编码图像获取文本X/Y坐标位置的实现方案
Solution: Get Text Coordinates from Clicked Base64 Image
Alright, let's solve this problem where you need to click on text in a Base64 image and retrieve its X/Y coordinates. The core challenge here is combining image rendering, click coordinate calculation, and OCR (Optical Character Recognition) to identify text regions and match clicks to them.
Core Approach
- Render the Base64 image on the page (we'll use an
<img>tag for simplicity). - Use Tesseract.js (a lightweight front-end OCR library) to scan the image and extract all text blocks along with their bounding box coordinates.
- Listen for click events on the image, convert the click position to coordinates relative to the original image size (accounting for any CSS scaling).
- Check which text block's bounding box contains the click position, then return that block's coordinates.
Complete Implementation Code
<!DOCTYPE html> <html> <head> <title>Base64 Image Text Click Coordinates</title> <style> #target-image { max-width: 800px; border: 1px solid #ddd; cursor: crosshair; } #result { margin-top: 20px; padding: 10px; background: #f5f5f5; border-radius: 4px; } </style> </head> <body> <img id="target-image" alt="Base64 Image" /> <div id="result">Click on text in the image to get coordinates...</div> <!-- Load Tesseract.js from CDN --> <script src="https://cdn.jsdelivr.net/npm/tesseract.js@5.0.2/dist/tesseract.min.js"></script> <script> // Replace this with your actual Base64 image string const base64Image = "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAATAAAAFCAYAAAC8bQeYAAAABmJLR0QA/wD/AP+gvaeTAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAB3RJTUUH5gMVESkqQzkXSwAAABl0RVh0Q29tbWVudABDcmVhdGVkIHdpdGggR0lNUFeBDhcAAAAUSURBVBgZBcHPcsAABAAAQ+zWF8wAAAAASUVORK5CYII="; const imageElement = document.getElementById('target-image'); const resultElement = document.getElementById('result'); let textBlocks = []; // Initialize image and OCR async function init() { imageElement.src = base64Image; // Wait for image to load await new Promise(resolve => imageElement.onload = resolve); // Run OCR to extract text blocks const { data: { blocks } } = await Tesseract.recognize( base64Image, 'eng', // Change to your language code if needed { logger: m => console.log(m) } // Optional: log OCR progress ); // Filter out non-text blocks and store bounding boxes textBlocks = blocks.filter(block => block.blockType === 1).map(block => ({ text: block.text, x: block.bbox.x0, y: block.bbox.y0, width: block.bbox.x1 - block.bbox.x0, height: block.bbox.y1 - block.bbox.y0 })); console.log('Text blocks detected:', textBlocks); } // Handle image click imageElement.addEventListener('click', (e) => { if (textBlocks.length === 0) { resultElement.textContent = "OCR not completed yet. Wait a moment..."; return; } // Calculate click position relative to original image size const rect = imageElement.getBoundingClientRect(); const scaleX = imageElement.naturalWidth / rect.width; const scaleY = imageElement.naturalHeight / rect.height; const clickX = (e.clientX - rect.left) * scaleX; const clickY = (e.clientY - rect.top) * scaleY; // Find which text block contains the click const clickedBlock = textBlocks.find(block => clickX >= block.x && clickX <= block.x + block.width && clickY >= block.y && clickY <= block.y + block.height ); if (clickedBlock) { resultElement.innerHTML = ` Clicked Text: <strong>${clickedBlock.text}</strong><br> Text Block Coordinates (Top-Left): X = ${Math.round(clickedBlock.x)}, Y = ${Math.round(clickedBlock.y)}<br> Click Position: X = ${Math.round(clickX)}, Y = ${Math.round(clickY)} `; } else { resultElement.textContent = "No text found at this click position."; } }); // Start initialization init(); </script> </body> </html>
Key Details Explained
- Base64 Image Handling: We directly set the
srcof the<img>tag to your Base64 string. No server-side processing needed here. - OCR with Tesseract.js: The library scans the image and returns text blocks with bounding boxes (
x0,y0= top-left corner;x1,y1= bottom-right corner). We filter these to only keep actual text blocks. - Coordinate Calculation: Since images might be scaled with CSS, we convert the click's
clientX/clientY(relative to the viewport) to coordinates relative to the original image size using thenaturalWidth/naturalHeightand the element's bounding rect. - Click Matching: We check if the click position falls within any text block's bounding box, then display the relevant coordinates and text.
Notes
- Make sure your Base64 image is clear and has high-contrast text for better OCR accuracy.
- You can change the language code in
Tesseract.recognize()(e.g.,'spa'for Spanish) if needed. - For large images, OCR might take a few seconds—you can add a loading indicator to improve UX.
内容的提问来源于stack exchange,提问作者Nitesh Lad




