如何基于Puppeteer重放Chrome扩展记录的用户事件并生成HAR？

阿华AIGC实验室

2026-5-22

Hey there! Let's walk through how to implement replay for your recorded user actions and generate HAR files for each request during the process. I'll break this down into actionable steps based on your setup.

1. Core Overview of the Workflow

First, let's map out the big picture:

Fetch all recorded events for a specific recording_id from MongoDB, sorted by their sequence number to maintain execution order
Replay each action (click, input, etc.) in a controlled browser environment
Capture all network requests during replay and generate HAR files (either per-request or a full session HAR)

2. Step 1: Retrieve & Organize Events from MongoDB

First, we need to pull the events tied to a recording and sort them correctly. Here's a quick Node.js example using the MongoDB driver:

const { MongoClient } = require('mongodb');

async function getSortedRecordingEvents(recordingId) {
  const client = await MongoClient.connect('your-mongodb-connection-string');
  const db = client.db('your-database-name');
  
  // Fetch events sorted by sequence to ensure correct execution order
  const events = await db.collection('your-events-collection')
    .find({ recording_id: recordingId })
    .sort({ sequence: 1 })
    .toArray();
  
  await client.close();
  return events;
}

// Don't forget to fetch the starting URL tied to the recording
async function getRecordingStartUrl(recordingId) {
  const client = await MongoClient.connect('your-mongodb-connection-string');
  const db = client.db('your-database-name');
  const recording = await db.collection('your-recordings-collection').findOne({ _id: recordingId });
  await client.close();
  return recording.start_url;
}

For reliable browser automation and network capture, I recommend using Playwright or Puppeteer—they’re built for this exact use case and handle edge cases like element loading delays out of the box. Let’s use Playwright for this example:

3.1 Initialize Browser & Load Starting URL

const { chromium } = require('playwright');

async function replayRecording(recordingId) {
  // Fetch events and start URL in parallel
  const [events, startUrl] = await Promise.all([
    getSortedRecordingEvents(recordingId),
    getRecordingStartUrl(recordingId)
  ]);

  // Launch browser (set headless: true for production)
  const browser = await chromium.launch({ headless: false });
  // Enable HAR recording at the context level (we'll tweak this later for per-request HARs)
  const context = await browser.newContext({
    recordHar: {
      path: `full-session-${recordingId}.har`, // Full session HAR
      omitContent: false // Set to true if you don't need request/response bodies
    }
  });
  const page = await context.newPage();

  // Load the starting URL and wait for network to settle
  await page.goto(startUrl, { waitUntil: 'networkidle' });

3.2 Execute Each Recorded Event

Loop through the sorted events and run the corresponding action:

for (const event of events) {
    switch (event.command) {
      case 'click':
        // Wait for the target element to exist before clicking to avoid errors
        await page.waitForSelector(event.target);
        await page.click(event.target);
        break;
      case 'input':
        await page.waitForSelector(event.target);
        // Make sure your recorded input events include a `value` field with the typed text
        await page.fill(event.target, event.value);
        break;
      // Add cases for other commands (e.g., hover, submit) as needed
      default:
        console.warn(`Skipping unsupported command: ${event.command}`);
    }
    // Add a small delay to mimic real user pacing
    await page.waitForTimeout(500);
  }

  // Cleanup
  await browser.close();
}

4. Step 3: Generate Per-Request HAR Files

If you need a separate HAR file for each individual request (instead of a full session), you can manually capture request/response details and write them to files:

// Add this inside the replayRecording function, right after creating the page
const fs = require('fs');
let requestCounter = 0;

page.on('response', async (response) => {
  const request = response.request();
  // Build a HAR entry for this request
  const harEntry = {
    startedDateTime: request.startTime().toISOString(),
    time: response.responseTime(),
    request: {
      method: request.method(),
      url: request.url(),
      headers: request.headers(),
      queryString: request.url().includes('?') ? new URL(request.url()).searchParams : []
    },
    response: {
      status: response.status(),
      statusText: response.statusText(),
      headers: response.headers(),
      content: {
        size: (await response.body()).length,
        mimeType: response.headers()['content-type'] || ''
      }
    },
    timings: {
      wait: response.responseTime(),
      blocked: -1,
      dns: -1,
      connect: -1,
      send: 0,
      receive: 0,
      ssl: -1
    }
  };

  // Wrap the entry in a valid HAR structure
  const harFileContent = {
    log: {
      version: '1.2',
      creator: { name: 'Action Replay Tool', version: '1.0' },
      entries: [harEntry]
    }
  };

  // Write to a unique file
  fs.writeFileSync(`request-${++requestCounter}-${Date.now()}.har`, JSON.stringify(harFileContent, null, 2));
});

5. Key Notes for Reliability

Element Stability: CSS selectors like button.btn-sm can break if the page structure changes. Consider adding data-testid attributes to key elements during recording for more reliable targeting.
Wait Strategies: Always use waitForSelector or waitForNetworkIdle before actions—never assume elements are immediately available.
Input Event Data: Double-check that your recorded input events store the value of what was typed (your example cut off, but this is critical for replay).
HAR Content: If you don’t need request/response bodies, set omitContent: true to reduce file size and improve performance.

内容的提问来源于stack exchange，提问作者Atul Singh