如何用正则匹配Markdown标题并排除代码块注释（TypeScript场景）

阿华AIGC实验室

2026-5-13

Nice catch! That regex issue is a classic gotcha when parsing Markdown—code blocks (whether fenced or indented) will easily trick simple line-based patterns into false positives. Let's go through your options to fix this:

方案一：预处理排除代码块，再用正则检测

The core idea here is to strip out all code block content first, then run your original title-check regex on the cleaned content. This avoids any accidental matches inside code blocks.

Here's how you can implement this in TypeScript:

function hasMarkdownTitle(markdownContent: string): boolean {
  // 移除围栏式代码块（匹配 ```...``` 或 ~~~...~~~ 包裹的内容）
  const withoutFencedCode = markdownContent.replace(/```[\s\S]*?```|~~~[\s\S]*?~~~/g, '');
  // 移除缩进式代码块（匹配以4个空格或制表符开头的整行）
  const withoutIndentedCode = withoutFencedCode.replace(/^( {4}|\t).*$/gm, '');
  // 检测是否存在行首带可选空格的 #（标题）
  return /^\s*#/m.test(withoutIndentedCode);
}

Pros & Cons

Pros: No external dependencies, quick to implement for most common Markdown cases.
Cons: Might miss edge cases like nested code blocks (though standard Markdown doesn't support nested fenced blocks) or unusual code formatting. For simple use cases, this works great, but it's not bulletproof.

方案二：使用专业Markdown解析库（推荐）

If you want a 100% reliable solution that handles all Markdown edge cases (like code blocks inside quotes, nested elements, etc.), using a dedicated parser is the way to go. Libraries like remark or marked parse Markdown into an Abstract Syntax Tree (AST), letting you easily inspect the structure without regex guesswork.

Here's an example with remark:

First, install the dependencies:

npm install remark remark-parse unist-util-visit

Then the TypeScript code:

import { remark } from 'remark';
import remarkParse from 'remark-parse';
import { visit } from 'unist-util-visit';

async function hasMarkdownTitle(markdownContent: string): Promise<boolean> {
  // Parse the Markdown into an AST
  const ast = await remark().use(remarkParse).parse(markdownContent);
  
  let hasTitle = false;
  // Traverse the AST to find any heading node (h1 to h6)
  visit(ast, 'heading', () => {
    hasTitle = true;
    return false; // Stop traversing as soon as we find the first heading
  });
  
  return hasTitle;
}

Pros & Cons

Pros: Completely accurate, handles all valid Markdown syntax, and is easier to maintain if you need to add more Markdown-related logic later.
Cons: Adds a few dependencies to your project, but this is a negligible tradeoff for production-grade code.

Final Note

Trying to handle all Markdown edge cases with a single regex is almost impossible—Markdown is context-dependent, and regex isn't designed to parse structured languages. For quick scripts, the preprocessing method works, but for any serious application, using a parser library is the smarter choice.

内容的提问来源于stack exchange，提问作者enisz