如何使用robots.txt屏蔽特定URL?含阻止Google抓取及Disallow标签配置
Hey there! If you want to stop Google (and other search engines) from crawling specific parts of your site, robots.txt is the right tool for the job. Let's walk through exactly how to use the Disallow directive to block those URLs, with concrete examples for common scenarios.
Your robots.txt file lives in the root directory of your website (e.g., https://yourdomain.com/robots.txt). It uses directives to communicate with web crawlers, and Disallow is the key one for blocking access.
You start by specifying which crawler the rule applies to with User-agent:
- Use
User-agent: Googlebotto target only Google's crawler. - Use
User-agent: *to target all search engine crawlers.
Let's cover the most frequent use cases with clear code examples:
Block a single specific URL
If you want to block a single page like https://yourdomain.com/private-dashboard.html, add this to your robots.txt:
User-agent: Googlebot Disallow: /private-dashboard.html
The path after Disallow is the relative URL from your site root.
Block an entire directory
To block every page inside a directory (e.g., /blog/unpublished-drafts/ and all its subpages), use:
User-agent: * Disallow: /blog/unpublished-drafts/
The trailing slash ensures you're targeting the entire directory, not just a page with a similar name.
Block URLs matching a pattern (with wildcards)
Google supports wildcards for more flexible blocking. Here are useful patterns:
- Block all files with a specific extension (e.g., PDFs):
TheUser-agent: Googlebot Disallow: /*.pdf$$matches the end of the URL, so only files ending in.pdfare blocked. - Block all URLs containing a specific string (e.g.,
/old-archive/anywhere in the path):
TheUser-agent: * Disallow: /*old-archive**acts as a wildcard for any characters before or after the target string. - Block all URLs with query parameters (e.g., any URL with
?tracking=):User-agent: Googlebot Disallow: /*?tracking=
robots.txtis advisory: Legitimate crawlers like Google will follow it, but malicious bots might ignore it. For truly private content, use password protection or server-side restrictions.- If you already have content indexed that you want removed,
Disallowalone won't delete it from search results. You'll need to add anoindexmeta tag to those pages or use Google's site management tools to request removal. - Always test your robots.txt rules: Use Google's built-in testing tool in its site console to verify that your blocked URLs are correctly restricted.
内容的提问来源于stack exchange,提问作者Mohan Prajapati




