You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

VB.NET WebClient如何模拟真实浏览器访问?

让VB.NET WebClient(或HttpClient)更接近真实浏览器的解决方法

我之前也碰到过一模一样的问题——很多现代站点会通过细致的请求特征识别非浏览器客户端,仅仅加个User-Agent根本不够。下面是几个亲测有效的方法,能让你的请求行为更贴近真实用户:

1. 填充完整的请求头,模拟浏览器请求特征

真实浏览器发送的请求头包含很多细节,你需要把这些字段都补上,比如Accept、Accept-Language、Accept-Encoding、Referer、Connection等。这里给你一个封装好的带Cookie支持的WebClient子类:

Imports System.Net

Public Class BrowserLikeWebClient
    Inherits WebClient

    Private _cookieContainer As New CookieContainer()

    Protected Overrides Function GetWebRequest(address As Uri) As WebRequest
        Dim request = MyBase.GetWebRequest(address)
        If TypeOf request Is HttpWebRequest Then
            Dim httpRequest = DirectCast(request, HttpWebRequest)
            ' 模拟Chrome浏览器的请求头(可替换成你常用浏览器的真实头信息)
            httpRequest.UserAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36"
            httpRequest.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9"
            httpRequest.AcceptLanguage = "zh-CN,zh;q=0.9,en;q=0.8"
            httpRequest.AcceptEncoding = "gzip, deflate, br"
            httpRequest.Referer = address.GetLeftPart(UriPartial.Authority) ' 引用设为站点根域名
            httpRequest.Connection = "keep-alive"
            httpRequest.AllowAutoRedirect = True
            httpRequest.AutomaticDecompression = DecompressionMethods.GZip Or DecompressionMethods.Deflate
            httpRequest.CookieContainer = _cookieContainer
            ' 强制使用现代TLS版本,适配大多数站点
            ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12 Or SecurityProtocolType.Tls13
        End If
        Return request
    End Function
End Class

使用方式很简单,直接实例化这个子类即可:

Using client As New BrowserLikeWebClient()
    Try
        Dim content As String = client.DownloadString("https://target-site.com/your-target-url")
        ' 在这里处理获取到的内容
    Catch ex As WebException
        ' 捕获异常时,务必查看响应细节,很多站点会返回具体拦截原因
        If ex.Response IsNot Nothing Then
            Dim response = DirectCast(ex.Response, HttpWebResponse)
            Console.WriteLine($"状态码: {response.StatusCode}, 描述: {response.StatusDescription}")
            Using reader As New StreamReader(response.GetResponseStream())
                Console.WriteLine("错误响应内容: " & reader.ReadToEnd())
            End Using
        End If
    End Try
End Using

2. 改用HttpClient(更推荐,灵活性更强)

WebClient是比较老旧的API,HttpClient在模拟浏览器行为上更灵活,能轻松配置默认请求头、处理复杂会话和重定向。这里是一个示例:

Imports System.Net.Http
Imports System.Net.Http.Headers

Module HttpClientBrowserSimulator
    Async Function GetBrowserCompliantContentAsync(url As String) As Task(Of String)
        ' 配置Handler处理Cookie和重定向
        Dim handler As New HttpClientHandler() With {
            .CookieContainer = New CookieContainer(),
            .AllowAutoRedirect = True,
            .AutomaticDecompression = DecompressionMethods.GZip Or DecompressionMethods.Deflate
        }

        ' 强制使用TLS1.2/1.3
        ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12 Or SecurityProtocolType.Tls13

        Using client As New HttpClient(handler)
            ' 添加默认请求头
            client.DefaultRequestHeaders.UserAgent.ParseAdd("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36")
            client.DefaultRequestHeaders.Accept.ParseAdd("text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9")
            client.DefaultRequestHeaders.AcceptLanguage.ParseAdd("zh-CN,zh;q=0.9,en;q=0.8")
            client.DefaultRequestHeaders.AcceptEncoding.ParseAdd("gzip, deflate, br")
            client.DefaultRequestHeaders.Referrer = New Uri(url.GetLeftPart(UriPartial.Authority))
            client.DefaultRequestHeaders.Connection.Add("keep-alive")

            ' 模拟真实用户:先访问站点首页获取初始会话Cookie
            Await client.GetAsync(url.GetLeftPart(UriPartial.Authority))

            ' 再请求目标URL
            Dim response = Await client.GetAsync(url)
            response.EnsureSuccessStatusCode() ' 状态码错误时会抛出异常
            Return Await response.Content.ReadAsStringAsync()
        End Using
    End Function
End Module

3. 模拟真实用户的访问流程

很多站点会验证请求上下文,比如你直接请求内部页面但没先访问首页获取会话Cookie,就会被拦截。所以先请求站点首页,拿到初始Cookie后再请求目标URL,完全模拟用户的浏览路径(上面的HttpClient示例已经包含了这一步)。

4. 查看错误响应的具体内容

遇到500错误时,别只盯着状态码看——很多站点会在错误响应里给出具体拦截原因(比如缺少某个请求头、会话无效等)。上面的WebClient示例已经包含了读取错误响应的代码,你可以根据返回的内容针对性调整请求配置。


内容的提问来源于stack exchange,提问作者Ed Jones

火山引擎 最新活动