You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

如何解码含阿拉伯字符的百分编码URL?VB6多语言URL解码求助

如何在VB6中解码包含阿拉伯字符的百分编码URL?

你的问题很典型——VB6默认的字符串处理基于ANSI编码,而URL里的阿拉伯字符是用UTF-8编码后再做百分转义的,这就是原函数只能处理英文的核心原因。原函数把每个%xx单独转成ANSI字符,但阿拉伯字符的UTF-8编码是多字节序列(比如一个阿拉伯字符对应2-3个%xx片段),单独解码每个字节只会得到乱码。

下面是具体的解决方法,核心思路是先把百分编码还原成UTF-8字节数组,再通过Windows API把UTF-8转换成VB6支持的Unicode字符串:

步骤1:声明必要的Windows API

首先在窗体的通用部分或者标准模块里声明MultiByteToWideChar函数,它负责把多字节编码(这里是UTF-8)转换成VB6内部的Unicode字符串:

Private Declare Function MultiByteToWideChar Lib "kernel32" ( _
    ByVal CodePage As Long, _
    ByVal dwFlags As Long, _
    ByRef lpMultiByteStr As Byte, _
    ByVal cchMultiByte As Long, _
    ByVal lpWideCharStr As Long, _
    ByVal cchWideChar As Long _
) As Long

步骤2:修改URL解码函数为UTF-8兼容版本

替换原来的URLDecode函数,改成这个支持UTF-8的版本:

Private Function URLDecodeUTF8(ByVal txt As String) As String
    Dim txt_len As Integer
    Dim i As Integer
    Dim ch As String
    Dim digits As String
    Dim utf8Bytes() As Byte
    Dim byteCount As Integer
    
    byteCount = 0
    ReDim utf8Bytes(0 To Len(txt)) ' 初始化足够的空间
    
    txt_len = Len(txt)
    i = 1
    Do While i <= txt_len
        ch = Mid$(txt, i, 1)
        If ch = "+" Then
            ' URL里的+代表空格,UTF-8中空格的字节是&H20
            utf8Bytes(byteCount) = &H20
            byteCount = byteCount + 1
        ElseIf ch <> "%" Then
            ' 普通ASCII字符直接转成字节
            utf8Bytes(byteCount) = Asc(ch)
            byteCount = byteCount + 1
        ElseIf i > txt_len - 2 Then
            ' 剩余字符不足两位,直接保留原字符的字节
            utf8Bytes(byteCount) = Asc(ch)
            byteCount = byteCount + 1
        Else
            ' 解析两位十六进制为UTF-8字节
            digits = Mid$(txt, i + 1, 2)
            utf8Bytes(byteCount) = CInt("&H" & digits)
            byteCount = byteCount + 1
            i = i + 2 ' 跳过已处理的两位十六进制字符
        End If
        i = i + 1
    Loop
    
    ' 调整数组到实际字节数
    If byteCount > 0 Then
        ReDim Preserve utf8Bytes(0 To byteCount - 1)
        ' 先获取转换后的Unicode字符串长度
        Dim unicodeLen As Long
        unicodeLen = MultiByteToWideChar(65001, 0, utf8Bytes(0), byteCount, 0, 0)
        If unicodeLen > 0 Then
            ' 生成对应长度的字符串并完成转换
            Dim result As String
            result = Space$(unicodeLen)
            MultiByteToWideChar 65001, 0, utf8Bytes(0), byteCount, StrPtr(result), unicodeLen
            URLDecodeUTF8 = result
        Else
            ' 转换失败时 fallback 到原ANSI解码
            URLDecodeUTF8 = URLDecodeANSI(txt)
        End If
    Else
        URLDecodeUTF8 = ""
    End If
End Function

' 保留原ANSI解码函数作为备用
Private Function URLDecodeANSI(ByVal txt As String) As String
    Dim txt_len As Integer
    Dim i As Integer
    Dim ch As String
    Dim digits As String
    Dim result As String
    result = ""
    txt_len = Len(txt)
    i = 1
    Do While i <= txt_len
        ch = Mid$(txt, i, 1)
        If ch = "+" Then
            result = result & " "
        ElseIf ch <> "%" Then
            result = result & ch
        ElseIf i > txt_len - 2 Then
            result = result & ch
        Else
            digits = Mid$(txt, i + 1, 2)
            result = result & Chr$(CInt("&H" & digits))
            i = i + 2
        End If
        i = i + 1
    Loop
    URLDecodeANSI = result
End Function

步骤3:测试代码

用你的测试用例验证:

Private Sub Form_Load()
    THE_ARABIC_URL = "%D8%AF%D8%B4%D9%85%D9%86%DB%8C+%D8%AF%D8%B1+%D8%A7%D8%B9%D9%85%D8%A7%D9%82-2019-12-09+01%3A09%3A00"
    MsgBox URLDecodeUTF8(THE_ARABIC_URL)
End Sub

运行后就能看到正确解码的阿拉伯文本了。

内容的提问来源于stack exchange,提问作者Mahdi Jazini

火山引擎 最新活动