You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

SSIS脚本任务:使用C#转换VARCHAR为NVARCHAR的问题求助

Solution for Unicode TSV Import into SQL Server NVARCHAR Columns

Let's break down the issues with your current approach and fix them step by step:

Core Problems

  1. Missing Unicode Prefix in SQL Queries: When inserting into NVARCHAR columns, SQL Server requires string values to be prefixed with N'...' to recognize them as Unicode. Without this, even valid Unicode strings from C# get converted to VARCHAR during insertion, causing non-ASCII characters (like Chinese) to turn into ?????.

  2. Incorrect Encoding Handling: Your manual import uses code page 65001 (UTF-8), which means your TSV files are UTF-8 encoded. However, your code uses Encoding.Unicode (which maps to UTF-16 in .NET) to read/convert the data—this mismatch garbles the text before it even reaches the database.

  3. Unsafe String Concatenation: Building SQL queries by concatenating strings is error-prone (even with quote replacement) and exposes you to SQL injection risks. Parameterized queries are the safer, more reliable alternative.

Fixed Implementation

Here's how to adjust your script task to handle Unicode correctly:

Step 1: Read the TSV File with the Correct Encoding

Use StreamReader with Encoding.UTF8 to read your files (confirmed by your manual import settings):

string filePath = "path/to/your/template.tsv"; // Replace with your actual file path
using (StreamReader reader = new StreamReader(filePath, Encoding.UTF8))
{
    string line;
    // Skip header line if your TSV has one
    // reader.ReadLine();
    
    while ((line = reader.ReadLine()) != null)
    {
        ProcessLine(line, SchemaName, TableName, ColumnList, myADONETConnection);
    }
}

Step 2: Split TSV Line into Columns

Split the tab-separated line into individual values (note: this assumes standard TSV without quoted fields containing tabs; adjust if your files have complex formatting):

string[] columnValues = line.Split('\t');

Step 3: Use Parameterized Queries for Insertion

Parameterized queries automatically handle Unicode and avoid quote issues. Here's the ProcessLine method:

private void ProcessLine(string line, string schemaName, string tableName, string columnList, SqlConnection connection)
{
    string[] columnNames = columnList.Split(new[] { ',', ' ' }, StringSplitOptions.RemoveEmptyEntries);
    string[] columnValues = line.Split('\t');

    // Build parameterized query with placeholders
    string paramPlaceholders = string.Join(", ", columnNames.Select(col => $"@{col}"));
    string query = $"INSERT INTO {schemaName}.[{tableName}] ({columnList}) VALUES ({paramPlaceholders})";

    using (SqlCommand cmd = new SqlCommand(query, connection))
    {
        // Add parameters with Unicode type to match NVARCHAR(MAX)
        for (int i = 0; i < columnNames.Length; i++)
        {
            // -1 corresponds to MAX length for NVARCHAR
            cmd.Parameters.Add($"@{columnNames[i]}", SqlDbType.NVarChar, -1).Value = columnValues[i];
        }

        cmd.ExecuteNonQuery();
    }
}

Key Improvements Explained

  • Correct Encoding: Reading with Encoding.UTF8 ensures we parse the TSV file's actual Unicode content correctly, eliminating the need for manual encoding conversion.
  • Parameterized Queries: By using SqlDbType.NVarChar, we tell SQL Server to treat the values as Unicode—this removes the need for the N' prefix entirely, as the parameter handles Unicode transmission automatically.
  • No Quote Replacement: Parameters safely handle special characters like single quotes, so you don't need line.Replace("'", "''") anymore.

Verify the Encoding

If you're unsure about your file's encoding, use a tool like Notepad++ (check the "Encoding" menu). If your files are actually UTF-16 (labeled as "Unicode" in Notepad++), replace Encoding.UTF8 with Encoding.Unicode in the StreamReader.

内容的提问来源于stack exchange,提问作者Craig

火山引擎 最新活动