SSIS脚本任务:使用C#转换VARCHAR为NVARCHAR的问题求助
Let's break down the issues with your current approach and fix them step by step:
Core Problems
Missing Unicode Prefix in SQL Queries: When inserting into
NVARCHARcolumns, SQL Server requires string values to be prefixed withN'...'to recognize them as Unicode. Without this, even valid Unicode strings from C# get converted toVARCHARduring insertion, causing non-ASCII characters (like Chinese) to turn into?????.Incorrect Encoding Handling: Your manual import uses code page 65001 (UTF-8), which means your TSV files are UTF-8 encoded. However, your code uses
Encoding.Unicode(which maps to UTF-16 in .NET) to read/convert the data—this mismatch garbles the text before it even reaches the database.Unsafe String Concatenation: Building SQL queries by concatenating strings is error-prone (even with quote replacement) and exposes you to SQL injection risks. Parameterized queries are the safer, more reliable alternative.
Fixed Implementation
Here's how to adjust your script task to handle Unicode correctly:
Step 1: Read the TSV File with the Correct Encoding
Use StreamReader with Encoding.UTF8 to read your files (confirmed by your manual import settings):
string filePath = "path/to/your/template.tsv"; // Replace with your actual file path using (StreamReader reader = new StreamReader(filePath, Encoding.UTF8)) { string line; // Skip header line if your TSV has one // reader.ReadLine(); while ((line = reader.ReadLine()) != null) { ProcessLine(line, SchemaName, TableName, ColumnList, myADONETConnection); } }
Step 2: Split TSV Line into Columns
Split the tab-separated line into individual values (note: this assumes standard TSV without quoted fields containing tabs; adjust if your files have complex formatting):
string[] columnValues = line.Split('\t');
Step 3: Use Parameterized Queries for Insertion
Parameterized queries automatically handle Unicode and avoid quote issues. Here's the ProcessLine method:
private void ProcessLine(string line, string schemaName, string tableName, string columnList, SqlConnection connection) { string[] columnNames = columnList.Split(new[] { ',', ' ' }, StringSplitOptions.RemoveEmptyEntries); string[] columnValues = line.Split('\t'); // Build parameterized query with placeholders string paramPlaceholders = string.Join(", ", columnNames.Select(col => $"@{col}")); string query = $"INSERT INTO {schemaName}.[{tableName}] ({columnList}) VALUES ({paramPlaceholders})"; using (SqlCommand cmd = new SqlCommand(query, connection)) { // Add parameters with Unicode type to match NVARCHAR(MAX) for (int i = 0; i < columnNames.Length; i++) { // -1 corresponds to MAX length for NVARCHAR cmd.Parameters.Add($"@{columnNames[i]}", SqlDbType.NVarChar, -1).Value = columnValues[i]; } cmd.ExecuteNonQuery(); } }
Key Improvements Explained
- Correct Encoding: Reading with
Encoding.UTF8ensures we parse the TSV file's actual Unicode content correctly, eliminating the need for manual encoding conversion. - Parameterized Queries: By using
SqlDbType.NVarChar, we tell SQL Server to treat the values as Unicode—this removes the need for theN'prefix entirely, as the parameter handles Unicode transmission automatically. - No Quote Replacement: Parameters safely handle special characters like single quotes, so you don't need
line.Replace("'", "''")anymore.
Verify the Encoding
If you're unsure about your file's encoding, use a tool like Notepad++ (check the "Encoding" menu). If your files are actually UTF-16 (labeled as "Unicode" in Notepad++), replace Encoding.UTF8 with Encoding.Unicode in the StreamReader.
内容的提问来源于stack exchange,提问作者Craig




