如何解决含越南语Unicode的平面文件批量导入SQL时的乱码问题
Hey there, let's figure out why your Vietnamese text PM vuông is turning into PM vuĂ´ng during the import—this is almost always an encoding mismatch issue between your flat file and how SQL is parsing it. Here's a step-by-step breakdown of how to fix it:
1. First, confirm your flat file's actual encoding
The root cause here is that SQL is interpreting your file's bytes using the wrong character set. To check what encoding your file uses:
- Open it in Notepad++ and look at the bottom-right corner—it'll show something like
UTF-8,Windows-1258(the standard Vietnamese non-Unicode encoding), orISO-8859-1. - For Vietnamese text,
Windows-1258orUTF-8are the most common correct encodings.
2. Update your XML format file to specify the correct code page
Your XML format file needs to explicitly tell SQL what encoding the flat file uses. Add or adjust the CODEPAGE attribute in the <FIELD> element to match your file's encoding:
- For Windows-1258:
CODEPAGE="1258" - For UTF-8:
CODEPAGE="65001"
Here's an example of a corrected XML format file:
<?xml version="1.0"?> <BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <RECORD> <FIELD ID="1" xsi:type="CharTerm" TERMINATOR="\r\n" MAX_LENGTH="4000" CODEPAGE="1258"/> </RECORD> <ROW> <COLUMN SOURCE="1" NAME="YourTargetColumn" xsi:type="SQLNVARCHAR"/> </ROW> </BCPFORMAT>
Make sure the <COLUMN> uses SQLNVARCHAR to match your target table's nvarchar(4000) type—this ensures SQL stores the text as Unicode, which supports Vietnamese characters perfectly.
3. Specify the code page in your bulk import command
If you're using BULK INSERT or OPENROWSET, you need to mirror the code page setting there too. For example:
BULK INSERT YourTargetTable FROM 'C:\Path\To\Your\FlatFile.txt' WITH ( FORMATFILE = 'C:\Path\To\Your\FormatFile.xml', CODEPAGE = '1258', -- Use '65001' if your file is UTF-8 DATAFILETYPE = 'char' -- Use 'widechar' only if your file is Unicode (UTF-16) );
The DATAFILETYPE parameter matters: use char for non-Unicode encodings like Windows-1258, and widechar only if your flat file is saved as UTF-16 (Unicode).
4. Why changing collation didn't fix it
Just to clarify: collations like SQL_Latin1_General_CP1_CI_AS or Vietnamese_CI_AS control how SQL sorts and compares text, not how it stores or parses incoming data. Since your target column is nvarchar (Unicode), the collation doesn't affect the actual character storage—your problem was always about incorrect encoding parsing during import, not the table's collation settings.
5. Fix the file encoding if needed
If you discover your flat file is saved in the wrong encoding (e.g., UTF-8 was incorrectly saved as Latin-1), use Notepad++ to convert it:
- Open the file, go to Encoding > Convert to UTF-8 (or Convert to Windows-1258), then save it before re-running the import.
Once you align the file's encoding with the settings in your XML format file and import command, the Vietnamese characters should import correctly as PM vuông.
内容的提问来源于stack exchange,提问作者Trần Thế Chương




