如何去除字符串中的换行?tMap处理CSV字段问题求助
Hey there! Let's work through this tMap string processing issue you're facing. The key problems here are handling multi-line content (line breaks), conflicting logic for quotes and spaces, and ensuring consistent results for cases like "Other vc_7days". Here's a step-by-step fix:
1. First: Remove All Line Breaks
Multi-line cells are caused by \n (Unix), \r\n (Windows), or \r (old Mac) line breaks. Start by cleaning these out, plus trimming any extra whitespace at the start/end of the string:
// Clean line breaks and trim whitespace String cleaned = row4.Ad_Set_Name.replaceAll("\\r?\\n|\\r", "").trim();
This regex covers all common line break formats, so you won't miss any hidden newlines.
2. Optimize Quote & Space Logic (Avoid Conflicts)
Your original code has two separate checks that can conflict (e.g., a string with both quotes and spaces). Let's prioritize and combine the logic to ensure predictable results:
Option A: Use a Readable Nested Logic (for tMap Expression)
If you want to write this directly in tMap's expression field, use this nested approach (we'll handle quotes first, then spaces):
// Full expression for tMap ( // Step 1: Clean line breaks and trim row4.Ad_Set_Name.replaceAll("\\r?\\n|\\r", "").trim().contains("\"") ? // Step 2: Handle quotes (only if paired) (row4.Ad_Set_Name.replaceAll("\\r?\\n|\\r", "").trim().indexOf("\"") != row4.Ad_Set_Name.replaceAll("\\r?\\n|\\r", "").trim().lastIndexOf("\"") ? row4.Ad_Set_Name.replaceAll("\\r?\\n|\\r", "").trim().substring(row4.Ad_Set_Name.replaceAll("\\r?\\n|\\r", "").trim().indexOf("\"")+1, row4.Ad_Set_Name.replaceAll("\\r?\\n|\\r", "").trim().lastIndexOf("\"")) : "null") : row4.Ad_Set_Name.replaceAll("\\r?\\n|\\r", "").trim() ).contains(" ") ? // Step 3: Extract text before first space (after quote handling) ( row4.Ad_Set_Name.replaceAll("\\r?\\n|\\r", "").trim().contains("\"") ? (row4.Ad_Set_Name.replaceAll("\\r?\\n|\\r", "").trim().indexOf("\"") != row4.Ad_Set_Name.replaceAll("\\r?\\n|\\r", "").trim().lastIndexOf("\"") ? row4.Ad_Set_Name.replaceAll("\\r?\\n|\\r", "").trim().substring(row4.Ad_Set_Name.replaceAll("\\r?\\n|\\r", "").trim().indexOf("\"")+1, row4.Ad_Set_Name.replaceAll("\\r?\\n|\\r", "").trim().lastIndexOf("\"")) : "null") : row4.Ad_Set_Name.replaceAll("\\r?\\n|\\r", "").trim() ).substring(0, ( row4.Ad_Set_Name.replaceAll("\\r?\\n|\\r", "").trim().contains("\"") ? (row4.Ad_Set_Name.replaceAll("\\r?\\n|\\r", "").trim().indexOf("\"") != row4.Ad_Set_Name.replaceAll("\\r?\\n|\\r", "").trim().lastIndexOf("\"") ? row4.Ad_Set_Name.replaceAll("\\r?\\n|\\r", "").trim().substring(row4.Ad_Set_Name.replaceAll("\\r?\\n|\\r", "").trim().indexOf("\"")+1, row4.Ad_Set_Name.replaceAll("\\r?\\n|\\r", "").trim().lastIndexOf("\"")) : "null") : row4.Ad_Set_Name.replaceAll("\\r?\\n|\\r", "").trim() ).indexOf(" ")) : // Keep the cleaned string if no space exists ( row4.Ad_Set_Name.replaceAll("\\r?\\n|\\r", "").trim().contains("\"") ? (row4.Ad_Set_Name.replaceAll("\\r?\\n|\\r", "").trim().indexOf("\"") != row4.Ad_Set_Name.replaceAll("\\r?\\n|\\r", "").trim().lastIndexOf("\"") ? row4.Ad_Set_Name.replaceAll("\\r?\\n|\\r", "").trim().substring(row4.Ad_Set_Name.replaceAll("\\r?\\n|\\r", "").trim().indexOf("\"")+1, row4.Ad_Set_Name.replaceAll("\\r?\\n|\\r", "").trim().lastIndexOf("\"")) : "null") : row4.Ad_Set_Name.replaceAll("\\r?\\n|\\r", "").trim() )
Option B: Use a Custom Java Method (Better Readability & Maintainability)
For cleaner code (especially if you need to reuse this logic), create a custom utility class in Talend:
// Create this class in Talend's Code tab or as a separate Java file public class AdSetNameProcessor { public static String process(String input) { // Handle null input upfront if (input == null) return "null"; // Step 1: Remove line breaks and trim whitespace String temp = input.replaceAll("\\r?\\n|\\r", "").trim(); // Step 2: Process paired quotes if (temp.contains("\"")) { int firstQuote = temp.indexOf("\""); int lastQuote = temp.lastIndexOf("\""); // Only extract content if quotes are paired if (firstQuote != lastQuote) { temp = temp.substring(firstQuote + 1, lastQuote); } else { return "null"; } } // Step 3: Extract text before first space if (temp.contains(" ")) { temp = temp.substring(0, temp.indexOf(" ")); } // Return "null" if the final string is empty return temp.isEmpty() ? "null" : temp; } }
Then call this method directly in your tMap expression:
AdSetNameProcessor.process(row4.Ad_Set_Name)
3. Test with Your Example
For input "Other vc_7days":
- Line breaks are removed (none here), string becomes
"Other vc_7days" - Paired quotes are detected, extract content inside:
Other vc_7days - Space is detected, extract text before first space:
Other
If you want to keep the full quoted content instead of trimming at the space, just remove the space-handling step from the logic!
内容的提问来源于stack exchange,提问作者Mara M




