Spring Data Elasticsearch实现asciifolding过滤器，支持çözüm/cozum搜索匹配

阿华AIGC实验室

2026-5-9

Hey there! Let's figure out how to make both çözüm and cozum return the same Company document in Spring Data Elasticsearch. The asciifolding filter is exactly what we need here—it converts accented characters to their plain ASCII equivalents, so the two terms get treated as identical. Here's a step-by-step guide to set this up:

Step 1: Update the Company Entity with a Custom Analyzer

First, we need to define a custom analyzer that uses the asciifolding filter, then apply it to the name field. This ensures both indexing and searching use the same logic to process text.

Modify your Company class like this:

@Document(indexName = "erp")
@Setting(settingPath = "elasticsearch/settings.json") // Points to our custom analyzer config
public class Company { 
    @Id 
    private String id; 

    // Apply our custom analyzer to the name field
    @Field(type = FieldType.Text, analyzer = "ascii_folding_analyzer", searchAnalyzer = "ascii_folding_analyzer")
    private String name; 

    private String description; 

    @Field(type = FieldType.Nested, includeInParent = true) 
    private List<Employee> employees; 

    // Getters and setters omitted for brevity
}

Step 2: Create the Custom Settings File

Create a settings.json file in src/main/resources/elasticsearch (make the folder if it doesn't exist). This file defines our analyzer with the filters we need:

{
  "analysis": {
    "analyzer": {
      "ascii_folding_analyzer": {
        "tokenizer": "standard",
        "filter": [
          "lowercase",
          "asciifolding"
        ]
      }
    }
  }
}

Let me break this down:

standard tokenizer: Splits text into standard, easy-to-search tokens (works great for most use cases)
lowercase filter: Ensures case insensitivity (so "Cozum" and "cozum" are a match)
asciifolding filter: Converts accented characters to their ASCII counterparts (ç → c, ö → o, ü → u, etc.)

Step 3: Recreate the Index (Critical!)

Elasticsearch doesn't let you change analyzer settings on an existing index, so you'll need to delete the old erp index first, then let Spring Data rebuild it with our new configuration.

You can do this via Elasticsearch Dev Tools:

DELETE /erp

Or programmatically if you want it handled automatically on app startup:

@Autowired
private ElasticsearchOperations elasticsearchOperations;

@PostConstruct
public void initIndex() {
    IndexOperations indexOps = elasticsearchOperations.indexOps(Company.class);
    if (indexOps.exists()) {
        indexOps.delete();
    }
    indexOps.create();
    indexOps.putMapping();
}

Step 4: Test It Out

Now when you save a Company with name çözüm, the analyzer will convert it to cozum behind the scenes. When you search for either çözüm or cozum, the search query will also get converted to cozum, so it matches the indexed document perfectly.

For example, both of these search queries will return your target Company:

// Search with plain ASCII
SearchHits<Company> hits1 = elasticsearchOperations.search(
    Query.query(QueryBuilders.matchQuery("name", "cozum")),
    Company.class
);

// Search with accented characters
SearchHits<Company> hits2 = elasticsearchOperations.search(
    Query.query(QueryBuilders.matchQuery("name", "çözüm")),
    Company.class
);

That's it! Your search will now work seamlessly with both accented and plain versions of the term.

内容的提问来源于stack exchange，提问作者enesoral