Spring Data Elasticsearch实现asciifolding过滤器,支持çözüm/cozum搜索匹配
Hey there! Let's figure out how to make both çözüm and cozum return the same Company document in Spring Data Elasticsearch. The asciifolding filter is exactly what we need here—it converts accented characters to their plain ASCII equivalents, so the two terms get treated as identical. Here's a step-by-step guide to set this up:
First, we need to define a custom analyzer that uses the asciifolding filter, then apply it to the name field. This ensures both indexing and searching use the same logic to process text.
Modify your Company class like this:
@Document(indexName = "erp") @Setting(settingPath = "elasticsearch/settings.json") // Points to our custom analyzer config public class Company { @Id private String id; // Apply our custom analyzer to the name field @Field(type = FieldType.Text, analyzer = "ascii_folding_analyzer", searchAnalyzer = "ascii_folding_analyzer") private String name; private String description; @Field(type = FieldType.Nested, includeInParent = true) private List<Employee> employees; // Getters and setters omitted for brevity }
Create a settings.json file in src/main/resources/elasticsearch (make the folder if it doesn't exist). This file defines our analyzer with the filters we need:
{ "analysis": { "analyzer": { "ascii_folding_analyzer": { "tokenizer": "standard", "filter": [ "lowercase", "asciifolding" ] } } } }
Let me break this down:
standardtokenizer: Splits text into standard, easy-to-search tokens (works great for most use cases)lowercasefilter: Ensures case insensitivity (so "Cozum" and "cozum" are a match)asciifoldingfilter: Converts accented characters to their ASCII counterparts (ç → c, ö → o, ü → u, etc.)
Elasticsearch doesn't let you change analyzer settings on an existing index, so you'll need to delete the old erp index first, then let Spring Data rebuild it with our new configuration.
You can do this via Elasticsearch Dev Tools:
DELETE /erp
Or programmatically if you want it handled automatically on app startup:
@Autowired private ElasticsearchOperations elasticsearchOperations; @PostConstruct public void initIndex() { IndexOperations indexOps = elasticsearchOperations.indexOps(Company.class); if (indexOps.exists()) { indexOps.delete(); } indexOps.create(); indexOps.putMapping(); }
Now when you save a Company with name çözüm, the analyzer will convert it to cozum behind the scenes. When you search for either çözüm or cozum, the search query will also get converted to cozum, so it matches the indexed document perfectly.
For example, both of these search queries will return your target Company:
// Search with plain ASCII SearchHits<Company> hits1 = elasticsearchOperations.search( Query.query(QueryBuilders.matchQuery("name", "cozum")), Company.class ); // Search with accented characters SearchHits<Company> hits2 = elasticsearchOperations.search( Query.query(QueryBuilders.matchQuery("name", "çözüm")), Company.class );
That's it! Your search will now work seamlessly with both accented and plain versions of the term.
内容的提问来源于stack exchange,提问作者enesoral




