如何在Elasticsearch Java与GraphQL-Java构建的搜索API中实现聚合Include/Exclude同时支持正则表达式与精确值数组

阿华AIGC实验室

2026-5-6

这确实是GraphQL输入类型设计里一个挺常见的痛点——尤其是要兼容Elasticsearch这种原生支持多格式输入的服务时。我之前做类似的搜索API项目时也碰到过几乎一样的问题，给你几个可行的解决方案，你可以根据项目复杂度和团队习惯来选：

方案1：自定义输入类型，拆分两种输入场景

既然GraphQL不支持输入联合类型，那我们可以把两种输入形式拆成同一个输入类型里的可选字段，让用户明确选择要传正则还是精确值数组。这种方式最直观，也最容易维护。

步骤1：定义GraphQL输入类型

input IncludeExcludeInput {
  # 用于传入正则表达式
  regex: String
  # 用于传入精确值数组
  values: [String]
}

# 你的搜索主输入类型
input SearchQueryInput {
  include: IncludeExcludeInput
  exclude: IncludeExcludeInput
  # 其他搜索参数（关键词、分页等）...
}

步骤2：Java业务逻辑处理

在DataFetcher里直接判断哪个字段有值，然后转换成Elasticsearch需要的IncludeExclude对象：

public class SearchDataFetcher implements DataFetcher<SearchResult> {
    @Override
    public SearchResult get(DataFetchingEnvironment env) {
        IncludeExcludeInput includeInput = env.getArgument("include");
        IncludeExclude esInclude = null;

        // 处理include参数
        if (includeInput != null) {
            if (includeInput.getRegex() != null) {
                esInclude = IncludeExclude.include(includeInput.getRegex());
            } else if (includeInput.getValues() != null && !includeInput.getValues().isEmpty()) {
                esInclude = IncludeExclude.include(includeInput.getValues());
            }
        }

        // 同理处理exclude参数...
        
        // 构建Elasticsearch查询并执行
        SearchRequest searchRequest = buildSearchRequest(esInclude, ...);
        // ...后续逻辑
        return new SearchResult(...);
    }
}

优点：逻辑清晰，用户调用时不会混淆，出错概率低；代码实现简单，不需要额外的GraphQL扩展。
缺点：用户调用时需要多写一层字段（比如include: { regex: "P.*" }而不是直接include: "P.*"），和Elasticsearch原生格式略有差异。

方案2：自定义标量类型，兼容两种输入格式

如果想让用户调用体验和Elasticsearch原生API完全一致（直接传单个字符串或数组），可以自定义一个GraphQL标量类型，在解析层自动处理两种输入形式的转换。

步骤1：定义标量SDL

# 自定义标量，支持单个字符串（正则）或字符串数组（精确值）
scalar IncludeExclude

input SearchQueryInput {
  include: IncludeExclude
  exclude: IncludeExclude
  # 其他搜索参数...
}

步骤2：实现Java自定义标量

需要继承GraphQLScalarType，实现解析和序列化逻辑：

public class IncludeExcludeScalar extends GraphQLScalarType {
    public IncludeExcludeScalar() {
        super(
            "IncludeExclude",
            "Supports either a single regex string or an array of exact values for Elasticsearch aggregations",
            new Coercing<Object, IncludeExclude>() {
                // 序列化：把Java对象转成GraphQL输出格式（这里我们主要关注解析输入）
                @Override
                public IncludeExclude serialize(Object dataFetcherResult) throws CoercingSerializeException {
                    if (dataFetcherResult instanceof String) {
                        return IncludeExclude.include((String) dataFetcherResult);
                    } else if (dataFetcherResult instanceof List) {
                        return IncludeExclude.include((List<String>) dataFetcherResult);
                    }
                    throw new CoercingSerializeException("Unsupported type for IncludeExclude scalar");
                }

                // 解析客户端传来的变量值
                @Override
                public IncludeExclude parseValue(Object input) throws CoercingParseValueException {
                    if (input instanceof String) {
                        return IncludeExclude.include((String) input);
                    } else if (input instanceof List) {
                        List<?> list = (List<?>) input;
                        if (list.stream().allMatch(item -> item instanceof String)) {
                            return IncludeExclude.include((List<String>) input);
                        }
                    }
                    throw new CoercingParseValueException("IncludeExclude scalar accepts either a string or an array of strings");
                }

                // 解析GraphQL查询中的字面量（比如直接写在查询里的字符串或数组）
                @Override
                public IncludeExclude parseLiteral(Object input) throws CoercingParseLiteralException {
                    if (input instanceof StringValue) {
                        return IncludeExclude.include(((StringValue) input).getValue());
                    } else if (input instanceof ListValue) {
                        List<Value> values = ((ListValue) input).getValues();
                        List<String> stringValues = values.stream()
                            .filter(val -> val instanceof StringValue)
                            .map(val -> ((StringValue) val).getValue())
                            .collect(Collectors.toList());
                        if (stringValues.size() == values.size()) {
                            return IncludeExclude.include(stringValues);
                        }
                    }
                    throw new CoercingParseLiteralException("IncludeExclude scalar literal must be a string or array of strings");
                }
            }
        );
    }
}

步骤3：注册标量到GraphQL Schema

在构建GraphQL实例时，把这个标量加进去：

GraphQLSchema schema = GraphQLSchema.newSchema()
    .query(...)
    .additionalType(new IncludeExcludeScalar())
    .build();

之后在DataFetcher里就能直接拿到IncludeExclude对象，不用再做类型判断了：

public SearchResult get(DataFetchingEnvironment env) {
    IncludeExclude include = env.getArgument("include");
    // 直接用include构建Elasticsearch查询即可
    // ...
}

优点：用户调用体验和Elasticsearch原生API完全一致，非常直观；代码逻辑在标量层封装，业务层更简洁。
缺点：需要实现自定义标量，对GraphQL-Java的标量机制有一定要求；如果后续有类似的兼容需求，这个标量可以复用，但首次开发需要一点时间。

方案3：数组+类型标识，明确区分输入类型

如果不想拆字段也不想写自定义标量，可以用一个数组加枚举类型的方式，让用户明确指定输入是正则还是精确值。

步骤1：定义SDL

enum IncludeExcludeType {
    REGEX
    VALUES
}

input IncludeExcludeInput {
    type: IncludeExcludeType!
    values: [String]!
}

input SearchQueryInput {
    include: IncludeExcludeInput
    exclude: IncludeExcludeInput
    # 其他搜索参数...
}

步骤2：Java处理逻辑

public SearchResult get(DataFetchingEnvironment env) {
    IncludeExcludeInput includeInput = env.getArgument("include");
    IncludeExclude esInclude = null;

    if (includeInput != null) {
        if (IncludeExcludeType.REGEX.equals(includeInput.getType())) {
            // 正则类型要求数组只能有一个元素
            if (includeInput.getValues().size() != 1) {
                throw new IllegalArgumentException("REGEX type requires exactly one value");
            }
            esInclude = IncludeExclude.include(includeInput.getValues().get(0));
        } else {
            esInclude = IncludeExclude.include(includeInput.getValues());
        }
    }

    // ...后续逻辑
    return new SearchResult(...);
}

优点：逻辑清晰，避免自动识别的歧义；实现简单，不需要扩展GraphQL核心功能。
缺点：用户调用时需要额外指定类型，输入格式比原生Elasticsearch繁琐。