如何在两个AWS Elasticsearch集群/域间实现索引复制?求AWS专属方案
AWS Elasticsearch 跨域索引复制解决方案
嘿,我刚好处理过类似的跨AWS Elasticsearch(现在官方叫OpenSearch)域复制索引的需求,给你整理几个靠谱的方案,根据你的场景选就行:
方案1:官方Cross Cluster Replication(CCR)—— 实时同步首选
其实AWS OpenSearch(包括旧版7.10+的AWS ES)是支持CCR的,可能你之前没注意版本要求?这个方案适合需要实时增量同步的场景:
- 先确认两个域的版本兼容:主域(Leader)和从域(Follower)版本必须一致,或者从域版本不低于主域(比如主域是7.10,从域可以是7.10或更高)
- 配置主域的访问策略,允许从域的ARN或IP段访问,比如在主域的访问策略里添加:
{ "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::123456789012:role/aws-service-role/es.amazonaws.com/AWSServiceRoleForAmazonElasticsearchService" }, "Action": "es:*", "Resource": "arn:aws:es:us-east-1:123456789012:domain/leader-domain/*" } - 在从域上启动 follower 索引同步,用curl命令示例:
curl -XPUT https://<follower-domain-endpoint>/_plugins/_replication/follower/my-target-index/_start \ -H 'Content-Type: application/json' \ -d '{ "leader_alias": "my-leader-cluster", "leader_index": "my-source-index", "use_roles":{ "leader_cluster_role": "all_access", "follower_cluster_role": "all_access" } }' - 查看同步状态:
curl https://<follower-domain-endpoint>/_plugins/_replication/follower/my-target-index/_status
方案2:Logstash 中间同步 —— 兼容低版本域
如果你的AWS ES版本低于7.10(不支持CCR),或者需要自定义同步逻辑,Logstash是个不错的选择:
- 编写Logstash配置文件(比如
es-sync.conf),配置源域输入和目标域输出:input { elasticsearch { hosts => ["https://<source-domain-endpoint>"] index => "my-source-index" scroll => "10m" # 大索引可以调大这个值 size => 1000 # 批量读取大小 user => "<source-es-user>" password => "<source-es-password>" } } filter { # 这里可以加自定义过滤逻辑,比如修改字段、过滤文档等 } output { elasticsearch { hosts => ["https://<target-domain-endpoint>"] index => "my-target-index" document_id => "%{[@metadata][_id]}" # 保留原文档ID,避免重复 user => "<target-es-user>" password => "<target-es-password>" } } - 运行Logstash开始同步:
bin/logstash -f es-sync.conf
方案3:快照+恢复 —— 一次性全量复制
如果只需要一次性全量复制,不需要实时同步,用AWS ES的快照功能最省心:
- 在源域创建基于S3的快照仓库:
curl -XPUT https://<source-domain-endpoint>/_snapshot/my-s3-snapshot-repo \ -H 'Content-Type: application/json' \ -d '{ "type": "s3", "settings": { "bucket": "your-snapshot-bucket", "region": "us-east-1", "role_arn": "arn:aws:iam::123456789012:role/your-es-snapshot-role" } }' - 对源索引创建快照:
curl -XPUT https://<source-domain-endpoint>/_snapshot/my-s3-snapshot-repo/my-index-snapshot?wait_for_completion=true \ -H 'Content-Type: application/json' \ -d '{ "indices": "my-source-index", "ignore_unavailable": true, "include_global_state": false }' - 在目标域配置同一个S3快照仓库(确保目标域有访问该S3桶的权限)
- 从快照恢复到目标索引:
curl -XPOST https://<target-domain-endpoint>/_snapshot/my-s3-snapshot-repo/my-index-snapshot/_restore \ -H 'Content-Type: application/json' \ -d '{ "indices": "my-source-index", "rename_pattern": "my-source-index", "rename_replacement": "my-target-index", "ignore_unavailable": true }'
注意事项
- 不管用哪个方案,都要确保两个域的网络互通:如果是VPC内的域,要配置安全组允许彼此的ES端口(默认443)流量;如果是公网域,要确保访问策略允许对方的IP或角色访问。
- 对于大索引,快照恢复比Logstash同步更快,而CCR适合长期实时同步。
内容的提问来源于stack exchange,提问作者dvop_g




