如何通过Gremlin-Python在Gremlin-Server导入CSV创建图?
别发愁!我来一步步带你用Gremlin-Python在Gremlin Server上搭建漫威超级英雄图——其实核心思路和你找到的Groovy版指南是相通的,只是把语法换成Python风格而已,跟着做就行。
用Gremlin-Python在Gremlin Server创建漫威超级英雄图
一、先搞定环境准备
- 确保Gremlin Server已经启动(这里以默认的TinkerGraph为例,用JanusGraph、Neo4j等其他图数据库也可以,操作逻辑一致)
- 安装Gremlin-Python依赖:
pip install gremlinpython
二、先理清数据结构(以漫威IP为例)
先明确我们要创建的顶点和边类型:
- 顶点类型:
Hero(超级英雄)、Movie(电影)、Team(英雄团队) - 边类型:
appeared_in(英雄出演电影)、member_of(英雄属于某团队)、teamed_with(英雄与其他英雄组队)
三、编写Python代码实现导入
1. 建立与Gremlin Server的连接
首先创建客户端连接,默认端口是8182:
from gremlin_python.driver import client, serializer # 连接本地Gremlin Server gremlin_client = client.Client( 'ws://localhost:8182/gremlin', 'g', message_serializer=serializer.GraphSONSerializersV2d0() )
2. 导入顶点数据
我们先准备一批测试数据,用参数化查询的方式添加顶点(既安全又高效):
# 超级英雄数据 heroes = [ {"id": "iron_man", "name": "Tony Stark", "alias": "Iron Man", "universe": "Marvel"}, {"id": "captain_america", "name": "Steve Rogers", "alias": "Captain America", "universe": "Marvel"}, {"id": "black_widow", "name": "Natasha Romanoff", "alias": "Black Widow", "universe": "Marvel"} ] # 电影数据 movies = [ {"id": "avengers_1", "title": "The Avengers", "year": 2012}, {"id": "captain_civil_war", "title": "Captain America: Civil War", "year": 2016} ] # 英雄团队数据 teams = [ {"id": "avengers", "name": "The Avengers", "founded_year": 1963} ] # 批量添加英雄顶点 for hero in heroes: query = """ g.addV('Hero') .property('id', $id) .property('name', $name) .property('alias', $alias) .property('universe', $universe) """ gremlin_client.submit(query, hero).all().result() # 批量添加电影顶点 for movie in movies: query = """ g.addV('Movie') .property('id', $id) .property('title', $title) .property('year', $year) """ gremlin_client.submit(query, movie).all().result() # 添加团队顶点 for team in teams: query = """ g.addV('Team') .property('id', $id) .property('name', $name) .property('founded_year', $founded_year) """ gremlin_client.submit(query, team).all().result()
3. 导入边数据
顶点添加完成后,我们来建立顶点之间的关联:
# 英雄出演电影的关联 appearances = [ {"hero_id": "iron_man", "movie_id": "avengers_1"}, {"hero_id": "iron_man", "movie_id": "captain_civil_war"}, {"hero_id": "captain_america", "movie_id": "avengers_1"}, {"hero_id": "captain_america", "movie_id": "captain_civil_war"}, {"hero_id": "black_widow", "movie_id": "avengers_1"}, {"hero_id": "black_widow", "movie_id": "captain_civil_war"} ] # 英雄加入团队的关联 memberships = [ {"hero_id": "iron_man", "team_id": "avengers"}, {"hero_id": "captain_america", "team_id": "avengers"}, {"hero_id": "black_widow", "team_id": "avengers"} ] # 添加appeared_in边 for appearance in appearances: query = """ g.V($hero_id).as('h') .V($movie_id).as('m') .addE('appeared_in') .from('h').to('m') """ gremlin_client.submit(query, appearance).all().result() # 添加member_of边 for membership in memberships: query = """ g.V($hero_id).as('h') .V($team_id).as('t') .addE('member_of') .from('h').to('t') """ gremlin_client.submit(query, membership).all().result()
4. 验证导入结果
我们可以执行几个查询来确认数据是否正确导入:
# 查询所有复仇者成员的别名 result = gremlin_client.submit(""" g.V('avengers').in('member_of').values('alias') """).all().result() print("复仇者成员:", result) # 查询钢铁侠出演的所有电影 result = gremlin_client.submit(""" g.V('iron_man').out('appeared_in').values('title') """).all().result() print("钢铁侠出演的电影:", result)
5. 关闭连接
操作完成后记得关闭客户端连接,释放资源:
gremlin_client.close()
四、实用小贴士
- 如果你的数据是CSV格式,可以用Python内置的
csv模块读取文件,把数据转成上面的列表格式后再循环导入 - 生产环境建议用批量导入(比如结合
inject和循环的Gremlin语句),避免单条提交的性能瓶颈 - 如果用JanusGraph等数据库,记得提前创建顶点ID和查询字段的索引,不然大数据量下查询会很慢
- 默认的Gremlin Server配置允许Python客户端连接,不需要额外修改
内容的提问来源于stack exchange,提问作者Tushar Aggarwal




