You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

Redis vs PostgreSQL:亿级字符串通配符搜索方案选型咨询

Great question—dealing with 100M+ strings and wildcard suffix searches (like *foo) is exactly where MySQL's default indexing falls flat, so it makes total sense you're exploring PostgreSQL trigram indexes and Redis. Since your data fits entirely in memory, both options will deliver way better performance than MySQL, but they each have sweet spots depending on your needs. Let's break them down:

PostgreSQL Trigram Index Approach

PostgreSQL's trigram indexes (via the pg_trgm extension) are a fantastic fit for fuzzy search scenarios, and with a small workaround, they handle suffix searches like *foo efficiently.

  • The suffix search workaround: Trigram indexes are naturally optimized for prefix matches (foo*) because they look at consecutive 3-character chunks. To make them work for suffixes, we'll reverse our strings and build the index on the reversed values. This turns a suffix search *foo into a prefix search on the reversed string (oof*), which the trigram index can handle perfectly.
  • Step-by-step implementation:
    1. First, enable the trigram extension if you haven't already:
      CREATE EXTENSION IF NOT EXISTS pg_trgm;
      
    2. Add a generated column to store reversed versions of your strings (this is maintained automatically by PostgreSQL):
      ALTER TABLE string_data ADD COLUMN value_reversed TEXT GENERATED ALWAYS AS (REVERSE(value)) STORED;
      
    3. Create a GIN trigram index on the reversed column (GIN is faster for trigram searches than GIST for large datasets):
      CREATE INDEX idx_string_data_reversed_trgm ON string_data USING GIN (value_reversed gin_trgm_ops);
      
    4. Run your suffix search by reversing the query term:
      SELECT value FROM string_data WHERE value_reversed LIKE REVERSE('%foo');
      
  • Performance & tradeoffs: Since your data fits in memory, crank up PostgreSQL's shared_buffers to let it cache most (or all) of the table and index in RAM—this will make queries fly. Trigram indexes typically take 2-3x the space of your raw data, which is manageable for 100M+ short strings. The big win here is that you get ACID compliance, support for all kinds of fuzzy searches (not just suffixes), and easy handling of data updates if you need them.
Redis-Based Approach

Redis is built for in-memory speed, so it's a great option if you want raw performance for simple search patterns. There are two main ways to handle suffix searches here:

Option 1: Sorted Sets with Reversed Strings

This is the most memory-efficient Redis approach, leveraging sorted sets' lexicographical ordering.

  • How it works: Reverse each string and add it as a member to a sorted set (use a score of 0 since we don't care about numeric ordering). To search for *foo, reverse the term to oof and use ZRANGEBYLEX to get all members starting with oof.
  • Implementation steps:
    1. Import your data (example for a single string):
      ZADD string_set 0 "oof"  # Corresponding to original string "foo"
      
    2. Run the suffix search:
      ZRANGEBYLEX string_set [oof (oof{
      
      The [ means "inclusive start" and ( means "exclusive end", so this grabs all members starting with oof. Just reverse the results to get your original strings.
  • Tradeoffs: This is blazingly fast and uses minimal memory, but it only handles prefix/suffix searches (no middle-of-string matches). It's best for static or read-heavy datasets—updates require removing and re-adding reversed strings, which can be tedious if you have frequent changes.

Option 2: Redis Search (RediSearch Module)

If you need more flexible search capabilities (like middle matches, scoring, or pagination), the RediSearch module is worth considering.

  • How it works: RediSearch lets you build full-text indexes on Redis data (hashes, JSON, etc.) and supports wildcard searches natively, including suffixes.
  • Implementation steps:
    1. First, install the RediSearch module (follow Redis's official setup docs for your environment).
    2. Create an index for your strings (stored as hashes here):
      FT.CREATE string_idx ON HASH PREFIX 1 "string:" SCHEMA value TEXT
      
    3. Import your data:
      HSET string:1 value "foo"
      HSET string:2 value "barfoo"
      
    4. Run the suffix search directly:
      FT.SEARCH string_idx "@value:*foo"
      
  • Tradeoffs: This gives you more search power out of the box, but it uses more memory than the sorted set approach and requires installing an extra module. It's a good middle ground if you need more than just suffix searches but still want Redis-level speed.
Which Should You Choose?
  • Go with PostgreSQL trigram indexes if you need:
    • ACID transactions and data integrity
    • Support for complex fuzzy searches (middle matches, partial terms)
    • Easy handling of data updates
  • Go with Redis (sorted sets) if you:
    • Only need suffix/prefix searches
    • Want the absolute fastest query times and minimal memory usage
    • Have static or rarely updated data
  • Go with Redis Search if you want Redis speed plus more advanced search features without switching to a database.

内容的提问来源于stack exchange,提问作者Stuart Yoder

火山引擎 最新活动