You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

如何在PHP中正确拆分连写单词?以"outofthebox"为例的问题排查与实现需求

Hey there! Let's break down what's going wrong with your current PHP script and fix it to get the proper word splitting you need.

What's Wrong with Your Current Script?

Your current logic has two critical flaws:

  1. Blindly selecting the longest found word: Using max($words) picks the longest valid word starting at each position, but this doesn't account for whether the remaining string can be split into valid words. For example, choosing "oft" instead of "of" leaves you with "thebox" which gets mangled into invalid fragments.
  2. Incorrect starting position calculation: The way you reset $start leads to skipping characters or starting at the wrong index, breaking subsequent word checks.

Correct Implementation Logic (Dynamic Programming)

The Python solution you referenced uses dynamic programming to find the most logical word split—we can replicate this in PHP. The core idea is to track valid splits for every substring length, building up to the full string:

  1. Use a $dp array where $dp[$i] stores the best word split for the first $i characters of your input string.
  2. For each position in the string, check all possible preceding substrings (up to your dictionary's longest word length) to see if they're valid words. If they are, append them to the valid split of the preceding substring.
  3. This ensures we prioritize splits where all remaining characters can also be turned into valid words.

Fixed PHP Code

First, we'll fix the SQL injection risk in your word check function, then implement the dynamic programming logic:

<?php
// Initialize database connection
$db = new PDO("mysql:host=localhost;dbname=strings", "root", "");
$db->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);

$inputText = "outofthebox";
$maxWordLength = 10; // Adjust this to match the longest word in your dictionary

// Dynamic programming array: $dp[$i] holds the valid word split for the first $i characters
$dp = array_fill(0, strlen($inputText) + 1, null);
$dp[0] = []; // Base case: 0 characters = empty split

// Iterate through every possible end position of the string
for ($i = 1; $i <= strlen($inputText); $i++) {
    // Check substrings ending at $i, going back up to $maxWordLength characters
    $startRange = max(0, $i - $maxWordLength);
    for ($j = $startRange; $j < $i; $j++) {
        // Only proceed if the substring up to $j has a valid split
        if ($dp[$j] !== null) {
            $currentSubstr = substr($inputText, $j, $i - $j);
            // Check if the current substring is a valid word
            if (isValidWord($currentSubstr, $db)) {
                // Build the new split by adding the valid word to the previous split
                $newSplit = $dp[$j];
                $newSplit[] = $currentSubstr;
                // Save this split as the valid one for position $i
                $dp[$i] = $newSplit;
            }
        }
    }
}

// Output the final result
if ($dp[strlen($inputText)] !== null) {
    print_r($dp[strlen($inputText)]);
} else {
    echo "Could not split the string into valid words. Check your dictionary for missing common words!";
}

/**
 * Safe check for valid words using prepared statements to avoid SQL injection
 */
function isValidWord($word, $db) {
    $stmt = $db->prepare("SELECT id FROM strings WHERE string = :word");
    $stmt->execute([':word' => $word]);
    return $stmt->rowCount() > 0;
}

Key Notes for Success

  • Dictionary completeness: Make sure your MySQL strings table includes all common words (like "of", "the", "box")—missing these will break splits for common phrases.
  • Optional: Frequency-based optimization: For more accurate splits (like handling ambiguous strings), add a frequency column to your dictionary table. Modify the dynamic programming logic to choose splits with the highest total word frequency, just like wordninja does.
  • Performance: Setting a reasonable $maxWordLength (e.g., 10) prevents unnecessary checks against extremely long substrings.

内容的提问来源于stack exchange,提问作者cihan100

火山引擎 最新活动