9

面试刷题-Trie树-单词搜索

 3 years ago
source link: http://www.banbeichadexiaojiubei.com/index.php/2021/01/01/%e9%9d%a2%e8%af%95%e5%88%b7%e9%a2%98-trie%e6%a0%91-%e5%8d%95%e8%af%8d%e6%90%9c%e7%b4%a2/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
2021年1月1日2021年1月1日 | by YoungTimes | No comments

面试刷题-Trie树-单词搜索

给定一个 m x n 二维字符网格 board 和一个单词(字符串)列表 words,找出所有同时在二维网格和字典中出现的单词。

单词必须按照字母顺序,通过 相邻的单元格 内的字母构成,其中“相邻”单元格是那些水平相邻或垂直相邻的单元格。同一个单元格内的字母在一个单词中不允许被重复使用。

(从当前位置开始,向前、向后、向上、向下搜索,查看搜索路径组成的单词是否在words中出现过。)

示例 1:

输入:board = [[“o”,”a”,”a”,”n”],[“e”,”t”,”a”,”e”],[“i”,”h”,”k”,”r”],[“i”,”f”,”l”,”v”]], words = [“oath”,”pea”,”eat”,”rain”]
输出:[“eat”,”oath”]

基于前缀树(Trie树)的回溯

解决这个问题的关键在于我们如何从字典中找到单词的匹配项,最直观的思路就是将字典中的所有单词都放入到Map或者Dict中,然后一边遍历网格,一边查询搜索过的字符组成的单词是否在Map/Dict中。显然这种效率比较低。

改进的思路:1、采用回溯的算法,避免每次都从头开始搜索;2、如果我们知道给定前缀在字典中不存在任何单词匹配,那么就不需要在这个方向上继续搜索,Trie树可以很方便的实现前缀的查询。

来源:https://leetcode-cn.com/problems/word-search-ii/solution/dan-ci-sou-suo-ii-by-leetcode/

代码实现如下:

1、TrieNode结构

我们在TrieNode中记录了以该Node结尾的Word,这样就避免了在网格遍历过程中还要记录走过的路径。只要搜索到该TrieNode就可以直接取到搜索路径的所有字符组成的Word。

class TrieNode {
public:
bool is_end{false};
std::unordered_map<char, std::shared_ptr<TrieNode>> childs;
std::string word;
class TrieNode {
public:
    bool is_end{false};
    std::unordered_map<char, std::shared_ptr<TrieNode>> childs;
    std::string word;
};

2、构建Trie树

将所有的words构建Trie树,后面用这个Trie树查询前缀组成的字符串是否在字典中。

auto root = std::make_shared<TrieNode>();
for (const auto& word : words) {
auto node = root;
for (size_t i = 0; i < word.size(); i++) {
const auto& ch = word.at(i);
if (node->childs.count(ch) <= 0) {
node->childs[ch] = std::make_shared<TrieNode>();
node = node->childs.at(ch);
node->word = word;
node->is_end = true;
auto root = std::make_shared<TrieNode>();
for (const auto& word : words) {
    auto node = root;
    for (size_t i = 0; i < word.size(); i++) {
        const auto& ch = word.at(i);

        if (node->childs.count(ch) <= 0) {
            node->childs[ch] = std::make_shared<TrieNode>();
        }

        node = node->childs.at(ch);
    }

    node->word = word;
    node->is_end = true;
}

3、回溯遍历搜索

代码中有几个技巧:

1)当首次查询到word之后,清空word内容。这样就解决了结果集中出现重复项的问题,避免了要先搜索匹配结果再去重的过程。

if (node->childs.at(ch)->is_end == true) {
if (!node->childs[ch]->word.empty()) {
rets.emplace_back(node->childs[ch]->word);
node->childs[ch]->word.clear();
if (node->childs.at(ch)->is_end == true) {
    if (!node->childs[ch]->word.empty()) {
        rets.emplace_back(node->childs[ch]->word);
        node->childs[ch]->word.clear();
    }
}

2)在搜索过程中引入回溯机制

在搜索过程中,Trie树遍历与回溯过程一起进行,提高了搜索效率。

3)回溯过程中对Trie树进行剪枝

来源:https://leetcode-cn.com/problems/word-search-ii/solution/dan-ci-sou-suo-ii-by-leetcode/

对于 Trie 中的叶节点,一旦遍历它(即找到匹配的单词),就不需要再遍历它了,因此我们可以把它从树上剪下来。逐渐地,这些非叶节点可以成为叶节点以后。在极端情况下,一旦我们找到字典中所有单词的匹配项,Trie 就会变成空的,极大的提升在线检索的效率。

剪枝的代码如下:

if (node->childs[ch]->childs.size() <= 0) {
node->childs.erase(ch);
if (node->childs[ch]->childs.size() <= 0) {
    node->childs.erase(ch);
}

剪枝优化最典型的是下面这个测试用例,剪枝之前耗时1516ms
,剪枝之后耗时4ms

[["a","a","a","a","a","a","a","a","a","a","a","a"],["a","a","a","a","a","a","a","a","a","a","a","a"],["a","a","a","a","a","a","a","a","a","a","a","a"],["a","a","a","a","a","a","a","a","a","a","a","a"],["a","a","a","a","a","a","a","a","a","a","a","a"],["a","a","a","a","a","a","a","a","a","a","a","a"],["a","a","a","a","a","a","a","a","a","a","a","a"],["a","a","a","a","a","a","a","a","a","a","a","a"],["a","a","a","a","a","a","a","a","a","a","a","a"],["a","a","a","a","a","a","a","a","a","a","a","a"],["a","a","a","a","a","a","a","a","a","a","a","a"],["a","a","a","a","a","a","a","a","a","a","a","a"]]
["lllllll","fffffff","ssss","s","rr","xxxx","ttt","eee","ppppppp","iiiiiiiii","xxxxxxxxxx","pppppp","xxxxxx","yy","jj","ccc","zzz","ffffffff","r","mmmmmmmmm","tttttttt","mm","ttttt","qqqqqqqqqq","z","aaaaaaaa","nnnnnnnnn","v","g","ddddddd","eeeeeeeee","aaaaaaa","ee","n","kkkkkkkkk","ff","qq","vvvvv","kkkk","e","nnn","ooo","kkkkk","o","ooooooo","jjj","lll","ssssssss","mmmm","qqqqq","gggggg","rrrrrrrrrr","iiii","bbbbbbbbb","aaaaaa","hhhh","qqq","zzzzzzzzz","xxxxxxxxx","ww","iiiiiii","pp","vvvvvvvvvv","eeeee","nnnnnnn","nnnnnn","nn","nnnnnnnn","wwwwwwww","vvvvvvvv","fffffffff","aaa","p","ddd","ppppppppp","fffff","aaaaaaaaa","oooooooo","jjjj","xxx","zz","hhhhh","uuuuu","f","ddddddddd","zzzzzz","cccccc","kkkkkk","bbbbbbbb","hhhhhhhhhh","uuuuuuu","cccccccccc","jjjjj","gg","ppp","ccccccccc","rrrrrr","c","cccccccc","yyyyy","uuuu","jjjjjjjj","bb","hhh","l","u","yyyyyy","vvv","mmm","ffffff","eeeeeee","qqqqqqq","zzzzzzzzzz","ggg","zzzzzzz","dddddddddd","jjjjjjj","bbbbb","ttttttt","dddddddd","wwwwwww","vvvvvv","iii","ttttttttt","ggggggg","xx","oooooo","cc","rrrr","qqqq","sssssss","oooo","lllllllll","ii","tttttttttt","uuuuuu","kkkkkkkk","wwwwwwwwww","pppppppppp","uuuuuuuu","yyyyyyy","cccc","ggggg","ddddd","llllllllll","tttt","pppppppp","rrrrrrr","nnnn","x","yyy","iiiiiiiiii","iiiiii","llll","nnnnnnnnnn","aaaaaaaaaa","eeeeeeeeee","m","uuu","rrrrrrrr","h","b","vvvvvvv","ll","vv","mmmmmmm","zzzzz","uu","ccccccc","xxxxxxx","ss","eeeeeeee","llllllll","eeee","y","ppppp","qqqqqq","mmmmmm","gggg","yyyyyyyyy","jjjjjj","rrrrr","a","bbbb","ssssss","sss","ooooo","ffffffffff","kkk","xxxxxxxx","wwwwwwwww","w","iiiiiiii","ffff","dddddd","bbbbbb","uuuuuuuuu","kkkkkkk","gggggggggg","qqqqqqqq","vvvvvvvvv","bbbbbbbbbb","nnnnn","tt","wwww","iiiii","hhhhhhh","zzzzzzzz","ssssssssss","j","fff","bbbbbbb","aaaa","mmmmmmmmmm","jjjjjjjjjj","sssss","yyyyyyyy","hh","q","rrrrrrrrr","mmmmmmmm","wwwww","www","rrr","lllll","uuuuuuuuuu","oo","jjjjjjjjj","dddd","pppp","hhhhhhhhh","kk","gggggggg","xxxxx","vvvv","d","qqqqqqqqq","dd","ggggggggg","t","yyyy","bbb","yyyyyyyyyy","tttttt","ccccc","aa","eeeeee","llllll","kkkkkkkkkk","sssssssss","i","hhhhhh","oooooooooo","wwwwww","ooooooooo","zzzz","k","hhhhhhhh","aaaaa","mmmmm"]
[["a","a","a","a","a","a","a","a","a","a","a","a"],["a","a","a","a","a","a","a","a","a","a","a","a"],["a","a","a","a","a","a","a","a","a","a","a","a"],["a","a","a","a","a","a","a","a","a","a","a","a"],["a","a","a","a","a","a","a","a","a","a","a","a"],["a","a","a","a","a","a","a","a","a","a","a","a"],["a","a","a","a","a","a","a","a","a","a","a","a"],["a","a","a","a","a","a","a","a","a","a","a","a"],["a","a","a","a","a","a","a","a","a","a","a","a"],["a","a","a","a","a","a","a","a","a","a","a","a"],["a","a","a","a","a","a","a","a","a","a","a","a"],["a","a","a","a","a","a","a","a","a","a","a","a"]]
["lllllll","fffffff","ssss","s","rr","xxxx","ttt","eee","ppppppp","iiiiiiiii","xxxxxxxxxx","pppppp","xxxxxx","yy","jj","ccc","zzz","ffffffff","r","mmmmmmmmm","tttttttt","mm","ttttt","qqqqqqqqqq","z","aaaaaaaa","nnnnnnnnn","v","g","ddddddd","eeeeeeeee","aaaaaaa","ee","n","kkkkkkkkk","ff","qq","vvvvv","kkkk","e","nnn","ooo","kkkkk","o","ooooooo","jjj","lll","ssssssss","mmmm","qqqqq","gggggg","rrrrrrrrrr","iiii","bbbbbbbbb","aaaaaa","hhhh","qqq","zzzzzzzzz","xxxxxxxxx","ww","iiiiiii","pp","vvvvvvvvvv","eeeee","nnnnnnn","nnnnnn","nn","nnnnnnnn","wwwwwwww","vvvvvvvv","fffffffff","aaa","p","ddd","ppppppppp","fffff","aaaaaaaaa","oooooooo","jjjj","xxx","zz","hhhhh","uuuuu","f","ddddddddd","zzzzzz","cccccc","kkkkkk","bbbbbbbb","hhhhhhhhhh","uuuuuuu","cccccccccc","jjjjj","gg","ppp","ccccccccc","rrrrrr","c","cccccccc","yyyyy","uuuu","jjjjjjjj","bb","hhh","l","u","yyyyyy","vvv","mmm","ffffff","eeeeeee","qqqqqqq","zzzzzzzzzz","ggg","zzzzzzz","dddddddddd","jjjjjjj","bbbbb","ttttttt","dddddddd","wwwwwww","vvvvvv","iii","ttttttttt","ggggggg","xx","oooooo","cc","rrrr","qqqq","sssssss","oooo","lllllllll","ii","tttttttttt","uuuuuu","kkkkkkkk","wwwwwwwwww","pppppppppp","uuuuuuuu","yyyyyyy","cccc","ggggg","ddddd","llllllllll","tttt","pppppppp","rrrrrrr","nnnn","x","yyy","iiiiiiiiii","iiiiii","llll","nnnnnnnnnn","aaaaaaaaaa","eeeeeeeeee","m","uuu","rrrrrrrr","h","b","vvvvvvv","ll","vv","mmmmmmm","zzzzz","uu","ccccccc","xxxxxxx","ss","eeeeeeee","llllllll","eeee","y","ppppp","qqqqqq","mmmmmm","gggg","yyyyyyyyy","jjjjjj","rrrrr","a","bbbb","ssssss","sss","ooooo","ffffffffff","kkk","xxxxxxxx","wwwwwwwww","w","iiiiiiii","ffff","dddddd","bbbbbb","uuuuuuuuu","kkkkkkk","gggggggggg","qqqqqqqq","vvvvvvvvv","bbbbbbbbbb","nnnnn","tt","wwww","iiiii","hhhhhhh","zzzzzzzz","ssssssssss","j","fff","bbbbbbb","aaaa","mmmmmmmmmm","jjjjjjjjjj","sssss","yyyyyyyy","hh","q","rrrrrrrrr","mmmmmmmm","wwwww","www","rrr","lllll","uuuuuuuuuu","oo","jjjjjjjjj","dddd","pppp","hhhhhhhhh","kk","gggggggg","xxxxx","vvvv","d","qqqqqqqqq","dd","ggggggggg","t","yyyy","bbb","yyyyyyyyyy","tttttt","ccccc","aa","eeeeee","llllll","kkkkkkkkkk","sssssssss","i","hhhhhh","oooooooooo","wwwwww","ooooooooo","zzzz","k","hhhhhhhh","aaaaa","mmmmm"]

4)避免循环搜索

在搜索过程中,搜索过的节点不能重复搜索。为了解决这个问题,需要给搜索过的节点打个标记。

打标记的方法很多,这里只说两种,一种是定义一个与网格大小一样的布尔类型数组,访问过的节点标记为True,未访问过的节点标记为False;另一种是把节点的值修改为标志变量,搜索完成之后再还原回去。

board[i][j] = '.';
board[i][j] = ch;
board[i][j] = '.';
....
board[i][j] = ch;

完整的代码

class Solution {
private:
class TrieNode {
public:
bool is_end{false};
std::unordered_map<char, std::shared_ptr<TrieNode>> childs;
std::string word;
void search(std::shared_ptr<TrieNode> node,
std::vector<std::vector<char>>& board,
const size_t i,
const size_t j,
std::vector<std::string>& rets) {
const auto ch = board[i][j];
if (node->childs.count(ch) <= 0) {
return;
// std::cout << "ch:" << ch << std::endl;
if (node->childs.at(ch)->is_end == true) {
// std::cout << "ch 1:" << ch << std::endl;
if (!node->childs[ch]->word.empty()) {
rets.emplace_back(node->childs[ch]->word);
node->childs[ch]->word.clear();
board[i][j] = '.';
if (i > 0) {
search(node->childs[ch], board, i - 1, j, rets);
if (j > 0) {
search(node->childs[ch], board, i, j - 1, rets);
if (i + 1 < board.size()) {
search(node->childs[ch], board, i + 1, j, rets);
if (j + 1 < board[0].size()) {
search(node->childs[ch], board, i, j + 1, rets);
board[i][j] = ch;
if (node->childs[ch]->childs.size() <= 0) {
node->childs.erase(ch);
public:
vector<string> findWords(vector<vector<char>>& board, vector<string>& words) {
std::vector<std::string> rets;
if (words.size() <= 0) {
return rets;
if (board.size() <= 0 || board[0].size() <= 0) {
return rets;
auto root = std::make_shared<TrieNode>();
for (const auto& word : words) {
auto node = root;
for (size_t i = 0; i < word.size(); i++) {
const auto& ch = word.at(i);
if (node->childs.count(ch) <= 0) {
node->childs[ch] = std::make_shared<TrieNode>();
node = node->childs.at(ch);
node->word = word;
node->is_end = true;
for (size_t i = 0; i < board.size(); i++) {
for (size_t j = 0; j < board[0].size(); j++) {
const auto& ch = board[i][j];
if (root->childs.count(ch) > 0) {
search(root, board, i, j, rets);
return rets;
class Solution {
private:
    class TrieNode {
    public:
        bool is_end{false};
        std::unordered_map<char, std::shared_ptr<TrieNode>> childs;
        std::string word;
    };

    void search(std::shared_ptr<TrieNode> node,
        std::vector<std::vector<char>>& board, 
        const size_t i, 
        const size_t j, 
        std::vector<std::string>& rets) {
        const auto ch = board[i][j];
        if (node->childs.count(ch) <= 0) {
            return;
        }

        // std::cout << "ch:" << ch << std::endl;
        if (node->childs.at(ch)->is_end == true) {
            // std::cout << "ch 1:" << ch << std::endl;
            if (!node->childs[ch]->word.empty()) {
                rets.emplace_back(node->childs[ch]->word);
                node->childs[ch]->word.clear();
            }
        }

        board[i][j] = '.';

        if (i > 0) {
            search(node->childs[ch], board, i - 1, j, rets);
        }

        if (j > 0) {
            search(node->childs[ch], board, i, j - 1, rets);
        }

        if (i + 1 < board.size()) {
            search(node->childs[ch], board, i + 1, j, rets);
        }

        if (j + 1 < board[0].size()) {
            search(node->childs[ch], board, i, j + 1, rets);
        }

        board[i][j] = ch;

        if (node->childs[ch]->childs.size() <= 0) {
            node->childs.erase(ch);
        }
    }

public:
    vector<string> findWords(vector<vector<char>>& board, vector<string>& words) {
        std::vector<std::string> rets;
        if (words.size() <= 0) {
            return rets;
        }

        if (board.size() <= 0 || board[0].size() <= 0) {
            return rets;
        }

        auto root = std::make_shared<TrieNode>();
        for (const auto& word : words) {
            auto node = root;
            for (size_t i = 0; i < word.size(); i++) {
                const auto& ch = word.at(i);

                if (node->childs.count(ch) <= 0) {
                    node->childs[ch] = std::make_shared<TrieNode>();
                }

                node = node->childs.at(ch);
            }

            node->word = word;
            node->is_end = true;
        }

        for (size_t i = 0; i < board.size(); i++) {
            for (size_t j = 0; j < board[0].size(); j++) {
                const auto& ch = board[i][j];
                if (root->childs.count(ch) > 0) {
                    search(root, board, i, j, rets);
                }
            }
        }

        return rets;
    }
};

https://leetcode-cn.com/problems/word-search-ii/

除非注明,否则均为[半杯茶的小酒杯]原创文章,转载必须以链接形式标明本文链接

本文链接:http://www.banbeichadexiaojiubei.com/index.php/2021/01/01/%e9%9d%a2%e8%af%95%e5%88%b7%e9%a2%98-trie%e6%a0%91-%e5%8d%95%e8%af%8d%e6%90%9c%e7%b4%a2/


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK