`

java trie树

 
阅读更多

字典树查找,Trie,又称字典树、单词查找树,是一种树形结构,用于保存大量的字符串。它的优点是:利用字符串的公共前缀来节约存储空间。

 

package com.jwetherell.algorithms.data_structures;

import java.util.Arrays;


/**
 * A trie, or prefix tree, is an ordered tree data structure that is used to
 * store an associative array where the keys are usually strings.
 * 
 * == This is NOT a compact Trie. ==
 * 
 * http://en.wikipedia.org/wiki/Trie
 * 
 * @author Justin Wetherell <phishman3579@gmail.com>
 */
public class Trie<C extends CharSequence> {

    private int size = 0;

    protected INodeCreator creator = null;
    protected Node root = null;


    /**
     * Default constructor.
     */
    public Trie() { }

    /**
     * Constructor with external Node creator.
     */
    public Trie(INodeCreator creator) {
        this.creator = creator;
    }

    /**
     * Create a new node for sequence.
     * 
     * @param parent node of the new node.
     * @param character which represents this node.
     * @param isWord signifies if the node represents a word.
     * @return Node which was created.
     */
    protected Node createNewNode(Node parent, Character character, boolean isWord) {
        return (new Node(parent, character, isWord));
    }

    /**
     * Add sequence to trie.
     * 
     * @param seq to add to the trie.
     * @return True if sequence is added to trie or false if it already exists.
     */
    public boolean add(C seq) {
        return (this.addSequence(seq)!=null);
    }

    /**
     * Add sequence to trie.
     * 
     * @param seq to add to the trie.
     * @return Node which was added to trie or null if it already exists.
     */
    protected Node addSequence(C seq) {
        if (root==null) {
            if (this.creator==null) root = createNewNode(null, null, false);
            else root = this.creator.createNewNode(null, null, false);
        }

        int length = (seq.length() - 1);
        Node prev = root;
        //For each Character in the input, we'll either go to an already define 
        // child or create a child if one does not exist
        for (int i = 0; i < length; i++) {
            Node n = null;
            Character c = seq.charAt(i);
            int index = prev.childIndex(c);
            //If 'prev' has a child which starts with Character c
            if (index >= 0) {
                //Go to the child
                n = prev.getChild(index);
            } else {
                //Create a new child for the character
                if (this.creator==null) n = createNewNode(prev, c, false);
                else n = this.creator.createNewNode(prev, c, false);
                prev.addChild(n);
            }
            prev = n;
        }

        //Deal with the first character of the input string not found in the trie
        Node n = null;
        Character c = seq.charAt(length);
        int index = prev.childIndex(c);
        //If 'prev' already contains a child with the last Character
        if (index >= 0) {
            n = prev.getChild(index);
            //If the node doesn't represent a string already
            if (n.isWord == false) {
                //Set the string to equal the full input string
                n.character = c;
                n.isWord = true;
                size++;
                return n;
            } else {
                //String already exists in Trie
                return null;
            }
        } else {
            //Create a new node for the input string
            if (this.creator==null) n = createNewNode(prev, c, true);
            else n = this.creator.createNewNode(prev, c, true);
            prev.addChild(n);
            size++;
            return n;
        }
    }

    /**
     * Remove sequence from the trie.
     * 
     * @param sequence to remove from the trie.
     * @return True if sequence was remove or false if sequence is not found.
     */
    public boolean remove(C sequence) {
        if (root == null) return false;

        //Find the key in the Trie
        Node previous = null;
        Node node = root;
        int length = (sequence.length() - 1);
        for (int i = 0; i <= length; i++) {
            char c = sequence.charAt(i);
            int index = node.childIndex(c);
            if (index >= 0) {
                previous = node;
                node = node.getChild(index);
            } else {
                return false;
            }
        }
        
        if (node.childrenSize > 0) {
            //The node which contains the input string and has children, just NULL out the string
            node.isWord = false;
        } else {
            //The node which contains the input string does NOT have children
            int index = previous.childIndex(node.character);
            //Remove node from previous node
            previous.removeChild(index);
            //Go back up the trie removing nodes until you find a node which represents a string
            while (previous != null && previous.isWord==false && previous.childrenSize == 0) {
                if (previous.parent != null) {
                    int idx = previous.parent.childIndex(previous.character);
                    if (idx >= 0) previous.parent.removeChild(idx);
                }
                previous = previous.parent;
            }
        }
        size--;
        return true;
    }

    /**
     * Get node which represents the sequence in the trie.
     * 
     * @param seq to find a node for.
     * @return Node which represents the sequence or NULL if not found.
     */
    protected Node getNode(C seq) {
        if (root == null) return null;

        //Find the string in the trie
        Node n = root;
        int length = (seq.length() - 1);
        for (int i = 0; i <= length; i++) {
            char c = seq.charAt(i);
            int index = n.childIndex(c);
            if (index >= 0) {
                n = n.getChild(index);
            } else {
                //string does not exist in trie
                return null;
            }
        }

        return n;
    }

    /**
     * Does the trie contain the sequence.
     * 
     * @param seq to locate in the trie.
     * @return True if sequence is in the trie.
     */
    public boolean contains(C seq) {
        Node n = this.getNode(seq);
        if (n==null || !n.isWord) return false;
        
        //If the node found in the trie does not have it's string 
        // field defined then input string was not found
        return n.isWord;
    }

    /**
     * Number of sequences in the trie.
     * 
     * @return number of sequences in the trie.
     */
    public int size() {
        return size;
    }

    /**
     * {@inheritDoc}
     */
    @Override
    public String toString() {
        return TriePrinter.getString(this);
    }


    protected static class Node {

        private static final int MINIMUM_SIZE = 2;

        protected Node[] children = new Node[MINIMUM_SIZE];
        protected int childrenSize = 0;
        protected Node parent = null;
        protected boolean isWord = false; //Signifies this node represents a word
        protected Character character = null; //First character that is different the parent's string


        protected Node(Node parent, Character character, boolean isWord) {
            this.parent = parent;
            this.character = character;
            this.isWord = isWord;
        }

        protected void addChild(Node node) {
            if (childrenSize>=children.length) {
                children = Arrays.copyOf(children, ((children.length*3)/2)+1);
            }
            children[childrenSize++] = node;
        }
        protected boolean removeChild(int index) {
            if (index>=childrenSize) return false;
            children[index] = null;
            childrenSize--;
            System.arraycopy(children, index+1, children, index, childrenSize-index);
            if (childrenSize>=MINIMUM_SIZE && childrenSize<children.length/2) {
                children = Arrays.copyOf(children, childrenSize);
            }
            return true;
        }
        protected int childIndex(Character character) {
            for (int i = 0; i < childrenSize; i++) {
                Node c = children[i];
                if (c.character.equals(character)) return i;
            }
            return Integer.MIN_VALUE;
        }
        protected Node getChild(int index) {
            if (index>=childrenSize) return null;
            return children[index];
        }
        protected int getChildrenSize() {
            return childrenSize;
        }

        /**
         * {@inheritDoc}
         */
        @Override
        public String toString() {
            StringBuilder builder = new StringBuilder();
            if (isWord == true) builder.append("Node=").append(isWord).append("\n");
            for (int i=0; i<childrenSize; i++) {
                Node c = children[i];
                builder.append(c.toString());
            }
            return builder.toString();
        }
    }

    protected static interface INodeCreator {

        /**
         * Create a new node for sequence.
         * 
         * @param parent node of the new node.
         * @param character which represents this node.
         * @param isWord signifies if the node represents a word.
         * @return Node which was created.
         */
        public Node createNewNode(Node parent, Character character, boolean type);
    }

    protected static class TriePrinter {

        public static <C extends CharSequence> void print(Trie<C> trie) {
            System.out.println(getString(trie));
        }

        public static <C extends CharSequence> String getString(Trie<C> tree) {
            return getString(tree.root, "", null, true);
        }

        protected static <C extends CharSequence> String getString(Node node, String prefix, String previousString, boolean isTail) {
            StringBuilder builder = new StringBuilder();
            String string = null;
            if (node.character!=null) {
                String temp = String.valueOf(node.character);
                if (previousString!=null) string = previousString + temp;
                else string = temp;
            }
            builder.append(prefix + (isTail ? "└── " : "├── ") + 
                ((node.isWord == true) ? 
                    ("(" + node.character + ") " + string) 
                : 
                    node.character) +
            "\n");
            if (node.children != null) {
                for (int i = 0; i < node.childrenSize - 1; i++) {
                    builder.append(getString(node.children[i], prefix + (isTail ? "    " : "│   "), string, false));
                }
                if (node.childrenSize >= 1) {
                    builder.append(getString(node.children[node.childrenSize - 1], prefix + (isTail ? "    " : "│   "), string, true));
                }
            }
            return builder.toString();
        }
    }
}

 测试代码

 

 private static boolean testTrie() {
        {
            long count = 0;

            long addTime = 0L;
            long removeTime = 0L;
            long beforeAddTime = 0L;
            long afterAddTime = 0L;
            long beforeRemoveTime = 0L;
            long afterRemoveTime = 0L;

            long memory = 0L;
            long beforeMemory = 0L;
            long afterMemory = 0L;

            //Trie.
            if (debug>1) System.out.println("Trie.");
            testNames[testIndex] = "Trie";

            count++;

            if (debugMemory) beforeMemory = DataStructures.getMemoryUse();
            if (debugTime) beforeAddTime = System.currentTimeMillis();
            Trie<String> trie = new Trie<String>();
            for (int i=0; i<unsorted.length; i++) {
                int item = unsorted[i];
                String string = String.valueOf(item);
                trie.add(string);
                if (validateStructure && !(trie.size()==i+1)) {
                    System.err.println("YIKES!! "+item+" caused a size mismatch.");
                    handleError(trie);
                    return false;
                }
                if (validateContents && !trie.contains(string)) {
                    System.err.println("YIKES!! "+string+" doesn't exist.");
                    handleError(trie);
                    return false;
                }
            }
            if (debugTime) {
                afterAddTime = System.currentTimeMillis();
                addTime += afterAddTime-beforeAddTime;
                if (debug>0) System.out.println("Trie add time = "+addTime/count+" ms");
            }
            if (debugMemory) {
                afterMemory = DataStructures.getMemoryUse();
                memory += afterMemory-beforeMemory;
                if (debug>0) System.out.println("Trie memory use = "+(memory/count)+" bytes");
            }

            String invalid = INVALID.toString();
            boolean contains = trie.contains(invalid);
            boolean removed = trie.remove(invalid);
            if (contains || removed) {
                System.err.println("Trie invalidity check. contains="+contains+" removed="+removed);
                return false;
            } else System.out.println("Trie invalidity check. contains="+contains+" removed="+removed);

            if (debug>1) System.out.println(trie.toString());

            long lookupTime = 0L;
            long beforeLookupTime = 0L;
            long afterLookupTime = 0L;
            if (debugTime) beforeLookupTime = System.currentTimeMillis();
            for (int item : unsorted) {
                String string = String.valueOf(item);
                trie.contains(string);
            }
            if (debugTime) {
                afterLookupTime = System.currentTimeMillis();
                lookupTime += afterLookupTime-beforeLookupTime;
                if (debug>0) System.out.println("Trie lookup time = "+lookupTime/count+" ms");
            }

            if (debugTime) beforeRemoveTime = System.currentTimeMillis();
            for (int i=0; i<unsorted.length; i++) {
                int item = unsorted[i];
                String string = String.valueOf(item);
                trie.remove(string);
                if (validateStructure && !(trie.size()==unsorted.length-(i+1))) {
                    System.err.println("YIKES!! "+item+" caused a size mismatch.");
                    handleError(trie);
                    return false;
                }
                if (validateContents && trie.contains(string)) {
                    System.err.println("YIKES!! "+string+" still exists.");
                    handleError(trie);
                    return false;
                }
            }
            if (debugTime) {
                afterRemoveTime = System.currentTimeMillis();
                removeTime += afterRemoveTime-beforeRemoveTime;
                if (debug>0) System.out.println("Trie remove time = "+removeTime/count+" ms");
            }

            contains = trie.contains(invalid);
            removed = trie.remove(invalid);
            if (contains || removed) {
                System.err.println("Trie invalidity check. contains="+contains+" removed="+removed);
                return false;
            } else System.out.println("Trie invalidity check. contains="+contains+" removed="+removed);

            count++;

            if (debugMemory) beforeMemory = DataStructures.getMemoryUse();
            if (debugTime) beforeAddTime = System.currentTimeMillis();
            for (int i=unsorted.length-1; i>=0; i--) {
                int item = unsorted[i];
                String string = String.valueOf(item);
                trie.add(string);
                if (validateStructure && !(trie.size()==unsorted.length-i)) {
                    System.err.println("YIKES!! "+item+" caused a size mismatch.");
                    handleError(trie);
                    return false;
                }
                if (validateContents && !trie.contains(string)) {
                    System.err.println("YIKES!! "+string+" doesn't exists.");
                    handleError(trie);
                    return false;
                }
            }
            if (debugTime) {
                afterAddTime = System.currentTimeMillis();
                addTime += afterAddTime-beforeAddTime;
                if (debug>0) System.out.println("Trie add time = "+addTime/count+" ms");
            }
            if (debugMemory) {
                afterMemory = DataStructures.getMemoryUse();
                memory += afterMemory-beforeMemory;
                if (debug>0) System.out.println("Trie memory use = "+(memory/count)+" bytes");
            }

            contains = trie.contains(invalid);
            removed = trie.remove(invalid);
            if (contains || removed) {
                System.err.println("Trie invalidity check. contains="+contains+" removed="+removed);
                return false;
            } else System.out.println("Trie invalidity check. contains="+contains+" removed="+removed);

            if (debug>1) System.out.println(trie.toString());

            lookupTime = 0L;
            beforeLookupTime = 0L;
            afterLookupTime = 0L;
            if (debugTime) beforeLookupTime = System.currentTimeMillis();
            for (int item : unsorted) {
                String string = String.valueOf(item);
                trie.contains(string);
            }
            if (debugTime) {
                afterLookupTime = System.currentTimeMillis();
                lookupTime += afterLookupTime-beforeLookupTime;
                if (debug>0) System.out.println("Trie lookup time = "+lookupTime/count+" ms");
            }

            if (debugTime) beforeRemoveTime = System.currentTimeMillis();
            for (int i=0; i<unsorted.length; i++) {
                int item = unsorted[i];
                String string = String.valueOf(item);
                trie.remove(string);
                if (validateStructure && !(trie.size()==unsorted.length-(i+1))) {
                    System.err.println("YIKES!! "+item+" caused a size mismatch.");
                    handleError(trie);
                    return false;
                }
                if (validateContents && trie.contains(string)) {
                    System.err.println("YIKES!! "+string+" still exists.");
                    handleError(trie);
                    return false;
                }
            }
            if (debugTime) {
                afterRemoveTime = System.currentTimeMillis();
                removeTime += afterRemoveTime-beforeRemoveTime;
                if (debug>0) System.out.println("Trie remove time = "+removeTime/count+" ms");
            }

            contains = trie.contains(invalid);
            removed = trie.remove(invalid);
            if (contains || removed) {
                System.err.println("Trie invalidity check. contains="+contains+" removed="+removed);
                return false;
            } else System.out.println("Trie invalidity check. contains="+contains+" removed="+removed);

            //sorted
            long addSortedTime = 0L;
            long removeSortedTime = 0L;
            long beforeAddSortedTime = 0L;
            long afterAddSortedTime = 0L;
            long beforeRemoveSortedTime = 0L;
            long afterRemoveSortedTime = 0L;

            if (debugMemory) beforeMemory = DataStructures.getMemoryUse();
            if (debugTime) beforeAddSortedTime = System.currentTimeMillis();
            for (int i=0; i<sorted.length; i++) {
                int item = sorted[i];
                String string = String.valueOf(item);
                trie.add(string);
                if (validateStructure && !(trie.size()==(i+1))) {
                    System.err.println("YIKES!! "+item+" caused a size mismatch.");
                    handleError(trie);
                    return false;
                }
                if (validateContents && !trie.contains(string)) {
                    System.err.println("YIKES!! "+item+" doesn't exist.");
                    handleError(trie);
                    return false;
                }
            }
            if (debugTime) {
                afterAddSortedTime = System.currentTimeMillis();
                addSortedTime += afterAddSortedTime-beforeAddSortedTime;
                if (debug>0) System.out.println("Trie add time = "+addSortedTime+" ms");
            }
            if (debugMemory) {
                afterMemory = DataStructures.getMemoryUse();
                memory += afterMemory-beforeMemory;
                if (debug>0) System.out.println("Trie memory use = "+(memory/(count+1))+" bytes");
            }

            contains = trie.contains(invalid);
            removed = trie.remove(invalid);
            if (contains || removed) {
                System.err.println("Trie invalidity check. contains="+contains+" removed="+removed);
                return false;
            } else System.out.println("Trie invalidity check. contains="+contains+" removed="+removed);

            if (debug>1) System.out.println(trie.toString());

            lookupTime = 0L;
            beforeLookupTime = 0L;
            afterLookupTime = 0L;
            if (debugTime) beforeLookupTime = System.currentTimeMillis();
            for (int item : sorted) {
                String string = String.valueOf(item);
                trie.contains(string);
            }
            if (debugTime) {
                afterLookupTime = System.currentTimeMillis();
                lookupTime += afterLookupTime-beforeLookupTime;
                if (debug>0) System.out.println("Trie lookup time = "+lookupTime/(count+1)+" ms");
            }

            if (debugTime) beforeRemoveSortedTime = System.currentTimeMillis();
            for (int i=sorted.length-1; i>=0; i--) {
                int item = sorted[i];
                String string = String.valueOf(item);
                trie.remove(string);
                if (validateStructure && !(trie.size()==i)) {
                    System.err.println("YIKES!! "+item+" caused a size mismatch.");
                    handleError(trie);
                    return false;
                }
                if (validateContents && trie.contains(string)) {
                    System.err.println("YIKES!! "+item+" still exists.");
                    handleError(trie);
                    return false;
                }
            }
            if (debugTime) {
                afterRemoveSortedTime = System.currentTimeMillis();
                removeSortedTime += afterRemoveSortedTime-beforeRemoveSortedTime;
                if (debug>0) System.out.println("Trie remove time = "+removeSortedTime+" ms");
            }

            contains = trie.contains(invalid);
            removed = trie.remove(invalid);
            if (contains || removed) {
                System.err.println("Trie invalidity check. contains="+contains+" removed="+removed);
                return false;
            } else System.out.println("Trie invalidity check. contains="+contains+" removed="+removed);

            if (testResults[testIndex]==null) testResults[testIndex] = new long[6];
            testResults[testIndex][0]+=addTime/count;
            testResults[testIndex][1]+=removeTime/count;
            testResults[testIndex][2]+=addSortedTime;
            testResults[testIndex][3]+=removeSortedTime;
            testResults[testIndex][4]+=lookupTime/(count+1);
            testResults[testIndex++][5]+=memory/(count+1);

            if (debug>1) System.out.println();
        }
        
        return true;
    }
0
1
分享到:
评论

相关推荐

    基于Trie树模仿谷歌百度搜索框提示

    我们将深入探讨Trie树数据结构以及如何使用Java来构建这样的系统。 首先,我们要理解什么是Trie树,也被称为前缀树或字典树。Trie树是一种特殊的树形数据结构,用于存储一个字符串集合。它的特点是每个节点代表一个...

    Trie树(字典树)的介绍及Java实现

    在Java中实现Trie树,首先定义节点类`TrieNode`,包括指向26个子节点的数组(表示26个英文字母),字符数据成员`data`,以及词频计数器`freq`。例如: ```java public class TrieNode { public TrieNode[] ...

    java数组-基于java实现的双数组Trie树.zip

    本项目聚焦于利用Java实现的双数组Trie树,这是一种在字符串处理和搜索中广泛使用的数据结构。 Trie树,又称“前缀树”或“字典树”,是一种用于存储键值对的数据结构,特别适用于字符串。它的主要特点是能以O(1)的...

    一棵Java字典树(trie)和它的增删改查

    一棵用List来存储子结点的字典树——当然,也可以用哈希表等形式存储。 这篇笔记记录了构建思路,文末是源码 一、构建思路 Step1 设计结点——数据结构 Step2 实现相应的操作方法——增删改查 Step1 设计结点 我们...

    Java实现字典树TrieTree

    总的来说,Java实现的字典树TrieTree是一个强大的工具,尤其适用于处理大量字符串数据,如四六级高频词汇的存储和查询。通过理解和运用这种数据结构,我们可以提高算法效率,优化应用程序的性能。

    基于Java链表实现的字典树(trie)

    基于Java链表实现的字典树(trie),实现了增删改查等功能,它说摘要必须大于50字我还能说啥啥啥啥

    实现Trie_java_

    Trie,又称为字典树或前缀树,是一种用于存储关联数组的数据结构,它允许我们高效地进行字符串的查找、插入和删除操作。在Java中实现Trie数据结构,可以帮助我们快速处理大量字符串数据,例如在搜索引擎中进行关键词...

    关于tire树

    ### 关于Trie树 #### 一、Trie树简介 Trie树,又称为前缀树或字典树,是一种用于存储字符串的树形数据结构。Trie树的优势在于可以高效地支持字符串的查找操作,尤其是在处理大量词汇时表现尤为突出。在搜索引擎、...

    字典树~java语言

    在IT领域,字典树(Trie,也称为前缀树或字首树)是一种用于存储动态集合或关联数组的数据结构。它以高效的方式处理字符串查询,尤其适用于查找具有共同前缀的单词。字典树的主要优点是它可以快速地通过共享前缀来...

    什么是Tire树1

    在Java中,Trie树的节点通常由一个枚举类型定义节点种类(如内部节点和叶子节点),并使用一个字符数组来存储指向子节点的指针。以下是一个简化的TrieNode类的示例: ```java enum NodeKind {LN, BN} class ...

    递归递归

    可能包含了Trie树的插入、查找、删除等操作的递归版本。 8. **理解递归** - 在学习递归时,重要的是理解递归函数如何工作,包括它的基本情况、递归情况以及如何返回结果。 - 实践是掌握递归的关键,编写递归程序...

    Java敏感词过滤源码

    总结来说,这个Java敏感词过滤源码是基于DFA和Trie树算法实现的,可以高效地在文本中检测和处理敏感词汇。对于需要在Java环境中进行文本审核或内容过滤的开发者来说,这是一个非常有价值的资源。通过深入理解和修改...

    JAVA制作树TREE

    在Java编程中,"制作树TREE"通常是指创建和操作数据结构中的树。树是一种非线性数据结构,由节点...如果需要更深入地学习,可以进一步研究各种特定类型的树,如堆、B树、Trie树等,以及它们在算法和数据处理中的应用。

    java程序敏感词分析

    在这个Java程序中,我们可能会使用Trie树或者KMP算法,因为它们对于大量关键词的查找效率较高。Trie树能够快速定位并匹配关键词,而KMP算法则适用于处理单个字符串的模式匹配问题。 1. **Trie树**:Trie树是一种...

    DoubleArrayTrie:高级结构双数组Trie树(DoubleArrayTrie) java实现

    DoubleArrayTrie Java编写的DoubleArrayTrie介绍用法// construct and buildDoubleArrayTrie dat = new DoubleArrayTrie(); for(String word: words) { dat.Insert(word); } System.out.println(dat.Base.length); ...

    Java 实现文章汉字关键词(违禁词)识别

    总结来说,Java实现文章汉字关键词(违禁词)识别需要结合多种数据结构和算法,如哈希表、Trie树、分词库和过滤算法。通过合理的设计和优化,我们可以构建出高效、准确的违禁词检测系统,满足内容审核的需求。

    字典树实例--java实现

    在IT领域,字典树(Trie,也称为前缀树或字典树)是一种用于存储动态集合或关联数组的数据结构。它允许我们快速查找、插入和删除字符串,特别是对于有公共前缀的字符串,效率非常高。这个实例是用Java语言实现的,...

    WordTrie.java

    Java字典树实现。 精简优化版处理。适合Java屏蔽字并进行。进行检测并处理的情况

    关于java树型结构

    - Trie树(字典树):用于快速查找字符串,常用于自动补全功能。 - B树和B+树:常用于数据库和文件系统的索引结构,能高效地处理大数据量的存储和检索。 4. **Java集合框架中的树结构**: - `java.util.TreeSet`...

Global site tag (gtag.js) - Google Analytics