浏览 3285 次
精华帖 (0) :: 良好帖 (0) :: 新手帖 (0) :: 隐藏帖 (0)
|
|
---|---|
作者 | 正文 |
发表时间:2012-08-14
最后修改:2012-08-14
开源地址:https://github.com/edisonlz/suffixTree_ch o SuffixTree.SuffixTree -- The suffix tree structure. This is a thin wrapper around strmat's stree data structure. This isn't a complete wrapper yet; I need to find some time to complete this. The wrapper appears to be good enough for simple stuff. Methods of SuffixTree: o SuffixTree(alphabet=STREE_ASCII) Construct a new SuffixTree. By default, the alphabet used by the SuffixTree is ASCII. Other choices include STREE_DNA, STREE_RNA, and STREE_PROTEIN. o add(string, id) Adds a string to the suffix tree with an id. o root() Returns the root() SuffixNode of the tree. o num_nodes(): Returns the total number of nodes held in the tree. o match(string) Given a string, traverse the suffix tree and return a 3-tuple (match_length, suffix_node, endpos) o SuffixTree.SuffixNode (I need to fix the documentation here) Methods of num_children() find_child(char ch) children() next() parent() suffix_link() edgelen() edgestr() getch() labellen() labelstr() ident() num_leaves() leaf(int leafnum) o SuffixTree.SubstringDict -- An application of suffix trees toward substring matching. An example might help: >>> #coding=utf-8 >>> from SuffixTree import SubstringDict >>> sd = SubstringDict() >>> sd.__setitem__("我是python程序员",1) >>> sd.__setitem__("我是ruby程序员",2) >>> sd.__setitem__("我是javascript程序员",3) >>> sd.__setitem__("我是android程序员",4) >>> sd.__setitem__("我还是DBA",4) >>> print sd[“我是”] >>> print sd[“我还是”] >>> sd = SubstringDict() >>> sd["我是python程序员"] = 1 >>> sd["我是ruby程序员"] = 2 >>> sd["我是javascript程序员"] = 3 >>> sd["我是android程序员"] = 4 >>> sd["我还是DBA"] = 5 >>> print sd[“我还是”] SubstringDict provides a mapping that allows for substrings of keys. The keys do need to be strings though. 支持中文的方式是使用 base64,数据量回增加30%,对性能回有些损耗,但是,损耗不大 64 位 安装 : ARCHFLAGS="-arch i386 -arch x86_64" python setup.py installPython SuffixTree (后缀树)中文 声明:ITeye文章版权属于作者,受法律保护。没有作者书面许可不得转载。
推荐链接
|
|
返回顶楼 | |