A lesson of RegExp: 50x faster with just one line patch -

andrewstz

浏览: 132305 次
性别:
来自: 深圳

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

A lesson of RegExp: 50x faster with just one line patch

博客分类：

javascript前端

2009-05-11

A lesson of RegExp: 50x faster with just one line patch

While I'm developing WebSHi (which is the fastest syntax highlighter written by JavaScript), I also write many performance testings for other rivals. One of them is SyCODE Syntax Highlighter , which is written by silverdrag (水月). It derives from the famous SyntaxHighlighter 1.5.x (dp.SH for short) and as silverdrag's words, it should be 5x to 10x faster than original dp.SH .

But unfortunately, my testings can't prove it. Though it won't trigger the "script slowly" dialog like dp.SH when highlighting large file, in most cases, it only shows 2x faster than dp.SH on IE6. On the other side, when I tested it on FF, I was so surprised that SyCODE is extremely slow, it will cost 5s+ for processing a 700 lines JavaScript file while the original dp.SH only half second.

It's very strange, so I digged into SyCODE. I found that SyCODE highlights more words for JavaScript language (currently all my testcases are to highlight some JavaScript source code files). The original dp.SH (and most other rivals) only highlights the keywords of JavaScript language. SyCODE also highlights global names likeArray , Boolean , String , etc., and properties and methods like alert , charAt , onclick etc.. That means SyCODE need to do more text searching and replacement. I disabled such features and tested again, this time SyCODE is 2x faster than dp.SH.

So you will think the problem is just the extra words replacement. And what interesting is it just affect FF a lot, even SyCODE do more text processing, it's still faster than (or at least as fast as) dp.SH on other browsers (Safari, Chrome, Opera and IE).

I'm curious about the root cause of the problem. After some researching, I located it. Just one simple function:


GetKeywords: function(str) {
 return '\\b' + str.replace(/\s+/g, '\\b|\\b') + '\\b';
},

The function GetKeywords is used to generate a regexp for keywords search and replacement. For example,GetKeywords("abstract break byte case catch") will return a regexp/\babstract\b|\bbreak\b|\bbyte\b|\bcase\b|\bcatch\b/ .

The code is straightforward, but it's bad and generate a very inefficient regexp.

The keypoint is \b , \b is a word boundary assertion. To test whether a position is a word boundary, the regexp engine need to consider both the left character of the position and the right character of the position. If one is a word character (aka a-z, A-Z, 0-9 and the underscore "_") and the other is not, then it's a word boundary. You see it need both look forward one char and look backward one char. Though \b assertion is not very expensive, each failed match of /\babstract\b|\bbreak\b|\bbyte\b|\bcase\b|\bcatch\b/ will do such look forward/backward 10 times, and JavaScript language has 50+ keywords means each failed match will do 100 times, and SyCODE add 400+ properties/methods words means extra 800+ times!

Of coz, \b assertion can be easily optimized, but our test result shows that FF's regexp engine doesn't do a good optimization at all.

Anyway, there is a very cheap way to solve the problem. Most of those \b assertions are unnecessary ./\babstract\b|\bbreak\b|\bbyte\b|\bcase\b|\bcatch\b/ can be rewrite as/\b(?:abstract|break|byte|case|catch)\b/ , those two regexp are equal, the only difference is the latter only need two \b assertion. Yes, we just need two , even SyCODE add 400+ words, we still just need two .

It's trivial to fix GetKeywords :


GetKeywords: function(str) {
 return '\\b' + str.replace(/\s+/g, '\\b|\\b') + '\\b';

 return '\\b(' + str.replace(/\s+/g, '|') + ')\\b';

},

Let's see the result of applying this one line patch:

Test results of FF3 code lines original patched

700	6.7s	0.1s
1600	15.5s	0.3s
4300	41.5s	0.7s

Oops, one line code cause 50x difference.

Besides FF, the patch also help other browsers a lot.

Test results of 4300 lines of code browser original patched

IE6	6.3s	2.8s
Safari3	2.6s	0.8s
Opera9	8.2s	1.7s
Chrome1	2.3s	0.5s

As we can see, even Chrome, which introduce a very optimized regexp engine, also shows at least 4x difference.

SyCODE derives from dp.SH, the GetKeywords function is also the legacy from dp.SH, and even the newSyntaxHighlighter 2 still use the similar code. Because dp.SH only highlight about 50 keywords for JavaScript langauge, you will not see performance issue like SyCODE, but applying this one line patch still introduce 20% faster on most browsers.

But this patch is not the end. In next article, I will discuss a complex technique to get another 10% to 40% faster for keywords search/replacement.

分享到：

less学习 | JavaScript的未来方向之观察

2014-03-13 17:14
浏览 541
评论(0)
分类:Web前端
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

A lesson of RegExp: 50x faster with just one line patch

写对正则：一行代码，速度差50倍

2009-05-11

A lesson of RegExp: 50x faster with just one line patch

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

A lesson of RegExp: 50x faster with just one line patch

写对正则：一行代码，速度差50倍

2009-05-11

A lesson of RegExp: 50x faster with just one line patch

评论

发表评论

相关推荐

解决vue-resource的post的问题

Lisp之根源

css、js干货

js类型, 值, 变量

双向绑定框架学习研究

双向绑定框架学习研究

2015-11~ 12 react

远程js调试工具

苹果授权深圳

Jasmine入门教程

年度回顾：2013年前端技术盘点

JS逗号运算符的用法详解

闭包原理详解——深入浏览器底层解析js的实际过程

后台ContentType参数及对应类型

js版压缩解压算法

jquery的ajax请求忽然总是报错

基于有限状态机与交互组件设计与实现

浏览器底层相关

网站统计中的数据收集原理及实现

第三方应用中间件收集

最近访客更多访客>>