锁定老帖子 主题:cc的sql优化实战
精华帖 (0) :: 良好帖 (0) :: 新手帖 (0) :: 隐藏帖 (0)
|
|
---|---|
作者 | 正文 |
发表时间:2011-11-11
最近帮同事优化了一个SQL,优化前66秒,优化后0.8秒,优化的过程中总结了一些常用的需要注意的事项, 没有时间整理格式,自己留个记录,附件是原始SQL和优化后SQL脚本。
1:拿到原SQL,第一是先运行一下,看看运行时间状况和执行计划,是否有全表扫描的情况。 发现没有全表的情况,首先大致排除索引使用的问题(索引情况很多,这里只是大致排除,因为 前提是知道查询源表数据量很大,而查询结果很小,理论上应该必须使用索引才能提高效率)。 2:开始考虑源数据量的问题。 先分析原有SQL的逻辑目的,即它究竟是想干什么,从哪些表里查询,查询什么条件的什么数据? 查看逻辑的经验是从SQL里层往外剥,一层一层执行(SQL执行的大致原理也是先内后外), 注意这里可能需要调整原有SQL的条件限制,目的只是查看它的查询逻辑,而不是优化调整。 那么看到里层第一个能执行的SQL如下: ① select (select ca.award_amount from channel.c_award ca where ca.cycle_id = c.cycle_id and ca.channel_id = c.channel_id and ca.state = 1 and ca.element_code = 'C_YDJHCJ') C_YDJHCJ, (select ca.award_amount from channel.c_award ca where ca.cycle_id = c.cycle_id and ca.channel_id = c.channel_id and ca.state = 1 and ca.element_code = 'C_JHCJHJ') C_JHCJHJ, (select ca.award_amount from channel.c_award ca where ca.cycle_id = c.cycle_id and ca.channel_id = c.channel_id and ca.state = 1 and ca.element_code = 'C_XYWCJHJ') C_XYWCJHJ, (select ca.award_amount from channel.c_award ca where ca.cycle_id = c.cycle_id and ca.channel_id = c.channel_id and ca.state = 1 and ca.element_code = 'C_XYWSYZWCJHJ') C_XYWSYZWCJHJ, (select ca.award_amount from channel.c_award ca where ca.cycle_id = c.cycle_id and ca.channel_id = c.channel_id and ca.state = 1 and ca.element_code = 'C_KZCZCJ') C_KZCZCJ, (select ca.award_amount from channel.c_award ca where ca.cycle_id = c.cycle_id and ca.channel_id = c.channel_id and ca.state = 1 and ca.element_code = 'C_SF_XYDJCJ') C_SF_XYDJCJ, (select ca.award_amount from channel.c_award ca where ca.cycle_id = c.cycle_id and ca.channel_id = c.channel_id and ca.state = 1 and ca.element_code = '_C_CJZE') C_CJZE, d.region_code from channel.c_award c, channel.C_AWARD_BILL_CYCLE a, channel.channel_dept d where c.cycle_id = a.bill_cycle and c.state = 1 and c.channel_id = d.channel_id and d.state = 1 and c.element_code = 'C_DMLX' and c.amount in (311, 312, 313, 32, 33) 这里有人可能要说了,不对啊,最里层第一个SQL语句类似的是: select ca.award_amount from channel.c_award ca where ca.cycle_id = c.cycle_id and ca.channel_id = c.channel_id and ca.state = 1 and ca.element_code = 'C_YDJHCJ' 适当调整一下也能执行啊,是,调整一下的确能执行,分析一下也行,该SQL大致是 查询channel.c_award表中符合and ca.state = 1 and ca.element_code = 'C_YDJHCJ' 的数据,至于where ca.cycle_id = c.cycle_id and ca.channel_id = c.channel_id 这两个条件就需要看外层的SQL了,那么来解释一下外层的SQL①,首先是确定数据源 from channel.c_award c, channel.C_AWARD_BILL_CYCLE a, channel.channel_dept d 然后是过滤数据 where c.cycle_id = a.bill_cycle and c.state = 1 and c.channel_id = d.channel_id and d.state = 1 and c.element_code = 'C_DMLX' and c.amount in (311, 312, 313, 32, 33) 最后是解析表达式 select (select ca.award_amount from channel.c_award ca where ca.cycle_id = c.cycle_id and ca.channel_id = c.channel_id and ca.state = 1 and ca.element_code = 'C_YDJHCJ') C_YDJHCJ, (select ca.award_amount from channel.c_award ca where ca.cycle_id = c.cycle_id and ca.channel_id = c.channel_id and ca.state = 1 and ca.element_code = 'C_JHCJHJ') C_JHCJHJ, (select ca.award_amount from channel.c_award ca where ca.cycle_id = c.cycle_id and ca.channel_id = c.channel_id and ca.state = 1 and ca.element_code = 'C_XYWCJHJ') C_XYWCJHJ, (select ca.award_amount from channel.c_award ca where ca.cycle_id = c.cycle_id and ca.channel_id = c.channel_id and ca.state = 1 and ca.element_code = 'C_XYWSYZWCJHJ') C_XYWSYZWCJHJ, (select ca.award_amount from channel.c_award ca where ca.cycle_id = c.cycle_id and ca.channel_id = c.channel_id and ca.state = 1 and ca.element_code = 'C_KZCZCJ') C_KZCZCJ, (select ca.award_amount from channel.c_award ca where ca.cycle_id = c.cycle_id and ca.channel_id = c.channel_id and ca.state = 1 and ca.element_code = 'C_SF_XYDJCJ') C_SF_XYDJCJ, (select ca.award_amount from channel.c_award ca where ca.cycle_id = c.cycle_id and ca.channel_id = c.channel_id and ca.state = 1 and ca.element_code = '_C_CJZE') C_CJZE, d.region_code 说解析表达式可能不太明白,那就说最后是执行SELECT(注:如果有ORDER BY的话,会在SELECT后执行)。 ①这条SQL在pl/sql developer里执行显示结果很快,但要全部查询显示结果需要27秒左右,说明该查询 结果条目数大(9012条)。 但是条目数大也不应该这么慢,SELECT * FROM channel_dept;查询出13518条数据也就需要11秒左右, 这说明肯定还有其它原因,那么仔细看①SQL语句的SELECT的内容,发现,每一个表达式都是一个SELECT语句,一共有 7个SELECT作为最终的查询表达式来一个一个解析。 具体到这个SQL,也就是说在 from channel.c_award c, channel.C_AWARD_BILL_CYCLE a, channel.channel_dept d where c.cycle_id = a.bill_cycle and c.state = 1 and c.channel_id = d.channel_id and d.state = 1 and c.element_code = 'C_DMLX' and c.amount in (311, 312, 313, 32, 33) 数据源中每查询一条数据出来都需要和①SQL里SELECT里的每一个表达式字段关联再执行该字段对应的自己的 SQL语句(拿出7个中的一个做例子) select ca.award_amount from channel.c_award ca where ca.cycle_id = c.cycle_id and ca.channel_id = c.channel_id and ca.state = 1 and ca.element_code = 'C_YDJHCJ') C_YDJHCJ, 最后才能跳出符合条件的表达式结果(即查询结果字段),也就是说每一条符合条件的数据源都要在经过7次 表达式的SELECT查询逻辑,但是这个7个SELECT查询逻辑虽然是严格的内连接过滤查询,只查询出一条符合的记录(速度应该会很快), where ca.cycle_id = c.cycle_id and ca.channel_id = c.channel_id and ca.state = 1--这里的1也可以用c.state代替 and ca.element_code = 'C_YDJHCJ' 【因为分析channel.c_award表字段可知:字段channel_id,state,cycle_id,element_code联合唯一(具体业务 是不是先不管了,那是题外话)】 而且最后查询出一个字段作为①SQL的一个查询结果表达式字段,理论上和查询单个表达式字段一样,没有很大区别啊。
确实,单独执行以下SQL速度很快:0.047秒(因为太符合采用索引的条件了,大数据源,小结果,channel.c_award有168万数据,只查询出一条记录) ⑨select ca.award_amount from channel.c_award ca where ca.cycle_id = 201109 and ca.channel_id = 10045 and ca.state = 1 and ca.element_code = 'C_YDJHCJ';
即使是7个这样的SQL叠加也不到半秒啊,那么性能问题是不是不在这里,哪里出了问题呢?? 先看看下面这些例子: ②SELECT (select ca.award_amount from channel.c_award ca where ca.cycle_id = c.cycle_id and ca.channel_id = c.channel_id and ca.state = 1 and ca.element_code = 'C_YDJHCJ' ) C_YDJHCJ from channel.c_award c, channel.C_AWARD_BILL_CYCLE a, channel.channel_dept d where c.cycle_id = a.bill_cycle and c.state = 1 and c.channel_id = d.channel_id and d.state = 1 and c.element_code = 'C_DMLX' and c.amount in (311, 312, 313, 32, 33) ; 这条SQL大概要5-7秒, ③SELECT (select ca.award_amount from channel.c_award ca where ca.cycle_id = 201109 and ca.channel_id = 10045 and ca.state = 1 and ca.element_code = 'C_YDJHCJ' ) C_YDJHCJ from channel.c_award c, channel.C_AWARD_BILL_CYCLE a, channel.channel_dept d where c.cycle_id = a.bill_cycle and c.state = 1 and c.channel_id = d.channel_id and d.state = 1 and c.element_code = 'C_DMLX' and c.amount in (311, 312, 313, 32, 33) ; 这条SQL需要1.7秒左右, ④SELECT (select 1 from dual ) C_YDJHCJ from channel.c_award c, channel.C_AWARD_BILL_CYCLE a, channel.channel_dept d where c.cycle_id = a.bill_cycle and c.state = 1 and c.channel_id = d.channel_id and d.state = 1 and c.element_code = 'C_DMLX' and c.amount in (311, 312, 313, 32, 33) ; 这条SQL也是1.7秒左右 ⑤SELECT 1 C_YDJHCJ from channel.c_award c, channel.C_AWARD_BILL_CYCLE a, channel.channel_dept d where c.cycle_id = a.bill_cycle and c.state = 1 and c.channel_id = d.channel_id and d.state = 1 and c.element_code = 'C_DMLX' and c.amount in (311, 312, 313, 32, 33) ; 这条SQL也是1.7秒左右 注意看②-⑤条SQL的区别, 第②条SQL和外层表进行了关联,③和④都没有关联,④把表达式SELECT的表变成了dual伪表, ⑤直接去掉了表达式SELECT,用字面量1代替 但③④⑤的效率一样, 【注:③之所以能和④,⑤效率差不多完全是因为③查询条件的每个字段都采用了index,从大数量中 搂出小数据,由此可见INDEX的强大。】 由此可见表达式的SELECT和外层数据源之间的关联才是性能消耗的原因,这就解释了之前分析⑨SQL时的疑惑了, 性能不是出在单表SQL上,而是内外层关联上,再加上7个字段都是这种内外层关联,一个5-7秒,加起来大概35秒,接近27秒的原因了, 那么到底是内外层关联影响大,还是字段子查询个数的影响大呢?? 看看这个例子: SELECT ( select ca.award_amount from channel.c_award ca where ca.cycle_id = 201109 and ca.channel_id = 10045 and ca.state = 1 and ca.element_code = 'C_YDJHCJ' ) C_YDJHCJ, ( select ca.award_amount from channel.c_award ca where ca.cycle_id = 201109 and ca.channel_id = 10045 and ca.state = 1 and ca.element_code = 'C_YDJHCJ' ) C_YDJHCJ2 from channel.c_award c, channel.C_AWARD_BILL_CYCLE a, channel.channel_dept d where c.cycle_id = a.bill_cycle and c.state = 1 and c.channel_id = d.channel_id and d.state = 1 and c.element_code = 'C_DMLX' and c.amount in (311, 312, 313, 32, 33) ; 需要2.1秒左右,比一个字段的SELECT多了0.4秒, SELECT ( select ca.award_amount from channel.c_award ca where ca.cycle_id = 201109 and ca.channel_id = 10045 and ca.state = 1 and ca.element_code = 'C_YDJHCJ' ) C_YDJHCJ, ( select ca.award_amount from channel.c_award ca where ca.cycle_id = 201109 and ca.channel_id = 10045 and ca.state = 1 and ca.element_code = 'C_YDJHCJ' ) C_YDJHCJ2, ( select ca.award_amount from channel.c_award ca where ca.cycle_id = 201109 and ca.channel_id = 10045 and ca.state = 1 and ca.element_code = 'C_YDJHCJ' ) C_YDJHCJ3 from channel.c_award c, channel.C_AWARD_BILL_CYCLE a, channel.channel_dept d where c.cycle_id = a.bill_cycle and c.state = 1 and c.channel_id = d.channel_id and d.state = 1 and c.element_code = 'C_DMLX' and c.amount in (311, 312, 313, 32, 33) ; 需要2.3秒,比两个字段的SQL又多了0.2秒, 可以发现增加字段的SELECT确实影响了效率,但由于都没有和外层关联,所以影响微乎其微。 再看下面几个: SELECT (select ca.award_amount from channel.c_award ca where ca.cycle_id = c.cycle_id and ca.channel_id = c.channel_id and ca.state = 1 and ca.element_code = 'C_YDJHCJ' ) C_YDJHCJ, ( select ca.award_amount from channel.c_award ca where ca.cycle_id = 201109 and ca.channel_id = 10045 and ca.state = 1 and ca.element_code = 'C_YDJHCJ' ) C_YDJHCJ2, ( select ca.award_amount from channel.c_award ca where ca.cycle_id = 201109 and ca.channel_id = 10045 and ca.state = 1 and ca.element_code = 'C_YDJHCJ' ) C_YDJHCJ3 from channel.c_award c, channel.C_AWARD_BILL_CYCLE a, channel.channel_dept d where c.cycle_id = a.bill_cycle and c.state = 1 and c.channel_id = d.channel_id and d.state = 1 and c.element_code = 'C_DMLX' and c.amount in (311, 312, 313, 32, 33) ; 花了5.8秒, SELECT (select ca.award_amount from channel.c_award ca where ca.cycle_id = c.cycle_id and ca.channel_id = c.channel_id and ca.state = 1 and ca.element_code = 'C_YDJHCJ' ) C_YDJHCJ, ( select ca.award_amount from channel.c_award ca where ca.cycle_id = c.cycle_id and ca.channel_id = c.channel_id and ca.state = 1 and ca.element_code = 'C_YDJHCJ' ) C_YDJHCJ2, ( select ca.award_amount from channel.c_award ca where ca.cycle_id = 201109 and ca.channel_id = 10045 and ca.state = 1 and ca.element_code = 'C_YDJHCJ' ) C_YDJHCJ3 from channel.c_award c, channel.C_AWARD_BILL_CYCLE a, channel.channel_dept d where c.cycle_id = a.bill_cycle and c.state = 1 and c.channel_id = d.channel_id and d.state = 1 and c.element_code = 'C_DMLX' and c.amount in (311, 312, 313, 32, 33) ; 这个花了10.9秒(如果是3个都和外层关联是14.2秒),由此可见和外层关联是影响效率的主要因素,如有有N个字段,只有其中 一个字段和外层关联,那么其它N-1个字段对效率的影响可以忽略,如果再加一个字段也和外层关联, 那么这个效率影响就是一个字段和外层关联的两倍,以此类推。
那么这里的7个SELECT字段难道就不能只和外层数据源关联一次吗??这样不就节省掉了6/7的时间???
再仔细观察②-⑤条SQL查询的结果都是9012条数据,都和⑤的结果条目数一致,也就是说和外层数据源结果 的条目数一致,和每个表达式SELECT的条目数无关。 以下是证明: select ca.award_amount from channel.c_award ca where ca.cycle_id = 201109 and ca.channel_id = 10045 and ca.state = 1 and ca.element_code = 'C_YDJHCJ'; 结果是1条, 而 SELECT ( select ca.award_amount from channel.c_award ca where ca.cycle_id = 201109 and ca.channel_id = 10045 and ca.state = 1 and ca.element_code = 'C_YDJHCJ' ) C_YDJHCJ from channel.c_award c, channel.C_AWARD_BILL_CYCLE a, channel.channel_dept d where c.cycle_id = a.bill_cycle and c.state = 1 and c.channel_id = d.channel_id and d.state = 1 and c.element_code = 'C_DMLX' and c.amount in (311, 312, 313, 32, 33) ; 仍然是9012条,和外层的条目数一致。
那这是在干什么,有关联关系,想想??想想外连接,对啊,这不就和外连接类似吗?? 左+号已右表数据条数为准,右+号已左表条目数为准,而SELECT不是也能当做FROM 的数据源吗?? 想到这里就可以进行修改了,将①SQL修改为 ⑥SELECT C_DMLX.channel_id,C_DMLX.region_code,/*C_DMLX.AWARD_AMOUNT,*/ C_YDJHCJ.AWARD_AMOUNT C_YDJHCJ, C_JHCJHJ.AWARD_AMOUNT C_JHCJHJ, C_XYWCJHJ.AWARD_AMOUNT C_XYWCJHJ, C_XYWSYZWCJHJ.AWARD_AMOUNT C_XYWSYZWCJHJ, C_KZCZCJ.AWARD_AMOUNT C_KZCZCJ, C_SF_XYDJCJ.AWARD_AMOUNT C_SF_XYDJCJ, C_CJZE.AWARD_AMOUNT C_CJZE FROM ( SELECT d.channel_id,d.region_code,c.cycle_id,c.award_amount FROM CHANNEL.C_AWARD C, CHANNEL.C_AWARD_BILL_CYCLE A, CHANNEL.CHANNEL_DEPT D WHERE C.CYCLE_ID = A.BILL_CYCLE AND C.STATE = 1 AND C.CHANNEL_ID = D.CHANNEL_ID AND D.STATE = 1 AND C.ELEMENT_CODE = 'C_DMLX' /*AND C.AMOUNT IN (311, 312, 313, 32, 33)*/ ) C_DMLX, ( select ca.channel_id,ca.award_amount from channel.c_award ca, CHANNEL.C_AWARD_BILL_CYCLE A where ca.cycle_id = A.BILL_CYCLE
and ca.state = 1 and ca.element_code = 'C_YDJHCJ' AND ca.award_amount<>0 ) C_YDJHCJ, ( select ca.channel_id,ca.award_amount from channel.c_award ca, CHANNEL.C_AWARD_BILL_CYCLE A where ca.cycle_id = A.BILL_CYCLE
and ca.state = 1 and ca.element_code = 'C_JHCJHJ' AND ca.award_amount<>0 ) C_JHCJHJ, ( select ca.channel_id,ca.award_amount from channel.c_award ca, CHANNEL.C_AWARD_BILL_CYCLE A where ca.cycle_id = A.BILL_CYCLE
and ca.state = 1 and ca.element_code = 'C_XYWCJHJ' AND ca.award_amount<>0 ) C_XYWCJHJ, ( select ca.channel_id,ca.award_amount from channel.c_award ca, CHANNEL.C_AWARD_BILL_CYCLE A where ca.cycle_id = A.BILL_CYCLE
and ca.state = 1 and ca.element_code = 'C_XYWSYZWCJHJ' AND ca.award_amount<>0 ) C_XYWSYZWCJHJ, ( select ca.channel_id,ca.award_amount from channel.c_award ca, CHANNEL.C_AWARD_BILL_CYCLE A where ca.cycle_id = A.BILL_CYCLE
and ca.state = 1 and ca.element_code = 'C_KZCZCJ' AND ca.award_amount<>0 ) C_KZCZCJ, ( select ca.channel_id,ca.award_amount from channel.c_award ca, CHANNEL.C_AWARD_BILL_CYCLE A where ca.cycle_id = A.BILL_CYCLE
and ca.state = 1 and ca.element_code = 'C_SF_XYDJCJ' AND ca.award_amount<>0 ) C_SF_XYDJCJ, ( select ca.channel_id,ca.award_amount from channel.c_award ca, CHANNEL.C_AWARD_BILL_CYCLE A where ca.cycle_id = A.BILL_CYCLE
and ca.state = 1 and ca.element_code = '_C_CJZE' AND ca.award_amount<>0 ) C_CJZE/*, CHANNEL.C_AWARD_BILL_CYCLE A, CHANNEL.CHANNEL_DEPT D*/ WHERE C_DMLX.CHANNEL_ID=C_YDJHCJ.Channel_Id(+) AND C_DMLX.CHANNEL_ID=C_JHCJHJ.Channel_Id(+) AND C_DMLX.CHANNEL_ID=C_XYWCJHJ.Channel_Id(+) AND C_DMLX.CHANNEL_ID=C_XYWSYZWCJHJ.Channel_Id(+) AND C_DMLX.CHANNEL_ID=C_KZCZCJ.Channel_Id(+) AND C_DMLX.CHANNEL_ID=C_SF_XYDJCJ.Channel_Id(+) AND C_DMLX.CHANNEL_ID=C_CJZE.Channel_Id(+)
执行时间为4.7秒左右,乘以7=32秒左右和原始①SQL执行时间27秒接近,而结果一样,达到目的。 那么这4.7秒也很慢啊,这是什么原因啊? 那么现在就来分析⑥SQL,这里是以C_DMLX为主表,其它表没有数据就补空,C_DMLX主表有9012条数据。 整个执行时间4.7秒,如果确定主表只查询出一条数据呢??稍微修改一下⑥SQL变为下面 ⑦SELECT C_DMLX.channel_id,C_DMLX.region_code,/*C_DMLX.AWARD_AMOUNT,*/ C_YDJHCJ.AWARD_AMOUNT C_YDJHCJ, C_JHCJHJ.AWARD_AMOUNT C_JHCJHJ, C_XYWCJHJ.AWARD_AMOUNT C_XYWCJHJ, C_XYWSYZWCJHJ.AWARD_AMOUNT C_XYWSYZWCJHJ, C_KZCZCJ.AWARD_AMOUNT C_KZCZCJ, C_SF_XYDJCJ.AWARD_AMOUNT C_SF_XYDJCJ, C_CJZE.AWARD_AMOUNT C_CJZE FROM ( SELECT d.channel_id,d.region_code,c.cycle_id,c.award_amount FROM CHANNEL.C_AWARD C, CHANNEL.C_AWARD_BILL_CYCLE A, CHANNEL.CHANNEL_DEPT D WHERE C.CYCLE_ID = A.BILL_CYCLE AND C.STATE = 1 AND C.CHANNEL_ID = D.CHANNEL_ID AND D.STATE = 1 AND C.ELEMENT_CODE = 'C_DMLX' /*AND C.AMOUNT IN (311, 312, 313, 32, 33)*/ ) C_DMLX, ( select ca.channel_id,ca.award_amount from channel.c_award ca, CHANNEL.C_AWARD_BILL_CYCLE A where ca.cycle_id = A.BILL_CYCLE
and ca.state = 1 and ca.element_code = 'C_YDJHCJ' AND ca.award_amount<>0 ) C_YDJHCJ, ( select ca.channel_id,ca.award_amount from channel.c_award ca, CHANNEL.C_AWARD_BILL_CYCLE A where ca.cycle_id = A.BILL_CYCLE
and ca.state = 1 and ca.element_code = 'C_JHCJHJ' AND ca.award_amount<>0 ) C_JHCJHJ, ( select ca.channel_id,ca.award_amount from channel.c_award ca, CHANNEL.C_AWARD_BILL_CYCLE A where ca.cycle_id = A.BILL_CYCLE
and ca.state = 1 and ca.element_code = 'C_XYWCJHJ' AND ca.award_amount<>0 ) C_XYWCJHJ, ( select ca.channel_id,ca.award_amount from channel.c_award ca, CHANNEL.C_AWARD_BILL_CYCLE A where ca.cycle_id = A.BILL_CYCLE
and ca.state = 1 and ca.element_code = 'C_XYWSYZWCJHJ' AND ca.award_amount<>0 ) C_XYWSYZWCJHJ, ( select ca.channel_id,ca.award_amount from channel.c_award ca, CHANNEL.C_AWARD_BILL_CYCLE A where ca.cycle_id = A.BILL_CYCLE
and ca.state = 1 and ca.element_code = 'C_KZCZCJ' AND ca.award_amount<>0 ) C_KZCZCJ, ( select ca.channel_id,ca.award_amount from channel.c_award ca, CHANNEL.C_AWARD_BILL_CYCLE A where ca.cycle_id = A.BILL_CYCLE
and ca.state = 1 and ca.element_code = 'C_SF_XYDJCJ' AND ca.award_amount<>0 ) C_SF_XYDJCJ, ( select ca.channel_id,ca.award_amount from channel.c_award ca, CHANNEL.C_AWARD_BILL_CYCLE A where ca.cycle_id = A.BILL_CYCLE
and ca.state = 1 and ca.element_code = '_C_CJZE' AND ca.award_amount<>0 ) C_CJZE/*, CHANNEL.C_AWARD_BILL_CYCLE A, CHANNEL.CHANNEL_DEPT D*/ WHERE C_DMLX.CHANNEL_ID=C_YDJHCJ.Channel_Id(+) AND C_DMLX.CHANNEL_ID=C_JHCJHJ.Channel_Id(+) AND C_DMLX.CHANNEL_ID=C_XYWCJHJ.Channel_Id(+) AND C_DMLX.CHANNEL_ID=C_XYWSYZWCJHJ.Channel_Id(+) AND C_DMLX.CHANNEL_ID=C_KZCZCJ.Channel_Id(+) AND C_DMLX.CHANNEL_ID=C_SF_XYDJCJ.Channel_Id(+) AND C_DMLX.CHANNEL_ID=C_CJZE.Channel_Id(+) AND C_DMLX.CHANNEL_ID=102096;//添加的条件 执行一下,天啊,只需要0.172秒,再多执行几次也一样,不敢相信自己的眼睛吧, ⑦SQL就添加了AND C_DMLX.CHANNEL_ID=102096用于过滤出一条数据,c_award表有168万数据, 从中查询9012条数据需要4.7秒,查询一条数据需要0.172秒,都用了索引,而索引的使用原则是 大数据量中查询小结果。【这页正验证了这个原理,不信的话你可以把条件都去掉,直接查询168万数据 出来,估计你等不到结果,这样查询大数据量的话不用索引反而比用索引快,你可以线下自己试试, 至于为什么,自己去查索引的相关知识吧。】 那么就来考虑这9012条数据是否都有用,其中每一条数据对最终的结果都会产生影响??? 是否可以把对结果不会产生影响的数据在where里给过滤掉,这样查询结果集就会变小,速度也会提升。 【用土话说就是PLSQL/DEVELOPER在查询结果大数据量时列出默认的数据和列出所有结果数据时间肯定不一样, 时间就浪费在列出所有数据上,就是点击那个两个向下箭头带一个横线的图标。(其实这是PLSQL/DEVELOPER的 优化,先缓存一屏幕数据出来,你要看全部了点击按钮它再取出所有需要的数据。那么真正该SQL执行所需要的 时间就是列出所有数据所需要的时间)】
要想知道⑥SQL查询出的9012条数据是否全部有用,就需要往外层看,看外层SQL拿这些数据干了些什么: select sum(nvl(C_YDJHCJ, 0)) C_YDJHCJ, sum(nvl(C_JHCJHJ, 0)) C_JHCJHJ, sum(nvl(C_XYWCJHJ, 0)) C_XYWCJHJ, sum(nvl(C_XYWSYZWCJHJ, 0)) C_XYWSYZWCJHJ, sum(nvl(C_KZCZCJ, 0)) C_KZCZCJ, sum(nvl(C_SF_XYDJCJ, 0)) C_SF_XYDJCJ, sum(nvl(C_CJZE, 0)) C_CJZE, region_code from (select (select ca.award_amount from channel.c_award ca where ca.cycle_id = c.cycle_id and ca.channel_id = c.channel_id and ca.state = 1 and ca.element_code = 'C_YDJHCJ') C_YDJHCJ, (select ca.award_amount from channel.c_award ca where ca.cycle_id = c.cycle_id and ca.channel_id = c.channel_id and ca.state = 1 and ca.element_code = 'C_JHCJHJ') C_JHCJHJ, (select ca.award_amount from channel.c_award ca where ca.cycle_id = c.cycle_id and ca.channel_id = c.channel_id and ca.state = 1 and ca.element_code = 'C_XYWCJHJ') C_XYWCJHJ, (select ca.award_amount from channel.c_award ca where ca.cycle_id = c.cycle_id and ca.channel_id = c.channel_id and ca.state = 1 and ca.element_code = 'C_XYWSYZWCJHJ') C_XYWSYZWCJHJ, (select ca.award_amount from channel.c_award ca where ca.cycle_id = c.cycle_id and ca.channel_id = c.channel_id and ca.state = 1 and ca.element_code = 'C_KZCZCJ') C_KZCZCJ, (select ca.award_amount from channel.c_award ca where ca.cycle_id = c.cycle_id and ca.channel_id = c.channel_id and ca.state = 1 and ca.element_code = 'C_SF_XYDJCJ') C_SF_XYDJCJ, (select ca.award_amount from channel.c_award ca where ca.cycle_id = c.cycle_id and ca.channel_id = c.channel_id and ca.state = 1 and ca.element_code = '_C_CJZE') C_CJZE, d.region_code from channel.c_award c, channel.C_AWARD_BILL_CYCLE a, channel.channel_dept d where c.cycle_id = a.bill_cycle and c.state = 1 and c.channel_id = d.channel_id and d.state = 1 and c.element_code = 'C_DMLX' and c.amount in (311, 312, 313, 32, 33)) group by region_code 这个SQL是我们以上分析完毕的①SQL的外层SQL,我们看它的查询结果字段表达式都是SUM处理,还有空值转换处理, 也就是说是想把内层①SQL查询出来的9012条数据按照region_code字段分组,其它7个字段各自在行分组内按行求和, 注意是行求和,不是列求和。 那么除了region_code的其它需要求和的7个字段都是(0或空)的数据就对最后SUM的结果没有任何影响了。 那这些数据就可以先过滤掉了,使⑥SQL(⑥SQL是我们用于替换①SQL的优化后的SQL,可以看做等价,但效率比①快6/7) 查询结果减少,又不影响最终外层的结果。
符合7个字段都是(0或空)的数据有多少呢?? ⑧SELECT C_DMLX.channel_id FROM ( SELECT d.channel_id,d.region_code,c.cycle_id,c.award_amount FROM CHANNEL.C_AWARD C, CHANNEL.C_AWARD_BILL_CYCLE A, CHANNEL.CHANNEL_DEPT D WHERE C.CYCLE_ID = A.BILL_CYCLE AND C.STATE = 1 AND C.CHANNEL_ID = D.CHANNEL_ID AND D.STATE = 1 AND C.ELEMENT_CODE = 'C_DMLX' /*AND C.AMOUNT IN (311, 312, 313, 32, 33)*/ ) C_DMLX, ( select ca.channel_id,ca.award_amount from channel.c_award ca, CHANNEL.C_AWARD_BILL_CYCLE A where ca.cycle_id = A.BILL_CYCLE
and ca.state = 1 and ca.element_code = 'C_YDJHCJ' AND ca.award_amount<>0 ) C_YDJHCJ, ( select ca.channel_id,ca.award_amount from channel.c_award ca, CHANNEL.C_AWARD_BILL_CYCLE A where ca.cycle_id = A.BILL_CYCLE
and ca.state = 1 and ca.element_code = 'C_JHCJHJ' AND ca.award_amount<>0 ) C_JHCJHJ, ( select ca.channel_id,ca.award_amount from channel.c_award ca, CHANNEL.C_AWARD_BILL_CYCLE A where ca.cycle_id = A.BILL_CYCLE
and ca.state = 1 and ca.element_code = 'C_XYWCJHJ' AND ca.award_amount<>0 ) C_XYWCJHJ, ( select ca.channel_id,ca.award_amount from channel.c_award ca, CHANNEL.C_AWARD_BILL_CYCLE A where ca.cycle_id = A.BILL_CYCLE
and ca.state = 1 and ca.element_code = 'C_XYWSYZWCJHJ' AND ca.award_amount<>0 ) C_XYWSYZWCJHJ, ( select ca.channel_id,ca.award_amount from channel.c_award ca, CHANNEL.C_AWARD_BILL_CYCLE A where ca.cycle_id = A.BILL_CYCLE
and ca.state = 1 and ca.element_code = 'C_KZCZCJ' AND ca.award_amount<>0 ) C_KZCZCJ, ( select ca.channel_id,ca.award_amount from channel.c_award ca, CHANNEL.C_AWARD_BILL_CYCLE A where ca.cycle_id = A.BILL_CYCLE
and ca.state = 1 and ca.element_code = 'C_SF_XYDJCJ' AND ca.award_amount<>0 ) C_SF_XYDJCJ, ( select ca.channel_id,ca.award_amount from channel.c_award ca, CHANNEL.C_AWARD_BILL_CYCLE A where ca.cycle_id = A.BILL_CYCLE
and ca.state = 1 and ca.element_code = '_C_CJZE' AND ca.award_amount<>0 ) C_CJZE/*, CHANNEL.C_AWARD_BILL_CYCLE A, CHANNEL.CHANNEL_DEPT D*/ WHERE C_DMLX.CHANNEL_ID=C_YDJHCJ.Channel_Id(+) AND C_DMLX.CHANNEL_ID=C_JHCJHJ.Channel_Id(+) AND C_DMLX.CHANNEL_ID=C_XYWCJHJ.Channel_Id(+) AND C_DMLX.CHANNEL_ID=C_XYWSYZWCJHJ.Channel_Id(+) AND C_DMLX.CHANNEL_ID=C_KZCZCJ.Channel_Id(+) AND C_DMLX.CHANNEL_ID=C_SF_XYDJCJ.Channel_Id(+) AND C_DMLX.CHANNEL_ID=C_CJZE.Channel_Id(+)
AND C_YDJHCJ.award_amount IS NULL AND C_JHCJHJ.award_amount IS NULL AND C_XYWCJHJ.award_amount IS NULL AND C_XYWSYZWCJHJ.award_amount IS NULL AND C_KZCZCJ.award_amount IS NULL AND C_SF_XYDJCJ.award_amount IS NULL AND C_CJZE.award_amount IS NULL
查询结果为8982条,耗时2.094秒,仔细观察上面的SQL,为什么在7个FROM的数据源中SELECT时就加上 AND ca.award_amount<>0的条件,而不是再最外面添加不为0的判断呢?? 由于这7个ca.award_amount的值很不统一,有的为0值,有的为空值,假如直接在最后的条件中添加过滤 的话,需要添加成类似AND (C_YDJHCJ.award_amount=0 OR C_YDJHCJ.award_amount IS NULL),这样明显 就复杂了,效率上会不会有影响我没试,首先感觉就不清爽,肯定也快不了。 因为反正是外连接,在最后查询结果中显示0和空值对结果的影响又都一样,索性就先在每个子数据源里将为0的数据过滤掉, 这样每个子数据源的结果数据量肯定有所减小,和主表关联也会少,速度会提升,而最后有外连接做保证,没有关联数据的拿空行补齐, 相当于字面量操作,不需从关联表中查询结果填充,又节省了时间(我很抠门儿啊。。。) 这样就相当于把所有0的数据都变成了Null的数据,外面过滤直接统一用null判断就行了,不用再关注0的情况了, 数据正确性也能保证。何乐而不为呢??嘿嘿!
那有人又要问了,为什么只在子数据源中添加AND ca.award_amount<>0,如果把AND ca.award_amount IS NOT NULL 添加上不是更好?因为会过滤掉更多的数据。变成类似下面的写法 select ca.channel_id,ca.award_amount from channel.c_award ca, CHANNEL.C_AWARD_BILL_CYCLE A where ca.cycle_id = A.BILL_CYCLE
and ca.state = 1 and ca.element_code = 'C_YDJHCJ' AND ca.award_amount<>0 AND ca.award_amount IS NOT NULL 是,其实理论上这样会过滤掉更多无效数据,但假如某个子数据源中的数据是三种情况: 1:ca.award_amount字段0和空值都用空值来表现 2:ca.award_amount字段0和空值都用0来表现 3:ca.award_amount字段0代表该条数据在该字段上有意义,只是结果为0,空值代表该条数据在该字段上没有意义 【举例:某家店面有A项酬金,但酬金值为0,和某家店面直接没有参与过A酬金是不一样的意义的。】 那么就需要看每个具体子数据源的该字段当初的定义了,如果确定 为2情况,那么只需要添加AND ca.award_amount<>0就行了,不需要额外添加没有意义的AND ca.award_amount IS NOT NULL 。如果确定是1的情况,那么只需要添加AND ca.award_amount IS NOT NULL,而不需要额外添加没有意义的AND ca.award_amount<>0, 如果该字段是3这种情况的话,只能两个条件都加上了。
那具体到现在的场景到底这里该这么添加条件呢?因为开发人员对于数据操作很不规范, 这7个子数据源的字段没法确定到底是哪种情况,所以只能用最保险的第三种情况的条件了,就是两个都加上, 但我为什么只加了个过滤0的条件呢?其实这里就看实际的测试了,我把两个条件都加上如下: SELECT C_DMLX.channel_id FROM ( SELECT d.channel_id,d.region_code,c.cycle_id,c.award_amount FROM CHANNEL.C_AWARD C, CHANNEL.C_AWARD_BILL_CYCLE A, CHANNEL.CHANNEL_DEPT D WHERE C.CYCLE_ID = A.BILL_CYCLE AND C.STATE = 1 AND C.CHANNEL_ID = D.CHANNEL_ID AND D.STATE = 1 AND C.ELEMENT_CODE = 'C_DMLX' /*AND C.AMOUNT IN (311, 312, 313, 32, 33)*/ ) C_DMLX, ( select ca.channel_id,ca.award_amount from channel.c_award ca, CHANNEL.C_AWARD_BILL_CYCLE A where ca.cycle_id = A.BILL_CYCLE
and ca.state = 1 and ca.element_code = 'C_YDJHCJ' AND ca.award_amount<>0 AND ca.award_amount IS NOT NULL ) C_YDJHCJ, ( select ca.channel_id,ca.award_amount from channel.c_award ca, CHANNEL.C_AWARD_BILL_CYCLE A where ca.cycle_id = A.BILL_CYCLE
and ca.state = 1 and ca.element_code = 'C_JHCJHJ' AND ca.award_amount<>0 AND ca.award_amount IS NOT NULL ) C_JHCJHJ, ( select ca.channel_id,ca.award_amount from channel.c_award ca, CHANNEL.C_AWARD_BILL_CYCLE A where ca.cycle_id = A.BILL_CYCLE
and ca.state = 1 and ca.element_code = 'C_XYWCJHJ' AND ca.award_amount<>0 AND ca.award_amount IS NOT NULL ) C_XYWCJHJ, ( select ca.channel_id,ca.award_amount from channel.c_award ca, CHANNEL.C_AWARD_BILL_CYCLE A where ca.cycle_id = A.BILL_CYCLE
and ca.state = 1 and ca.element_code = 'C_XYWSYZWCJHJ' AND ca.award_amount<>0 AND ca.award_amount IS NOT NULL ) C_XYWSYZWCJHJ, ( select ca.channel_id,ca.award_amount from channel.c_award ca, CHANNEL.C_AWARD_BILL_CYCLE A where ca.cycle_id = A.BILL_CYCLE
and ca.state = 1 and ca.element_code = 'C_KZCZCJ' AND ca.award_amount<>0 AND ca.award_amount IS NOT NULL ) C_KZCZCJ, ( select ca.channel_id,ca.award_amount from channel.c_award ca, CHANNEL.C_AWARD_BILL_CYCLE A where ca.cycle_id = A.BILL_CYCLE
and ca.state = 1 and ca.element_code = 'C_SF_XYDJCJ' AND ca.award_amount<>0 AND ca.award_amount IS NOT NULL ) C_SF_XYDJCJ, ( select ca.channel_id,ca.award_amount from channel.c_award ca, CHANNEL.C_AWARD_BILL_CYCLE A where ca.cycle_id = A.BILL_CYCLE
and ca.state = 1 and ca.element_code = '_C_CJZE' AND ca.award_amount<>0 AND ca.award_amount IS NOT NULL ) C_CJZE/*, CHANNEL.C_AWARD_BILL_CYCLE A, CHANNEL.CHANNEL_DEPT D*/ WHERE C_DMLX.CHANNEL_ID=C_YDJHCJ.Channel_Id(+) AND C_DMLX.CHANNEL_ID=C_JHCJHJ.Channel_Id(+) AND C_DMLX.CHANNEL_ID=C_XYWCJHJ.Channel_Id(+) AND C_DMLX.CHANNEL_ID=C_XYWSYZWCJHJ.Channel_Id(+) AND C_DMLX.CHANNEL_ID=C_KZCZCJ.Channel_Id(+) AND C_DMLX.CHANNEL_ID=C_SF_XYDJCJ.Channel_Id(+) AND C_DMLX.CHANNEL_ID=C_CJZE.Channel_Id(+)
AND C_YDJHCJ.award_amount IS NULL AND C_JHCJHJ.award_amount IS NULL AND C_XYWCJHJ.award_amount IS NULL AND C_XYWSYZWCJHJ.award_amount IS NULL AND C_KZCZCJ.award_amount IS NULL AND C_SF_XYDJCJ.award_amount IS NULL AND C_CJZE.award_amount IS NULL
查询时间是2.094秒左右浮动,比只加AND ca.award_amount<>0的2.094秒一样,多执行几次也都几乎一样, 为什么会这样,其实好好想想,外关联从子数据源关联出一个空值字段,和自己补充一个空值字段能差多少呢? 关键是只有一个字段,又不是很多,而且关联查询出的值和关联不上自己需要补充的值一样,都是空值, ORACLE对于这种情况肯定有所优化,如果这个字段不空,又或者需要关联出子数据源的很多个字段出来,肯定 会有影响,这里只有空值,还就一个字段,所以不用考虑影响了。 那么说明没有说明区别,所以这里分析到就行了,如果两种执行结果差距很大,再来具体分析有说明问题。 由于两种写法的执行时间一样,不管里面添不添加空判断,外层条件的写法都一样,都需要添加空判断,时间还一样,那还不如里面 不添加AND ca.award_amount IS NOT NULL条件了,还能减小掉SQL得字符数量,让ORACLE解析SQL语句快一点了。 嘿嘿! 其实上面的分析也说明一开始对字段好的规范定义多么重要,由于IS NULL 和 IS NOT NULL都不能使用索引, 所以考虑在设计表的时候就对索引列设置为NOT NULL约束。
好了, 现在先YY一下,全量数据为9012条,其中里面有8982条没用的数据, 剩下30条才是最后决定结果的有用数据,坑爹呢。。。如果把这8982条数据去掉,最后只显示30条有用的数据, 那效率肯定会有质的提高。 要怎么把他们去掉呢???第一个想到的就是not in,对,⑦SQL not in ⑧就是想要的结果了,关联字段是 channel_id。 这样是没有问题,但经验告诉我们not in效率低下,为什么低下,因为它会使索引失效,类似于笛卡尔集的遍历。 但channel_id是有索引的啊,这么利用它得到和not in 一样的结果呢?? -----外连接加Null判断-----这是个替换not in的最常用的方法, 也就是⑦.channel_id=⑧.channel_id(+) and ⑧.channel_id is null
①SQL变为⑥SQL最终优化为下面的SQL ⑩SELECT q.* FROM ( SELECT C_DMLX.channel_id,C_DMLX.region_code,/*C_DMLX.AWARD_AMOUNT,*/ nvl(C_YDJHCJ.AWARD_AMOUNT,0) C_YDJHCJ, nvl(C_JHCJHJ.AWARD_AMOUNT,0) C_JHCJHJ, nvl(C_XYWCJHJ.AWARD_AMOUNT,0) C_XYWCJHJ, nvl(C_XYWSYZWCJHJ.AWARD_AMOUNT,0) C_XYWSYZWCJHJ, nvl(C_KZCZCJ.AWARD_AMOUNT,0) C_KZCZCJ, nvl(C_SF_XYDJCJ.AWARD_AMOUNT,0) C_SF_XYDJCJ, nvl(C_CJZE.AWARD_AMOUNT,0) C_CJZE FROM ( SELECT d.channel_id,d.region_code,c.cycle_id,c.award_amount FROM CHANNEL.C_AWARD C, CHANNEL.C_AWARD_BILL_CYCLE A, CHANNEL.CHANNEL_DEPT D WHERE C.CYCLE_ID = A.BILL_CYCLE AND C.STATE = 1 AND C.CHANNEL_ID = D.CHANNEL_ID AND D.STATE = 1 AND C.ELEMENT_CODE = 'C_DMLX' /*AND C.AMOUNT IN (311, 312, 313, 32, 33)*/ ) C_DMLX, ( select ca.channel_id,ca.award_amount from channel.c_award ca, CHANNEL.C_AWARD_BILL_CYCLE A where ca.cycle_id = A.BILL_CYCLE
and ca.state = 1 and ca.element_code = 'C_YDJHCJ' AND ca.award_amount<>0 ) C_YDJHCJ, ( select ca.channel_id,ca.award_amount from channel.c_award ca, CHANNEL.C_AWARD_BILL_CYCLE A where ca.cycle_id = A.BILL_CYCLE
and ca.state = 1 and ca.element_code = 'C_JHCJHJ' AND ca.award_amount<>0 ) C_JHCJHJ, ( select ca.channel_id,ca.award_amount from channel.c_award ca, CHANNEL.C_AWARD_BILL_CYCLE A where ca.cycle_id = A.BILL_CYCLE
and ca.state = 1 and ca.element_code = 'C_XYWCJHJ' AND ca.award_amount<>0 ) C_XYWCJHJ, ( select ca.channel_id,ca.award_amount from channel.c_award ca, CHANNEL.C_AWARD_BILL_CYCLE A where ca.cycle_id = A.BILL_CYCLE
and ca.state = 1 and ca.element_code = 'C_XYWSYZWCJHJ' AND ca.award_amount<>0 ) C_XYWSYZWCJHJ, ( select ca.channel_id,ca.award_amount from channel.c_award ca, CHANNEL.C_AWARD_BILL_CYCLE A where ca.cycle_id = A.BILL_CYCLE
and ca.state = 1 and ca.element_code = 'C_KZCZCJ' AND ca.award_amount<>0 ) C_KZCZCJ, ( select ca.channel_id,ca.award_amount from channel.c_award ca, CHANNEL.C_AWARD_BILL_CYCLE A where ca.cycle_id = A.BILL_CYCLE
and ca.state = 1 and ca.element_code = 'C_SF_XYDJCJ' AND ca.award_amount<>0 ) C_SF_XYDJCJ, ( select ca.channel_id,ca.award_amount from channel.c_award ca, CHANNEL.C_AWARD_BILL_CYCLE A where ca.cycle_id = A.BILL_CYCLE
and ca.state = 1 and ca.element_code = '_C_CJZE' AND ca.award_amount<>0 ) C_CJZE/*, CHANNEL.C_AWARD_BILL_CYCLE A, CHANNEL.CHANNEL_DEPT D*/ WHERE C_DMLX.CHANNEL_ID=C_YDJHCJ.Channel_Id(+) AND C_DMLX.CHANNEL_ID=C_JHCJHJ.Channel_Id(+) AND C_DMLX.CHANNEL_ID=C_XYWCJHJ.Channel_Id(+) AND C_DMLX.CHANNEL_ID=C_XYWSYZWCJHJ.Channel_Id(+) AND C_DMLX.CHANNEL_ID=C_KZCZCJ.Channel_Id(+) AND C_DMLX.CHANNEL_ID=C_SF_XYDJCJ.Channel_Id(+) AND C_DMLX.CHANNEL_ID=C_CJZE.Channel_Id(+)
) q,
( SELECT C_DMLX.channel_id FROM ( SELECT d.channel_id,d.region_code,c.cycle_id,c.award_amount FROM CHANNEL.C_AWARD C, CHANNEL.C_AWARD_BILL_CYCLE A, CHANNEL.CHANNEL_DEPT D WHERE C.CYCLE_ID = A.BILL_CYCLE AND C.STATE = 1 AND C.CHANNEL_ID = D.CHANNEL_ID AND D.STATE = 1 AND C.ELEMENT_CODE = 'C_DMLX' /*AND C.AMOUNT IN (311, 312, 313, 32, 33)*/ ) C_DMLX, ( select ca.channel_id,ca.award_amount from channel.c_award ca, CHANNEL.C_AWARD_BILL_CYCLE A
声明:ITeye文章版权属于作者,受法律保护。没有作者书面许可不得转载。
推荐链接
|
|
返回顶楼 | |
浏览 2194 次