商业智能研究(十六)materialized view+dimension提高mondrian性能

全部 Hibernate Spring Struts iBATIS 企业应用 Lucene SOA Java综合 Tomcat 设计模式 OO JBoss

浏览 5470 次

锁定老帖子主题：商业智能研究(十六)materialized view+dimension提高mondrian性能精华帖 (0) :: 良好帖 (0) :: 新手帖 (0) :: 隐藏帖 (0)
作者	正文
jjjava 等级: 性别: 文章: 42 积分: 321 来自: wuhan	发表时间：2007-06-10 相关推荐: PostgreSQL物化视图（materialized view）过程解析 Tripwire强大的服务器文件完整性监测系统-检测并报告服务器*** 基于Tripwire的系统文件篡改检知系统搭建 Tripwire 应用 Tripwire入侵检测系统的搭建与应用更多相关推荐企业应用商业智能研究(十六) 用materialized view + dimension 来提高mondrian 的性能(二) 接着上一篇的定义我们定义如下两个dimension : CREATE DIMENSION PRODUCT_DIM LEVEL "product_id" IS "product"."product_id" LEVEL "brand_name" IS "product"."brand_name" LEVEL "product_class_id" IS "product_class"."product_class_id" LEVEL "product_category" IS "product_class"."product_category" LEVEL "product_department" IS "product_class"."product_department" LEVEL "product_family" IS "product_class"."product_family" HIERARCHY PRODUCT_ROLLUP ( "product_id" CHILD OF "brand_name" CHILD OF "product_class_id" CHILD OF "product_category" CHILD OF "product_department" CHILD OF "product_family" JOIN KEY ("product"."product_class_id") REFERENCES "product_class_id" ) ATTRIBUTE "product_id" DETERMINES ("product_name") ATTRIBUTE "product_class_id" DETERMINES ("product_subcategory"); CREATE DIMENSION TIME_DIM LEVEL time IS "time_by_day"."time_id" LEVEL month IS "time_by_day"."month_of_year" LEVEL quarter IS "time_by_day"."quarter" LEVEL year IS "time_by_day"."the_year" HIERARCHY TIME_ROLLUP ( time CHILD OF month CHILD OF quarter CHILD OF year ) ATTRIBUTE time DETERMINES ("time_by_day"."the_date"); 然后我们建立materialized view , 注意QUERY_REWRITE_INTEGRITY 和 QUERY_REWRITE_ENABLED 应该已经正确的设置了. CREATE MATERIALIZED VIEW PRODUCT_SUM BUILD IMMEDIATE REFRESH ON DEMAND ENABLE QUERY REWRITE AS SELECT "time_by_day"."time_id" , "product"."product_id", "product_class"."product_class_id" , SUM("sales_fact_1997"."store_sales") , SUM("sales_fact_1997"."store_cost") FROM "time_by_day" "time_by_day", "product" "product" , "product_class" "product_class" , "sales_fact_1997" "sales_fact_1997" WHERE "sales_fact_1997"."time_id" = "time_by_day"."time_id" AND "sales_fact_1997"."product_id" = "product"."product_id" AND "product"."product_class_id" = "product_class"."product_class_id" GROUP BY "time_by_day"."time_id", "product"."product_id" , "product_class"."product_class_id"; 现在我们 set autotrace on . 执行 SELECT "time_by_day"."the_date" , "product_class"."product_family" , SUM("sales_fact_1997"."store_sales") , SUM("sales_fact_1997"."store_cost") FROM "time_by_day" "time_by_day", "product" "product" , "product_class" "product_class" , "sales_fact_1997" "sales_fact_1997" WHERE "sales_fact_1997"."time_id" = "time_by_day"."time_id" AND "sales_fact_1997"."product_id" = "product"."product_id" AND "product"."product_class_id" = "product_class"."product_class_id" GROUP BY "time_by_day"."the_date", "product_class"."product_family"; 从图一中我们看到，当我们把product 聚合到了最高的level product_family,oracle 的执行计划是从product_sum 中来做聚合的.这就是因为我们建立的dimension 告诉了oracle product有这种层次的关系.product的dimension 即告诉了product_id 能够决定product_name，也告诉了product_id能够聚合product_family , 同样的我们把Time 聚合到最高的level SELECT "time_by_day"."the_year" , "product"."product_name" , SUM("sales_fact_1997"."store_sales") , SUM("sales_fact_1997"."store_cost") FROM "time_by_day" "time_by_day", "product" "product" , "product_class" "product_class" , "sales_fact_1997" "sales_fact_1997" WHERE "sales_fact_1997"."time_id" = "time_by_day"."time_id" AND "sales_fact_1997"."product_id" = "product"."product_id" AND "product"."product_class_id" = "product_class"."product_class_id" GROUP BY "time_by_day"."the_year", "product"."product_name"; 从图二中的执行计划同样可以看出我们只选取Time 来做聚合的时候，oracle 仍然是从product_sum 表中来做聚合, 用time_id 来决定the_date ,time_id同样可以聚合year. 最后一个是同时聚合product 和 time SELECT "time_by_day"."the_year" , "product_class"."product_family" , SUM("sales_fact_1997"."store_sales") , SUM("sales_fact_1997"."store_cost") FROM "time_by_day" "time_by_day", "product" "product" , "product_class" "product_class" , "sales_fact_1997" "sales_fact_1997" WHERE "sales_fact_1997"."time_id" = "time_by_day"."time_id" AND "sales_fact_1997"."product_id" = "product"."product_id" AND "product"."product_class_id" = "product_class"."product_class_id" GROUP BY "time_by_day"."the_year", "product_class"."product_family"; 同样的，oracle 还是从product_sum 中取数据 . 因为materialized view 的使用，我们可以把我们要分析的Cube 作成一个或几个非常大的materialized view , 建立正确的dimension 之后，当你查询的时候,由于数据已经提前计算过了，所以查询的速度比较快，在加上dimension可以告诉oracle 数据之间的层级关系，减少了我们建立不必要的materialized view , 所以使数据能够得到更加充分的利用. 关于如何建立dimension 倒是比较简单，只要你弄懂数据之间的层级关系就可以了. 对于如何建立materialized view 倒是比较麻烦，我举个简单的例子吧: 在mondrian 的 foodmart 的例子中，我们可以任意的选取指标 , product , customers , education leve , gender ,marital sttus ,promotin media ,promotions , store , store size in SQFT , store type , time , yearly income 这十三个要分析的数据来建立cube ,用户有可能使用product 来做分析的维度，也有可能把product 来做Measure 或者不选，所以我们不可能建立所有情况考虑到的Cube . ps : 如果你非要搞一个出来的话，我可以给你点提示总的方案有2 的 13 次方：8096 种方案.也就是你要建立8096 个materialized view 就可以解决所有情况. C 13 3 ：数学里面的概率问题，十三个里面选3个出来，不论顺序的. C 13 3 = 131212/(123) 代表的意思是从13个里面选3个出来做fact table ,其余十个做dimension .不论你选不选这些dimension 都一样，总的方案 = c 13 1 + c 13 2 + c 13 3 + ...... + c 13 13 = 2 的十三次方 = 8096. 如何建立materialized view 还是主要是看你如何建立你的分析的维度.如果你的fact table 本身很多.而维度也很多的情况下，不可能每个fact table 都建立一个关于所有dimension 的materialized view ,对于我们的product dimension数据还算是比较少的，如果达到像大型超市那么多的产品，可能还需要在brand_name 或者 subcategory 来建立materialized view ,所以还是建议根据用户的查询sql 来分析用户到底经常查询那些数据. 下一篇继续介绍 Mondrian 如何使用materialized view 来提高性能. 图一 : product 集合到最高层的执行计划. 图二 : Time 聚合到最高层的执行计划图三：同时将Time 和 product 聚合到最高层的执行计划图四 : drill down product 的样子图五：十三种数据，到底怎样建materialized view 呢 ? 声明：ITeye文章版权属于作者，受法律保护。没有作者书面许可不得转载。推荐链接
返回顶楼

论坛首页 → Java企业应用版

跳转论坛: