- 浏览: 34402 次
- 性别:
- 来自: 南京
最新评论
技术思路整理:
【1】先根据表中数据计算出需要的值,存储在临时表里面,比如计算日活跃度的语句:
另外,计算历史峰值的逻辑比较复杂,例子如下:
该行语句先从zl_daily_kv 表里面读取出历史峰值,通过正则表达式提取出新设备的历史峰值(提取前的格式为:677777||20161026,前面的数值可能带小数点),然后将历史峰值与今天的新数据对比,找出新的最大值,并更新相应的峰值。
【2】拼接JSON字符串
【3】以‘daily_static’为键值存储第二步的结果
下面是完整的程序:
【1】先根据表中数据计算出需要的值,存储在临时表里面,比如计算日活跃度的语句:
with tbl_dau as ( select 'dau' as name, count(distinct deviceid) as cnt from ( select deviceid from src_huidu_zl.event3 where hdfs_par = '20170108' union all select deviceid from src_huidu_zl.page3 where hdfs_par = '20170108' ) as tmp )
另外,计算历史峰值的逻辑比较复杂,例子如下:
historymaxnewdev as( select if(regexp_extract(json_value, '\"maxnewdev\"\:(\\d+(\\.\\d+)*)',1)='','1',regexp_extract(json_value, '\"maxnewdev\"\:(\\d+(\\.\\d+)*)',1)) as historymaxvalue,if(regexp_extract(json_value, '\"maxnewdev\":(\"\\d+(\\.\\d+)*)\\|\\|(\\d{8})\"',3)='','19710101',regexp_extract(json_value, '\"maxnewdev\":(\"\\d+(\\.\\d+)*)\\|\\|(\\d{8})\"',3)) as historydate from ana_fx_goal.zl_daily_kv where json_key = 'daily_static' and hdfs_par='20170108' )
该行语句先从zl_daily_kv 表里面读取出历史峰值,通过正则表达式提取出新设备的历史峰值(提取前的格式为:677777||20161026,前面的数值可能带小数点),然后将历史峰值与今天的新数据对比,找出新的最大值,并更新相应的峰值。
【2】拼接JSON字符串
【3】以‘daily_static’为键值存储第二步的结果
下面是完整的程序:
insert overwrite ana_fx_goal.zl_daily_kv partition(hdfs_par='20170108') select * from ( select cast( concat(substr('20170108', 1, 4), '-', substr('20170108', 5, 2), '-', substr('20170108', 7, 2)) as timestamp) as day_begin, NOW() as generate_time, 'daily_static' as json_key, json_output as json_value from ( select concat( '{', group_concat( concat(wrapped_column_key, ':', wrapped_column_value), ','), '}' ) as json_output from ( select concat('"', name, '"') as wrapped_column_key, case when cnt is null then 'null' when 'double' = 'string' then concat('"', cast(cnt as string), '"') else cast(cnt as string) end as wrapped_column_value from ( with tbl_dau as ( select 'dau' as name, count(distinct deviceid) as cnt from ( select deviceid from src_huidu_zl.event3 where hdfs_par = '20170108' union all select deviceid from src_huidu_zl.page3 where hdfs_par = '20170108' ) as tmp ), tbl_launchrecords as ( select 'launchrecords' as name, count(*) as cnt from src_huidu_zl.launch2 where hdfs_par = '20170108' ), tbl_newdev as ( select 'newdev' as name, count(distinct deviceid) as cnt from ana_fx_middle.devappear2 where first_appear_hdfspar = '20170108' and hdfs_par = '20170108' ), tbl_avgusetime as ( select 'avgusetime' as name, avg(use_time) as cnt from src_huidu_zl.launch2 where hdfs_par = '20170108' ), tbl_avgdevtime as ( select 'avgdevtime' as name, sum(use_time)/count(distinct deviceid) as cnt from src_huidu_zl.launch2 where hdfs_par = '20170108' ), tbl_alldev as ( select 'alldev' as name, count(distinct deviceid) as cnt from ana_fx_middle.devappear2 where hdfs_par = '20170108' ), tbl_mau as ( select 'mau' as name, count(distinct deviceid) as cnt from ( select deviceid from src_huidu_zl.event3 where hdfs_par between '20161211' and '20170108' union all select deviceid from src_huidu_zl.page3 where hdfs_par between '20161211' and '20170108' ) as tmp ), tbl_avglaunchrecords as ( select 'avglaunchrecords' as name, count(*)/count(distinct deviceid) as cnt from src_huidu_zl.launch2 where hdfs_par = '20170108' ), tbl_launchdays as ( select sum(cnt) as cnt from ( select hdfs_par, count(distinct deviceid) as cnt from src_huidu_zl.launch2 where hdfs_par between '20161211' and '20170108' group by hdfs_par ) as tmp ), tbl_launchdevs as ( select count(distinct deviceid) as cnt from src_huidu_zl.launch2 where hdfs_par between '20161211' and '20170108' ), tbl_28new7retain as ( select '28new7retain' as name, count(distinct tbl_7day.deviceid) as cnt from ( select deviceid from src_huidu_zl.event3 where hdfs_par between '20170101' and '20170108' union all select deviceid from src_huidu_zl.page3 where hdfs_par between '20170101' and '20170108' ) as tbl_7day left join ( select distinct deviceid as deviceid from ana_fx_middle.devappear2 where first_appear_hdfspar = '20161211' ) as tbl_28day on tbl_7day.deviceid = tbl_28day.deviceid where tbl_28day.deviceid != 'NULL' ), tbl_28new as ( select '28new' as name, count(distinct deviceid) as cnt from ana_fx_middle.devappear2 where first_appear_hdfspar = '20161211' ), historymaxdauvalue as( select if(regexp_extract(json_value, '\"maxdau\"\:(\\d+(\\.\\d+)*)',1)='','1',regexp_extract(json_value, '\"maxdau\"\:(\\d+(\\.\\d+)*)',1)) as historymaxvalue,if(regexp_extract(json_value, '\"maxdau\":(\"\\d+(\\.\\d+)*)\\|\\|(\\d{8})\"',3)='','19710101',regexp_extract(json_value, '\"maxdau\":(\"\\d+(\\.\\d+)*)\\|\\|(\\d{8})\"',3)) as historydate from ana_fx_goal.zl_daily_kv where json_key = 'daily_static' and hdfs_par='20170108' ), historymaxlaunchrecords as( select if(regexp_extract(json_value, '\"maxlaunchrecords\"\:(\\d+(\\.\\d+)*)',1)='','1',regexp_extract(json_value, '\"maxlaunchrecords\"\:(\\d+(\\.\\d+)*)',1)) as historymaxvalue,if(regexp_extract(json_value, '\"maxlaunchrecords\":(\"\\d+(\\.\\d+)*)\\|\\|(\\d{8})\"',3)='','19710101',regexp_extract(json_value, '\"maxlaunchrecords\":(\"\\d+(\\.\\d+)*)\\|\\|(\\d{8})\"',3)) as historydate from ana_fx_goal.zl_daily_kv where json_key = 'daily_static' and hdfs_par='20170108' ), historymaxnewdev as( select if(regexp_extract(json_value, '\"maxnewdev\"\:(\\d+(\\.\\d+)*)',1)='','1',regexp_extract(json_value, '\"maxnewdev\"\:(\\d+(\\.\\d+)*)',1)) as historymaxvalue,if(regexp_extract(json_value, '\"maxnewdev\":(\"\\d+(\\.\\d+)*)\\|\\|(\\d{8})\"',3)='','19710101',regexp_extract(json_value, '\"maxnewdev\":(\"\\d+(\\.\\d+)*)\\|\\|(\\d{8})\"',3)) as historydate from ana_fx_goal.zl_daily_kv where json_key = 'daily_static' and hdfs_par='20170108' ), historymaxavgusetime as ( select if(regexp_extract(json_value, '\"maxavgusetime\"\:(\\d+(\\.\\d+)*)',1)='','1',regexp_extract(json_value, '\"maxavgusetime\"\:(\\d+(\\.\\d+)*)',1)) as historymaxvalue,if(regexp_extract(json_value, '\"maxavgusetime\":(\"\\d+(\\.\\d+)*)\\|\\|(\\d{8})\"',3)='','19710101',regexp_extract(json_value, '\"maxavgusetime\":(\"\\d+(\\.\\d+)*)\\|\\|(\\d{8})\"',3)) as historydate from ana_fx_goal.zl_daily_kv where json_key = 'daily_static' and hdfs_par='20170108' ), historymaxavgdevtime as ( select if(regexp_extract(json_value, '\"maxavgdevtime\"\:(\\d+(\\.\\d+)*)',1)='','1',regexp_extract(json_value, '\"maxavgdevtime\"\:(\\d+(\\.\\d+)*)',1)) as historymaxvalue,if(regexp_extract(json_value, '\"maxavgdevtime\":(\"\\d+(\\.\\d+)*)\\|\\|(\\d{8})\"',3)='','19710101',regexp_extract(json_value, '\"maxavgdevtime\":(\"\\d+(\\.\\d+)*)\\|\\|(\\d{8})\"',3)) as historydate from ana_fx_goal.zl_daily_kv where json_key = 'daily_static' and hdfs_par='20170108' ) select name,cast(cnt as string) as cnt from tbl_dau union all select name,cast(cnt as string) as cnt from tbl_launchrecords union all select name,cast(cnt as string) as cnt from tbl_newdev union all select name,cast(cnt as string) as cnt from tbl_avgusetime union all select name,cast(cnt as string) as cnt from tbl_avgdevtime union all select name,cast(cnt as string) as cnt from tbl_alldev union all select name,cast(cnt as string) as cnt from tbl_mau union all select name,cast(cnt as string) as cnt from tbl_avglaunchrecords union all select 'avglaunchdays' as name, cast((tbl_launchdays.cnt/tbl_launchdevs.cnt) as string) as cnt from tbl_launchdays, tbl_launchdevs union all select 'retainratio' as name, cast((tbl_28new7retain.cnt/tbl_28new.cnt) as string) as cnt from tbl_28new, tbl_28new7retain union all select 'maxdau' as name, if(tbl_dau.cnt>cast(historymaxdauvalue.historymaxvalue as bigint),concat(cast(tbl_dau.cnt as string),'||','20170108'),concat(historymaxdauvalue.historymaxvalue,'||',historydate)) as cnt from tbl_dau,historymaxdauvalue union all select 'maxlaunchrecords' as name, if(tbl_launchrecords.cnt>cast(historymaxlaunchrecords.historymaxvalue as bigint),concat(cast(tbl_launchrecords.cnt as string),'||','20170108'),concat(historymaxlaunchrecords.historymaxvalue,'||',historydate)) as cnt from tbl_launchrecords,historymaxlaunchrecords union all select 'maxnewdev' as name, if(tbl_newdev.cnt>cast(historymaxnewdev.historymaxvalue as bigint),concat(cast(tbl_newdev.cnt as string),'||','20170108'),concat(historymaxnewdev.historymaxvalue,'||',historydate)) as cnt from tbl_newdev,historymaxnewdev union all select 'maxavgusetime' as name, if(tbl_avgusetime.cnt>cast(historymaxavgusetime.historymaxvalue as bigint),concat(cast(tbl_avgusetime.cnt as string),'||','20170108'),concat(historymaxavgusetime.historymaxvalue,'||',historydate)) as cnt from tbl_avgusetime,historymaxavgusetime union all select 'maxavgusetime' as name, if(tbl_avgdevtime.cnt>cast(historymaxavgdevtime.historymaxvalue as bigint),concat(cast(tbl_avgdevtime.cnt as string),'||','20170108'),concat(historymaxavgdevtime.historymaxvalue,'||',historydate)) as cnt from tbl_avgdevtime,historymaxavgdevtime ) as tmp1_B ) as tmp2_B ) as tmp limit 1 ) as t1 union all select * from ( select day_begin, generate_time, json_key, json_value from ana_fx_goal.zl_daily_kv where hdfs_par = '20170108' and json_key != 'daily_static' ) as t2 ;
发表评论
-
redis 五种数据类型的使用场景
2017-01-12 09:46 520String 1、String ... -
深入理解Mysql的四种隔离级别
2017-01-03 17:05 412开发工作中我们会使用 ... -
mysql---udf
2016-12-23 09:05 319http://www.2cto.com/database/20 ... -
sql读取数据库中的字段生成json字符串再存储
2016-12-22 16:27 555[size=medium] --screen ... -
sql多表计算问题
2016-12-16 15:29 584mc_devappear表中的数据可以计算出当天新增的设备数, ... -
SQL Case when 的使用方法
2016-12-15 09:39 419Case具有两种格式。简单Case函数和Case搜索函数。 ... -
Mysql初始化root密码和允许远程访问
2016-03-14 11:41 403mysql默认root用户没有密码,输入mysql –u ro ... -
MySQL 5.6 for Windows 解压缩版配置安装
2016-03-13 11:00 607MySQL安装文件分为两种,一种是msi格式的,一种是zip格 ...
相关推荐
在IT领域,数据储存是至关重要的一个环节,尤其是在如今大数据时代。本地数据库储存是指在用户的设备上,如个人计算机或移动设备(如MobilePhone)上存储数据的一种方式。这种方式允许用户离线访问数据,提高数据...
基于go语言实现的分布式缓存系统源码+项目说明(以键值对的形式存储数据,一致性hash算法选择存储节点,Protobuf通信协议编解码。用户输入查询请求后,会优先在缓存系统查询,查不到则使用回调函数去源数据库查询,...
基于go语言实现的分布式缓存系统源码+项目说明(以键值对的形式存储数据,一致性hash算法选择存储节点,Protobuf通信协议编解码。用户输入查询请求后,会优先在缓存系统查询,查不到则使用回调函数去源数据库查询,...
键值存储是一种NoSQL数据库类型,其中数据以键和对应的值的形式存储,键是唯一的标识符,值可以是任何数据类型。这种数据模型非常适合用于缓存系统,因为查找和插入操作通常非常快速,只需要通过键来定位数据。 ...
例如,在关系型数据库中,数据以表格的形式组织,每张表都有明确的定义和结构,表与表之间通过键值建立连接,这种设计方式极大地提升了数据处理的速度和准确性。 **2. 数据的共享性高、易于扩充** 数据的共享性...
关系型数据库基于关系模型,以表格的形式存储数据,数据之间的关系通过外键来定义,常用的数据库管理系统包括MySQL、PostgreSQL和Microsoft SQL Server等。非关系型数据库,也称为NoSQL数据库,包括键值存储、文档型...
JSON文件中的菜谱数据将以键值对的形式存在,如{"名称": "宫保鸡丁", "做法": "..."}. 这种格式适合在Web应用程序中进行数据交互,可以方便地集成到前端展示或后端处理中。 3. **CSV**:逗号分隔值,是一种简单且...
《汉字简体繁体参照表》是一个非常实用的数据库资源,包含了大约4792条汉字的简体与繁体转换映射数据。这个数据库涵盖了多种格式,包括SQL、JSON、CSV以及XLS,旨在方便用户在不同场景下进行简繁体汉字的互换和处理...
根据给定的文件信息,以下是对计算机三级数据库考试中涉及的关键知识点的详细解析: ### 计算机系统组成与应用领域 计算机系统由硬件和软件两大部分构成,硬件包括运算器、控制器、存储器、输入设备和输出设备,而...
数据库是信息化社会中不可或缺的一部分,它是一个存储在计算机内的、有组织、可共享的信息集合。这一章的笔记主要涵盖了数据库的基础概念、逻辑结构、效率提升方法、数据库模型以及设计步骤,同时也提到了学习数据库...
数据库是一个按照特定数据模型组织的、相互关联的数据集合,存储在二级存储器中,独立于使用它的应用程序,由数据库管理系统(DBMS)进行管理和控制,确保数据的共享、独立性、最小冗余、安全性和完整性。...
键值数据库以键值对的形式存储数据,例如Riak和Redis。列族数据库将数据存储在列族中,例如HBase和Cassandra。文档数据库将数据存储在文档中,例如MongoDB和CouchDB。图形数据库将数据存储在图形中,例如Neo4j和...
它们以键值对、文档、列族或图的形式存储数据,灵活且扩展性强,适用于处理大量非结构化或半结构化数据。 数据库的工作原理通常涉及事务处理、并发控制和恢复机制。事务是一组逻辑操作,必须全部成功执行或全部回滚...
1、文档化存储:MongoDB以文档为单位进行数据存储,一个文档可以包含多个键值对,并且文档之间可以存在嵌套关系,这使得数据存储更加灵活。 2、动态查询:MongoDB支持使用JavaScript表达式进行查询,这使得查询操作...
在《中华古诗词大全》的SQL版本中,很可能以表格的形式存储了诗词的信息,如诗词作者、朝代、诗词内容、格律等字段。用户可以通过SQL查询语句来筛选、排序或聚合数据,例如查找某个诗人所有的作品,或者统计某一朝代...
1. 关系型数据库:如Oracle、MySQL、SQL Server等,基于关系模型,使用表格形式存储数据,支持SQL语言。 2. 非关系型数据库(NoSQL):如MongoDB、Cassandra,适用于大数据、高并发和分布式环境,数据模型多样,...
2. **关系数据库**:在关系模型中,数据以二维表格(关系)的形式存在,每个表格称为一个表或关系。表中的每一行代表一个记录,每一列代表一个数据项。关系数据库中的表具有特定性质,如数据项不可再分,同一列的...
2. **数据库特点**:数据库不仅是一个数据集合,而且这些数据是以一种逻辑上一致且能够被多个应用程序或用户共享的方式组织起来的。这表明数据库的设计注重数据的一致性和共享性。 3. **数据冗余度**:传统意义上的...
1. 列存储数据库:列存储数据库以列簇的形式存储数据,适合处理大数据量和进行大规模分析任务。它能够以列为单位进行高效的数据读写操作,适合应对列式数据的查询和分析需求。典型的列存储数据库有HBase和Cassandra...