- 浏览: 412427 次
文章分类
最新评论
-
lvdccyb:
wuhan_liurui 写道安装这种配置并没有成功,上面说的 ...
Spring Cloud (1)——config server使用SVN作为远程例子的运行与配置 -
wuhan_liurui:
安装这种配置并没有成功,上面说的,需要仔细阅读spring 官 ...
Spring Cloud (1)——config server使用SVN作为远程例子的运行与配置 -
g_man1990:
maven clean后无法生成。class文件
设置JAVA编译程序级别,Maven编译插件(翻译)--(2) -
最佳蜗牛:
非常感谢,我也遇到这个问题,用楼主的方法解决问题了。
Hadoop HDFS配置——UnknownHostException -
mousepc:
今天被这个问题害了...
JAVA时间的一个陷阱
参考 https://drill.apache.org/docs/json-data-model/
假设有原始数据在hdfs上:
hdfs://dc1:8020/xf/mytest/ia/2017/0208/details/part-00000
多条数据,按行存储的json文件,实际上是spark saveAsTextFile方法生成。
格式如下(已删除部分数据)
{
"afterOpenDay": 9,
"basic": {
"availableMoney": 24063.51344060898,
"closeReason": 0,
"cutEarning": 0,
"end_date": "20170222",
"ism_id": "170208206199185",
"losePercentage": 0,
"profitPercentage": 0,
"start_date": "20170209",
"tm_close": -1,
"totalMoney": 23600,
"user_id": "8888"
},
"closeDay": true,
"dailySummary": [
{
"TN": 0,
"annualProfitRate": 0,
"asset": 23515.820100307465,
"commission": 75.1798996925354,
"cost": 21560.179899692535,
"day": "20170209",
"floatProfit": -84.1798996925354,
"freeMoney": 2039.8201003074646,
"maketValue": 21476,
"profitRate": -0.0035669449022260762
},
{
"TN": 1,
"annualProfitRate": 0,
"asset": 23585.904140472412,
"commission": 81.09585952758789,
"cost": 20668.095859527588,
"day": "20170210",
"floatProfit": -14.09585952758789,
"freeMoney": 2931.904140472412,
"maketValue": 20654,
"profitRate": -0.0005972821833723683
},
{
"TN": 2,
"annualProfitRate": 0,
"asset": 23830.72134065628,
"commission": 88.27865934371948,
"cost": 18535.27865934372,
"day": "20170213",
"floatProfit": 230.72134065628052,
"freeMoney": 5064.7213406562805,
"maketValue": 18766,
"profitRate": 0.009776327993910192
},
{
"TN": 3,
"annualProfitRate": 0,
"asset": 23887.72134065628,
"commission": 88.27865934371948,
"cost": 18535.27865934372,
"day": "20170214",
"floatProfit": 287.7213406562805,
"freeMoney": 5064.7213406562805,
"maketValue": 18823,
"profitRate": 0.012191582231198327
},
{
"TN": 4,
"annualProfitRate": 0,
"asset": 23652.72134065628,
"commission": 88.27865934371948,
"cost": 18535.27865934372,
"day": "20170215",
"floatProfit": 52.72134065628052,
"freeMoney": 5064.7213406562805,
"maketValue": 18588,
"profitRate": 0.002233955112554259
},
{
"TN": 5,
"annualProfitRate": 0,
"asset": 23716.917340755463,
"commission": 94.08265924453735,
"cost": 17737.082659244537,
"day": "20170216",
"floatProfit": 116.91734075546265,
"freeMoney": 5862.917340755463,
"maketValue": 17854,
"profitRate": 0.004954124608282316
},
{
"TN": 6,
"annualProfitRate": 0,
"asset": 23595.554340839386,
"commission": 100.44565916061401,
"cost": 16380.445659160614,
"day": "20170217",
"floatProfit": -4.445659160614014,
"freeMoney": 7219.554340839386,
"maketValue": 16376,
"profitRate": -0.00018837538816161075
},
{
"TN": 7,
"annualProfitRate": 0,
"asset": 23780.802600860596,
"commission": 106.1973991394043,
"cost": 15649.197399139404,
"day": "20170220",
"floatProfit": 180.8026008605957,
"freeMoney": 7950.802600860596,
"maketValue": 15830,
"profitRate": 0.007661127155109988
},
{
"TN": 8,
"annualProfitRate": 0,
"asset": 24011.805600643158,
"commission": 113.19439888000488,
"cost": 13659.194399356842,
"day": "20170221",
"floatProfit": 411.80560064315796,
"freeMoney": 9940.805600643158,
"maketValue": 14071,
"profitRate": 0.01744938985776093
}
]
}
(1)下载并启动apache drill
bin/drill-embedded
(2)配置storage plugin,dc1是机器hostname
http://dc1:8047/storage
{
"type": "file",
"enabled": true,
"connection": "hdfs://dc1:8020",
"config": null,
"workspaces": {
"root": {
"location": "/",
"writable": false,
"defaultInputFormat": null
},
"tmp": {
"location": "/tmp",
"writable": true,
"defaultInputFormat": null
},
"ism": {
"location": "/wx/mytest/ia/2017",
"writable": true,
"defaultInputFormat": "json"
}
},
"formats": {
"psv": {
"type": "text",
"extensions": [
"tbl"
],
"delimiter": "|"
},
"csv": {
"type": "text",
"extensions": [
"csv"
],
"delimiter": ","
},
"tsv": {
"type": "text",
"extensions": [
"tsv"
],
"delimiter": "\t"
},
"httpd": {
"type": "httpd",
"logFormat": "%h %t \"%r\" %>s %b \"%{Referer}i\"",
"timestampFormat": null
},
"parquet": {
"type": "parquet"
},
"json": {
"type": "json",
"extensions": [
"json"
]
},
"avro": {
"type": "avro"
},
"sequencefile": {
"type": "sequencefile",
"extensions": [
"seq"
]
},
"csvh": {
"type": "text",
"extensions": [
"csvh"
],
"extractHeader": true,
"delimiter": ","
}
}
}
(3)修改配置
http://dc1:8047/options
store.json.read_numbers_as_double 改为true,这个是因为我这边的json数据,有的浮点数输出为整数如5.0直接输出为5,导致错误”DATA_READ ERROR: You tried to write a Float8 type when you are using a ValueWriter of type ...“
(4)执行sql语句,这里dfs.ism.表示使用的是storage plguin 中的dfs里面配置的工作目录为ism
a.basic.ism_id,表示使用json文件中的basic字段(basic是个OBJECT类型)里面的ism_id字段
0: jdbc:drill:zk=local> select a.basic.ism_id as ism_id,a.dailySummary.asset as asset from dfs.ism.`0208/details/part-00000` a limit 10;
+------------------+---------------------+
| ism_id | asset |
+------------------+---------------------+
| 170208206199185 | 23515.820100307465 |
| 170208206199187 | 23585.904140472412 |
| 170208206199188 | 23830.72134065628 |
| 170208206199189 | 23887.72134065628 |
| 170208206199191 | 23652.72134065628 |
| 170208206199196 | 23716.917340755463 |
| 170208206199199 | 23595.554340839386 |
| 170208206199201 | 23780.802600860596 |
| 170208206199206 | 24011.805600643158 |
| 170208206199209 | 24063.51344060898 |
+------------------+---------------------+
10 rows selected (0.898 seconds)
[思考问题]上述字段中,如果遇到数组应该如何处理?
比如,要查询dailySummary 中的每日资产asset?
参考Drill官方文档,使用子查询(nest query)和FLATTEN函数,
FLATTEN用于将数组扁平化,即1行拆分成多行数据。
0: jdbc:drill:zk=local> select b.ism_id,b.daily.asset as asset from (select a.basic.ism_id as ism_id,FLATTEN(a.dailySummary) as daily from dfs.ism.`0208/details/part-00000` a ) b limit 10;
+------------------+---------------------+
| ism_id | asset |
+------------------+---------------------+
| 170208206199185 | 23515.820100307465 |
| 170208206199185 | 23585.904140472412 |
| 170208206199185 | 23830.72134065628 |
| 170208206199185 | 23887.72134065628 |
| 170208206199185 | 23652.72134065628 |
| 170208206199185 | 23716.917340755463 |
| 170208206199185 | 23595.554340839386 |
| 170208206199185 | 23780.802600860596 |
| 170208206199185 | 24011.805600643158 |
| 170208206199185 | 24063.51344060898 |
| 170208206199187 | 20130.834299087524 |
| 170208206199187 | 19987.834299087524 |
| 170208206199187 | 20333.938299179077 |
| 170208206199187 | 20277.938299179077 |
| 170208206199187 | 20153.938299179077 |
| 170208206199187 | 20321.938299179077 |
| 170208206199187 | 20165.137598991394 |
| 170208206199187 | 20376.137598991394 |
| 170208206199187 | 20496.137598991394 |
| 170208206199187 | 20428.81975889206 |
+------------------+---------------------+
20 rows selected (0.978 seconds)
上述查询也可以通过web方式
http://dc1:8047/query
查询获得。
假设有原始数据在hdfs上:
hdfs://dc1:8020/xf/mytest/ia/2017/0208/details/part-00000
多条数据,按行存储的json文件,实际上是spark saveAsTextFile方法生成。
格式如下(已删除部分数据)
{
"afterOpenDay": 9,
"basic": {
"availableMoney": 24063.51344060898,
"closeReason": 0,
"cutEarning": 0,
"end_date": "20170222",
"ism_id": "170208206199185",
"losePercentage": 0,
"profitPercentage": 0,
"start_date": "20170209",
"tm_close": -1,
"totalMoney": 23600,
"user_id": "8888"
},
"closeDay": true,
"dailySummary": [
{
"TN": 0,
"annualProfitRate": 0,
"asset": 23515.820100307465,
"commission": 75.1798996925354,
"cost": 21560.179899692535,
"day": "20170209",
"floatProfit": -84.1798996925354,
"freeMoney": 2039.8201003074646,
"maketValue": 21476,
"profitRate": -0.0035669449022260762
},
{
"TN": 1,
"annualProfitRate": 0,
"asset": 23585.904140472412,
"commission": 81.09585952758789,
"cost": 20668.095859527588,
"day": "20170210",
"floatProfit": -14.09585952758789,
"freeMoney": 2931.904140472412,
"maketValue": 20654,
"profitRate": -0.0005972821833723683
},
{
"TN": 2,
"annualProfitRate": 0,
"asset": 23830.72134065628,
"commission": 88.27865934371948,
"cost": 18535.27865934372,
"day": "20170213",
"floatProfit": 230.72134065628052,
"freeMoney": 5064.7213406562805,
"maketValue": 18766,
"profitRate": 0.009776327993910192
},
{
"TN": 3,
"annualProfitRate": 0,
"asset": 23887.72134065628,
"commission": 88.27865934371948,
"cost": 18535.27865934372,
"day": "20170214",
"floatProfit": 287.7213406562805,
"freeMoney": 5064.7213406562805,
"maketValue": 18823,
"profitRate": 0.012191582231198327
},
{
"TN": 4,
"annualProfitRate": 0,
"asset": 23652.72134065628,
"commission": 88.27865934371948,
"cost": 18535.27865934372,
"day": "20170215",
"floatProfit": 52.72134065628052,
"freeMoney": 5064.7213406562805,
"maketValue": 18588,
"profitRate": 0.002233955112554259
},
{
"TN": 5,
"annualProfitRate": 0,
"asset": 23716.917340755463,
"commission": 94.08265924453735,
"cost": 17737.082659244537,
"day": "20170216",
"floatProfit": 116.91734075546265,
"freeMoney": 5862.917340755463,
"maketValue": 17854,
"profitRate": 0.004954124608282316
},
{
"TN": 6,
"annualProfitRate": 0,
"asset": 23595.554340839386,
"commission": 100.44565916061401,
"cost": 16380.445659160614,
"day": "20170217",
"floatProfit": -4.445659160614014,
"freeMoney": 7219.554340839386,
"maketValue": 16376,
"profitRate": -0.00018837538816161075
},
{
"TN": 7,
"annualProfitRate": 0,
"asset": 23780.802600860596,
"commission": 106.1973991394043,
"cost": 15649.197399139404,
"day": "20170220",
"floatProfit": 180.8026008605957,
"freeMoney": 7950.802600860596,
"maketValue": 15830,
"profitRate": 0.007661127155109988
},
{
"TN": 8,
"annualProfitRate": 0,
"asset": 24011.805600643158,
"commission": 113.19439888000488,
"cost": 13659.194399356842,
"day": "20170221",
"floatProfit": 411.80560064315796,
"freeMoney": 9940.805600643158,
"maketValue": 14071,
"profitRate": 0.01744938985776093
}
]
}
(1)下载并启动apache drill
bin/drill-embedded
(2)配置storage plugin,dc1是机器hostname
http://dc1:8047/storage
{
"type": "file",
"enabled": true,
"connection": "hdfs://dc1:8020",
"config": null,
"workspaces": {
"root": {
"location": "/",
"writable": false,
"defaultInputFormat": null
},
"tmp": {
"location": "/tmp",
"writable": true,
"defaultInputFormat": null
},
"ism": {
"location": "/wx/mytest/ia/2017",
"writable": true,
"defaultInputFormat": "json"
}
},
"formats": {
"psv": {
"type": "text",
"extensions": [
"tbl"
],
"delimiter": "|"
},
"csv": {
"type": "text",
"extensions": [
"csv"
],
"delimiter": ","
},
"tsv": {
"type": "text",
"extensions": [
"tsv"
],
"delimiter": "\t"
},
"httpd": {
"type": "httpd",
"logFormat": "%h %t \"%r\" %>s %b \"%{Referer}i\"",
"timestampFormat": null
},
"parquet": {
"type": "parquet"
},
"json": {
"type": "json",
"extensions": [
"json"
]
},
"avro": {
"type": "avro"
},
"sequencefile": {
"type": "sequencefile",
"extensions": [
"seq"
]
},
"csvh": {
"type": "text",
"extensions": [
"csvh"
],
"extractHeader": true,
"delimiter": ","
}
}
}
(3)修改配置
http://dc1:8047/options
store.json.read_numbers_as_double 改为true,这个是因为我这边的json数据,有的浮点数输出为整数如5.0直接输出为5,导致错误”DATA_READ ERROR: You tried to write a Float8 type when you are using a ValueWriter of type ...“
(4)执行sql语句,这里dfs.ism.表示使用的是storage plguin 中的dfs里面配置的工作目录为ism
a.basic.ism_id,表示使用json文件中的basic字段(basic是个OBJECT类型)里面的ism_id字段
0: jdbc:drill:zk=local> select a.basic.ism_id as ism_id,a.dailySummary.asset as asset from dfs.ism.`0208/details/part-00000` a limit 10;
+------------------+---------------------+
| ism_id | asset |
+------------------+---------------------+
| 170208206199185 | 23515.820100307465 |
| 170208206199187 | 23585.904140472412 |
| 170208206199188 | 23830.72134065628 |
| 170208206199189 | 23887.72134065628 |
| 170208206199191 | 23652.72134065628 |
| 170208206199196 | 23716.917340755463 |
| 170208206199199 | 23595.554340839386 |
| 170208206199201 | 23780.802600860596 |
| 170208206199206 | 24011.805600643158 |
| 170208206199209 | 24063.51344060898 |
+------------------+---------------------+
10 rows selected (0.898 seconds)
[思考问题]上述字段中,如果遇到数组应该如何处理?
比如,要查询dailySummary 中的每日资产asset?
参考Drill官方文档,使用子查询(nest query)和FLATTEN函数,
FLATTEN用于将数组扁平化,即1行拆分成多行数据。
0: jdbc:drill:zk=local> select b.ism_id,b.daily.asset as asset from (select a.basic.ism_id as ism_id,FLATTEN(a.dailySummary) as daily from dfs.ism.`0208/details/part-00000` a ) b limit 10;
+------------------+---------------------+
| ism_id | asset |
+------------------+---------------------+
| 170208206199185 | 23515.820100307465 |
| 170208206199185 | 23585.904140472412 |
| 170208206199185 | 23830.72134065628 |
| 170208206199185 | 23887.72134065628 |
| 170208206199185 | 23652.72134065628 |
| 170208206199185 | 23716.917340755463 |
| 170208206199185 | 23595.554340839386 |
| 170208206199185 | 23780.802600860596 |
| 170208206199185 | 24011.805600643158 |
| 170208206199185 | 24063.51344060898 |
| 170208206199187 | 20130.834299087524 |
| 170208206199187 | 19987.834299087524 |
| 170208206199187 | 20333.938299179077 |
| 170208206199187 | 20277.938299179077 |
| 170208206199187 | 20153.938299179077 |
| 170208206199187 | 20321.938299179077 |
| 170208206199187 | 20165.137598991394 |
| 170208206199187 | 20376.137598991394 |
| 170208206199187 | 20496.137598991394 |
| 170208206199187 | 20428.81975889206 |
+------------------+---------------------+
20 rows selected (0.978 seconds)
上述查询也可以通过web方式
http://dc1:8047/query
查询获得。
发表评论
-
java.lang Enum Thread.State,jstack显示的线程状态说明
2016-05-20 12:06 2043java docs 写道 http://docs.ora ... -
备忘下:flume、es、配置
2015-06-09 13:41 0启动flume: nohup bin/flume-ng ... -
几个java 面试题备忘
2015-06-03 21:21 783java: (1)希望so/dl ... -
JAVA GC日志输出
2014-09-09 10:37 18522014-09-09T09:45:41.560+0800: 3 ... -
eclipselink,MOXy,JAXB,Json工具类
2014-06-05 12:32 2969从String中转成JSON对象 最近使用json来传递 ... -
再说jdk7的substring——从O(1)到O(N)
2013-09-26 21:12 1632JDK7的String重新设计。 确实,重新设计后, ... -
apache的math库——Frequency的使用
2013-09-06 10:37 1877Frequency用于统计数据频率,其功能类似于如下lin ... -
apache的math库中的回归——regression(翻译)
2013-08-23 17:12 3883这个Math库,虽然不向weka那样专业的ML库,但是用户 ... -
TopCoder SRM583 GameOnBoard
2013-06-20 08:28 10782013-06-19没有注册:没有去做做题,不知道对 ... -
HTTP连接错误,Unexpected end of file from server at sun.
2013-05-08 16:31 2746Caused by: java.net.SocketExc ... -
java的substring,split,trim容易出错——内存长期占用不回收
2013-04-18 15:02 4117注:本文当时写的是基于jdk6或者更早版本;jdk7 ... -
LINUX
2013-04-18 14:55 040.45% statall [kernel.kall ... -
JAVA多线程创建与退出过程
2013-01-30 14:46 15680Java多个线程,什么时候被创建,什么时候退出? ja ... -
JAVA Socket的方法说明
2012-12-03 19:09 3088/** * Enable/disable ... -
设置JAVA编译程序级别,Maven编译插件(翻译)--(2)
2012-11-29 19:01 8454通过Maven compiler Plugin选项可以设置j ... -
Maven encoding,Maven涉及到的文件编码问题(翻译)——(1)
2012-11-29 18:28 3321指定文件编码 Specifying a characte ... -
JAVA时间的一个陷阱
2012-11-27 13:29 1911import java.text.DateFormat; i ... -
guava
2012-11-21 18:51 2624主要翻译GUAVA的介绍 ... -
基于JUNG开发,用于图展示于计算
2012-09-04 16:38 1209JUNG http://jung.sourceforg ... -
多少个对称日期?
2011-11-09 18:05 1879一共有331个,所有的对称日期都不是质数,但331是个质 ...
相关推荐
Apache Drill 是一个低延迟的分布式海量数据交互式查询引擎,使用户能够使用 ANSI SQL 兼容语法查询多种类型的数据存储系统,包括本地文件、HDFS、Hive、HBase、MongoDB 等。Drill 的设计目标是为了支持更广泛的数据...
1. **查询性能**:Apache Drill的设计重点之一是提高查询效率,能够在短时间内处理PB级别的数据量。 2. **扩展性**:支持数千个节点的集群部署,可以根据需要动态扩展资源。 3. **易用性**:对于大多数用户来说,只...
Apache Drill 是一个开源的无模式SQL查询引擎,它在大数据分析领域扮演着重要的角色。与传统的Hive不同,Drill不依赖MapReduce作业,并且它并不完全基于Hadoop生态系统。实际上,Drill的设计灵感来源于Google的...
《Learning Apache Drill 2019》是一本关于如何使用Apache Drill进行分布式数据源查询和分析的书籍。Apache Drill是一个开源的SQL查询引擎,它能够查询各种数据源,包括Hadoop上的数据、NoSQL数据库、云存储服务和...
Apache Drill是Google BigQuery团队发起的一个开源项目,它是一个分布式、低延迟的SQL查询引擎,设计用于处理大规模的非结构化和半结构化数据。Apache Drill的目标是提供一种简单、快速的方式来查询和分析大规模的...
Apache Drill是一款强大的、跨平台的数据查询引擎,专为大数据分析设计。它支持SQL查询语言,使得用户能够方便地处理各种不同类型的数据源,如Hadoop、NoSQL数据库、云存储等。在Apache Drill 1.18版本中,我们找到...
Learning Apache Drill Queryand Analyze Distributed Data Sources with SQL
Apache Drill是一款开源的分布式SQL查询引擎,专门设计用于大规模数据集的分析,尤其适用于现代大数据存储格式,如Hadoop Distributed File System (HDFS)、云存储服务以及NoSQL数据库。这款工具无需预先定义schema...
apache-drill-jdbc-plugin 适用于Apache Drill的JDBC插件 下载Apache Drill 0.9。 将代码添加到contrib中,然后用此文件夹中的pom文件替换现有的pom文件。 用mvn构建。 要仅生成软件包,请使用与以下类似的符号:...
Apache Drill 是一个开源的分布式大数据查询引擎,设计用于无模式(schema-less)的数据湖环境,支持多种文件格式,包括 Parquet、JSON、CSV等。它提供了SQL接口,使得用户能够轻松地对大规模分布式存储的数据进行...
Apache Drill是一个分布式MPP查询层,支持针对NoSQL和Hadoop数据存储系统SQL和替代查询语言。 它的部分灵感来自 。 开发者 请阅读以设置和运行Apache Drill。 有关完整的开发人员文档,请参见 更多信息 请参阅或以...
Apache Drill是一款开源的分布式SQL查询引擎,主要用于大数据分析。它设计的目标是提供低延迟的交互式查询能力,支持多种数据源,包括Hadoop的HDFS、Amazon S3、Cassandra、MongoDB等,以及文件系统如本地文件系统或...
一组用于处理Internet域名的Apache Drill UDF UDFs 有一个UDF: suffix_extract(domain-string) :给定一个有效的互联网域名(FQDN或其他方式),这将返回一个地图的领域tld , assigned , subdomain和hostname的...
Apache Drill UDF用于检索和使用HTML文本 基于库。 注意:这绝对是一个在制品。 UDFs soup_read_html(url-string, timeout-ms) :此UDF要求网络可到达预期的URL目标。 给定一个URL和一个连接超时(以毫秒为单位)...
一个Apache Drill UDF,用于通过 Java库处理Twitter tweet文本。 UDFs tw_parse_tweet(string) :解析tweet文本并返回具有以下命名值的地图列: weightedLength :(整数)tweet的总长度,其中代码点按配置文件中...
然后,你可以创建一个SQLAlchemy引擎实例,就像连接其他数据库一样,但这次是连接到Apache Drill: ```python from sqlalchemy import create_engine engine = create_engine('drill+...
Apache Drill的Node.js客户端Drillnode是使用REST API从Node.js连接并在Apache Drill上执行查询的客户端。安装使用NPM软件包管理器安装Drillone npm install drillnode入门// Requirementsconst Drill = require ( '...
用于 Apache Drill 的 Ruby 客户端 安装 首先, 。 对于 Homebrew,请使用: brew install apache-drill drill-embedded 并将这一行添加到您的应用程序的 Gemfile 中: gem 'drill-sergeant' 如何使用 创建...
史上最全大数据技术全套教程,包括: 分布式存储系统 大数据基础 大数据处理框架 大数据管理与监控 实时计算 数据仓库 数据分析工具 数据湖 数据集成工具 消息队列 等流行技术的系列教程
网址工具一组用于URL的Apache Drill UDF 它使用 Java库进行解析。UDFs 包括以下UDF: url_parse(url-string) :输入URL / URI字符串后,将在地图中返回一组字段( url , scheme , username , password , host ,...