THREAD TEST

博客分类：

scala

val THREAD_POOL_SIZE = 10 val THREAD_POOL = Executors.newScheduledThreadPool(THREAD_POOL_SIZE); THREAD_POOL.scheduleWithFixedDelay(new Runnable() { def run() { otsQueueProcess } }, 0, 60000, TimeUnit.MILLISECONDS);

2017-09-12 18:07
浏览 394
评论(0)
分类:非技术

thriftserver dynamicallocation

博客分类：

spark 学习

./sbin/start-thriftserver.sh --hiveconf hive.server2.thrift.port=9998 --hiveconf hive.server2.thrift.bind.host=ip --master yarn --deploy-mode client --conf spark.shuffle.service.enabled=true --conf spark.shuffle.service.port=7337 --conf spark.dynamicAllocation.enabled=true --conf spark.dynamicAlloc ...

2017-09-08 14:41
浏览 608
评论(0)
分类:非技术

json

博客分类：

scala

val gson: Gson = new GsonBuilder().create def jsonToMap(jsonstring: String): java.util.Map[String, String] = { val typeOfHashMap: Type = new TypeToken[java.util.Map[String, String]]() { }.getType val newMap: java.util.Map[String, String] = gson.fromJson(jsonstring, typeOfHashMap) ...

2017-09-07 10:21
浏览 410
评论(0)
分类:非技术

test code2

博客分类：

spark 学习

package org.test.udf import com.google.gson.{Gson, GsonBuilder} import org.apache.spark.sql.Row import org.apache.spark.sql.api.java.UDF2 import org.apache.spark.sql.expressions.{MutableAggregationBuffer, UserDefinedAggregateFunction} import org.apache.spark.sql.types._ import scala.collection.mutab ...

2017-09-03 13:45
浏览 516
评论(0)
分类:非技术

test code

博客分类：

spark 学习

def taskcal(data:Array[(String,Long)],rt:Array[String],wd:Int):Array[Boolean]={ val result = Array.fill[Boolean](rt.length)(false) val sortData = data.sortBy(_._2) val indexArrayLength = rt.length - 1 var startTimeArray = Array.fill[Long](rt.length)(0l) val indexMap = rt.map(item ...

2017-08-24 17:52
浏览 305
评论(0)
分类:非技术

struct streaming SQL udf udaf

博客分类：

spark 学习

spark aggregator class HllcdistinctByte extends Aggregator[Row, HLLCounter, Array[Byte]] { // A zero value for this aggregation. Should satisfy the property that any b + zero = b def zero: HLLCounter = new HLLCounter(14) // Combine two values to produce a new value. For performance ...

2017-08-22 11:50
浏览 693
评论(0)
分类:非技术

pipiline tf token

博客分类：

spark 学习

import org.apache.spark.ml.{Pipeline, PipelineModel} import org.apache.spark.ml.classification.LogisticRegression import org.apache.spark.ml.feature.{HashingTF, Tokenizer} import org.apache.spark.ml.linalg.Vector import org.apache.spark.sql.Row // Prepare training documents from a list of (id, text, ...

2017-08-16 18:29
浏览 377
评论(0)
分类:非技术

struct streaming SQL udf udaf

博客分类：

spark 学习

object StructuredNetworkWordCount { def main(args: Array[String]) { if (args.length < 2) { System.err.println("Usage: StructuredNetworkWordCount <hostname> <port>") System.exit(1) } val host = args(0) val port = args(1).toInt val spark = Sp ...

2017-08-15 18:06
浏览 716
评论(0)
分类:非技术

spark , jar

博客分类：

spark 学习

cat conf/spark-defaults.conf spark.yarn.jars hdfs:/app/jars/*.jar

2017-08-15 16:48
浏览 436
评论(0)
分类:非技术

curreying function

博客分类：

scala

benchmark2("hllc")(10000000)(hcclcodeanddecode2) benchmark("hllc")(10000000)(hcclcodeanddecode) def hcclcodeanddecode() :Unit = { val hllc = new HLLCounter(14) hllc.add("adsfasdfawerwfadfs") val bytes1 = ByteBuffer.allocate(hllc.maxLength()) h ...

2017-08-09 15:27
浏览 303
评论(0)
分类:非技术

描述统计

博客分类：

数据-通用数据分析

import org.apache.commons.math3.stat.descriptive.moment._ def vLTreeDigesttest = { val ttCnt = 10000 val myDigestOrg: AVLTreeDigest = TDigest.createAvlTreeDigest(100).asInstanceOf[AVLTreeDigest] val orgCollection = new mutable.ArrayBuffer[Double]() for(i <- 0 to ttCnt){ ...

2017-08-09 11:50
浏览 497
评论(0)
分类:非技术

udaf self define type

博客分类：

spark 学习

class HllcdistinctByte extends Aggregator[Row, HLLCounter, Array[Byte]] { // A zero value for this aggregation. Should satisfy the property that any b + zero = b def zero: HLLCounter = new HLLCounter(14) // Combine two values to produce a new value. For performance, the function may mod ...

2017-07-25 16:20
浏览 388
评论(0)
分类:非技术

setup notebook

博客分类：

tensorflow

python

pip 146 sudo python ez_setup.py 147 python setup.py intall 148 python setup.py install 149 pip list tersonflow 150 pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.12.0-cp27-none-linux_x86_64.whl notebook 158 pip i ...

2017-02-11 19:19
浏览 685
评论(0)
分类:非技术

file io

博客分类：

scala

window 下通过 source 读文件各种鬼改用 BufferedReader 用于删除文件中中文 object ChineseDrop extends App { // val stArray = Array("胜多负少","abadsf","13123123") // stArray.foreach( word => println(s" $word is ${isChinese(word)} ")) //G:\\fromHD\\勇敢的心\\勇敢的心.srt ...

2017-02-04 20:10
浏览 313
评论(0)
分类:非技术

settings.xml 这个库真的快阿里云做了件好事

博客分类：

spark 学习

settings.xml 这个库真的快阿里云做了件好事

$M2_HOME/conf/settings.xml 尼玛，这个库真的快阿里云做了件好事 <mirror> <id>alimaven</id> <name>aliyun maven</name> <url>http://maven.aliyun.com/nexus/content/groups/public/</url> <mirrorOf>central</mirrorOf> ...

2016-11-07 21:56
浏览 496
评论(0)
分类:互联网

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

THREAD TEST

thriftserver dynamicallocation

json

test code2

test code

struct streaming SQL udf udaf

pipiline tf token

struct streaming SQL udf udaf

spark , jar

curreying function

描述统计

udaf self define type

setup notebook

file io

settings.xml 这个库真的快阿里云做了件好事

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

最近访客更多访客>>