- 浏览: 246155 次
- 性别:
- 来自: 成都
最新评论
-
oldrat:
https://github.com/oldratlee/tr ...
Kafka: High Qulity Posts
文章列表
Download from https://github.com/xetorthio/jedis/releases
#tar xzf jedis-jedis-2.5.1.tar.gz
#cd jedis-jedis-2.5.1
#./gradlew jar
References
https://github.com/xetorthio/jedis#jedis
http://yale.iteye.com/blog/1022186
high quality blogs
- 博客分类:
- Redis
References
http://strata.oreilly.com/2013/03/large-scale-data-collection-and-real-time-analytics-using-redis.html
common errors solution
- 博客分类:
- Hadoop
1.when i create a hive table in hue, there errors comes
Solution:#hadoop dfsadmin -safemode leave
http://www.linkedin.com/groups/Creating-table-in-Hive-getting-4547204.S.225243871
2.error
solution:
edit hadoop-env.sh
# The maximum amount of heap to use, in MB. Default is 1000.export ...
Chain MapReduce Jobs
- 博客分类:
- Hadoop
References
http://stackoverflow.com/questions/2499585/chaining-multiple-mapreduce-jobs-in-hadoop
https://developer.yahoo.com/hadoop/tutorial/module4.html#chaining
http://blogs.msdn.com/b/avkashchauhan/archive/2012/03/29/how-to-chain-multiple-mapreduce-jobs-in-hadoop ...
0 reducer means reduce step will be skipped and mapper output will be the final out
Identity reducer means then shuffling/sorting will still take place
If you do not need sorting of map results - you set 0 reduced,and the job is called map only.
If you need to sort the mapping results, but do ...
The number of map tasks for a given job is driven by the number of input splits and not by the mapred.map.tasks parameter. For each input split a map task is spawned. So, over the lifetime of a mapreduce job the number of map tasks is equal to the number of input splits. mapred.map.tasks is just a ...
mysql full jion
- 博客分类:
- mysql
full jion
http://stackoverflow.com/questions/7978663/mysql-full-join
https://dev.mysql.com/worklog/task/?id=1604
self join
http://databases.about.com/od/sql/a/selfjoin.htm
TopK problem in Hadoop
- 博客分类:
- Hadoop
Some example codes here
https://github.com/adamjshook/mapreducepatterns/tree/master/MRDP/src/main/java/mrdp
https://github.com/adamjshook/mapreducepatterns/blob/master/MRDP/src/main/java/mrdp/ch3/TopTenDriver.java
Since MapReduce and HDFS are complex distributed systems that run arbitrary user code, there’s no hard and fast set of rules to achieve optimal performance; instead, I tend to think of tuning a cluster or job much like a doctor would treat a sick human being. There are a number of key symptoms to ...
I want to import data with tree fileds from mysql table to hdfs with hive metastore integaration. I want to the data format stored in hdfs likes
field1,field2,field3
....
which these fileds in a record delimilated by comma(,), to achive this format, two things should be pay attention to.
a. sqo ...
1.configure parameters by Run -> Run Configurations->Java Applications --> Arguments
--input hdfs://192.168.122.1:2014/user/zhaohj/mahout/item.txt --output hdfs://192.168.122.1:2014/user/zhaohj/mahout/output1 --booleanData true -s com.inoknok.math.similarity.cooccurrence.measures.Cooccu ...
when I run #yum repolist, error comes
yum Error: Cannot retrieve metalink for repository: epel.
S:vi /etc/yum.repos.d/epel.repo change https to http
=====
set mysqld start on boot
#chkconfig mysqld --list
#chkconfig mysqld on
#chkconfig mysqld --list
Searching and Replacing
:s/string
:s/pattern/replace/
:%s/pattern/replace/
Mahout: qulity blogs
- 博客分类:
- Mahout
http://blog.csdn.net/zwan0518/article/details/9100329
https://www.ibm.com/developerworks/library/j-mahout-scaling/
http://mail-archives.apache.org/mod_mbox/mahout-user/201202.mbox/%3C1328197877.20898.YahooMailNeo@web39405.mail.mud.yahoo.com%3E
Clustering a collection involves three things:
An algorithm
A notion of both similarity and dissimilarity
A stopping condition
Measuring the similarity of items
The most important issue in clustering is finding a function that quantifies the similarity between any two data points as a ...