Download from https://github.com/xetorthio/jedis/releases #tar xzf jedis-jedis-2.5.1.tar.gz #cd jedis-jedis-2.5.1 #./gradlew jar         References https://github.com/xetorthio/jedis#jedis http://yale.iteye.com/blog/1022186

high quality blogs

                References http://strata.oreilly.com/2013/03/large-scale-data-collection-and-real-time-analytics-using-redis.html

common errors solution

1.when i create a hive table in hue, there errors comes   Solution:#hadoop dfsadmin -safemode leave http://www.linkedin.com/groups/Creating-table-in-Hive-getting-4547204.S.225243871   2.error    solution: edit hadoop-env.sh # The maximum amount of heap to use, in MB. Default is 1000.export ...

Chain MapReduce Jobs

                    References http://stackoverflow.com/questions/2499585/chaining-multiple-mapreduce-jobs-in-hadoop https://developer.yahoo.com/hadoop/tutorial/module4.html#chaining http://blogs.msdn.com/b/avkashchauhan/archive/2012/03/29/how-to-chain-multiple-mapreduce-jobs-in-hadoop ...
0 reducer means reduce step will be skipped and mapper output will be the final out Identity reducer means then shuffling/sorting will still take place If you do not need sorting of map results - you set 0 reduced,and the job is called map only. If you need to sort the mapping results, but do ...
The number of map tasks for a given job is driven by the number of input splits and not by the mapred.map.tasks parameter. For each input split a map task is spawned. So, over the lifetime of a mapreduce job the number of map tasks is equal to the number of input splits. mapred.map.tasks is just a ...

mysql full jion

full jion http://stackoverflow.com/questions/7978663/mysql-full-join https://dev.mysql.com/worklog/task/?id=1604     self join http://databases.about.com/od/sql/a/selfjoin.htm
                  Some example codes here https://github.com/adamjshook/mapreducepatterns/tree/master/MRDP/src/main/java/mrdp   https://github.com/adamjshook/mapreducepatterns/blob/master/MRDP/src/main/java/mrdp/ch3/TopTenDriver.java
  Since MapReduce and HDFS are complex distributed systems that run arbitrary user code, there’s no hard and fast set of rules to achieve optimal performance; instead, I tend to think of tuning a cluster or job much like a doctor would treat a sick human being. There are a number of key symptoms to ...
I want to import data with tree fileds from mysql table to hdfs with hive metastore integaration. I want to the data format stored in hdfs likes field1,field2,field3 .... which these fileds in a record delimilated by comma(,), to achive this format, two things should be pay attention to. a. sqo ...
1.configure parameters  by Run -> Run Configurations->Java Applications --> Arguments --input hdfs:// --output hdfs:// --booleanData true -s  com.inoknok.math.similarity.cooccurrence.measures.Cooccu ...
when I run #yum repolist, error comes   yum Error: Cannot retrieve metalink for repository: epel. S:vi /etc/yum.repos.d/epel.repo change https to http     ===== set mysqld start on boot #chkconfig mysqld --list #chkconfig mysqld on #chkconfig mysqld --list
Searching and Replacing :s/string :s/pattern/replace/ :%s/pattern/replace/    

Mahout: qulity blogs

  http://blog.csdn.net/zwan0518/article/details/9100329 https://www.ibm.com/developerworks/library/j-mahout-scaling/ http://mail-archives.apache.org/mod_mbox/mahout-user/201202.mbox/%3C1328197877.20898.YahooMailNeo@web39405.mail.mud.yahoo.com%3E
Clustering a collection involves three things: An algorithm A notion of both similarity and dissimilarity A stopping condition   Measuring the similarity of items   The most important issue in clustering is finding a function that quantifies the similarity between any two data points as a ...
Global site tag (gtag.js) - Google Analytics