`
sunwinner
  • 浏览: 202491 次
  • 性别: Icon_minigender_1
  • 来自: 上海
社区版块
存档分类
最新评论
文章列表
  Shellsort is a simple extension of insertion sort that gains speed by allowing exchanges of array entries that are far apart, to produce partially sorted arrays that can be efficiently sorted, eventually by insertion sort.   The idea is to rearrange the array to give it the property that takin ...
Generally we compare algorithms by ■ Implementing and debugging them ■ Analyzing their basic properties ■ Formulating a hypothesis about comparative performance ■ Running experiments to validate the hypothesis   These steps are nothing more than the time-honored scientific method, applied to ...
As in selection sort, the items to the left of the current index are in sorted order during the sort, but they are not in their final position, as they may have to be moved to make room for smaller items encountered later. The array is, however, fully sorted when the index reaches the right end.   ...
  One of the simplest sorting algorithms works as follows: First, find the smallest item in the array and exchange it with the first entry (itself if the first entry is already the smallest). Then, find the next smallest item and exchange it with the sec- ond entry. Continue in this way until the ...
  The problem that we consider is not a toy problem; it is a fundamental computational task, and the solution that we develop is of use in a variety of applications, from percolation in physical chemistry to connectivity in communications networks. We start with a simple solution, then seek to und ...
Availability Availability in the context of HBase can be defined as the ability of the system to handle failures. The most common failures cause one or more nodes in the HBase cluster to fall off the cluster and stop serving requests. This could be because of hardware on the node failing or the so ...
Pig version: [root@n8 examples]# pig -version Apache Pig version 0.11.0-cdh4.3.0 (rexported) compiled May 27 2013, 20:48:21  Hadoop version: [root@n8 examples]# hadoop version Hadoop 2.0.0-cdh4.3.0 Subversion file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hadoop-2. ...
Env: Cloudera Manager 4.6.1 with CDH4.3 Hadoop 2.0.0-CDH4.3 Hive 0.10.0-CDH4.3 CentOS 6.4 X86_64   Hive started successfully:   [root@n8 hive]# netstat -anlp | grep 10000 tcp 0 0 0.0.0.0:10000 0.0.0.0:* LISTEN 21739/java     What I want to do: connect to Hive server via Hive Th ...
在学习Hive的过程中我经常遇到的问题是没有合适的数据文件,比如在读《Programming Hive》这本书的时候就因为Employees这张表没有提供示例数据而倍感挫折。因为Hive默认用'\001'(Ctrl+A)作为字段(Fields)分隔符,'\002'(Ctrl+B)作为集合元素(Collections Items)分隔符,'\003'作为Map类型Key/Values分隔符。在编辑器中插入这几个控制字符(Control Character)需要特殊的处理,而且每个编辑器都不同。下面我要介绍两种办法来生成Hive可以使用的数据文件,其中一种办法就是直接在Vi中插入控制字符,另外一 ...
运行环境Cloudera Hive 0.10-CDH4   在我机器上安装的Hive里有如下的表:   hive (human_resources)> describe formatted employees; OK col_name data_type comment # col_name             data_type           comment                name                       string                                            None   ...
Cascading is a data processing API and processing query planner used for defining, sharing, and executing data-processing workflows on a single computing node or distributed computing cluster. On a single node, Cascading's "local mode" can be used to efficiently test code and process loca ...
If you know Hadoop, you're undoubtedly have seen WordCount before, WordCount serves as a hello world for Hadoop apps. This simple program provides a great test case for parallel processing: It requires a minimal amount of code. It demonstrates use of both symbolic and numeric values It shows a ...
Apache Crunch is a Java library for creating MapReduce pipelines that is based on Google's FlumeJava library. Like other high-level tools for creating MapReduce jobs, such as Apache Hive, Apache Pig, and Cascading, Crunch provides a library of patterns to implement common tasks like joining data, p ...
The Apache Crunch Java library provides a framework for writing, testing, and running MapReduce pipelines. Its goal is to make pipelines that are composed of many user-defined functions simple to write, easy to test, and efficient to run. Running on top of Hadoop MapReduce, the Apache Crunch librar ...
When a job is in sorting or merging phase, Hadoop leverage RawComparator for the map output key to compare keys. Built-in Writable classes such as IntWritable have byte-level implementation that are fast because they don't require the byte form of the object to be unmarshalled to Object form for th ...
Global site tag (gtag.js) - Google Analytics