基础数据结构和算法四：Shell sort

博客分类：

Algorithm

Shellsort is a simple extension of insertion sort that gains speed by allowing exchanges of array entries that are far apart, to produce partially sorted arrays that can be efficiently sorted, eventually by insertion sort. The idea is to rearrange the array to give it the property that takin ...

2013-11-20 19:11
浏览 1179
评论(0)
分类:编程语言

Comparing two sorting algorithms

博客分类：

Algorithm

Benchmark Algorithm

Generally we compare algorithms by ■ Implementing and debugging them ■ Analyzing their basic properties ■ Formulating a hypothesis about comparative performance ■ Running experiments to validate the hypothesis These steps are nothing more than the time-honored scientific method, applied to ...

2013-11-19 21:16
浏览 841
评论(0)
分类:编程语言

基础数据结构和算法三：Insertion Sort

博客分类：

Algorithm

Insertion sort Algorithm

As in selection sort, the items to the left of the current index are in sorted order during the sort, but they are not in their final position, as they may have to be moved to make room for smaller items encountered later. The array is, however, fully sorted when the index reaches the right end. ...

2013-11-19 21:06
浏览 997
评论(0)
分类:编程语言

基础数据结构和算法二：Selection sort

博客分类：

Algorithm

Selection sort Algorithm

One of the simplest sorting algorithms works as follows: First, find the smallest item in the array and exchange it with the first entry (itself if the first entry is already the smallest). Then, find the next smallest item and exchange it with the sec- ond entry. Continue in this way until the ...

2013-11-19 20:57
浏览 1169
评论(1)
分类:编程语言

基础数据结构和算法一：UnionFind

博客分类：

Algorithm

Algorithm Union find

The problem that we consider is not a toy problem; it is a fundamental computational task, and the solution that we develop is of use in a variety of applications, from percolation in physical chemistry to connectivity in communications networks. We start with a simple solution, then seek to und ...

2013-11-19 20:47
浏览 1277
评论(0)
分类:编程语言

Availability and Reliability with HBase

博客分类：

HBase
Hadoop

Availability Availability in the context of HBase can be defined as the ability of the system to handle failures. The most common failures cause one or more nodes in the HBase cluster to fall off the cluster and stop serving requests. This could be because of hardware on the node failing or the so ...

2013-08-25 10:53
浏览 911
评论(0)
分类:企业架构

Failed to Run Pig Script with Macro

博客分类：

Hadoop
Pig

Pig version: [root@n8 examples]# pig -version Apache Pig version 0.11.0-cdh4.3.0 (rexported) compiled May 27 2013, 20:48:21 Hadoop version: [root@n8 examples]# hadoop version Hadoop 2.0.0-cdh4.3.0 Subversion file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hadoop-2. ...

2013-08-16 19:44
浏览 1597
评论(0)
分类:企业架构

Solution to Hive Thrift Client Hang without Any Return

博客分类：

Hive
Hadoop

Env: Cloudera Manager 4.6.1 with CDH4.3 Hadoop 2.0.0-CDH4.3 Hive 0.10.0-CDH4.3 CentOS 6.4 X86_64 Hive started successfully: [root@n8 hive]# netstat -anlp | grep 10000 tcp 0 0 0.0.0.0:10000 0.0.0.0:* LISTEN 21739/java What I want to do: connect to Hive server via Hive Th ...

2013-08-12 19:38
浏览 2493
评论(0)
分类:企业架构

如何制作Hive数据文件

博客分类：

Hadoop
Hive

在学习Hive的过程中我经常遇到的问题是没有合适的数据文件，比如在读《Programming Hive》这本书的时候就因为Employees这张表没有提供示例数据而倍感挫折。因为Hive默认用'\001'（Ctrl+A）作为字段(Fields)分隔符，'\002'(Ctrl+B)作为集合元素(Collections Items)分隔符，'\003'作为Map类型Key/Values分隔符。在编辑器中插入这几个控制字符(Control Character)需要特殊的处理，而且每个编辑器都不同。下面我要介绍两种办法来生成Hive可以使用的数据文件，其中一种办法就是直接在Vi中插入控制字符，另外一 ...

2013-08-10 12:05
浏览 2630
评论(0)
分类:企业架构

Hive - 创建Index失败，原因暂未知

博客分类：

Hadoop
Hive

运行环境Cloudera Hive 0.10-CDH4 在我机器上安装的Hive里有如下的表： hive (human_resources)> describe formatted employees; OK col_name data_type comment # col_name data_type comment name string None ...

2013-08-10 00:08
浏览 3257
评论(0)
分类:企业架构

Cascading Terminology and Concepts

博客分类：

Hadoop
Cascading

Cascading is a data processing API and processing query planner used for defining, sharing, and executing data-processing workflows on a single computing node or distributed computing cluster. On a single node, Cascading's "local mode" can be used to efficiently test code and process loca ...

2013-08-02 23:17
浏览 1026
评论(0)
分类:企业架构

Cascading Kick Start: Word Counting

博客分类：

Cascading
Hadoop

If you know Hadoop, you're undoubtedly have seen WordCount before, WordCount serves as a hello world for Hadoop apps. This simple program provides a great test case for parallel processing: It requires a minimal amount of code. It demonstrates use of both symbolic and numeric values It shows a ...

2013-07-31 19:36
浏览 899
评论(0)
分类:企业架构

Joins with Apache Crunch

博客分类：

Hadoop
Crunch

Apache Crunch is a Java library for creating MapReduce pipelines that is based on Google's FlumeJava library. Like other high-level tools for creating MapReduce jobs, such as Apache Hive, Apache Pig, and Cascading, Crunch provides a library of patterns to implement common tasks like joining data, p ...

2013-07-30 19:46
浏览 1387
评论(0)
分类:企业架构

Getting Started with Apache Crunch

博客分类：

Crunch
Hadoop

The Apache Crunch Java library provides a framework for writing, testing, and running MapReduce pipelines. Its goal is to make pipelines that are composed of many user-defined functions simple to write, easy to test, and efficient to run. Running on top of Hadoop MapReduce, the Apache Crunch librar ...

2013-07-29 23:10
浏览 1588
评论(0)
分类:企业架构

Accelerating Comparison by Providing RawComparator

博客分类：

Hadoop

When a job is in sorting or merging phase, Hadoop leverage RawComparator for the map output key to compare keys. Built-in Writable classes such as IntWritable have byte-level implementation that are fast because they don't require the byte form of the object to be unmarshalled to Object form for th ...

2013-07-27 21:25
浏览 1238
评论(0)
分类:企业架构

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

基础数据结构和算法四：Shell sort

Comparing two sorting algorithms

基础数据结构和算法三：Insertion Sort

基础数据结构和算法二：Selection sort

基础数据结构和算法一：UnionFind

Availability and Reliability with HBase

Failed to Run Pig Script with Macro

Solution to Hive Thrift Client Hang without Any Return

如何制作Hive数据文件

Hive - 创建Index失败，原因暂未知

Cascading Terminology and Concepts

Cascading Kick Start: Word Counting

Joins with Apache Crunch

Getting Started with Apache Crunch

Accelerating Comparison by Providing RawComparator

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

最近访客更多访客>>