- 浏览: 245724 次
- 性别:
- 来自: 成都
最新评论
-
oldrat:
https://github.com/oldratlee/tr ...
Kafka: High Qulity Posts
文章列表
HighQulity PPT on line
- 博客分类:
- ML
http://www.slideshare.net/yuhuang/large-scale-machine-learning-for-big-data
Spark: Spark Streaming
- 博客分类:
- Spark
Spark Streaming uses a “micro-batch” architecture, where the streaming computation is treated as a continuous series of batch computations on small batches of data. Spark Streaming receives data from various input sources and groups it into small batches. New batches are created at regular time int ...
In distributed mode, Spark uses a master/slave architecture with one central coordinator and many distributed workers. The central coordinator is called the driver.The driver communicates with a potentially large number of distributed workers called executors.The driver runs in its own Java p ...
Host: 192.168.0.135 192.168.0.136 192.168.0.137
master: 137 workers:135 136
1.Install spark on all hosts in /opt dir
2.Install SSH Remote Access
137#ssh-keygen
137#ssh-copy-id -i ~/.ssh/id_rsa.pub root@192.168.0.135
137#ssh-copy-id -i ~/.ssh/id_rsa.pub root@192.168.0.136
3. Conf ...
https://spark.apache.org/docs/latest/cluster-overview.html
This document gives a short overview of how Spark runs on clusters, to make it easier to understand the components involved. Read through theapplication submission guide to submit applications to a cluster.
Components
Spark application ...
In order to flow the data across multiple agents or hops, the sink of the previous agent and source of the current hop need to be avro type with the sink pointing to the hostname (or IP address) and port of the source.
Hop 1:
a1.channels.ch1.type = memory
a1.sources.avro-source1.channel ...
Flume: hbase sink
- 博客分类:
- Flume
flume.conf
a1.sinks.hbase-sink1.channel = ch1
a1.sinks.hbase-sink1.type = hbase
a1.sinks.hbase-sink1.table = users
a1.sinks.hbase-sink1.columnFamily= info
a1.sinks.hbase-sink1.serializer=org.apache.flume.sink.hbase.RegexHbaseEventSerializer
a1.sinks.hbase-sink1.serializer.regex=^(.+)\t(.+)\t( ...
http://kitesdk.org/docs/1.0.0/morphlines/
http://blog.cloudera.com/blog/2013/07/morphlines-the-easy-way-to-build-and-integrate-etl-apps-for-apache-hadoop/
http://kitesdk.org/docs/current/
Neo4j: fulltext search
- 博客分类:
- Neo4j
Model
@Indexed(indexType = IndexType.FULLTEXT, indexName = "TaskTile")
private String title;
Repository
@Query("START n=node:TaskTile({0}) return n")
Iterable<Task> findTasksByTitle(String query);
query string parameter
title:*software*
In ...
In my project, I provide neo4j extentions to clients which send json string to extention while jackson auto parse the json string to my POJO Model. Howerver, I want to simply the json string sent by client .
The POJO Model class likes
@NodeEntity
@XmlRootElement
@JsonAutoDetect
@JsonIgnor ...
I want to search some nodes by date and time. In model,
@Indexed
private int startDate;
@Indexed
private int startTime;
@Indexed
private int endDate;
@Indexed
private int endTime;
A issue should be note that you should do your custom desirializer Josn t ...
I follow the spring data neo4j reference guide to import GeoSpatial function in my project likes
<dependency>
<groupId>org.neo4j</groupId>
<artifactId>neo4j-spatial</artifactId>
<version>0.13-neo4j-2.0.1</version>
</dependency>
...
To lift server performance, when we query some data and result is hug, in which case, large memory will be used to produce the result json string.
One solution is to use streaming JSON responses.
@Path("/fof/{userName}")
@GET
@Produces(MediaType.APPLICATION_JSON)
publ ...