- 浏览: 497932 次
- 性别:
- 来自: 北京
文章分类
最新评论
-
springdata_spring:
可以参考最新的文档:如何在eclipse jee中检出项目并转 ...
maven archetype:generate 的进一步理解 -
springaop_springmvc:
apache lucene开源框架demo使用实例教程源代码下 ...
lucene 使用教程<转> -
springmvc-freemarker:
可以参考最新的文档:如何在eclipse jee中检出项目并转 ...
maven 简单实用教程 -
nich002:
zealot 写道顶,推荐maven开发过程系列 大家不要点这 ...
maven 简单实用教程 -
刘宇斌:
您好 有个问题想请教您一下 您这个是通过jdbc连接的,如何 ...
云计算实战 (海量日志管理)hive -- hive + hiveclient (hive 客户端)
Scribe can be configured with:
- the file specified in the -c command line option
- the file at DEFAULT_CONF_FILE_LOCATION in env_default.h
Global Configuration Variables
port: assigned to variable “port”
- which port the scribe server will listen on
- default 0, passed at command line with -p, can also be set in conf file
max_msg_per_second:
- used in scribeHandler::throttleDeny
- default 0
- the default value is 0 and this parameter is ignored if the value is 0. With recent changes this parameter has become less relevant, and max_queue_size should be the parameter used for throttling bussiness
max_queue_size: in bytes
- used in scribeHandler::Log
- default 5,000,000 bytes
check_interval: in seconds
- used to control how often to check each store
- default 5
new_thread_per_category: yes/no
- If yes, will create a new thread for every category seen. Otherwise, will only create a single thread for every store defined in the configuration.
- For prefix stores or the default store, setting this parameter to “no” will cause all messages that match this category to get processed by a single store. Otherwise, a new store will be created for each unique category name.
- default yes
num_thrift_server_threads:
- Number of threads listening for incoming messages
- default 3
Example:
port=1463
max_msg_per_second=2000000
max_queue_size=10000000
check_interval=3
Store Configuration
Scribe Server determines how to log messages based on the Stores defined in the configuration. Every store must specify what message category it handles with three exceptions:
default store: The ‘default’ category handles any category that is not handled by any other store. There can only be one default store.
- category=default
prefix stores: If the specified category ends in a *, the store will handle all categories that begin with the specified prefix.
- category=web*
multiple categories: Can use ‘categories=’ to create multiple stores with a single store definition.
- categories=rock paper* scissors
In the above three cases, Scribe will create a subdirectory for each unique category in File Stores (unless new_thread_per_category is set to false).
Store Configuration Variables
category: Determines which messages are handled by this store
type:
- file
- buffer
- network
- bucket
- thriftfile
- null
- multi
target_write_size: 16,384 bytes by default
- determines how large to let the message queue grow for a given category before processing the messages
max_batch_size: 1,024,000 bytes by default (may not be in open-source yet)
- determines the amount of data from the in-memory store queue to be handled at a time. In practice, this (together with buffer file rotation size) controls how big a thrift call can be.
max_write_interval: 10 seconds by default
- determines how long to let the messages queue for a given category before processing the messages
must_succeed: yes/no
- Whether to requeue messages and retry if a store failed to process messages.
- If set to ‘no’, messages will be dropped if the store cannot process them.
- Note: We recommended using Buffer Stores to specify a secondary store to handle logging failures.
- default yes
Example:
<store>
category=statistics
type=file
target_write_size=20480
max_write_interval=2
</store>
File Store Configuration
File Stores write messages to a file.
file_path: defaults to “/tmp”
base_filename: defaults to category name
use_hostname_sub_directory: yes/no, default no
- Create a subdirectory using the server’s hostname
sub_directory: string
- Create a subdirectory with the specified name
rotate_period: “hourly”, “daily”, “never”, or number[suffix]; “never” by default
- determines how often to create new files
- suffix may be “s”, “m”, “h”, “d”, “w” for seconds (the default), minutes, hours, days and weeks, respectively
rotate_hour: 0-23, 1 by default
- if rotation_period is daily, determines what hour of day to rotate
rotate_minute 0-59, 15 by default
- if rotation_period is daily or hourly, determines how many minutes after the hour to rotate
max_size: 1,000,000,000 bytes by default
- determines approximately how large to let a file grow before rotating to a new file
write_meta: “yes” or anything else; false by default
- if the file was rotated, the last line will contain "scribe_meta: " followed by the next filename
fs_type: supports two types “std” and “hdfs”. “std” by default
chunk_size: 0 by default. If a chunk size is specified, no messages within the file will cross chunk boundaries unless there are messages larger than the chunk size
add_newlines: 0 or 1, 0 by default
- if set to 1, will write a newline after every message
create_symlink: “yes” or anything else; “yes” by default
- if true, will maintain a symlink that points to the most recently written file
write_stats: yes/no, yes by default
- whether to create a scribe_stats file for each store to keep track of files written
max_write_size: 1000000 bytes by default. The file store will try to flush the data out to the file system in chunks of max_write_size of bytes. max_write_size cannot be more than max_size. Say due to target_write_size a certain number of messages were buffered. And then the file store was called to save these messages. The file-store will save these messages at least max_write_size bytes sized chunks at a time. The last write that the file store will make can be smaller than max_write_size.
Example:
<store>
category=sprockets
type=file
file_path=/tmp/sprockets
base_filename=sprockets_log
max_size=1000000
add_newlines=1
rotate_period=daily
rotate_hour=0
rotate_minute=10
max_write_size=4096
</store>
Network Store Configuration
Network Stores forward messages to other Scribe Servers. Scribe keeps persistent connections open as long as it is able to send messages. (It will only re-open a connection on error or if the downstream machine is overloaded). Scribe will send messages in batches during normal operation based on how many messages are currently sitting in the queue waiting to be sent. (If Scribe is backed up and buffering messages to local disk, Scribe will send messages in chunks based on the buffer file sizes.)
remote_host: name or ip of remote host to forward messages
remote_port: port number on remote host
timeout: socket timeout, in MS; defaults to DEFAULT_SOCKET_TIMEOUT_MS, which is set to 5000 in store.h
use_conn_pool: “yes” or anything else; defaults to false
- whether to use connection pooling instead of opening up multiple connections to each remote host
Example:
<store>
category=default
type=network
remote_host=hal
remote_port=1465
</store>
Buffer Store Configuration
Buffer Stores must have two sub-stores named “primary” and “secondary”. Buffer Stores will first attempt to Log messages to the primary store and only log to the secondary if the primary is not available. Once the primary store comes back online, a Buffer store will read messages out of the secondary store and send them to the primary store (unless replay_buffer=no). Only stores that are readable (store that implement the readOldest() method) may be used as secondary store. Currently, the only readable stores are File Stores and Null Stores.
max_queue_length: 2,000,000 messages by default
- if the number of messages in the queue exceeds this value, the buffer store will switch to writing to the secondary store
buffer_send_rate: 1 by default
- determines, for each check_interval, how many times to read a group of messages from the secondary store and send them to the primary store
retry_interval: 300 seconds by default
- how long to wait to retry sending to the primary store after failing to write to the primary store
retry_interval_range: 60 seconds by default
- will randomly pick a retry interval that is within this range of the specified retry_interval
replay_buffer: yes/no, default yes
- If set to ‘no’, Buffer Store will not remove messages from the secondary store and send them to the primary store
Example:
<store>
category=default
type=buffer
buffer_send_rate=1
retry_interval=30
retry_interval_range=10
<primary>
type=network
remote_host=wopr
remote_port=1456
</primary>
<secondary>
type=file
file_path=/tmp
base_filename=thisisoverwritten
max_size=10000000
</secondary>
</store>
Bucket Store Configuration
Bucket Stores will hash messages to multiple files using a prefix of each message as the key.
You can define each bucket implicitly(using a single ‘bucket’ definition) or explicitly (using a bucket definition for every bucket). Bucket Stores that are defined implicitly must have a substore named “bucket” that is either a File Store, Network store or ThriftFile Store (see examples).
num_buckets: defaults to 1
- number of buckets to hash into
- messages that cannot be hashed into any bucket will be put into a special bucket number 0
bucket_type: “key_hash”, “key_modulo”, or “random”
delimiter: must be an ascii code between 1 and 255; otherwise the default delimiter is ‘:’
- The message prefix up to(but not including) the first occurrence of the delimiter will be used as the key to do the hash/modulo. ‘random’ hashing does not use a delimiter.
remove_key: yes/no, defaults to no
- whether to remove the key prefix from each message.
bucket_subdir: the name of each subdirectory will be this name followed by the bucket number if a single ‘bucket’ definition is used
Example:
<store>
category=bucket_me
type=bucket
num_buckets=5
bucket_subdir=bucket
bucket_type=key_hash
delimiter=58
<bucket>
type=file
fs_type=std
file_path=/tmp/scribetest
base_filename=bucket_me
</bucket>
</store>
Instead of using a single ‘bucket’ definition for all buckets, you can specify each bucket explicitly:
<store>
category=bucket_me
type=bucket
num_buckets=2
bucket_type=key_hash
<bucket0>
type=file
fs_type=std
file_path=/tmp/scribetest/bucket0
base_filename=bucket0
</bucket0>
<bucket1>
...
</bucket1>
<bucket2>
...
</bucket2>
</store>
You can also bucket into network stores as well:
<store>
category=bucket_me
type=bucket
num_buckets=2
bucket_type=random
<bucket0>
type=file
fs_type=std
file_path=/tmp/scribetest/bucket0
base_filename=bucket0
</bucket0>
<bucket1>
type=network
remote_host=wopr
remote_port=1463
</bucket1>
<bucket2>
type=network
remote_host=hal
remote_port=1463
</bucket2>
</store>
Null Store Configuration
Null Stores can be used to tell Scribe to ignore all messages of a given category.
(no configuration parameters)
Example:
<store>
category=tps_report*
type=null
</store>
Multi Store Configuration
A Multi Store is a store that will forward all messages to multiple sub-stores.
A Multi Store may have any number of substores named “store0”, “store1”, “store2”, etc
report_success: “all” or “any”, defaults to “all”
- whether all substores or any substores must succeed in logging a message in order for the Multi Store to report the message logging as successful
Example:
<store>
category=default
type=multi
target_write_size=20480
max_write_interval=1
<store0>
type=file
file_path=/tmp/store0
</store0>
<store1>
type=file
file_path=/tmp/store1
</store1>
</store>
Thriftfile Store Configuration
A Thriftfile store is similar to a File store except that it stores messages in a Thrift TFileTransport file.
file_path: defaults to “/tmp”
base_filename: defaults to category name
rotate_period: “hourly”, “daily”, “never”, or number[suffix]; “never” by default
- determines how often to create new files
- suffix may be “s”, “m”, “h”, “d”, “w” for seconds (the default), minutes, hours, days and weeks, respectively
rotate_hour: 0-23, 1 by default
- if rotation_period is daily, determines what hour of day to rotate
rotate_minute 0-59, 15 by default
- if rotation_period is daily or hourly, determines how many minutes after the hour to rotate
max_size: 1,000,000,000 bytes by default
- determines approximately how large to let a file grow before rotating to a new file
fs_type: currently only “std” is supported; “std” by default
chunk_size: 0 by default
- if a chunk size is specified, no messages within the file will cross chunk boundaries unless there are messages larger than the chunk size
create_symlink: “yes” or anything else; “yes” by default
- if true, will maintain a symlink that points to the most recently written file
flush_frequency_ms: milliseconds, will use TFileTransport default of 3000ms if not specified
- determines how frequently to sync the Thrift file to disk
msg_buffer_size: in bytes, will use TFileTransport default of 0 if not specified
- if non-zero, store will reject any writes larger than this size
Example:
<store>
category=sprockets
type=thriftfile
file_path=/tmp/sprockets
base_filename=sprockets_log
max_size=1000000
flush_frequency_ms=2000
</store>
发表评论
-
kvm虚拟化之路 --虚拟机双网卡(外网and内网)
2012-02-02 10:11 9241宿主机层面: 需要分别从两块网卡建立两个网桥:eth0- ... -
基于KVM的虚拟化之路 -- 虚拟机自动创建
2012-02-02 10:05 27871、虚拟机资源配置(cpu、内存、磁盘空间、等的分配) ... -
HBase 架构101 –预写日志系统 (WAL)
2011-02-21 11:22 6486原文:http://www.larsgeo ... -
cloudera 的hbase集群搭建简录 (centos5)
2011-02-10 18:51 6040官方doc https://wiki.clouder ... -
zookeeper Have smaller server identifier
2011-02-10 15:33 8165zookeeper 的一个节点启动时候报错: 2011 ... -
hdfs 的容灾
2011-01-20 10:18 3862dfs.name.dir (namenode的路 ... -
hdfs 配置(namenode ,datanode 如何使用多硬盘)
2011-01-20 09:47 4078fs.default.name To run HD ... -
云计算实战 (海量日志管理)hive -- hive + hiveclient (hive 客户端)
2010-12-03 15:39 17623五个节点的hadoop服务器集群搭建完成后,得想办 ... -
Hadoop Name node is in safe mode
2010-11-19 13:39 1362运行hadoop程序时,有时候会报以下错误: org.apac ... -
云计算实战 (海量日志管理)hadoop + scribe -- log4j 客户端写入scribe
2010-11-02 09:04 5321上一篇文章已经安装完scribe,下面我们用ja ... -
云计算实战 (海量日志管理)hadoop + scribe -- scribe 介绍和安装
2010-10-28 17:50 4893scribe 介绍: scribe 是facebook ...
相关推荐
综上所述,通过结合使用 Scribe、Hadoop、Log4j、Hive 和 MySQL 这些工具和技术,可以构建一套完整的日志数据采集、存储、处理和分析的解决方案。这对于深入理解用户行为、提升系统性能等方面都具有重要意义。
【scribe-apache-1.5.0.zip】是一款基于Apache许可的开源项目,主要用于日志收集和管理系统。Scribe最初由Facebook开发并贡献给了开源社区,它设计的目的是在一个高流量、分布式环境中集中处理和存储来自不同服务的...
7. **集群监控与日志管理**:了解如何配置和使用Hadoop自带的监控工具,如Nagios、Ganglia,以及日志聚合工具Flume和Scribe,以便于监控集群状态和调试问题。 8. **故障排查**:提供常见问题及解决方案,帮助用户...
- **配置文件**: 包括`scribe.conf`,这是scribe的主要配置文件,定义了监听端口、日志类别、后端存储等设置。 - **示例脚本**: 如启动和停止scribereceiver的脚本,帮助用户快速部署和管理scribe服务。 - **依赖库*...
在大型分布式环境中,如Hadoop或Facebook等,scribe扮演着至关重要的角色,它允许各个节点将日志数据发送到中心位置进行处理和分析。下面我们将深入探讨scribe的安装过程,以及如何编写安装脚本。 ### 一、scribe...
scribe客户端是用于集成日志记录系统的一个开源库,主要针对分布式环境中的日志管理。它支持多种日志接收协议,并且可以轻松地将日志数据发送到各种后端存储,如Hadoop、Cassandra或者Elasticsearch等。在这个压缩包...
Scribe适合简单日志收集和存储,Chukwa适用于Hadoop环境,Kafka强调高吞吐和持久化,而Flume则提供了更灵活的数据流管理。在实际应用中,可以根据业务场景、性能要求和可扩展性需求来选择最适合的系统。
Facebook Scribe 是一个专门为大规模日志管理设计的系统,它简化了日志收集的过程,提高了日志处理的效率。在需要处理大量日志数据的环境中,Scribe提供了可靠、可扩展的解决方案,为运维人员提供了有力的工具。同时...
本文主要对比了四个知名的开源日志系统:Facebook的Scribe、Apache的Chukwa、LinkedIn的Kafka以及Cloudera的Flume。这四个系统各自拥有独特的设计和优势,适用于不同的场景。 1. Facebook的Scribe Scribe是一款由...
1. **Flume**:与Scribe类似,Apache Flume也是用于日志聚合的工具,但它更注重于容错和可配置性,适用于更复杂的日志收集场景。 2. **Kafka**:Kafka是一个消息队列系统,专注于实时数据流处理,通常用于在系统...
### Hadoop在雅虎的应用详解 #### 一、引言 随着互联网的飞速发展,海量数据处理成为了各个大型互联网公司的必备技术能力。雅虎作为全球知名的互联网公司,在早期便开始采用并发展Hadoop这一开源分布式计算框架来...
《海量日志分析架构与处理经验》 在现代互联网企业中,日志数据的管理和分析是至关重要的。作为中国互联网巨头之一的百度,其在处理海量日志数据方面积累了丰富的经验。陈晓明先生在这一领域的分享为我们揭示了百度...
【开源日志系统】日志管理系统在现代信息技术环境中扮演着至关重要的角色,特别是在大数据时代,海量日志数据的处理和分析对于监控、诊断以及优化应用性能至关重要。本文将重点介绍几种流行的开源日志系统,包括...
Chukwa是Apache基金会的一个项目,主要用于大型Hadoop集群的日志管理和分析。它结合了Hadoop的组件,如HDFS和MapReduce,以处理和分析大量日志数据: - Adaptor:作为数据源接口,可以适应不同的输入源,例如...
本文主要探讨了几个流行的开源日志系统,包括Facebook的Scribe、Apache的Chukwa以及LinkedIn的Kafka,这些都是处理海量日志数据的有效工具。 1. Facebook的Scribe Scribe是一款由Facebook开发的日志收集系统,广泛...
Hadoop不仅仅局限于单一的计算模式,随着大数据需求的多样化,出现了如Scribe和Flume这样的日志收集系统,以及Impala这样的实时查询引擎。这些工具进一步扩展了Hadoop在大数据领域的应用范围。 总的来说,Hadoop是...
再者,论文可能会讨论Facebook的云服务架构,如其内部的开源项目如Thrift和Scribe,这些工具用于跨不同服务进行数据传输和日志聚合,确保系统的可扩展性和灵活性。Facebook也可能使用容器化技术(如Docker)和编排...
首先,通过定制的日志收集工具,如Flume或Scribe,将分散在各个服务器上的搜索引擎日志实时汇聚到Hadoop集群。这些日志包含了用户的搜索关键词、点击行为、访问时间等多种信息,是理解用户行为和优化搜索引擎性能的...