- 浏览: 2181163 次
- 性别:
- 来自: 北京
文章分类
- 全部博客 (682)
- 软件思想 (7)
- Lucene(修真篇) (17)
- Lucene(仙界篇) (20)
- Lucene(神界篇) (11)
- Solr (48)
- Hadoop (77)
- Spark (38)
- Hbase (26)
- Hive (19)
- Pig (25)
- ELK (64)
- Zookeeper (12)
- JAVA (119)
- Linux (59)
- 多线程 (8)
- Nutch (5)
- JAVA EE (21)
- Oracle (7)
- Python (32)
- Xml (5)
- Gson (1)
- Cygwin (1)
- JavaScript (4)
- MySQL (9)
- Lucene/Solr(转) (5)
- 缓存 (2)
- Github/Git (1)
- 开源爬虫 (1)
- Hadoop运维 (7)
- shell命令 (9)
- 生活感悟 (42)
- shell编程 (23)
- Scala (11)
- MongoDB (3)
- docker (2)
- Nodejs (3)
- Neo4j (5)
- storm (3)
- opencv (1)
最新评论
-
qindongliang1922:
粟谷_sugu 写道不太理解“分词字段存储docvalue是没 ...
浅谈Lucene中的DocValues -
粟谷_sugu:
不太理解“分词字段存储docvalue是没有意义的”,这句话, ...
浅谈Lucene中的DocValues -
yin_bp:
高性能elasticsearch ORM开发库使用文档http ...
为什么说Elasticsearch搜索是近实时的? -
hackWang:
请问博主,有用solr做电商的搜索项目?
Solr中Group和Facet的用法 -
章司nana:
遇到的问题同楼上 为什么会返回null
Lucene4.3开发之第八步之渡劫初期(八)
先来看下hue的架构图:
(1)Hue是什么?
Hue是一个可快速开发和调试Hadoop生态系统各种应用的一个基于浏览器的图形化用户接口。
(2)Hue能干什么?
1,访问HDFS和文件浏览
2,通过web调试和开发hive以及数据结果展示
3,查询solr和结果展示,报表生成
4,通过web调试和开发impala交互式SQL Query
5,spark调试和开发
6,Pig开发和调试
7,oozie任务的开发,监控,和工作流协调调度
8,Hbase数据查询和修改,数据展示
9,Hive的元数据(metastore)查询
10,MapReduce任务进度查看,日志追踪
11,创建和提交MapReduce,Streaming,Java job任务
12,Sqoop2的开发和调试
13,Zookeeper的浏览和编辑
14,数据库(MySQL,PostGres,SQlite,Oracle)的查询和展示
(3)Hue怎么用或者什么时候应该用?
如果你们公司用的是CDH的hadoop,那么很幸运,Hue也是出自CDH公司,自家的东西用起来当然很爽。
如果你们公司用的是Apache Hadoop或者是HDP的hadoop,那么也没事,Hue是开源的,而且支持任何版本的hadoop。
关于什么时候用,这纯属一个锦上添花的功能,你完全可以不用hue,因为各种开源项目都有自己的使用方式和开发接口,hue只不过是统一了各个项目的开发方式在一个接口里而已,这样比较方便而已,不用你一会准备使用hive,就开一个hive的cli终端,一会用pig,你就得开一个pig的grunt,或者你又想查Hbase,又得需要开一个Hbase的shell终端。如果你们使用hadoop生态系统的组件很多的情况下,使用hue还是比较方便的,另外一个好处就是hue提供了一个web的界面来开发和调试任务,不用我们再频繁登陆Linux来操作了。
你可以在任何时候,只要能上网,就可以通过hue来开发和调试数据,不用再装Linux的客户端来远程登陆操作了,这也是B/S架构的好处。
(4)如何下载,安装和编译Hue?
centos系统,执行命令:
yum install -y asciidoc cyrus-sasl-devel cyrus-sasl-gssapi gcc gcc-c++ krb5-devel libtidy libxml2-devel libxslt-devel make mysql mysql-devel openldap-devel python-devel sqlite-devel openssl-devel gmp-devel
1,hue的依赖(centos系统)
2,散仙的在安装hue前,centos上已经安装好了,jdk,maven,ant,hadoop,hive,oozie等,环境变量如下:
3,本文散仙主要是采用tar包的方式安装hue,除了tar包的方式,hue还能采用cm安装,当然这就与cdh的系统依赖比较大了。
hue最新的版本是3.8.1,散仙这里用的3.7.0的版本
下载地址:https://github.com/cloudera/hue/releases
hue的github地址:https://github.com/cloudera/hue
4,下载完后,解压tar包,并进入hue的根目录执行命令
make apps编译
5,编译成功后,需要配置/home/search/hue/desktop/conf/pseudo-distributed.ini文件,里面包含了hdfs,yarn,mapreduce,hive,oozie,pig,spark,solr等的ip地址和端口号配置,可根据自己的情况设置,如果没有安装某个应用,那就无须配置,只不过这个应用在web上不能使用而已,并不会影响其他框架的使用。
一个例子如下:
编译好的目录如下:
6,启动hue,执行命令:build/env/bin/supervisor
然后我们就可以访问安装机ip+8000端口来查看了:
工具箱界面:
hive的界面:
在配置hive(散仙这里是0.13的版本)的时候,需要注意以下几个方面:
hive的metastrore的服务和hiveserver2服务都需要启动
执行下面命令
bin/hive --service metastore
bin/hiveserver2
除此之外,还需要关闭的hive的SAL认证,否则,使用hue访问会出现问题。
注意下面三项的配置
除了上面的配置外,还需要把hive.server2.long.polling.timeout的参数值,默认是5000L给改成5000,否则使用beenline连接时候,会出错,这是hive的一个bug。
pig的界面:
solr的界面如下:
最后需要注意一点,hue也需要在hadoop的core-site.xml里面配置相应的代理用户,示例如下:
ok至此,我们的hue已经能完美工作了,我们可以根据自己的需要,定制相应的app插件,非常灵活!
(1)Hue是什么?
Hue是一个可快速开发和调试Hadoop生态系统各种应用的一个基于浏览器的图形化用户接口。
(2)Hue能干什么?
1,访问HDFS和文件浏览
2,通过web调试和开发hive以及数据结果展示
3,查询solr和结果展示,报表生成
4,通过web调试和开发impala交互式SQL Query
5,spark调试和开发
6,Pig开发和调试
7,oozie任务的开发,监控,和工作流协调调度
8,Hbase数据查询和修改,数据展示
9,Hive的元数据(metastore)查询
10,MapReduce任务进度查看,日志追踪
11,创建和提交MapReduce,Streaming,Java job任务
12,Sqoop2的开发和调试
13,Zookeeper的浏览和编辑
14,数据库(MySQL,PostGres,SQlite,Oracle)的查询和展示
(3)Hue怎么用或者什么时候应该用?
如果你们公司用的是CDH的hadoop,那么很幸运,Hue也是出自CDH公司,自家的东西用起来当然很爽。
如果你们公司用的是Apache Hadoop或者是HDP的hadoop,那么也没事,Hue是开源的,而且支持任何版本的hadoop。
关于什么时候用,这纯属一个锦上添花的功能,你完全可以不用hue,因为各种开源项目都有自己的使用方式和开发接口,hue只不过是统一了各个项目的开发方式在一个接口里而已,这样比较方便而已,不用你一会准备使用hive,就开一个hive的cli终端,一会用pig,你就得开一个pig的grunt,或者你又想查Hbase,又得需要开一个Hbase的shell终端。如果你们使用hadoop生态系统的组件很多的情况下,使用hue还是比较方便的,另外一个好处就是hue提供了一个web的界面来开发和调试任务,不用我们再频繁登陆Linux来操作了。
你可以在任何时候,只要能上网,就可以通过hue来开发和调试数据,不用再装Linux的客户端来远程登陆操作了,这也是B/S架构的好处。
(4)如何下载,安装和编译Hue?
centos系统,执行命令:
yum install -y asciidoc cyrus-sasl-devel cyrus-sasl-gssapi gcc gcc-c++ krb5-devel libtidy libxml2-devel libxslt-devel make mysql mysql-devel openldap-devel python-devel sqlite-devel openssl-devel gmp-devel
1,hue的依赖(centos系统)
ant asciidoc cyrus-sasl-devel cyrus-sasl-gssapi gcc gcc-c++ krb5-devel libtidy (for unit tests only) libxml2-devel libxslt-devel make mvn (from maven package or maven3 tarball) mysql mysql-devel openldap-devel python-devel sqlite-devel openssl-devel (for version 7+)
2,散仙的在安装hue前,centos上已经安装好了,jdk,maven,ant,hadoop,hive,oozie等,环境变量如下:
user="search" # java export JAVA_HOME="/usr/local/jdk" export CLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib export PATH=$PATH:$JAVA_HOME/bin # ant export ANT_HOME=/usr/local/ant export CLASSPATH=$CLASSPATH:$ANT_HOME/lib export PATH=$PATH:$ANT_HOME/bin # maven export MAVEN_HOME="/usr/local/maven" export CLASSPATH=$CLASSPATH:$MAVEN_HOME/lib export PATH=$PATH:$MAVEN_HOME/bin ##Hadoop2.2的变量设置 export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin export HADOOP_HOME=/home/search/hadoop export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export YARN_HOME=$HADOOP_HOME export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin export CLASSPATH=.:$CLASSPATH:$HADOOP_COMMON_HOME:$HADOOP_COMMON_HOMEi/lib:$HADOOP_MAPRED_HOME:$HADOOP_HDFS_HOME:$HADOOP_HDFS_HOME # Hive export HIVE_HOME=/home/search/hive export HIVE_CONF_DIR=/home/search/hive/conf export CLASSPATH=$CLASSPATH:$HIVE_HOME/lib export PATH=$PATH:$HIVE_HOME/bin:$HIVE_HOME/conf export OOZIE_HOME="/home/search/oozie-4.1.0" export PATH=$PATH:$OOZIE_HOME/sbin:$OOZIE_HOME/bin
3,本文散仙主要是采用tar包的方式安装hue,除了tar包的方式,hue还能采用cm安装,当然这就与cdh的系统依赖比较大了。
hue最新的版本是3.8.1,散仙这里用的3.7.0的版本
下载地址:https://github.com/cloudera/hue/releases
hue的github地址:https://github.com/cloudera/hue
4,下载完后,解压tar包,并进入hue的根目录执行命令
make apps编译
5,编译成功后,需要配置/home/search/hue/desktop/conf/pseudo-distributed.ini文件,里面包含了hdfs,yarn,mapreduce,hive,oozie,pig,spark,solr等的ip地址和端口号配置,可根据自己的情况设置,如果没有安装某个应用,那就无须配置,只不过这个应用在web上不能使用而已,并不会影响其他框架的使用。
一个例子如下:
##################################### # DEVELOPMENT EDITION ##################################### # Hue configuration file # =================================== # # For complete documentation about the contents of this file, run # $ <hue_root>/build/env/bin/hue config_help # # All .ini files under the current directory are treated equally. Their # contents are merged to form the Hue configuration, which can # can be viewed on the Hue at # http://<hue_host>:<port>/dump_config ########################################################################### # General configuration for core Desktop features (authentication, etc) ########################################################################### [desktop] send_dbug_messages=1 # To show database transactions, set database_logging to 1 database_logging=0 # Set this to a random string, the longer the better. # This is used for secure hashing in the session store. secret_key=search # Webserver listens on this address and port http_host=0.0.0.0 http_port=8000 # Time zone name time_zone=Asia/Shanghai # Enable or disable Django debug mode ## django_debug_mode=true # Enable or disable backtrace for server error ## http_500_debug_mode=true # Enable or disable memory profiling. ## memory_profiler=false # Server email for internal error messages ## django_server_email='hue@localhost.localdomain' # Email backend ## django_email_backend=django.core.mail.backends.smtp.EmailBackend # Webserver runs as this user server_user=search server_group=search # This should be the Hue admin and proxy user default_user=search # This should be the hadoop cluster admin default_hdfs_superuser=search # If set to false, runcpserver will not actually start the web server. # Used if Apache is being used as a WSGI container. ## enable_server=yes # Number of threads used by the CherryPy web server ## cherrypy_server_threads=10 # Filename of SSL Certificate ## ssl_certificate= # Filename of SSL RSA Private Key ## ssl_private_key= # List of allowed and disallowed ciphers in cipher list format. # See http://www.openssl.org/docs/apps/ciphers.html for more information on cipher list format. ## ssl_cipher_list=DEFAULT:!aNULL:!eNULL:!LOW:!EXPORT:!SSLv2 # LDAP username and password of the hue user used for LDAP authentications. # Set it to use LDAP Authentication with HiveServer2 and Impala. ## ldap_username=hue ## ldap_password= # Default encoding for site data ## default_site_encoding=utf-8 # Help improve Hue with anonymous usage analytics. # Use Google Analytics to see how many times an application or specific section of an application is used, nothing more. ## collect_usage=true # Support for HTTPS termination at the load-balancer level with SECURE_PROXY_SSL_HEADER. ## secure_proxy_ssl_header=false # Comma-separated list of Django middleware classes to use. # See https://docs.djangoproject.com/en/1.4/ref/middleware/ for more details on middlewares in Django. ## middleware=desktop.auth.backend.LdapSynchronizationBackend # Comma-separated list of regular expressions, which match the redirect URL. # For example, to restrict to your local domain and FQDN, the following value can be used: # ^\/.*$,^http:\/\/www.mydomain.com\/.*$ ## redirect_whitelist= # Comma separated list of apps to not load at server startup. # e.g.: pig,zookeeper ## app_blacklist= # The directory where to store the auditing logs. Auditing is disable if the value is empty. # e.g. /var/log/hue/audit.log ## audit_event_log_dir= # Size in KB/MB/GB for audit log to rollover. ## audit_log_max_file_size=100MB #poll_enabled=false # Administrators # ---------------- [[django_admins]] ## [[[admin1]]] ## name=john ## email=john@doe.com # UI customizations # ------------------- [[custom]] # Top banner HTML code #banner_top_html=Search Team Hadoop Manager # Configuration options for user authentication into the web application # ------------------------------------------------------------------------ [[auth]] # Authentication backend. Common settings are: # - django.contrib.auth.backends.ModelBackend (entirely Django backend) # - desktop.auth.backend.AllowAllBackend (allows everyone) # - desktop.auth.backend.AllowFirstUserDjangoBackend # (Default. Relies on Django and user manager, after the first login) # - desktop.auth.backend.LdapBackend # - desktop.auth.backend.PamBackend # - desktop.auth.backend.SpnegoDjangoBackend # - desktop.auth.backend.RemoteUserDjangoBackend # - libsaml.backend.SAML2Backend # - libopenid.backend.OpenIDBackend # - liboauth.backend.OAuthBackend # (New oauth, support Twitter, Facebook, Google+ and Linkedin ## backend=desktop.auth.backend.AllowFirstUserDjangoBackend # The service to use when querying PAM. ## pam_service=login # When using the desktop.auth.backend.RemoteUserDjangoBackend, this sets # the normalized name of the header that contains the remote user. # The HTTP header in the request is converted to a key by converting # all characters to uppercase, replacing any hyphens with underscores # and adding an HTTP_ prefix to the name. So, for example, if the header # is called Remote-User that would be configured as HTTP_REMOTE_USER # # Defaults to HTTP_REMOTE_USER ## remote_user_header=HTTP_REMOTE_USER # Ignore the case of usernames when searching for existing users. # Only supported in remoteUserDjangoBackend. ## ignore_username_case=false # Ignore the case of usernames when searching for existing users to authenticate with. # Only supported in remoteUserDjangoBackend. ## force_username_lowercase=false # Users will expire after they have not logged in for 'n' amount of seconds. # A negative number means that users will never expire. ## expires_after=-1 # Apply 'expires_after' to superusers. ## expire_superusers=true # Configuration options for connecting to LDAP and Active Directory # ------------------------------------------------------------------- [[ldap]] # The search base for finding users and groups ## base_dn="DC=mycompany,DC=com" # URL of the LDAP server ## ldap_url=ldap://auth.mycompany.com # A PEM-format file containing certificates for the CA's that # Hue will trust for authentication over TLS. # The certificate for the CA that signed the # LDAP server certificate must be included among these certificates. # See more here http://www.openldap.org/doc/admin24/tls.html. ## ldap_cert= ## use_start_tls=true # Distinguished name of the user to bind as -- not necessary if the LDAP server # supports anonymous searches ## bind_dn="CN=ServiceAccount,DC=mycompany,DC=com" # Password of the bind user -- not necessary if the LDAP server supports # anonymous searches ## bind_password= # Pattern for searching for usernames -- Use <username> for the parameter # For use when using LdapBackend for Hue authentication ## ldap_username_pattern="uid=<username>,ou=People,dc=mycompany,dc=com" # Create users in Hue when they try to login with their LDAP credentials # For use when using LdapBackend for Hue authentication ## create_users_on_login = true # Synchronize a users groups when they login ## sync_groups_on_login=false # Ignore the case of usernames when searching for existing users in Hue. ## ignore_username_case=false # Force usernames to lowercase when creating new users from LDAP. ## force_username_lowercase=false # Use search bind authentication. ## search_bind_authentication=true # Choose which kind of subgrouping to use: nested or suboordinate (deprecated). ## subgroups=suboordinate # Define the number of levels to search for nested members. ## nested_members_search_depth=10 [[[users]]] # Base filter for searching for users ## user_filter="objectclass=*" # The username attribute in the LDAP schema ## user_name_attr=sAMAccountName [[[groups]]] # Base filter for searching for groups ## group_filter="objectclass=*" # The username attribute in the LDAP schema ## group_name_attr=cn [[[ldap_servers]]] ## [[[[mycompany]]]] # The search base for finding users and groups ## base_dn="DC=mycompany,DC=com" # URL of the LDAP server ## ldap_url=ldap://auth.mycompany.com # A PEM-format file containing certificates for the CA's that # Hue will trust for authentication over TLS. # The certificate for the CA that signed the # LDAP server certificate must be included among these certificates. # See more here http://www.openldap.org/doc/admin24/tls.html. ## ldap_cert= ## use_start_tls=true # Distinguished name of the user to bind as -- not necessary if the LDAP server # supports anonymous searches ## bind_dn="CN=ServiceAccount,DC=mycompany,DC=com" # Password of the bind user -- not necessary if the LDAP server supports # anonymous searches ## bind_password= # Pattern for searching for usernames -- Use <username> for the parameter # For use when using LdapBackend for Hue authentication ## ldap_username_pattern="uid=<username>,ou=People,dc=mycompany,dc=com" ## Use search bind authentication. ## search_bind_authentication=true ## [[[[[users]]]]] # Base filter for searching for users ## user_filter="objectclass=Person" # The username attribute in the LDAP schema ## user_name_attr=sAMAccountName ## [[[[[groups]]]]] # Base filter for searching for groups ## group_filter="objectclass=groupOfNames" # The username attribute in the LDAP schema ## group_name_attr=cn # Configuration options for specifying the Desktop Database. For more info, # see http://docs.djangoproject.com/en/1.4/ref/settings/#database-engine # ------------------------------------------------------------------------ [[database]] # Database engine is typically one of: # postgresql_psycopg2, mysql, sqlite3 or oracle. # # Note that for sqlite3, 'name', below is a a path to the filename. For other backends, it is the database name. # Note for Oracle, options={'threaded':true} must be set in order to avoid crashes. # Note for Oracle, you can use the Oracle Service Name by setting "port=0" and then "name=<host>:<port>/<service_name>". ## engine=sqlite3 ## host= ## port= ## user= ## password= ## name=desktop/desktop.db ## options={} # Configuration options for specifying the Desktop session. # For more info, see https://docs.djangoproject.com/en/1.4/topics/http/sessions/ # ------------------------------------------------------------------------ [[session]] # The cookie containing the users' session ID will expire after this amount of time in seconds. # Default is 2 weeks. ## ttl=1209600 # The cookie containing the users' session ID will be secure. # Should only be enabled with HTTPS. ## secure=false # The cookie containing the users' session ID will use the HTTP only flag. ## http_only=false # Use session-length cookies. Logs out the user when she closes the browser window. ## expire_at_browser_close=false # Configuration options for connecting to an external SMTP server # ------------------------------------------------------------------------ [[smtp]] # The SMTP server information for email notification delivery host=localhost port=25 user= password= # Whether to use a TLS (secure) connection when talking to the SMTP server tls=no # Default email address to use for various automated notification from Hue ## default_from_email=hue@localhost # Configuration options for Kerberos integration for secured Hadoop clusters # ------------------------------------------------------------------------ [[kerberos]] # Path to Hue's Kerberos keytab file ## hue_keytab= # Kerberos principal name for Hue ## hue_principal=hue/hostname.foo.com # Path to kinit ## kinit_path=/path/to/kinit # Configuration options for using OAuthBackend (Core) login # ------------------------------------------------------------------------ [[oauth]] # The Consumer key of the application ## consumer_key=XXXXXXXXXXXXXXXXXXXXX # The Consumer secret of the application ## consumer_secret=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX # The Request token URL ## request_token_url=https://api.twitter.com/oauth/request_token # The Access token URL ## access_token_url=https://api.twitter.com/oauth/access_token # The Authorize URL ## authenticate_url=https://api.twitter.com/oauth/authorize ########################################################################### # Settings to configure SAML ########################################################################### [libsaml] # Xmlsec1 binary path. This program should be executable by the user running Hue. ## xmlsec_binary=/usr/local/bin/xmlsec1 # Entity ID for Hue acting as service provider. # Can also accept a pattern where '<base_url>' will be replaced with server URL base. ## entity_id="<base_url>/saml2/metadata/" # Create users from SSO on login. ## create_users_on_login=true # Required attributes to ask for from IdP. # This requires a comma separated list. ## required_attributes=uid # Optional attributes to ask for from IdP. # This requires a comma separated list. ## optional_attributes= # IdP metadata in the form of a file. This is generally an XML file containing metadata that the Identity Provider generates. ## metadata_file= # Private key to encrypt metadata with. ## key_file= # Signed certificate to send along with encrypted metadata. ## cert_file= # A mapping from attributes in the response from the IdP to django user attributes. ## user_attribute_mapping={'uid':'username'} # Have Hue initiated authn requests be signed and provide a certificate. ## authn_requests_signed=false # Have Hue initiated logout requests be signed and provide a certificate. ## logout_requests_signed=false # Username can be sourced from 'attributes' or 'nameid'. ## username_source=attributes # Performs the logout or not. ## logout_enabled=true ########################################################################### # Settings to configure OpenId ########################################################################### [libopenid] # (Required) OpenId SSO endpoint url. ## server_endpoint_url=https://www.google.com/accounts/o8/id # OpenId 1.1 identity url prefix to be used instead of SSO endpoint url # This is only supported if you are using an OpenId 1.1 endpoint ## identity_url_prefix=https://app.onelogin.com/openid/your_company.com/ # Create users from OPENID on login. ## create_users_on_login=true # Use email for username ## use_email_for_username=true ########################################################################### # Settings to configure OAuth ########################################################################### [liboauth] # NOTE: # To work, each of the active (i.e. uncommented) service must have # applications created on the social network. # Then the "consumer key" and "consumer secret" must be provided here. # # The addresses where to do so are: # Twitter: https://dev.twitter.com/apps # Google+ : https://cloud.google.com/ # Facebook: https://developers.facebook.com/apps # Linkedin: https://www.linkedin.com/secure/developer # # Additionnaly, the following must be set in the application settings: # Twitter: Callback URL (aka Redirect URL) must be set to http://YOUR_HUE_IP_OR_DOMAIN_NAME/oauth/social_login/oauth_authenticated # Google+ : CONSENT SCREEN must have email address # Facebook: Sandbox Mode must be DISABLED # Linkedin: "In OAuth User Agreement", r_emailaddress is REQUIRED # The Consumer key of the application ## consumer_key_twitter= ## consumer_key_google= ## consumer_key_facebook= ## consumer_key_linkedin= # The Consumer secret of the application ## consumer_secret_twitter= ## consumer_secret_google= ## consumer_secret_facebook= ## consumer_secret_linkedin= # The Request token URL ## request_token_url_twitter=https://api.twitter.com/oauth/request_token ## request_token_url_google=https://accounts.google.com/o/oauth2/auth ## request_token_url_linkedin=https://www.linkedin.com/uas/oauth2/authorization ## request_token_url_facebook=https://graph.facebook.com/oauth/authorize # The Access token URL ## access_token_url_twitter=https://api.twitter.com/oauth/access_token ## access_token_url_google=https://accounts.google.com/o/oauth2/token ## access_token_url_facebook=https://graph.facebook.com/oauth/access_token ## access_token_url_linkedin=https://api.linkedin.com/uas/oauth2/accessToken # The Authenticate URL ## authenticate_url_twitter=https://api.twitter.com/oauth/authorize ## authenticate_url_google=https://www.googleapis.com/oauth2/v1/userinfo?access_token= ## authenticate_url_facebook=https://graph.facebook.com/me?access_token= ## authenticate_url_linkedin=https://api.linkedin.com/v1/people/~:(email-address)?format=json&oauth2_access_token= # Username Map. Json Hash format. # Replaces username parts in order to simplify usernames obtained # Example: {"@sub1.domain.com":"_S1", "@sub2.domain.com":"_S2"} # converts 'email@sub1.domain.com' to 'email_S1' ## username_map={} # Whitelisted domains (only applies to Google OAuth). CSV format. ## whitelisted_domains_google= ########################################################################### # Settings for the RDBMS application ########################################################################### [librdbms] # The RDBMS app can have any number of databases configured in the databases # section. A database is known by its section name # (IE sqlite, mysql, psql, and oracle in the list below). [[databases]] # sqlite configuration. ## [[[sqlite]]] # Name to show in the UI. ## nice_name=SQLite # For SQLite, name defines the path to the database. ## name=/tmp/sqlite.db # Database backend to use. ## engine=sqlite # Database options to send to the server when connecting. # https://docs.djangoproject.com/en/1.4/ref/databases/ ## options={} # mysql, oracle, or postgresql configuration. ## [[[mysql]]] # Name to show in the UI. ## nice_name="My SQL DB" # For MySQL and PostgreSQL, name is the name of the database. # For Oracle, Name is instance of the Oracle server. For express edition # this is 'xe' by default. ## name=mysqldb # Database backend to use. This can be: # 1. mysql # 2. postgresql # 3. oracle ## engine=mysql # IP or hostname of the database to connect to. ## host=localhost # Port the database server is listening to. Defaults are: # 1. MySQL: 3306 # 2. PostgreSQL: 5432 # 3. Oracle Express Edition: 1521 ## port=3306 # Username to authenticate with when connecting to the database. ## user=example # Password matching the username to authenticate with when # connecting to the database. ## password=example # Database options to send to the server when connecting. # https://docs.djangoproject.com/en/1.4/ref/databases/ ## options={} ########################################################################### # Settings to configure your Hadoop cluster. ########################################################################### [hadoop] # Configuration for HDFS NameNode # ------------------------------------------------------------------------ [[hdfs_clusters]] # HA support by using HttpFs [[[default]]] # Enter the filesystem uri fs_defaultfs=hdfs://h1:8020 # NameNode logical name. logical_name=h1 # Use WebHdfs/HttpFs as the communication mechanism. # Domain should be the NameNode or HttpFs host. # Default port is 14000 for HttpFs. webhdfs_url=http://h1:50070/webhdfs/v1 # Change this if your HDFS cluster is Kerberos-secured security_enabled=false # Default umask for file and directory creation, specified in an octal value. umask=022 hadoop_conf_dir=/home/search/hadoop/etc/hadoop # Configuration for YARN (MR2) # ------------------------------------------------------------------------ [[yarn_clusters]] [[[default]]] # Enter the host on which you are running the ResourceManager resourcemanager_host=h1 # The port where the ResourceManager IPC listens on resourcemanager_port=8032 # Whether to submit jobs to this cluster submit_to=True # Resource Manager logical name (required for HA) ## logical_name= # Change this if your YARN cluster is Kerberos-secured ## security_enabled=false # URL of the ResourceManager API resourcemanager_api_url=http://h1:8088 # URL of the ProxyServer API proxy_api_url=http://h1:8088 # URL of the HistoryServer API history_server_api_url=http://h1:19888 # HA support by specifying multiple clusters # e.g. # [[[ha]]] # Resource Manager logical name (required for HA) ## logical_name=my-rm-name # Configuration for MapReduce (MR1) # ------------------------------------------------------------------------ [[mapred_clusters]] [[[default]]] # Enter the host on which you are running the Hadoop JobTracker jobtracker_host=h1 # The port where the JobTracker IPC listens on #jobtracker_port=8021 # JobTracker logical name for HA ## logical_name= # Thrift plug-in port for the JobTracker ## thrift_port=9290 # Whether to submit jobs to this cluster submit_to=False # Change this if your MapReduce cluster is Kerberos-secured ## security_enabled=false # HA support by specifying multiple clusters # e.g. # [[[ha]]] # Enter the logical name of the JobTrackers # logical_name=my-jt-name ########################################################################### # Settings to configure the Filebrowser app ########################################################################### [filebrowser] # Location on local filesystem where the uploaded archives are temporary stored. ## archive_upload_tempdir=/tmp ########################################################################### # Settings to configure liboozie ########################################################################### [liboozie] # The URL where the Oozie service runs on. This is required in order for # users to submit jobs. Empty value disables the config check. ## oozie_url=http://localhost:11000/oozie oozie_url=http://h1:11000/oozie # Requires FQDN in oozie_url if enabled ## security_enabled=false # Location on HDFS where the workflows/coordinator are deployed when submitted. remote_deployement_dir=/user/hue/oozie/deployments ########################################################################### # Settings to configure the Oozie app ########################################################################### [oozie] # Location on local FS where the examples are stored. local_data_dir=apps/oozie/examples/ # Location on local FS where the data for the examples is stored. ## sample_data_dir=...thirdparty/sample_data # Location on HDFS where the oozie examples and workflows are stored. remote_data_dir=apps/oozie/workspaces # Maximum of Oozie workflows or coodinators to retrieve in one API call. oozie_jobs_count=100 # Use Cron format for defining the frequency of a Coordinator instead of the old frequency number/unit. ## enable_cron_scheduling=true enable_cron_scheduling=true ########################################################################### # Settings to configure Beeswax with Hive ########################################################################### [beeswax] # Host where HiveServer2 is running. # If Kerberos security is enabled, use fully-qualified domain name (FQDN). hive_server_host=h1 # Port where HiveServer2 Thrift server runs on. hive_server_port=10000 # Hive configuration directory, where hive-site.xml is located hive_conf_dir=/home/search/hive/conf # Timeout in seconds for thrift calls to Hive service server_conn_timeout=120 # Set a LIMIT clause when browsing a partitioned table. # A positive value will be set as the LIMIT. If 0 or negative, do not set any limit. browse_partitioned_table_limit=250 # A limit to the number of rows that can be downloaded from a query. # A value of -1 means there will be no limit. # A maximum of 65,000 is applied to XLS downloads. download_row_limit=1000000 # Hue will try to close the Hive query when the user leaves the editor page. # This will free all the query resources in HiveServer2, but also make its results inaccessible. ## close_queries=false # Thrift version to use when communicating with HiveServer2 ## thrift_version=5 [[ssl]] # SSL communication enabled for this server. ## enabled=false # Path to Certificate Authority certificates. ## cacerts=/etc/hue/cacerts.pem # Path to the private key file. ## key=/etc/hue/key.pem # Path to the public certificate file. ## cert=/etc/hue/cert.pem # Choose whether Hue should validate certificates received from the server. ## validate=true ########################################################################### # Settings to configure Pig ########################################################################### [pig] # Location of piggybank.jar on local filesystem. local_sample_dir=/home/search/hue/apps/pig/examples # Location piggybank.jar will be copied to in HDFS. remote_data_dir=/home/search/pig/examples ########################################################################### # Settings to configure Sqoop ########################################################################### [sqoop] # For autocompletion, fill out the librdbms section. # Sqoop server URL server_url=http://h1:12000/sqoop ########################################################################### # Settings to configure Proxy ########################################################################### [proxy] # Comma-separated list of regular expressions, # which match 'host:port' of requested proxy target. ## whitelist=(localhost|127\.0\.0\.1):(50030|50070|50060|50075) # Comma-separated list of regular expressions, # which match any prefix of 'host:port/path' of requested proxy target. # This does not support matching GET parameters. ## blacklist= ########################################################################### # Settings to configure Impala ########################################################################### [impala] # Host of the Impala Server (one of the Impalad) ## server_host=localhost # Port of the Impala Server ## server_port=21050 # Kerberos principal ## impala_principal=impala/hostname.foo.com # Turn on/off impersonation mechanism when talking to Impala ## impersonation_enabled=False # Number of initial rows of a result set to ask Impala to cache in order # to support re-fetching them for downloading them. # Set to 0 for disabling the option and backward compatibility. ## querycache_rows=50000 # Timeout in seconds for thrift calls ## server_conn_timeout=120 # Hue will try to close the Impala query when the user leaves the editor page. # This will free all the query resources in Impala, but also make its results inaccessible. ## close_queries=true # If QUERY_TIMEOUT_S > 0, the query will be timed out (i.e. cancelled) if Impala does not do any work # (compute or send back results) for that query within QUERY_TIMEOUT_S seconds. ## query_timeout_s=600 ########################################################################### # Settings to configure HBase Browser ########################################################################### [hbase] # Comma-separated list of HBase Thrift servers for clusters in the format of '(name|host:port)'. # Use full hostname with security. ## hbase_clusters=(Cluster|localhost:9090) # HBase configuration directory, where hbase-site.xml is located. ## hbase_conf_dir=/etc/hbase/conf # Hard limit of rows or columns per row fetched before truncating. ## truncate_limit = 500 # 'buffered' is the default of the HBase Thrift Server and supports security. # 'framed' can be used to chunk up responses, # which is useful when used in conjunction with the nonblocking server in Thrift. ## thrift_transport=buffered ########################################################################### # Settings to configure Solr Search ########################################################################### [search] # URL of the Solr Server solr_url=http://172.21.50.41:8983/solr/ # Requires FQDN in solr_url if enabled ## security_enabled=false ## Query sent when no term is entered ## empty_query=*:* ########################################################################### # Settings to configure Solr Indexer ########################################################################### [indexer] # Location of the solrctl binary. ## solrctl_path=/usr/bin/solrctl # Location of the solr home. ## solr_home=/usr/lib/solr # Zookeeper ensemble. ## solr_zk_ensemble=localhost:2181/solr # The contents of this directory will be copied over to the solrctl host to its temporary directory. ## config_template_path=/../hue/desktop/libs/indexer/src/data/solr_configs ########################################################################### # Settings to configure Job Designer ########################################################################### [jobsub] # Location on local FS where examples and template are stored. ## local_data_dir=..../data # Location on local FS where sample data is stored ## sample_data_dir=...thirdparty/sample_data ########################################################################### # Settings to configure Job Browser ########################################################################### [jobbrowser] # Share submitted jobs information with all users. If set to false, # submitted jobs are visible only to the owner and administrators. ## share_jobs=true ########################################################################### # Settings to configure the Zookeeper application. ########################################################################### [zookeeper] [[clusters]] [[[default]]] # Zookeeper ensemble. Comma separated list of Host/Port. # e.g. localhost:2181,localhost:2182,localhost:2183 host_ports=zk1:2181 # The URL of the REST contrib service (required for znode browsing) ## rest_url=http://localhost:9998 ########################################################################### # Settings to configure the Spark application. ########################################################################### [spark] # URL of the REST Spark Job Server. server_url=http://h1:8080/ ########################################################################### # Settings for the User Admin application ########################################################################### [useradmin] # The name of the default user group that users will be a member of ## default_user_group=default ########################################################################### # Settings for the Sentry lib ########################################################################### [libsentry] # Hostname or IP of server. ## hostname=localhost # Port the sentry service is running on. ## port=8038 # Sentry configuration directory, where sentry-site.xml is located. ## sentry_conf_dir=/etc/sentry/conf
编译好的目录如下:
-rw-rw-r-- 1 search search 2782 5月 19 06:04 app.reg -rw-rw-r-- 1 search search 2782 5月 19 05:41 app.reg.bak drwxrwxr-x 22 search search 4096 5月 20 01:05 apps drwxrwxr-x 3 search search 4096 5月 19 05:41 build drwxr-xr-x 2 search search 4096 5月 19 05:40 data drwxrwxr-x 7 search search 4096 5月 20 01:29 desktop drwxrwxr-x 2 search search 4096 5月 19 05:41 dist drwxrwxr-x 7 search search 4096 5月 19 05:40 docs drwxrwxr-x 3 search search 4096 5月 19 05:40 ext -rw-rw-r-- 1 search search 11358 5月 19 05:38 LICENSE.txt drwxrwxr-x 2 search search 4096 5月 20 01:29 logs -rw-rw-r-- 1 search search 8121 5月 19 05:41 Makefile -rw-rw-r-- 1 search search 8505 5月 19 05:41 Makefile.sdk -rw-rw-r-- 1 search search 3093 5月 19 05:40 Makefile.tarball -rw-rw-r-- 1 search search 3498 5月 19 05:41 Makefile.vars -rw-rw-r-- 1 search search 2302 5月 19 05:41 Makefile.vars.priv drwxrwxr-x 2 search search 4096 5月 19 05:41 maven -rw-rw-r-- 1 search search 801 5月 19 05:40 NOTICE.txt -rw-rw-r-- 1 search search 4733 5月 19 05:41 README.rst -rw-rw-r-- 1 search search 52 5月 19 05:38 start.sh -rw-rw-r-- 1 search search 65 5月 19 05:41 stop.sh drwxrwxr-x 9 search search 4096 5月 19 05:38 tools -rw-rw-r-- 1 search search 932 5月 19 05:41 VERSION
6,启动hue,执行命令:build/env/bin/supervisor
[search@h1 hue]$ build/env/bin/supervisor [INFO] Not running as root, skipping privilege drop starting server with options {'ssl_certificate': None, 'workdir': None, 'server_name': 'localhost', 'host': '0.0.0.0', 'daemonize': False, 'threads': 10, 'pidfile': None, 'ssl_private_key': None, 'server_group': 'search', 'ssl_cipher_list': 'DEFAULT:!aNULL:!eNULL:!LOW:!EXPORT:!SSLv2', 'port': 8000, 'server_user': 'search'}
然后我们就可以访问安装机ip+8000端口来查看了:
工具箱界面:
hive的界面:
在配置hive(散仙这里是0.13的版本)的时候,需要注意以下几个方面:
hive的metastrore的服务和hiveserver2服务都需要启动
执行下面命令
bin/hive --service metastore
bin/hiveserver2
除此之外,还需要关闭的hive的SAL认证,否则,使用hue访问会出现问题。
注意下面三项的配置
<property> <name>hive.metastore.warehouse.dir</name> <value>/user/hive/warehouse</value> <description>location of default database for the warehouse</description> </property> <property> <name>hive.server2.thrift.port</name> <value>10000</value> <description>Port number of HiveServer2 Thrift interface. Can be overridden by setting $HIVE_SERVER2_THRIFT_PORT</description> </property> <property> <name>hive.server2.thrift.bind.host</name> <value>h1</value> <description>Bind host on which to run the HiveServer2 Thrift interface. Can be overridden by setting $HIVE_SERVER2_THRIFT_BIND_HOST</description> </property> <property> <name>hive.server2.authentication</name> <value>NOSASL</value> <description> Client authentication types. NONE: no authentication check LDAP: LDAP/AD based authentication KERBEROS: Kerberos/GSSAPI authentication CUSTOM: Custom authentication provider (Use with property hive.server2.custom.authentication.class) PAM: Pluggable authentication module. </description> </property>
除了上面的配置外,还需要把hive.server2.long.polling.timeout的参数值,默认是5000L给改成5000,否则使用beenline连接时候,会出错,这是hive的一个bug。
pig的界面:
solr的界面如下:
最后需要注意一点,hue也需要在hadoop的core-site.xml里面配置相应的代理用户,示例如下:
<property> <name>hadoop.proxyuser.hue.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.hue.groups</name> <value>*</value> </property>
ok至此,我们的hue已经能完美工作了,我们可以根据自己的需要,定制相应的app插件,非常灵活!
发表评论
-
Apache Flink在阿里的使用(译)
2019-02-21 21:18 1168Flink是未来大数据实时 ... -
计算机图形处理的一些知识
2018-04-25 17:46 1227最近在搞opencv来做一些 ... -
如何在kylin中构建一个cube
2017-07-11 19:06 1260前面的文章介绍了Apache Kylin的安装及数据仓 ... -
Apache Kylin的入门安装
2017-06-27 21:27 2142Apache Kylin™是一个开源的分布式分析引擎,提供 ... -
ES-Hadoop插件介绍
2017-04-27 18:07 1989上篇文章,写了使用spark集成es框架,并向es写入数据,虽 ... -
如何在Scala中读取Hadoop集群上的gz压缩文件
2017-04-05 18:51 2128存在Hadoop集群上的文件,大部分都会经过压缩,如果是压缩 ... -
如何收集项目日志统一发送到kafka中?
2017-02-07 19:07 2787上一篇(http://qindongliang.iteye. ... -
Hue+Hive临时目录权限不够解决方案
2016-06-14 10:40 4696安装Hue后,可能会分配多个账户给一些业务部门操作hive,虽 ... -
Hadoop的8088页面失效问题
2016-03-31 11:21 4445前两天重启了测试的hadoop集群,今天访问集群的8088任 ... -
Hadoop+Hbase集群数据迁移问题
2016-03-23 21:00 2520数据迁移或备份是任何 ... -
如何监控你的Hadoop+Hbase集群?
2016-03-21 16:10 4916前言 监控hadoop的框架 ... -
Logstash与Kafka集成
2016-02-24 18:44 11614在ELKK的架构中,各个框架的角色分工如下: Elastic ... -
Kakfa集群搭建
2016-02-23 15:36 2642先来整体熟悉下Kafka的一些概念和架构 (一)什么是Ka ... -
大数据日志收集框架之Flume入门
2016-02-02 14:25 4184Flume是Cloudrea公司开源的一款优秀的日志收集框架 ... -
Apache Tez0.7编译笔记
2016-01-15 16:33 2508目前最新的Tez版本是0.8,但还不是稳定版,所以大家还 ... -
Bug死磕之hue集成的oozie+pig出现资源任务死锁问题
2016-01-14 15:52 3828这两天,打算给现有的 ... -
Hadoop2.7.1和Hbase0.98添加LZO压缩
2016-01-04 17:46 25991,执行命令安装一些依赖组件 yum install -y ... -
Hadoop2.7.1配置NameNode+ResourceManager高可用原理分析
2015-11-11 19:51 3176关于NameNode高可靠需要配置的文件有core-site ... -
设置Hadoop+Hbase集群pid文件存储位置
2015-10-20 13:40 2846有时候,我们对运行几 ... -
Hadoop+Maven项目打包异常
2015-08-11 19:36 1561先简单说下业务:有一个单独的模块,可以在远程下载Hadoop上 ...
相关推荐
针对本次实验,我们需要用到Hadoop集群作为模拟大数据的分析软件,集群环境必须要包括,hdfs,hbase,hive,flume,sqoop等插件,最后结合分析出来的数据进行可视化展示,需要用到Python(爬取数据集,可视化展示)...
基于Hadoop Hive健身馆可视化分析平台项目源码+数据库文件.zip启动方式 环境启动 hadoop hive2元数据库 sql导入 导入hivesql脚本,修改application.yml 启动主程序 HadoopApplication 基于Hadoop Hive健身馆可视化...
【标题】:“Hadoop本地可视化界面的设计及基本实现HDFSViewer.rar”主要涉及的是使用Java开发的一个基于SWT(Standard Widget Toolkit)的工具,用于在本地提供一个可视化的Hadoop Distributed File System (HDFS) ...
可视化是一个重要的途径,它能够帮助大数据获得完整的数据图表并挖掘数据的价值,大数据分析离不开可视化这一工具的推动。 基于hadoop和echarts的教育大数据可视化系统,以B/S模式开发。通过Hadoop中Sqoop进行数据...
基于Hadoop实现大数据可视化分析的Web系统源码+项目说明+sql数据库.zip 1.本项目利用Hadoop处理高校无线定位大数据,有效地将位置信息应用于学生时空行为模式挖掘,建立基于精准位置信息的行为数据挖掘计算模型。 2....
【标题】"基于气象分析的hadoop可视化平台"是一个利用大数据处理技术和可视化工具来解析和展示气象数据的项目。这个项目特别关注了2022年的温度、空气质量、降水量和湿度这四个关键气象指标。 【描述】描述了该项目...
基于Hadoop的疫情可视化分析系统项目源码(期末大作业).zip 该项目是个人大作业项目源码,评审分达到95分以上,都经过严格调试,确保可以运行!放心下载使用。 基于Hadoop的疫情可视化分析系统项目源码(期末大...
hadoop-基于hive的聊天数据分析可视化案例数据源
此外,毕竟是大数据分析,所以我在报告里面很详细地写了,通过可视化图表能分析出什么结论等等…… 我也拿着这个作品去参加了很多的比赛,目前拿到了校内比赛“大数据应用大赛”的一等奖,还有另外一个比赛的三等奖...
基于Hadoop的成绩分析系统 本文档介绍了基于Hadoop的成绩分析系统的设计和实现。Hadoop是一个分布式开源计算平台,具有高可靠性、高扩展性、高效性和高容错性等特点。该系统使用Hadoop的分布式文件系统HDFS和...
基于Hadoop的疫情分析可视化项目源码(95分以上大作业项目).zip 可作为期末大作业和课程设计,纯手打高分项目,代码完整下载可用,小白也可实战。 基于Hadoop的疫情分析可视化项目源码(95分以上大作业项目).zip...
本文将深入探讨“Hadoop之外卖订单数据分析系统”,并介绍如何利用Hadoop进行大规模数据处理,以及如何将分析结果通过可视化手段进行展示。 首先,我们需要理解Hadoop的核心组件:HDFS(Hadoop Distributed File ...
项目源码:基于Hadoop+Spark招聘推荐可视化系统 大数据项目 计算机毕业设计 基于Hadoop+Spark的招聘推荐可视化系统是一种利用Hadoop和Spark等大数据处理技术,实现招聘推荐和可视化展示的应用系统。以下是该系统的...
幸运的是,一系列高级工具应运而生,旨在简化Hadoop的汇报、分析、可视化、集成和开发过程,使得企业和组织能够更有效地利用大数据。 1. **Jaspersoft BI Suite**:Jaspersoft是最优秀的开源报表生成工具之一,其...
【基于Hadoop平台的大数据可视化分析实现与应用】 本文是一篇深入探讨Hadoop架构在大数据处理和分析领域的原创学士学位毕业论文,适用于计算机科学与技术、软件工程等相关专业的本科和专科毕业生。Hadoop作为开源的...
针对气象分析大屏可视化等问题,对气象进行研究分析,然后开发设计出气象分析大屏可视化系统以解决问题。 气象分析大屏可视化系统主要功能模块包括后台首页,系统用户(管理员),模块管理(日照时数,平均相对湿度...
6. **结果可视化**:将分析结果通过Tableau、QlikView等工具进行可视化,以便于理解和决策。 Hadoop和其生态系统为大数据分析提供了强大的基础架构,使得企业能够处理前所未有的数据量,发现潜在的业务洞察,并优化...
从搭建高可用Hadoop集群到利用SparkStreaming进行实时数据分析,再到利用可视化工具实现信息的直观呈现,整个系统体现了大数据技术在新闻行业应用的潜力和价值。通过该平台的设计和实现,用户可以摆脱传统智能推荐...
此外,利用Python或Echarts等工具,可以将分析结果进行可视化展示,以便更好地理解和解释数据。 通过这个实验,我们可以学习如何运用Hadoop生态系统的组件进行大数据处理,理解Hive在数据仓库中的作用,以及如何...