DevOps(5)Spark Deployment on VM

sillycat

浏览: 2566664 次
性别:
来自: 成都

最近访客更多访客>>

huageng520

learnmore

u012363178

ymgjava

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

Distributed

DevOps(5)Spark Deployment on VM

1. Old Environment

1.1 Jdk

java version "1.6.0_45"

Switch version on ubuntu system.

>sudo update-alternatives --config java

Set up ubuntu JAVA_HOME

>vi ~/.profile

export JAVA_HOME="/usr/lib/jvm/java-6-oracle"

Java Compile Version Problem

[warn] Error reading API from class file : java.lang.UnsupportedClassVersionError: com/digby/localpoint/auth/util/Base64$OutputStream : Unsupported major.minor version 51.0

>sudo update-alternatives --config java

>sudo update-alternatives --config javac

1.2 Cassandra

cassandra 1.2.13 version

http://archive.apache.org/dist/cassandra/1.2.13/ls -

> sudo mkdir -p /var/log/cassandra

> sudo chown -R carl /var/log/cassandra

carl is my username

> sudo mkdir -p /var/lib/cassandra

> sudo chown -R carl /var/lib/cassandra

Change the config if needed, start the cassandra single mode

> cassandra -f conf/cassandra.yaml

Test that from client

> cassandra-cli -host ubuntu-dev1 -port 9160

Setup the multiple nodes, Config changes

listen_address: ubuntu-dev1

- class_name: org.apache.cassandra.locator.SimpleSeedProvider

parameters:

- seeds: "ubuntu-dev1,ubuntu-dev2"

Change that on both nodes on ubuntu-dev1, ubuntu-dev2.

Start the 2 nodes in backend

> nohup cassandra -f conf/cassandra.yaml &

Verify that the cluster is working

> nodetool -h ubuntu-dev1 ring

Datacenter: datacenter1
==========
Address         Rack        Status State Load            Owns                Token
                                                                               7068820527558753619
10.190.191.195 rack1       Up     Normal 132.34 KB       36.12%              -4714763636920163240

10.190.190.190 rack1 Up Normal 65.18 KB 63.88% 7068820527558753619

1.3 Spark

https://spark.apache.org/downloads.html

I am choosing this old version.

spark-0.9.0-incubating-bin-hadoop1.tgz

Place that in the right place.

Set up the access across among the masters and slaves.

On Master

> ssh-keygen -t rsa

> cat ~/.ssh/id_rsa.pub

On slave

> mkdir ~/.ssh

> vi ~/.ssh/authorized_keys

Put the public key from rsa.pub

Config the Spark file here /opt/spark/conf/spark-env.sh

SCALA_HOME=/opt/scala/scala-2.10.3
SPARK_WORKER_MEMORY=512m
#SPARK_CLASSPATH='/opt/localpoint-profiles-spark/*jar'
#SPARK_JAVA_OPTS="-Dbuild.env=lmm.sdprod"

USER=carl

/opt/spark/conf/slaves

ubuntu-dev1

ubuntu-dev2

Command to start the Spark Server

>sbin/start-all.sh

Spark single mode Command

>java -Dbuild.env=sillycat.dev cp /opt/YOU_PROJECT/lib/*.jar com.sillycat.YOUR_CLASS

>java -Dbuild.env=sillycat.dev -Dsparkcontext.Master=“spark://YOURSERVER:7070” cp /opt/YOU_PROJECT/lib/*.jar com.sillycat.YOUR_CLASS

Visit the homepage for Spark Master

http://ubuntu-master:8080/

3. Prepare Mysql

>sudo apt-get install software-properties-common

>sudo add-apt-repository ppa:ondrej/mysql-5.6
>sudo apt-get update

>sudo apt-get install mysql-server

Command to create the database and set up the password

>use mysql;

>grant all privileges on test.* to root@"%" identified by 'kaishi';

>flush privileges;

on the client, maybe only install mysql client

>sudo apt-get install mysql-client-core-5.6

Change the bind address in sudo vi /etc/mysql/my.cnf

bind-address = 127.0.0.1

>sudo service mysql stop

>sudo service mysql start

4. Install Grails

Download from here, I am using an old version.

>wget http://dist.springframework.org.s3.amazonaws.com/release/GRAILS/grails-1.3.7.zip

5. Install tomcat on Master

>wget http://apache.cs.utah.edu/tomcat/tomcat-7/v7.0.57/bin/apache-tomcat-7.0.57.tar.gz

Config the database in this file, TOMCAT_HOME/conf/context.xml

    <Resource name="jdbc/lmm" auth="Container" type="javax.sql.DataSource"
              maxIdle="30" maxWait="-1" maxActive="100"
              factory="org.apache.tomcat.jdbc.pool.DataSourceFactory"
              testOnBorrow="true"
              validationQuery="select 1"
              logAbandoned="true"
              username="root"
              password="kaishi"
              driverClassName="com.mysql.jdbc.Driver"

url="jdbc:mysql://localhost:3306/lmm?autoReconnect=true&useServerPrepStmts=false&rewriteBatchedStatements=true"/>

Download and place the right mysql driver

> ls -l lib | grep mysql

-rw-r--r-- 1 carl carl 786484 Dec 10 09:30 mysql-connector-java-5.1.16.jar

Change the config to avoid OutOfMemoryError

> vi bin/catalina.sh

JAVA_OPTS="$JAVA_OPTS -Xms2048m -Xmx2048m -XX:PermSize=256m -XX:MaxPermSize=512m"

6. Running Assembly Jar File

build the assembly jar and place in the lib directory, create a shell file in the bin directory

> cat bin/startup.sh
#!/bin/bash

java -Xms512m -Xmx1024m -Dbuild.env=lmm.sparkvm -Dspray.can.server.request-timeout=300s -Dspray.can.server.idle-timeout=360s -cp /opt/YOUR_MODULE/lib/*.jar com.sillycat,YOUPACKAGE.YOUMAINLCASS

Setup the Bouncy Castle Jar

>cd /usr/lib/jvm/java-6-oracle/jre/lib/ext

> sudo wget http://repo1.maven.org/maven2/org/bouncycastle/bcprov-jdk16/1.46/bcprov-jdk16-1.46.jar

>cd /usr/lib/jvm/java-6-oracle/jre/lib/security

>sudo vi java.security

security.provider.9=org.bouncycastle.jce.provider.BouncyCastleProvider

7. JCE Problem

http://sillycat.iteye.com/blog/2089322

download file jce_policy-6.zip from http://www.oracle.com/technetwork/java/javase/downloads/jce-6-download-429243.html

Unzip the file and place the jar into this directory.

8. Command to Check data in cqlsh

Connect to cassandra

> cqlsh localhost 9160

Check the key space

cqlsh> select * from system.schema_keyspaces;

Check the version

cqlsh> show version

[cqlsh 3.1.8 | Cassandra 1.2.13 | CQL spec 3.0.0 | Thrift protocol 19.36.2]

Use the key space, something like database;

cqlsh> use device_lookup;

check the table

cqlsh:device_lookup> select count(*) from profile_devices limit 300000;

During testing, if need to clear the data

delete from profile_devices where deviceid = 'ios1009528' and brandcode = 'spark' and profileid = 5;

delete from profile_devices where brandcode = 'spark' and profileid = 5;

Deployment Option One

1 Put a serialize class there.

package com.sillycat.easyspark.profile

import com.sillycat.easyspark.model.Attributes
import org.apache.spark.serializer.KryoRegistrator

import com.esotericsoftware.kryo.Kryo

import com.sillycat.easyspark.model.Profile

class ProfileKryoRegistrator extends KryoRegistrator {

  override def registerClasses(kryo: Kryo) {
   kryo.register(classOf[Attributes])

kryo.register(classOf[Profile])

}

Change the configuration and start SparkContent part as follow:

val config = ConfigFactory.load()

val conf = new SparkConf()
conf.setMaster(config.getString("sparkcontext.Master"))
conf.setAppName("Profile Device Update")

conf.setSparkHome(config.getString("sparkcontext.Home"))
if (config.hasPath("jobJar")) {
  conf.setJars(List(config.getString("jobJar")))
} else {
  conf.setJars(SparkContext.jarOfClass(this.getClass).toSeq)

}

conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")

conf.set("spark.kryo.registrator", “com.sillycat.easyspark.profile.ProfileKryoRegistrator")

val sc = new SparkContext(conf)

It works.

Tips

1. Command to Unzip the jar file

>jar xf jar-file

References:

cassandra

http://sillycat.iteye.com/blog/1870661

http://sillycat.iteye.com/blog/2011991

http://sillycat.iteye.com/blog/2011992

spark

http://sillycat.iteye.com/blog/2103288

http://sillycat.iteye.com/blog/2083193

http://sillycat.iteye.com/blog/1871204

ubuntu server

http://sillycat.iteye.com/blog/2090147

grails

http://sillycat.iteye.com/blog/562774

http://sillycat.iteye.com/blog/1058726

bouncy castle

http://sillycat.iteye.com/blog/2083195

tomcat out of memory

http://sillycat.iteye.com/blog/564052

Tips

https://docs.oracle.com/javase/tutorial/deployment/jar/unpack.html

Spark Trouble Shooting

http://spark.apache.org/docs/1.0.0/tuning.html

分享到：

DevOps(6)Spark Deployment on VM 2 | Play Raspberry Pi(1)System and nodeJS/ng ...

2014-12-16 06:12
浏览 1445
评论(0)
分类:企业架构
查看更多

发表评论

您还没有登录,请您登录后再发表评论

相关推荐

云平台及大数据建设方案.pptx: 自动化工具如CI/CD（Continuous Integration and Continuous Deployment）和DevOps实践被用来提升开发效率，缩短产品上线时间，同时确保应用的稳定性和可靠性。云化应用与传统应用相比，具有诸多优势，例如能够...

手撕源码C++哈希表实现：从底层原理到性能优化，看完面试官都怕你！（文末附源码）: 哈希表源码

sun_3ck_03_0119.pdf: sun_3ck_03_0119

MATLAB实现基于LSTM-AdaBoost长短期记忆网络结合AdaBoost时间序列预测（含模型描述及示例代码）: 内容概要：本文档详细介绍了基于 MATLAB 实现的 LSTM-AdaBoost 时间序列预测模型，涵盖项目背景、目标、挑战、特点、应用领域以及模型架构和代码示例。随着大数据和AI的发展，时间序列预测变得至关重要。传统方法如 ARIMA 在复杂非线性序列中表现欠佳，因此引入了 LSTM 来捕捉长期依赖性。但 LSTM 存在易陷局部最优、对噪声鲁棒性差的问题，故加入 AdaBoost 提高模型准确性和鲁棒性。两者结合能更好应对非线性和长期依赖的数据，提供更稳定的预测。项目还展示了如何在 MATLAB 中具体实现模型的各个环节。适用人群：对时间序列预测感兴趣的开发者、研究人员及学生，特别是有一定 MATLAB 编程经验和熟悉深度学习或机器学习基础知识的人群。使用场景及目标：①适用于金融市场价格预测、气象预报、工业生产故障检测等多种需要时间序列分析的场合；②帮助使用者理解并掌握将LSTM与AdaBoost结合的实现细节及其在提高预测精度和抗噪方面的优势。其他说明：尽管该模型有诸多优点，但仍存在训练时间长、计算成本高等挑战。文中提及通过优化数据预处理、调整超参数等方式改进性能。同时给出了完整的MATLAB代码实现，便于学习与复现。

免费1996-2019年各地级市平均工资数据: 1996-2019年各地级市平均工资数据 1、时间：1996-2019年 2、来源：城市nj、各地级市统计j 3、指标：平均工资（在岗职工） 4、范围：295个地级市

[AB PLC例程源码][MMS_040384]Winder Application.zip: AB PLC例程代码项目案例【备注】 1、该资源内项目代码都经过测试运行成功，功能ok的情况下才上传的，请放心下载使用！有问题请及时沟通交流。 2、适用人群：计算机相关专业(如计科、信息安全、数据科学与大数据技术、人工智能、通信、物联网、自动化、电子信息等)在校学生、专业老师或者企业员工下载使用。 3、用途：项目具有较高的学习借鉴价值，不仅适用于小白学习入门进阶。也可作为毕设项目、课程设计、大作业、初期项目立项演示等。 4、如果基础还行，或热爱钻研，亦可在此项目代码基础上进行修改添加，实现其他不同功能。欢迎下载！欢迎交流学习！不清楚的可以私信问我！

C2Former: 解决RGB-红外物体检测中模态校准与融合不精确问题的标定互补变压器: 内容概要：本文介绍了一种新颖的变压器模型C2Former（Calibrated and Complementary Transformer），专门用于解决RGB图像和红外图像之间的物体检测难题。传统方法在进行多模态融合时面临两个主要问题——模态错位（Modality miscalibration）和融合不准确（fusion imprecision）。作者针对这两个问题提出采用互模交叉注意力模块（Inter-modality Cross-Attention, ICA）以及自适应特征采样模块（Adaptive Feature Sampling, AFS）来改善。具体来说，ICA可以获取对齐并且互补的特性，在特征层面进行更好的整合；而AFS则减少了计算成本。通过实验验证了基于C2Former的一阶段和二阶段检测器均能在现有公开数据集上达到最先进的表现。适合人群：计算机视觉领域的研究人员和技术人员，特别是从事跨模态目标检测的研究人员，对Transformer架构有一定了解的开发者。使用场景及目标：适用于需要将可见光和热成像传感器相结合的应用场合，例如全天候的视频监控系统、无人驾驶汽车、无人

上海人工智能实验室：金融大模型应用评测报告-摘要版2024.pdf: 上海人工智能实验室：金融大模型应用评测报告-摘要版2024.pdf

malpass_02_0907.pdf: malpass_02_0907

C++-自制学习辅助工具: C++-自制学习辅助工具

微信生态系统开发指南：涵盖机器人、小程序及公众号的技术资源整合: 内容概要：本文提供了有关微信生态系统的综合开发指导，具体涵盖了微信机器人的Java与Python开发、全套及特定应用的小程序源码(PHP后台、DeepSeek集成)，以及微信公众号的基础开发与智能集成方法。文中不仅给出了各种应用的具体案例和技术要点如图灵API对接、DeepSeek大模型接入等的简述，还指出了相关资源链接以便深度探究或直接获取源码进行开发。适合人群：有意开发微信应用程序或提升相应技能的技术爱好者和专业人士。不论是初涉者寻求基本理解和操作流程，还是进阶者期望利用提供的资源进行项目构建或是研究。使用场景及目标：开发者能够根据自身兴趣选择不同方向深入学习微信平台的应用创建，如社交自动化（机器人）、移动互联网服务交付（小程序），或者公众信息服务（公众号）。特别是想要尝试引入AI能力到应用中的人士，文中介绍的内容非常有价值。其他说明：文中提及的多个项目都涉及到了最新技术栈（如DeepSeek大模型），并且为不同层次的学习者提供从零开始的详细资料。对于那些想要迅速获得成果同时深入了解背后原理的人来说是个很好的起点。

pimpinella_3cd_01_0916.pdf: pimpinella_3cd_01_0916

mellitz_3cd_01_0516.pdf: mellitz_3cd_01_0516

schube_3cd_01_0118.pdf: schube_3cd_01_0118

[AB PLC例程源码][MMS_046683]ME Faceplates for 1738 Digital and Analog I-O with Descriptions.zip: AB PLC例程代码项目案例【备注】 1、该资源内项目代码都经过测试运行成功，功能ok的情况下才上传的，请放心下载使用！有问题请及时沟通交流。 2、适用人群：计算机相关专业(如计科、信息安全、数据科学与大数据技术、人工智能、通信、物联网、自动化、电子信息等)在校学生、专业老师或者企业员工下载使用。 3、用途：项目具有较高的学习借鉴价值，不仅适用于小白学习入门进阶。也可作为毕设项目、课程设计、大作业、初期项目立项演示等。 4、如果基础还行，或热爱钻研，亦可在此项目代码基础上进行修改添加，实现其他不同功能。欢迎下载！欢迎交流学习！不清楚的可以私信问我！

[AB PLC例程源码][MMS_040371]Communication between CompactLogix Controllers on DeviceNet.zip: AB PLC例程代码项目案例【备注】 1、该资源内项目代码都经过测试运行成功，功能ok的情况下才上传的，请放心下载使用！有问题请及时沟通交流。 2、适用人群：计算机相关专业(如计科、信息安全、数据科学与大数据技术、人工智能、通信、物联网、自动化、电子信息等)在校学生、专业老师或者企业员工下载使用。 3、用途：项目具有较高的学习借鉴价值，不仅适用于小白学习入门进阶。也可作为毕设项目、课程设计、大作业、初期项目立项演示等。 4、如果基础还行，或热爱钻研，亦可在此项目代码基础上进行修改添加，实现其他不同功能。欢迎下载！欢迎交流学习！不清楚的可以私信问我！

[AB PLC例程源码][MMS_046507]SE Faceplates for 1797 Digital and Analog I-O.zip: AB PLC例程代码项目案例【备注】 1、该资源内项目代码都经过测试运行成功，功能ok的情况下才上传的，请放心下载使用！有问题请及时沟通交流。 2、适用人群：计算机相关专业(如计科、信息安全、数据科学与大数据技术、人工智能、通信、物联网、自动化、电子信息等)在校学生、专业老师或者企业员工下载使用。 3、用途：项目具有较高的学习借鉴价值，不仅适用于小白学习入门进阶。也可作为毕设项目、课程设计、大作业、初期项目立项演示等。 4、如果基础还行，或热爱钻研，亦可在此项目代码基础上进行修改添加，实现其他不同功能。欢迎下载！欢迎交流学习！不清楚的可以私信问我！

智慧用电平台建设解决方案【28页】.pptx: 智慧用电平台建设解决方案【28页】

lusted_3ck_01_0519.pdf: lusted_3ck_01_0519

HCIP作业1 这里面是完成的ensp的拓扑图: HCIP作业1 这里面是完成的ensp的拓扑图

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论