- 浏览: 2551341 次
- 性别:
- 来自: 成都
文章分类
最新评论
-
nation:
你好,在部署Mesos+Spark的运行环境时,出现一个现象, ...
Spark(4)Deal with Mesos -
sillycat:
AMAZON Relatedhttps://www.godad ...
AMAZON API Gateway(2)Client Side SSL with NGINX -
sillycat:
sudo usermod -aG docker ec2-use ...
Docker and VirtualBox(1)Set up Shared Disk for Virtual Box -
sillycat:
Every Half an Hour30 * * * * /u ...
Build Home NAS(3)Data Redundancy -
sillycat:
3 List the Cron Job I Have>c ...
Build Home NAS(3)Data Redundancy
Classification(2)NLP and Classifier Implementation
1. Generate the FeatureMap
NLP - Natural Language Processing
remove the noise, remove the html tag, remove the stop word(for example, of, a in English, 的,啊in Chinese)
stem(change the stopped to stop),
NLP for Chinese
https://github.com/xpqiu/fnlp/
NLP for English
Stanford
http://nlp.stanford.edu/software/index.shtml
http://nlp.stanford.edu/software/corenlp.shtml
http://nlp.stanford.edu/software/segmenter.shtml
http://nlp.stanford.edu/software/tagger.shtml
http://nlp.stanford.edu/software/CRF-NER.shtml
http://nlp.stanford.edu/software/lex-parser.shtml
http://nlp.stanford.edu/software/classifier.shtml
apache NLP
http://opennlp.apache.org/
Remove Stop Word
One source for Stop Workd
https://raw.githubusercontent.com/muhammad-ahsan/WebSentiment/master/mit-Stopwords.txt
PorterStemmer
convert the ‘ate’ -> ‘eat’ and etc.
coalesce function in Spark
decrease the number of partitions in the RDD to numParitions.
TF-IDF
http://spark.apache.org/docs/latest/mllib-feature-extraction.html#tf-idf
Term Frequency- Inverse Document Frequency
Denote a term by t, a document by d, and the corpus by D. Term frequency TF(t,d) is the number of times that term t appears in document d.
The document frequency DF(t,D) is the number of documents that contains term t.
Inverse document frequency is a numerical measure of how much information a term provides:
IDF(t,D) = log ((|D| + 1) / (DF(t, D) + 1))
|D| is the total number of documents in the corpus.
DF = String / Int
IDF = String, Double = LogValue
IDFSwithIndex = String —> ( Double, Index)
2. Generate Training Data
It seems to me that zeppelin can load the jar from remote
z.load("com.amazonaws:aws-java-sdk:1.10.4.1")
Amazon S3 Operation
import com.amazonaws.services.s3._
import com.amazonaws.services.s3.model._
import com.amazonaws.services.s3.transfer.TransferManager
import com.amazonaws.services.s3.transfer.Upload
/**
* Upload a file to S3
*/
def uploadToS3(client: AmazonS3Client, bucket: String, key: String, file: File): Unit = {
val tm = new TransferManager()
val upload = tm.upload(bucket, key, file)
upload.waitForCompletion()
}
/**
* Read a file's contents from S3
*/
def readFileContentsFromS3(client: AmazonS3Client, bucket: String, key: String): String = {
val getObjectRequest = new GetObjectRequest(bucket, key)
val responseHeaders = new ResponseHeaderOverrides()
responseHeaders.setCacheControl("No-cache")
getObjectRequest.setResponseHeaders(responseHeaders)
val objectStream = client.getObject(getObjectRequest).getObjectContent()
scala.io.Source.fromInputStream(objectStream).getLines().mkString("\n")
}
FeatureMap and Job
FeatureMap will read the features files.
Job will parse the raw data from xml to object. GetFeatures.
BinaryFeatureExtractor
Local Vector
Vectors.sparse(size, sortedElems)
Calculate and upload the binary label to the S3
TFFeatureExtractor
TFIDFFeatureExtractor
TFIDF(t,d,D) = TF(t,d)*IDF(t,D)
3. Classifier
UniformFoldingMechanism
validation codes blog
val msg = (positive, negative) match {
case _ if folds <= 0 =>
s"Invalid number of folds ($folds); Must be a positive integer."
case _ if negative.isEmpty || positive.isEmpty =>
"Insufficient number of samples " +
s"(# positive: ${positive.size}, # negative: ${negative.size})!"
case _ if positive.size < folds =>
s"Insufficient number of positive samples (${positive.size}); " +
s"Must be >= number of folds ($folds)!"
case _ if negative.size < folds =>
s"Insufficient number of negative samples (${negative.size}); " +
s"Must be >= number of folds ($folds)!"
case _ =>
""
}
isNullOrEmpty(msg) match {
case false =>
logger.error("Fold validation failed!")
Some(new RuntimeException(msg))
case true =>
logger.info("Fold validation succeeded!")
None
}
Merge the data and format them.
KFoldCrossValidator
Generate the TrainableSVM ——> TrainedSVM
Validate —> ModelMetrics
Scala Tips:
1. String Tail and Init
scala> val s = "123456"
s: String = 123456
scala> val s1 = s.tail
s1: String = 23456
scala> val s2 = s.init
s2: String = 12345
2. Tuple2
scala> val stuff = (42, "fish")
stuff: (Int, String) = (42,fish)
scala> stuff.getClass
res2: Class[_ <: (Int, String)] = class scala.Tuple2
scala>
scala> stuff._1
res3: Int = 42
scala> stuff._2
res4: String = fish
3. Scala Shuffle
scala> util.Random.shuffle(List(1, 2, 3, 4, 5, 6, 7, 8, 9))
res8: List[Int] = List(7, 1, 3, 9, 5, 8, 2, 6, 4)
scala> util.Random.shuffle(List(1, 2, 3, 4, 5, 6, 7, 8, 9))
res9: List[Int] = List(5, 1, 2, 6, 9, 4, 8, 7, 3)
4. Scala Grouped
scala> List(1,2,3,4,5,6,7,8,9,10,11,12,13).grouped(4).toList
res11: List[List[Int]] = List(List(1, 2, 3, 4), List(5, 6, 7,, List(9, 10, 11, 12), List(13))
5. Scala List Zip
scala> List(1,2,3).zip(List("one","two","three"))
res12: List[(Int, String)] = List((1,one), (2,two), (3,three))
scala> List(1,2,3).zip(List("one","two","three", "four"))
res13: List[(Int, String)] = List((1,one), (2,two), (3,three))
6. List Operation
scala> val s1 = List(1, 2, 3, 4, 5, 6, 7).splitAt(3)
s1: (List[Int], List[Int]) = (List(1, 2, 3),List(4, 5, 6, 7))
scala> val t1 = s1._1.last
t1: Int = 3
scala> val t2 = s1._1.init
t2: List[Int] = List(1, 2)
scala> val t2 = s1._2
t2: List[Int] = List(4, 5, 6, 7)
References:
http://www.fnlp.org/archives/4231
example
http://www.cnblogs.com/linlu1142/p/3292982.html
http://fuhao-987.iteye.com/blog/891697
1. Generate the FeatureMap
NLP - Natural Language Processing
remove the noise, remove the html tag, remove the stop word(for example, of, a in English, 的,啊in Chinese)
stem(change the stopped to stop),
NLP for Chinese
https://github.com/xpqiu/fnlp/
NLP for English
Stanford
http://nlp.stanford.edu/software/index.shtml
http://nlp.stanford.edu/software/corenlp.shtml
http://nlp.stanford.edu/software/segmenter.shtml
http://nlp.stanford.edu/software/tagger.shtml
http://nlp.stanford.edu/software/CRF-NER.shtml
http://nlp.stanford.edu/software/lex-parser.shtml
http://nlp.stanford.edu/software/classifier.shtml
apache NLP
http://opennlp.apache.org/
Remove Stop Word
One source for Stop Workd
https://raw.githubusercontent.com/muhammad-ahsan/WebSentiment/master/mit-Stopwords.txt
PorterStemmer
convert the ‘ate’ -> ‘eat’ and etc.
coalesce function in Spark
decrease the number of partitions in the RDD to numParitions.
TF-IDF
http://spark.apache.org/docs/latest/mllib-feature-extraction.html#tf-idf
Term Frequency- Inverse Document Frequency
Denote a term by t, a document by d, and the corpus by D. Term frequency TF(t,d) is the number of times that term t appears in document d.
The document frequency DF(t,D) is the number of documents that contains term t.
Inverse document frequency is a numerical measure of how much information a term provides:
IDF(t,D) = log ((|D| + 1) / (DF(t, D) + 1))
|D| is the total number of documents in the corpus.
DF = String / Int
IDF = String, Double = LogValue
IDFSwithIndex = String —> ( Double, Index)
2. Generate Training Data
It seems to me that zeppelin can load the jar from remote
z.load("com.amazonaws:aws-java-sdk:1.10.4.1")
Amazon S3 Operation
import com.amazonaws.services.s3._
import com.amazonaws.services.s3.model._
import com.amazonaws.services.s3.transfer.TransferManager
import com.amazonaws.services.s3.transfer.Upload
/**
* Upload a file to S3
*/
def uploadToS3(client: AmazonS3Client, bucket: String, key: String, file: File): Unit = {
val tm = new TransferManager()
val upload = tm.upload(bucket, key, file)
upload.waitForCompletion()
}
/**
* Read a file's contents from S3
*/
def readFileContentsFromS3(client: AmazonS3Client, bucket: String, key: String): String = {
val getObjectRequest = new GetObjectRequest(bucket, key)
val responseHeaders = new ResponseHeaderOverrides()
responseHeaders.setCacheControl("No-cache")
getObjectRequest.setResponseHeaders(responseHeaders)
val objectStream = client.getObject(getObjectRequest).getObjectContent()
scala.io.Source.fromInputStream(objectStream).getLines().mkString("\n")
}
FeatureMap and Job
FeatureMap will read the features files.
Job will parse the raw data from xml to object. GetFeatures.
BinaryFeatureExtractor
Local Vector
Vectors.sparse(size, sortedElems)
Calculate and upload the binary label to the S3
TFFeatureExtractor
TFIDFFeatureExtractor
TFIDF(t,d,D) = TF(t,d)*IDF(t,D)
3. Classifier
UniformFoldingMechanism
validation codes blog
val msg = (positive, negative) match {
case _ if folds <= 0 =>
s"Invalid number of folds ($folds); Must be a positive integer."
case _ if negative.isEmpty || positive.isEmpty =>
"Insufficient number of samples " +
s"(# positive: ${positive.size}, # negative: ${negative.size})!"
case _ if positive.size < folds =>
s"Insufficient number of positive samples (${positive.size}); " +
s"Must be >= number of folds ($folds)!"
case _ if negative.size < folds =>
s"Insufficient number of negative samples (${negative.size}); " +
s"Must be >= number of folds ($folds)!"
case _ =>
""
}
isNullOrEmpty(msg) match {
case false =>
logger.error("Fold validation failed!")
Some(new RuntimeException(msg))
case true =>
logger.info("Fold validation succeeded!")
None
}
Merge the data and format them.
KFoldCrossValidator
Generate the TrainableSVM ——> TrainedSVM
Validate —> ModelMetrics
Scala Tips:
1. String Tail and Init
scala> val s = "123456"
s: String = 123456
scala> val s1 = s.tail
s1: String = 23456
scala> val s2 = s.init
s2: String = 12345
2. Tuple2
scala> val stuff = (42, "fish")
stuff: (Int, String) = (42,fish)
scala> stuff.getClass
res2: Class[_ <: (Int, String)] = class scala.Tuple2
scala>
scala> stuff._1
res3: Int = 42
scala> stuff._2
res4: String = fish
3. Scala Shuffle
scala> util.Random.shuffle(List(1, 2, 3, 4, 5, 6, 7, 8, 9))
res8: List[Int] = List(7, 1, 3, 9, 5, 8, 2, 6, 4)
scala> util.Random.shuffle(List(1, 2, 3, 4, 5, 6, 7, 8, 9))
res9: List[Int] = List(5, 1, 2, 6, 9, 4, 8, 7, 3)
4. Scala Grouped
scala> List(1,2,3,4,5,6,7,8,9,10,11,12,13).grouped(4).toList
res11: List[List[Int]] = List(List(1, 2, 3, 4), List(5, 6, 7,, List(9, 10, 11, 12), List(13))
5. Scala List Zip
scala> List(1,2,3).zip(List("one","two","three"))
res12: List[(Int, String)] = List((1,one), (2,two), (3,three))
scala> List(1,2,3).zip(List("one","two","three", "four"))
res13: List[(Int, String)] = List((1,one), (2,two), (3,three))
6. List Operation
scala> val s1 = List(1, 2, 3, 4, 5, 6, 7).splitAt(3)
s1: (List[Int], List[Int]) = (List(1, 2, 3),List(4, 5, 6, 7))
scala> val t1 = s1._1.last
t1: Int = 3
scala> val t2 = s1._1.init
t2: List[Int] = List(1, 2)
scala> val t2 = s1._2
t2: List[Int] = List(4, 5, 6, 7)
References:
http://www.fnlp.org/archives/4231
example
http://www.cnblogs.com/linlu1142/p/3292982.html
http://fuhao-987.iteye.com/blog/891697
发表评论
-
Stop Update Here
2020-04-28 09:00 315I will stop update here, and mo ... -
NodeJS12 and Zlib
2020-04-01 07:44 475NodeJS12 and Zlib It works as ... -
Docker Swarm 2020(2)Docker Swarm and Portainer
2020-03-31 23:18 367Docker Swarm 2020(2)Docker Swar ... -
Docker Swarm 2020(1)Simply Install and Use Swarm
2020-03-31 07:58 368Docker Swarm 2020(1)Simply Inst ... -
Traefik 2020(1)Introduction and Installation
2020-03-29 13:52 336Traefik 2020(1)Introduction and ... -
Portainer 2020(4)Deploy Nginx and Others
2020-03-20 12:06 430Portainer 2020(4)Deploy Nginx a ... -
Private Registry 2020(1)No auth in registry Nginx AUTH for UI
2020-03-18 00:56 435Private Registry 2020(1)No auth ... -
Docker Compose 2020(1)Installation and Basic
2020-03-15 08:10 373Docker Compose 2020(1)Installat ... -
VPN Server 2020(2)Docker on CentOS in Ubuntu
2020-03-02 08:04 454VPN Server 2020(2)Docker on Cen ... -
Buffer in NodeJS 12 and NodeJS 8
2020-02-25 06:43 384Buffer in NodeJS 12 and NodeJS ... -
NodeJS ENV Similar to JENV and PyENV
2020-02-25 05:14 477NodeJS ENV Similar to JENV and ... -
Prometheus HA 2020(3)AlertManager Cluster
2020-02-24 01:47 422Prometheus HA 2020(3)AlertManag ... -
Serverless with NodeJS and TencentCloud 2020(5)CRON and Settings
2020-02-24 01:46 337Serverless with NodeJS and Tenc ... -
GraphQL 2019(3)Connect to MySQL
2020-02-24 01:48 246GraphQL 2019(3)Connect to MySQL ... -
GraphQL 2019(2)GraphQL and Deploy to Tencent Cloud
2020-02-24 01:48 450GraphQL 2019(2)GraphQL and Depl ... -
GraphQL 2019(1)Apollo Basic
2020-02-19 01:36 326GraphQL 2019(1)Apollo Basic Cl ... -
Serverless with NodeJS and TencentCloud 2020(4)Multiple Handlers and Running wit
2020-02-19 01:19 313Serverless with NodeJS and Tenc ... -
Serverless with NodeJS and TencentCloud 2020(3)Build Tree and Traverse Tree
2020-02-19 01:19 317Serverless with NodeJS and Tenc ... -
Serverless with NodeJS and TencentCloud 2020(2)Trigger SCF in SCF
2020-02-19 01:18 292Serverless with NodeJS and Tenc ... -
Serverless with NodeJS and TencentCloud 2020(1)Running with Component
2020-02-19 01:17 311Serverless with NodeJS and Tenc ...
相关推荐
基于智能温度监测系统设计.doc
包括userCF,itemCF,MF,LR,POLY2,FM,FFM,GBDT+LR,阿里LS-PLM 基于深度学习推荐系统(王喆)
2023-04-06-项目笔记-第三百五十五阶段-课前小分享_小分享1.坚持提交gitee 小分享2.作业中提交代码 小分享3.写代码注意代码风格 4.3.1变量的使用 4.4变量的作用域与生命周期 4.4.1局部变量的作用域 4.4.2全局变量的作用域 4.4.2.1全局变量的作用域_1 4.4.2.353局变量的作用域_353- 2024-12-22
和美乡村城乡融合发展数字化解决方案.docx
基于Python的深度学习图像识别系统是一个利用卷积神经网络(CNN)对图像进行分类的先进项目。该项目使用Python的深度学习库,如TensorFlow,构建和训练一个模型,能够自动识别和分类图像中的对象。系统特别适合于图像处理领域的研究和实践,如计算机视觉、自动驾驶、医疗影像分析等。 项目的核心功能包括数据预处理、模型构建、训练、评估和预测。用户可以上传自己的图像或使用预定义的数据集进行训练。系统提供了一个直观的界面,允许用户监控训练进度,并可视化模型的性能。此外,系统还包括了一个模型优化模块,通过调整超参数和网络结构来提高识别准确率。 技术层面上,该项目使用了Python编程语言,并集成了多个流行的机器学习库,如NumPy、Pandas、Matplotlib等,用于数据处理和可视化。模型训练过程中,系统会保存训练好的权重,以便后续进行模型评估和预测。用户可以通过简单的API调用,将新的图像输入到训练好的模型中,获取预测结果。
拳皇97.exe拳皇972.exe拳皇973.exe
基于python和协同过滤算法的电影推荐系统 基于python和协同过滤算法的电影推荐系统基于python和协同过滤算法的电影推荐系统基于python和协同过滤算法的电影推荐系统基于python和协同过滤算法的电影推荐系统基于python和协同过滤算法的电影推荐系统基于python和协同过滤算法的电影推荐系统基于python和协同过滤算法的电影推荐系统基于python和协同过滤算法的电影推荐系统基于python和协同过滤算法的电影推荐系统基于python和协同过滤算法的电影推荐系统基于python和协同过滤算法的电影推荐系统基于python和协同过滤算法的电影推荐系统基于python和协同过滤算法的电影推荐系统基于python和协同过滤算法的电影推荐系统基于python和协同过滤算法的电影推荐系统基于python和协同过滤算法的电影推荐系统基于python和协同过滤算法的电影推荐系统基于python和协同过滤算法的电影推荐系统基于python和协同过滤算法的电影推荐系统基于python和协同过滤算法的电影推荐系统基于python和协同过滤算法的电影推荐系统基于python和协同过滤算法
DEV-CPP-RED-PANDA
Python语言求解旅行商问题,算法包括禁忌搜索、蚁群算法、模拟退火算法等。
pdfjs 用于在浏览器中查看/预览/打印pdf。 pdfjs 2.5.207 支持firefox/chrome/edge/ie11以上版本。 如果需要支持旧版本浏览器,可以使用这个,是未修改过的原版,支持打印和下载按钮。亲测有效。 pdf 4.9.155分两个包: pdfjs-4.9.155-dist.zip pdfjs-4.9.155-legacy-dist.zip
建设项目现场高温人员中暑事故应急预案
数据结构上机实验大作业-线性表选题.zip
【资源说明】 基于高德地图的校园导航全部资料+详细文档+高分项目.zip 【备注】 1、该项目是个人高分项目源码,已获导师指导认可通过,答辩评审分达到95分 2、该资源内项目代码都经过测试运行成功,功能ok的情况下才上传的,请放心下载使用! 3、本项目适合计算机相关专业(人工智能、通信工程、自动化、电子信息、物联网等)的在校学生、老师或者企业员工下载使用,也可作为毕业设计、课程设计、作业、项目初期立项演示等,当然也适合小白学习进阶。 4、如果基础还行,可以在此代码基础上进行修改,以实现其他功能,也可直接用于毕设、课设、作业等。 欢迎下载,沟通交流,互相学习,共同进步!
【静态站群程序视频演示,只有视频,不含程序,下载须知】【静态站群程序视频演示,只有视频,不含程序,下载须知】全自动批量建站快速养权重站系统【纯静态html站群版】:(GPT4.0自动根据关键词写文章+自动发布+自定义友链+自动文章内链+20%页面加提权词)
9.30 SWKJ 男头7张+女头2张.zip
项目已获导师指导并通过的高分毕业设计项目,可作为课程设计和期末大作业,下载即用无需修改,项目完整确保可以运行。 包含:项目源码、数据库脚本、软件工具等,该项目可以作为毕设、课程设计使用,前后端代码都在里面。 该系统功能完善、界面美观、操作简单、功能齐全、管理便捷,具有很高的实际应用价值。 项目都经过严格调试,确保可以运行!可以放心下载 技术组成 语言:java 开发环境:idea、vscode 数据库:MySql5.7以上 部署环境:maven 数据库工具:navicat
一个通过单片机在各种屏幕上显示中文的解决方案.7z
图像
一、用户管理功能 用户注册与登录 学生注册:学生可以通过手机号、邮箱、社交账号等方式注册,填写个人信息(如姓名、年龄、学校等)。 家长/监护人账户:支持家长/监护人注册并管理学生账户,查看学习进度和成绩。 教师账户:教师可以注册并设置个人资料,上传资质认证文件。 管理员账户:管理员负责整个系统的管理,包括用户管理、课程管理、平台设置等。 用户权限管理 角色权限:系统根据用户类型(学生、家长、教师、管理员)分配不同权限,确保信息安全。 家长监督:家长可以查看子女的学习进度、成绩和教师反馈,参与学习监督。 个人资料管理 用户可以在个人中心更新基本信息,设置个人头像、联系方式、密码等。 支持学籍信息的维护,例如学生的年级、班级、课程历史等。 二、课程管理功能 课程设置 课程创建与编辑:教师或管理员可以创建和编辑课程内容,上传课件、视频、文档等教学材料。 课程分类:根据学科、年级、难度等维度进行课程分类,方便学生浏览和选择。 课程排课:管理员可以设置课程的时间表、教学内容和授课教师,并调整上课时间和频率。 课程安排与通知 课程预约:学生可以在线选择并预约感兴趣的课程,系统根据学生的时
内容概要:本文档介绍了英特尔2021年至2024年的网络连接性产品和智能处理单元(IPU)的战略和技术路线图。涵盖了从10GbE到200GbE的不同系列以太网适配器的特性、性能和发布时间。详细列出了各个产品的关键功能,如PCIe接口、安全特性、RDMA支持等。同时,介绍了IPU的发展计划,包括200G、400G和800G的不同代次产品的性能提升和新的功能特点。 适合人群:从事网络工程、数据中心管理、IT架构设计的专业技术人员。 使用场景及目标:本文档主要用于了解英特尔未来几年在以太网适配器和IPU领域的技术和产品规划,帮助企业在采购和部署网络设备时做出决策。同时,为研究人员提供最新技术发展趋势的参考。 其他说明:文档内容涉及的技术细节和时间表可能会有变动,请以英特尔官方发布的最新信息为准。