Days before, I've submitted an application to participate in Apache Mahout and at this time, have got a reply from the guru of this project. It inspired us with lots of courages. We decided that if I were selected by ASF, we will integrate redpoll into Mahout which has the same end goals, same license with our project. We believe that there is a lot of synergies if we work together with ASF.
However, all of these are based on an
if, we will keep working during the time they making their decision. Our short-term goal is listed below,
April 19th, finish learning the coding style of hadoop and implmenting Naive Bayes classifier.
May 3rd, finish parallelizing EM clustering algorithm which can work together with Canopy.
May 24th, SVM classifier implementation deadline.
At present, We are also doing some preparations like learning something about data mining, thinking about how to parallize them and finding some large data sets we havenot.
BTW, through reading the source code of Mahout these days, I found that those two clustering algorithms can only deal with double values. IMHO, the text data format can be uniformed for most of data mining algorithms. If we have an infrastructure for parsing data types like numeric, nominal, date, etc and organizing them into some certain data structures, the Mahout will be more efficient and more practical by the means of supporting more data types.
分享到:
相关推荐
Hadoop不仅仅是一个单独的工具,它已经发展成为一个庞大的生态系统,包含了众多的项目和工具,如Nutch、HBase、Hive、Mahout、ZooKeeper、Pig和Hama等。这些工具和项目覆盖了从数据采集、存储、处理到分析和可视化等...
MapReduce研究 调试、监控等 优化、扩展等 常用API Hadoop改造 数据挖掘项目Redpoll Canopy, k-means Naive bayes, SVM
onnxruntime-1.16.0-cp311-cp311-win_amd64.whl
基于springboot的流浪猫狗救助系统源码数据库文档.zip
摘 要 如今的信息时代,对信息的共享性,信息的流通性有着较高要求,因此传统管理方式就不适合。为了让美容院信息的管理模式进行升级,也为了更好的维护美容院信息,美容院管理系统的开发运用就显得很有必要。并且通过开发美容院管理系统,不仅可以让所学的SpringBoot框架得到实际运用,也可以掌握MySQL的使用方法,对自身编程能力也有一个检验和提升的过程。尤其是通过实践,可以对系统的开发流程加深印象,无论是前期的分析与设计,还是后期的编码测试等环节,都可以有一个深刻的了解。 美容院管理系统根据调研,确定其实现的功能主要包括美容用品管理,美容项目管理,美容部位管理,销量信息管理,订单管理,美容项目预约信息管理等功能。 借助于美容院管理系统这样的工具,让信息系统化,流程化,规范化是最终的发展结果,让其遵循实际操作流程的情况下,对美容院信息实施规范化处理,让美容院信息通过电子的方式进行保存,无论是管理人员检索美容院信息,维护美容院信息都可以便利化操作,真正缩短信息处理时间,节省人力和信息管理的成本。 关键字:美容院管理系统,SpringBoot框架,MySQL
numpy-1.21.1-cp39-cp39-linux_armv7l.whl
基于JavaWeb+springboot的宠物救助及领养平台源码数据库文档.zip
基于springboot员工在线餐饮管理系统源码数据库文档.zip
matplotlib-3.5.3-cp37-cp37m-linux_armv7l.whl
基于springboot+web的留守儿童网站源码数据库文档.zip
STM32神舟III号例程源码SysTick系统滴答(神舟III号-库函数版)提取方式是百度网盘分享地址
STM32开发相关软件ISP 程序下载STM32开发相关软件ISP 程序下载提取方式是百度网盘分享地址
onnxruntime-1.17.0-cp310-cp310-win_amd64.whl
Pillow-9.5.0-cp39-cp39-linux_armv7l.whl
基于springboot高性能计算中心的高性能集群共享平台源码数据库文档.zip
SciPy-1.11.1-cp311-cp311-linux_armv7l.whl
主机硬件信息邮件及微信推送
numpy-1.23.4-cp39-cp39-linux_armv7l.whl
基于springboot视频点播系统源码数据库文档.zip
基于springboot竞赛管理系统源码数据库文档.zip