Google
1. nosqldbs-NOSQL Introduction and Overview
2. system and method for data distribution(2009)
3. System and method for large-scale data processing using an application-independent framework(2010)
4. MapReduce: Simplified Data Processing on Large Clusters;
5. MapReduce-- a flexible data processing tool(2010)
6. Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters
7. MapReduce and Parallel DBMSs--Friends or Foes(2010)
8. Presentation:MapReduce and Parallel DBMSs:Together at Last (2010)
9. Twister: A Runtime for Iterative MapReduce(2010)
10. MapReduce Online(2009)
11. Megastore: Providing Scalable, Highly Available Storage for Interactive Services (2011,CIDR)
12. Interpreting the Data:Parallel Analysis with Sawzall
13. Dapper, a Large-Scale Distributed Systems Tracing Infrastructure (technical report 2010)
14. Large-scale Incremental Processing Using Distributed Transactions and Notifications(2010)
15. Improving MapReduce Performance in Heterogeneous Environments
16. Dremel: Interactive Analysis of WebScale Datasets(2011)
17. Large-scale Incremental Processing Using Distributed Transactions and Notifications
18. Chukwa: a scalable cloud monitoring System (presentation)
19. The Chubby lock service for loosely-coupled distributed systems
20. Paxos Made Simple(2001,Lamport)
21. Fast Paxos(2006)
22. Paxos Made Live - An Engineering Perspective(2007)
23. Classic Paxos vs. Fast Paxos: Caveat Emptor
24. On the Coordinator’s Rule for Fast Paxos(2005)
25. Paxos made code:Implementing a high throughput Atomic Broadcast (2009)
26. Bigtable: A Distributed Storage System for Structured Data(2006)
27. The Google File System
Google patent papers
1. Data processing system and method for financial debt instruments(1999)
2. Data processing system and method to enforce payment of royalties when copying softcopy books(1996)
3. Data processing systems and methods(2005)
4. Large-scale data processing in a distributed and parallel processing environment(2010)
5. METHODS AND SYSTEMS FOR MANAGEMENT OF DATA()
6. SEARCH OVER STRUCTURED DATA(2011)
7. System and method for maintaining replicated data coherency in a data processing system(1995)
8. System and method of using data mining prediction methodology(2006)
9. System and Methodology for Data Processing Combining Stream Processing and spreadsheet computation(2011)
10. Patent Factor index report of system and method of using data mining prediction methodology
11. Pregel: A System for Large-Scale Graph Processing(2010)
Hadoop
1. A simple totally ordered broadcast protocol
2. ZooKeeper: Wait-free coordination for Internet-scale systems
3. Zab: High-performance broadcast for primary-backup systems(2011)
4. wait-free syschronization(1991)
5. ON SELF-STABILIZING WAIT-FREE CLOCK SYNCHRONIZATION(1997)
6. Wait-free clock synchronization(ps format)
7. Programming with ZooKeeper - A basic tutorial
8. Hive – A Petabyte Scale Data Warehouse Using Hadoop
9. Thrift: Scalable Cross-Language Services Implementation(Facebook)
10. Hive other files: HiveMetaStore class picture, Chinese docs
11. Scaling out data preprocessing with Hive (2011)
12. HBase The Definitive Guide - 2011
13. Nova: Continuous Pig/Hadoop Workflows(yahoo,2011)
14. Pig Latin: A Not-So-Foreign Language for Data Processing(2008)
15. Analyzing Massive Astrophysical Datasets: Can Pig/Hadoop or a Relational DBMS Help?(2009)
a. Some docs about HStreaming,Zebra
16. HIPI: A Hadoop Image Processing Interface for Image-based MapReduce Tasks
17. System Anomaly Detection in Distributed Systems through MapReduce-Based Log Analysis(2010)
18. Benchmarking Cloud Serving Systems with YCSB(2010)
19. Low-Latency, High-Throughput Access to Static Global Resources within the Hadoop Framework (2009)
SmallFile Combine in hadoop world
1. TidyFS: A Simple and Small Distributed File System(Microsoft)
2. Improving the storage efficiency of small files in cloud storage(chinese,2011)
3. Comparing Hadoop and Fat-Btree Based Access Method for Small File I/O Applications(2010)
4. RCFile: A Fast and Space-efficient Data Placement Structure in MapReduce-based Warehouse Systems(Facebook)
5. A Novel Approach to Improving the Efficiency of Storing and Accessing Small Files on Hadoop: a Case Study by PowerPoint Files(IBM,2010)
Job schedule
1. Job Scheduling for Multi-User MapReduce Clusters(Facebook)
2. MapReduce Scheduler Using Classifiers for Heterogeneous Workloads(2011)
3. Performance-Driven Task Co-Scheduling for MapReduce Environments
4. Towards a Resource Aware Scheduler in Hadoop(2009)
5. Delay Scheduling: A Simple Technique for Achieving
6. Locality and Fairness in Cluster Scheduling(yahoo,2010)
7. Dynamic Proportional Share Scheduling in Hadoop(HP)
8. Adaptive Task Scheduling for MultiJob MapReduce Environments(2010)
9. A Dynamic MapReduce Scheduler for Heterogeneous Workloads(2009)
HStreaming
1. HStreaming Cloud Documentation
2. S4: Distributed Stream Computing Platform(yahoo,2010)
3. Complex Event Processing(2009)
4. Hstreaming : http://www.hstreaming.com/resources/manuals/
5. StreamBase: http://streambase.com/developers-docs-pdfindex.htm
6. Twitter storm: http://www.infoq.com/cn/news/2011/09/twitter-storm-real-time-hadoop
7. Bulk Synchronous Parallel(BSP) computing
8. MPI
SQL/Mapreduce
1. Aster Data whilepaper:Deriving Deep Insights from Large Datasets with SQL-MapReduce (2004)
2. SQL/MapReduce: A practical approach to self-describing,polymorphic, and parallelizable user-defined functions(2009,aster)
3. HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads(2009)
4. HadoopDB in Action: Building Real World Applications(2010)
5. Aster Data presentation: Making Advanced Analytics on Big Data Fast and Easy(2010)
6. A Scalable, Predictable Join Operator for
7. Highly Concurrent Data Warehouses(2009)
8. Cheetah: A High Performance, Custom Data Warehouse on Top of MapReduce(2010)
9. Greenplum whilepaper:A Unified Engine for RDBMS and MapReduce(2004)
10. A Comparison of Approaches to Large-Scale Data Analysis(2009)
11. MAD Skills: New Analysis Practices for Big Data (2009)
12. C Store A Column oriented DBMS(2005)
13. Distributed Aggregation for Data-Parallel Computing: Interfaces and Implementations(Microsoft)
Microsoft
1. Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks (2007)
Amazon
1. Dynamo: Amazon’s Highly Available Key-value Store(2007)
2. Efficient Reconciliation and Flow Control for Anti-Entropy Protocols
3. The Eucalyptus Open-source Cloud-computing System
4. Eucalyptus: An Open-source Infrastructure for Cloud Computing(presentation)
5. Eucalyptus : A Technical Report on an Elastic Utility Computing Archietcture Linking Your Programs to Useful Systems (2008)
6. Zephyr: Live Migration in Shared Nothing Databases for Elastic Cloud Platforms(2011)
7. Database-Agnostic Transaction Support for Cloud Infrastructures
8. CloudScale: Elastic Resource Scaling for Multi-Tenant Cloud Systems(2011)
9. ELT: Efficient Log-based Troubleshooting System for Cloud Computing Infrastructures
Books
1. Distributed Systems Concepts and Design (5th Edition)
2. Principles of Computer Systems (7-11)
3. Distributed system(chapter)
4. Data-Intensive Text Processing with MapReduce (2010)
5. Hadoop in Action
6. 21 Recipes for Mining Twitter
7. Hadoop.The.Definitive.Guide.2nd.Edition
8. Pro hadoop
Other papers about Distributed system
1. Flexible Update Propagation for Weakly Consistent Replication(1997)
2. Providing High Availability Using Lazy Replication(1992)
3. Managing Update Conflicts in Bayou,a Weakly Connected Replicated Storage System(1995)
4. XMIDDLE: A Data-Sharing Middleware for Mobile Computing(2002)
5. design and implementation of sun network filesystem
6. Chord: A Scalable Peertopeer Lookup Service for Internet Applications(2001)
7. A Survey and Comparison of Peer-to-Peer Overlay Network Schemes(2004)
8. Tapestry: An Infrastructure for Fault-tolerant Wide-area Location and Routing(2001)
BI
1. 21 Recipes for Mining Twitter(Book)
2. Web Data Mining(Book)
3. Web Mining and Social Networking(Book)
4. mining the social web(book)
5. TEXTUAL BUSINESS INTELLIGENCE (Inmon)
6. Social Network Analysis and Mining for Business Applications(yahoo,2011)
7. Data Mining in Social Networks(2002)
8. Natural Language Processing with Python(book)
9. data_mining-10_methods(Chinese editation)
10. Mahout in Action(Book)
11. Text Mining Infrastructure in R(2008)
12. Text Mining Handbook(2010)
Web search engine
1. Building Efficient Multi-Threaded Search Nodes(Yahoo,2010)
2. The Anatomy of a Large-Scale Hypertextual Web Search Engine(google)
分享到:
相关推荐
这些论文涵盖了云计算的各个方面,包括其核心技术、架构、安全性、效率、应用以及未来趋势。 1. **核心技术与架构**:云计算的核心技术包括虚拟化、分布式计算、并行处理和大数据分析。虚拟化技术使得物理资源可以...
Facebook的数据中心是云计算的核心,它们通常采用节能设计,如使用自然冷却系统,以降低运营成本并减少对环境的影响。此外,Facebook还通过开放计算项目(Open Compute Project)分享其数据中心硬件设计,推动了整个...
1. **云计算的基本概念**:云计算的核心理念是资源共享、按需分配,以及服务计费。它分为公有云、私有云和混合云三种主要类型,每种类型都有其独特的应用场景和优势。 2. **基础设施即服务(IaaS)**:这是云计算的...
大数据与云计算是当今信息技术领域的两大核心概念,它们的结合为现代社会的数据处理和计算能力提供了前所未有的机遇和挑战。本文将深入探讨这两个领域的相关知识点,并基于提供的"云计算论文"这一压缩包文件,概述...
首先,我们可以从“云计算”这一核心概念出发,探讨其三个主要服务模型:基础设施即服务(IaaS)、平台即服务(PaaS)和软件即服务(SaaS)。IaaS提供虚拟化的计算资源,如服务器、存储和网络;PaaS则在IaaS的基础上...
《虚拟化与云计算顶尖论文精选》是一份专为深入理解虚拟化技术和云计算核心概念而精心策划的学术资源。北京邮电大学的商老师遵循国外顶尖大学的教学理念,从海量论文中筛选出这批精华,旨在强化理论根基,启发学生...
综上所述,这两篇论文都展示了虚拟化技术在云计算环境中的创新应用。Directvisor通过优化虚拟化架构,兼顾了高性能和管理性的需求,而轻量级内核隔离则利用新型CPU特性强化了内核安全,降低了被攻击的风险。这些研究...
这里我们将深入探讨Google关于云计算的三篇重要论文,它们为云计算的理论框架和发展奠定了坚实的基础。 第一篇论文名为《谷歌文件系统》(Google File System, GFS),由Sanjay Ghemawat、Howard Gobioff和Shun-Tak...
除了这些核心论文,压缩包中的PPT文件提供了对分布式系统、学生背景知识、MapReduce理论与算法、其他谷歌技术以及Hadoop和分布式文件系统的教学和讲解,帮助读者更深入地理解这些技术的实际应用和原理。 模块1介绍...
### 云计算领域的论文知识点解析 #### 论文标题与核心思想 - **标题**:“InterCloud: Utility-Oriented Federation of Cloud Computing Environments for Scaling of Application Services” - **核心思想**:本文...
Google作为这一领域的先驱,贡献了一系列具有里程碑意义的论文,奠定了现代云计算架构的基础。这三篇经典论文——"Google File System"、"BigTable"和"Map&Reduce"——正是Google对这一领域的杰出贡献。 首先,我们...
首先,云安全是云计算中的核心议题,它涉及数据保护、隐私维护以及网络安全。论文可能会讨论如何在分布式环境中确保数据的加密传输、防止未授权访问以及应对潜在的DDoS攻击。此外,随着法规对数据保护的要求日益严格...
云计算的核心是虚拟化技术,它将硬件和软件资源抽象化,形成一个可动态扩展的资源池。根据服务模式,云计算可分为基础设施即服务(IaaS)、平台即服务(PaaS)和软件即服务(SaaS)。云计算的优势在于其弹性扩展、...
这个压缩包中包含了多篇来自国内核心期刊的云计算相关论文,全面覆盖了该领域的多个关键议题,对于学习和研究云计算的学生或专业人士具有很高的参考价值。 首先,"云计算时代的软件复用.pdf"可能探讨了在云计算环境...
在这篇论文中,我们讨论了云计算中的核心组件,以及在框架中构建组件,这个框架能帮助决策者评估云计算的成本,对比传统解决方案和云计算方案在成本上的差异。 17、 Using Transaction Based Parallel Computing to...
在IT行业中,Google是云计算领域的先驱之一,其在2000年代初期发表的三篇开创性论文对整个行业产生了深远影响。这三篇论文分别介绍了Bigtable、Hadoop的分布式文件系统(HDFS)以及MapReduce编程模型,它们为大规模...
这篇论文将深入探讨云计算的基础知识,包括其概念、原理、核心技术、实现机制、架构体系以及云存储技术和应用。 首先,云计算的概念可以理解为一种通过网络提供可扩展的、虚拟化的计算资源和服务的模式。它不再依赖...
### 云计算毕业论文知识点概述 #### 一、云计算的基本概念 云计算是由Google率先提出的一种新型网络应用模式。根据Google的概念,云计算主要分为两个层面:狭义的云计算关注于IT基础设施的交付与使用模式,即通过...
在“云计算论文”这个主题中,我们可以深入探讨几个关键的知识点。 1. **云计算定义与分类**:云计算通常分为三种服务模型——基础设施即服务(IaaS)、平台即服务(PaaS)和软件即服务(SaaS)。IaaS提供虚拟化...