  • 浏览: 935818 次

Spring Batch Framework– introduction chapter(下)


Extract,Transform, and load(ETL)

Briefly stated, ETL is a process in the database anddata-warehousing world that performs the following steps:

  1. Extracts data from an external data source
  2. Transforms the extracted data to match a specific purpose
  3. Loads the transformed data into a data target; a database or data warehouse.

Many products, both free and commercial, can help create ETLprocesses. This is a bigger topic than we can address here, bt it isn’t alwaysas simple as these three steps. Writing an ETL process can present its own setof challenges invloving parallel processing, rerunnability, and recoverability.The ETL community has developeed its own set of best practices to meet theseand other requirements.

For the prurpose of our discussion, this ETL process is ablack box; it could be implemented with an ETL tool(like Talend) or even withanother Spring Batch job.

Spring Batch includes many ready-to-use components to readfrom and write to daa stores like files and databases.

Chunk Processing is particularly well suited to handle largedata operations because a job handles itenms in small chunks instead ofprocessing them all at once. Practically speaking, a large file won’t be loadedin memory; instead it’s streamed, which is more efficient in terms of memory consumption.Chunk processing allows more flexibility to manage the data flow in a job.Spring Batch also handles transactions and errors around read and writeoperations.

Spring Batch provides the FlatFileItemReader class to readrecords from a flat file. To use a FlatFileItemReader, you need to configuraresome Spring beans and implement a component that creates domain objects fromwhat the FlatFileItemReader reads;Spring Batch will handle the rest.

Choosing a chunk size and commit interval

First, the size of a chunk and the commit interval are thesame thing! Second, there’s no definitive value to choose. Our recommendationis a value between 10 and 200. Too small a chunk size creates too many transactions,which is costly and makes the job run slowly. Too alrge a chunk size makestransactional resources-like databases-run slowly too, because a database mustbe able to roll back operations. The best value for the commit interval dependson many factors:data, processing, nature of the resources, and so on. Thecommit interval is a parameter in Spring Batch, so don’t hesitate to change itto find the most appropriate value for your jobs.

Decompressing a file isn’t a read-write step, but Springbatch is flexible enough to implement such a task as part of a job.A 1-GB flatfile can compress to 100MB, which is a more reasonable size for file transfersover the internet.

Note that you could encrypt the file as well, ensuring thatno one could read the product data if the file were intercepted duringtransfer. The encryption could be done before the compression or as part of it.Spring Batch provides an extension point to handle processing in a batchprocess step: The Tasklet. You implement a Tasklet that decompresses a ZIParchive into its source flat file.

How does a job refer to the job repository?

You may have noticed that we say a job needs the jobrepository to run but we don’t make any reference to the job repository bean inthe job configuration. The XML step element can have its job-repositoryattribute refer to a job repository bean. This attribute isn’t mandatory,because by default the job uses a jobRepository bean. As long as you declare ajobRepository bean of type JobRepository, you don’t need to explicitly refer toit in your job configuration.



    SpringBatch+Spring+Mybatis+MySql (spring batch 使用jar)

    Spring Batch是一个轻量级的,完全面向Spring的批处理框架,可以应用于企业级大量的数据处理系统。Spring Batch以POJO和大家熟知的Spring框架为基础,使开发者更容易的访问和利用企业级服务。Spring Batch可以提供...

    Spring Boot整合Spring Batch,实现批处理

    在Java开发领域,Spring Boot和Spring Batch的整合是构建高效批处理系统的一种常见方式。Spring Boot以其简洁的配置和快速的启动能力深受开发者喜爱,而Spring Batch作为Spring框架的一部分,专注于批量处理任务,...

    Spring Batch in Action英文pdf版

    Spring Batch是一个开源的轻量级、全面的批处理框架,它是为了解决企业应用中的大规模数据处理需求而设计的。Spring Batch in Action是一本专注于Spring Batch框架的书籍,由Arnaud Cogoluègnes、Thierry Templier...

    spring batch in action

    Jointly developed by SpringSource and Accenture, Spring Batch fills this critical gap by providing a robust and convenient framework for writing batch applications that process large volumes of ...

    The Definitive Guide to Spring Batch, 2nd Edition.epub

    Additionally, you’ll discover how Spring Batch 4 takes advantage of Java 9, Spring Framework 5, and the new Spring Boot 2 micro-framework. After reading this book, you’ll be able to use Spring Boot ...


    Quartz和Spring Batch是两种非常重要的Java开源框架,它们在企业级应用开发中扮演着重要角色。Quartz主要用于任务调度,而Spring Batch则专注于批量处理。在这个“quartz_springbatch_dynamic”项目中,我们将看到...

    springbatch 详解PDF附加 全书源码 压缩包

    **Spring Batch 深度解析** Spring Batch 是一个强大的、全面的批处理框架,由 Spring 社区开发,旨在简化企业级应用中的批量数据处理任务。这个框架提供了一种标准的方式来处理大量的数据输入和输出,使得开发者...

    Spring Batch批处理框架

    Spring Batch提供了强大的错误处理机制,比如可以配置当ItemReader读取失败时,作业会自动重试,或当ItemProcessor处理出错时,可以跳过当前项继续处理下一项。这些机制极大地提高了批处理作业的健壮性。 5. 作业...

    基于Spring Batch的大数据量并行处理

    ### 基于Spring Batch的大数据量并行处理 #### 概述 Spring Batch是一款用于高效处理大量数据的开源框架,特别适用于批处理任务。它由Spring Source与Accenture合作开发,结合了双方在批处理架构和技术上的优势,...

    Spring Batch API(Spring Batch 开发文档).CHM

    Spring Batch API(Spring Batch 开发文档).CHM。 官网 Spring Batch API,Spring Batch 开发文档


    Spring Batch 是一个强大的、全面的批处理框架,用于处理大量数据的处理任务。它由 Spring 框架提供支持,因此具有高度的可配置性和可扩展性,适用于各种企业级应用。Spring Batch 4.0.0 版本是该框架的一个重要版本...



    Spring batch in action

    Spring Batch是一本介绍如何使用Spring Batch框架来构建批处理应用程序的专业书籍。在软件行业中,随着各种趋势的发展,例如基于Web的应用、面向服务的架构(SOA)以及事件驱动的应用,批处理应用程序虽然存在已久,...

    spring Batch实现数据库大数据量读写

    Spring Batch 是一个强大的、可扩展的Java框架,专门用于处理批量数据处理任务,包括大量数据库数据的读取、处理和写入。它被设计为在企业级应用中处理大规模、高吞吐量的工作负载。本篇文章将深入探讨如何利用...

    SpringBatch-DataMigration SpringBatch数据迁移项目

    mybatis、springBatch、mysql、quartz、spring、springMVC 部署说明: 本项目为两个数据库,由一个数据库的表向另外一个数据库的表做数据迁移,其中数据库脚本在:/src/main/resources/sql/下面(其中data_rep中的表...


    Spring Batch 是一个强大的Java框架,专门用于处理批量数据处理任务。在Spring Batch中,分区处理是一种优化策略,它将大型工作负载分解成多个较小、独立的任务,这些任务可以在不同的线程或甚至不同的节点上并行...

    Spring-framework 所有的jar包.rar

    Spring框架是中国著名的Java企业级应用开发框架,由...通过下载并使用"Spring-framework 所有的jar包.rar",开发者可以方便地将Spring框架集成到自己的项目中,利用其强大的功能和灵活性来构建稳定、高效的企业级应用。


    SpringBatch 是一个强大的Java批处理框架,由Spring社区开发并维护,主要用于处理大量数据的后台操作,如批量导入导出、日志分析、定时任务等。在本文中,我们将深入探讨SpringBatch的学习入门,以及如何在IDEA环境...

    [原创]Spring Batch 示例程序

    Spring Batch 是一个强大的Java框架,专门用于处理批量数据处理任务。在给定的"Spring Batch 示例程序"中,我们可以深入探讨这个框架的核心概念和在实际应用中的使用方式。该示例程序采用的是Spring 3作为基础框架,...


    <bean id="transactionManager" class="org.springframework.batch.support.transaction.ResourcelessTransactionManager"/> <!-- 多线程任务... --> ``` #### 七、总结 通过上述分析,我们可以看到 Elastic-...

Global site tag (gtag.js) - Google Analytics