List of Figures – Spring Batch in Action

List of Figures

Chapter 1. Introducing Spring Batch

Figure 1.1. A typical batch application: system A exports data to flat files, and system B uses a batch process to read the files into a database.

Figure 1.2. Scaling by partitioning: a single step partitions records and autonomous substeps handle processing.

Figure 1.3. Thanks to this new application, anyone can buy ACME’s products online. The system sends catalogs to a server where a batch process reads them and writes product records into the online store database.

Figure 1.4. Because ACME doesn’t want its internal catalog system to be directly accessible from the outside world, it doesn’t allow the two applications to communicate directly and exchange data.

Figure 1.5. An extract, transform, and load (ETL) process extracts and transforms the catalog system data into a flat file, which ACME sends every night to a Spring Batch process. This Spring Batch process is in charge of reading the flat file and importing the data into the online store database.

Figure 1.6. The Spring Batch job consists of the following steps: decompression and read-write.

Figure 1.7. In read-write scenarios, Spring Batch uses chunk processing. Spring Batch reads items one by one from an ItemReader, collects the items in a chunk of a given size, and sends that chunk to an ItemWriter.

Figure 1.8. Chunk processing combined with item processing: an item processor can transform input items before calling the item writer.

Figure 1.9. The FlatFileItemReader reads the flat file and delegates the mapping between a line and a domain object to a LineMapper. The LineMapper implementation delegates the splitting of lines and the mapping between split lines and domain objects.

Figure 1.10. Launching the test in Eclipse. Despite all its features, Spring Batch remains lightweight, making jobs easy to test.

Chapter 2. Spring Batch concepts

Figure 2.1. The main Spring Batch components. The framework provides a job repository to store job metadata and a job launcher to launch jobs, and the application developer configures and implements jobs. The infrastructure components—provided by Spring Batch—are in gray, and application components—implemented by the developer—are in white.

Figure 2.2. A Spring Batch application interacts with systems like schedulers and data sources (databases, files, or JMS queues).

Figure 2.3. The Spring Batch Admin web application lists all job instances for a given job, in this case, import products. Spring Batch Admin uses job metadata stored in a database to monitor job executions.

Figure 2.4. A Spring Batch job is a sequence of steps, such as this import products job, which includes three steps: decompress, read-write, and cleanup.

Figure 2.5. A Spring Batch job can be a nonlinear sequence of steps, like this version of the import products job, which sends a report if some records were skipped.

Figure 2.6. A job flow in the SpringSource Tool Suite. The tool displays a graph based on the job model defined in Spring Batch XML.

Figure 2.7. A job can have several job instances, which can have several job executions. The import products job executes daily, so it should have one instance per day and one or more corresponding executions, depending on success or failure.

Figure 2.8. After the run for June 27, 2010, Spring Batch created a job instance in the job repository. The instance is marked as COMPLETED and is the first and only execution to complete successfully.

Figure 2.9. Details (duration, number of steps executed, and so on) of the first and only job execution for June 27, 2010. You can also learn about the job instance because the job parameters appear in the table.

Figure 2.10. The two June 28, 2010, executions. The first failed because of a corrupted archive, but the second completed successfully, thereby completing the job instance.

Chapter 3. Batch configuration

Figure 3.1. Interactions between Spring Batch and Spring XML vocabularies. The batch vocabulary defines the structure of batches. Some batch entities, such as a job, can refer to Spring beans defined with the beans vocabulary, such as item readers and writers.

Figure 3.2. The entity configuration hierarchy for batch processing. The Spring configuration contains the job configuration, which contains the step configuration, which contains the tasklet configuration, which contains the chunk configuration.

Figure 3.3. The steps of the import products job: decompress and readWrite. The decompress step first reads a zip file and decompresses it to another file. The readWrite step reads the decompressed file.

Figure 3.4. The import product tasklet configuration and chunk configuration define three steps: import, process, and store products.

Figure 3.5. Registration of entities as streams. Spring Batch automatically registers readers, processors, and writers if they implement the ItemStream interface. Explicit registration is necessary if Spring Batch doesn’t know about the streams to register, such as the writers in the figure used through a composite writer.

Figure 3.6. Using the filename to import from the job configuration

Figure 3.7. Notifications of lifecycle events and errors during job execution

Figure 3.8. Using configuration inheritance lets jobs inherit from an abstract job and steps inherit from an abstract step.

Chapter 4. Running batch jobs

Figure 4.1. The job launcher is synchronous by default. The client waits until the job execution ends (successfully or not) before the job launcher returns the corresponding job execution object. Synchronous execution can be problematic, for example, when the client is a controller from a web application.

Figure 4.2. The job launcher can use a task executor to launch job executions asynchronously. The task executor handles the threading strategy, and the client has immediate access to the job execution object.

Figure 4.3. You can launch a Spring Batch job as a plain JVM process. The triggering system can be a scheduler or a human operator. This solution is simple but implies initializing the batch environment for each run.

Figure 4.4. You can embed Spring Batch in a container along with a Java scheduler. A web container is a good candidate because Spring integrates easily in web applications.

Figure 4.5. An external system submits a job request to the container where the Spring Batch environment is deployed. An example is a cron scheduler submitting an HTTP request to a web controller. The web controller would use the job launcher API to start the job execution.

Figure 4.6. The command-line job runner uses an exit code mapper to translate the string exit status of a Spring Batch job into an integer system exit code. The triggering system—a system scheduler here—can then use this system exit code to decide what to do next.

Figure 4.7. An entry in the crontab file has three parts: (1) the cron expression, which schedules the job execution; (2) the user who runs the command; and (3) the command to execute. Some cron implementations don’t have the user option.

Figure 4.8. A web application can contain a Spring application context. This Spring application context can host Spring Batch’s infrastructure (job launcher, job repository) and jobs. The context can also host a Java-based scheduler (like Spring scheduler or Quartz) and any Spring beans related to the web application (data access objects, business services).

Figure 4.9. Once Spring Batch is in a web application, you can use an embedded Java scheduler such as Spring scheduler to launch jobs periodically.

Figure 4.10. Once Spring Batch is in a web application, you can add a web layer to launch Spring Batch jobs on incoming HTTP requests. This solution is convenient when the triggering system is external to Spring Batch (like cron).

Figure 4.11. The web controller is defined in the servlet’s application context. The root application context defines the job registry and the job launcher. Because the two application contexts share a parent-child relationship, you can inject beans from the root application context into the web controller.

Figure 4.12. You can expose the job operator bean to JMX and then call its methods remotely from a JMX client like JConsole. An operator can learn about the Spring Batch runtime and stop or restart jobs.

Chapter 5. Reading data

Figure 5.1. A chunk tasklet reads, processes, and writes data.

Figure 5.2. The supported file formats in the case study are separator-based text, fixed length-based text, and JSON.

Figure 5.3. ItemReader processing for flat files. The item reader first identifies records, and then creates data objects.

Figure 5.4. Classes and interfaces involved in reading and parsing flat files

Figure 5.5. Interactions between LineTokenizer and FieldSetMapper in a DefaultLineMapper. The LineTokenizer parses record lines, and the FieldSetMapper creates objects from field sets.

Figure 5.6. Inheritance relationships between different kinds of products. The MobilePhoneProduct and BookProduct classes inherit from the Product class.

Figure 5.7. Spring OXM components

Figure 5.8. How Spring Batch reads data from multiple files. A multiresource reader delegates to a resource reader that reads files from an input directory.

Figure 5.9. High-level JDBC architecture. An application uses the vendor-neutral JDBC API. A database-specific JDBC driver implements communication with the database system.

Figure 5.10. Getting input data from a JDBC ResultSet corresponding to result of a SQL request within the JdbcCursorItemReader class

Figure 5.11. Using JDBC batch-based fetching to provide input data to an ItemReader by pages with fixed size

Figure 5.12. Reusing methods of existing entities and services to get data to provide as input for batch processes

Chapter 6. Writing data

Figure 6.1. A chunk-oriented tasklet implements the chunk-oriented processing pattern in Spring Batch. This chapter focuses on the writing phase.

Figure 6.2. Spring Batch supports writing to multiple file formats thanks to the various item writer implementations it provides.

Figure 6.3. Spring Batch extracts and aggregates fields for each item when writing to a flat file. The framework also handles writing a header and footer to the file (both are optional).

Figure 6.4. The interfaces and classes involved in writing items to a flat file with the FlatFileItemReader. The FlatFileItemWriter also provides optional callbacks for the header and the footer of a flat file. The LineAggregator transforms an item into a string. The ExtractorLineAggregator implementation uses a FieldExtractor to “split” an item into an array of objects and calls business logic to render the array as a string.

Figure 6.5. Mapping between bean property, format expression, and output line

Figure 6.6. The domain model of the batch application uses several classes. The flat file item writer can use a custom LineAggregator to delegate aggregation to dedicated LineAggregators (one for each product subclass).

Figure 6.7. The multiresource item writer rolls over files after writing a given number of items. This creates multiple small files instead a single large file.

Figure 6.8. Sending a batch of SQL statements to a relational database is more efficient than sending one query at a time.

Figure 6.9. An application puts messages on a JMS queue with a JMS item writer. Applications often use JMS to communicate with each other in a decoupled and asynchronous way.

Figure 6.10. Sending an email message for each customer in an input file. Because Spring Batch’s email item writer only takes care of sending email, it’s common practice to use an item processor to convert read items into ready-to-be-sent SimpleMailMessage or MimeMessage objects.

Figure 6.11. A composite item writer delegates writing to a list of item writers. Use this pattern to send items to several targets, like a database and a file.

Figure 6.12. Routing a Product item to a specific writer

Chapter 7. Processing data

Figure 7.1. Spring Batch allows insertion of an optional processing phase between the reading and writing phases of a chunk-oriented step. The processing phase usually contains some business logic implemented as an item processor.

Figure 7.2. In the processing phase of a chunk-oriented step, you can choose to only change the state of read items. In this case, the item reader, processor, and writer all use the same type of object (illustrated by the small squares).

Figure 7.3. The item processor of a chunk-oriented step can produce objects of a different type (represented by circles) than the read items (squares). The item writer then receives and handles these new objects.

Figure 7.4. The driving query pattern implemented in Spring Batch. The item reader executes the driving query. The item processor receives the IDs and loads the objects. The item writer then receives these objects to, for example, write a file or update the database or an index.

Figure 7.5. An item processor filters read items. It implements logic to decide whether to send a read item to the item writer.

Figure 7.6. The filtering item processor discards products that are already in the database. This item writer only inserts new records and doesn’t interfere with the online application. A different job updates existing records when there’s less traffic in the online store.

Figure 7.7. The relationships between Spring Batch, Spring, and your validation logic. Spring Batch provides a level of abstraction with its Validator interface and an implementation (SpringValidator) that uses the Spring Validator interface. The ValangValidator implementation, from Spring Modules, depends on the Spring Validator interface. Both Validator interfaces are potential extension points for your own implementations.

Figure 7.8. Using a composite item processor allows item processors to be chained in order to apply a succession of business rules, transformations, or validations.

Figure 7.9. Applying the composite item processor pattern to the import products job. The first delegate item processor converts partner product objects into online store product objects. The second delegate item processor maps partner IDs with ACME IDs. You reuse and combine item processors without any modification.

Chapter 8. Implementing bulletproof jobs

Figure 8.1. The include element specifies an exception class and all its subclasses. If you want to exclude part of the hierarchy, use the exclude element. The exclude element also works transitively, as it excludes a class and its subclasses.

Figure 8.2. When skip is on, Spring Batch asks a skip policy whether it should skip an exception thrown by an item reader, processor, or writer. The skip policy’s decision can depend on the type of the exception and on the number of skipped items so far in the step.

Figure 8.3. When a writer throws a skippable exception, Spring Batch can’t know which item triggered the exception. Spring Batch then rolls back the transaction and processes the chunk item by item. Note that Spring Batch doesn’t read the items again, by default, because it maintains a chunk-scoped cache.

Figure 8.4. Spring Batch lets you register skip listeners. Whenever a chunk-oriented step throws a skippable exception, Spring Batch calls the listener accordingly. A listener can then log the skipped item for later processing.

Figure 8.5. Spring Batch configured to retry exceptions: the include tag includes an exception class and all its subclasses. By using the exclude tag, you specify a part of the hierarchy that Spring Batch shouldn’t retry. Here, Spring Batch retries any transient exception except pessimistic locking exceptions.

Figure 8.6. Spring Batch retries only for exceptions thrown during item processing or item writing. Retry triggers a rollback, so retrying is costly: don’t abuse it! Note that Spring Batch doesn’t read the items again, by default, because it maintains a chunk-scoped cache.

Figure 8.7. If a job fails in the middle of processing, Spring Batch can restart it exactly where it left off.

Figure 8.8. Restart is possible thanks to batch metadata that Spring Batch maintains during job executions.

Figure 8.9. Spring Batch lets you choose if it should re-execute already completed steps on restart. Spring Batch doesn’t re-execute already completed steps by default.

Figure 8.10. A chunk-oriented step can restart exactly where it left off. The figure shows an item reader that restarts on the line where the previous execution failed (it assumes the line has been corrected). To do so, the item reader uses batch metadata to store its state.

Chapter 9. Transaction management

Figure 9.1. Be careful when using Spring’s declarative transaction in a Spring Batch job. Depending on the transaction attributes, the Spring-managed transaction can participate (or not) with the Spring Batch–managed transaction.

Figure 9.2. The difference between nontransactional and transactional readers. By default, Spring Batch maintains a cache of read items for retries. You must disable this cache when the reader is transactional, so Spring Batch can read the items again in case of a rollback. A JMS item reader is an example of a transactional reader because reading a message from a JMS queue removes it from the queue. A database reader is a nontransactional reader, because reading rows from a database doesn’t modify the database.

Figure 9.3. A global transaction spanning multiple resources. The system must enforce ACID properties on all participating resources.

Figure 9.4. Local transactions between an application and a resource. The application directly communicates with the resource to demarcate transactions. Try to use local transactions as much as possible: they’re fast, simple, and reliable.

Figure 9.5. An application can use a JTA transaction manager to handle global transactions. The resources must provide XA drivers to communicate with the transaction manager using the XA protocol.

Figure 9.6. Use the shared resource transaction pattern when a common resource hosts the transactional resources. In this example, two Oracle database schemas exist in the same database instance. The first schema refers to the second schema’s tables using synonyms. This allows the application to use local transactions.

Figure 9.7. The best effort pattern can apply when reading from a JMS queue and writing to a database.

Figure 9.8. The best effort pattern. Spring automatically synchronizes the local JMS transaction commit with the commit of an ongoing transaction (the chunk transaction in the context of a Spring Batch job).

Figure 9.9. When the best effort pattern fails. The database commit works, but the JMS commit fails. JMS puts the message back on the queue, and the batch job processes it again. Even if such duplicate messages are rare, they can corrupt data because of repeated processing.

Figure 9.10. Detecting duplicate messages and filtering them out with an item processor in a chunk-oriented job. The item writer must track whether a processor processed each message. The best effort pattern combined with this filtering technique prevents a processor from processing duplicate messages.

Figure 9.11. When performing an idempotent operation on the reception of a message, there’s no need to detect duplicate messages. The best effort pattern combined with an idempotent operation is an acceptable solution.

Chapter 10. Controlling execution

Figure 10.1. Jobs with linear and nonlinear flows. Job A, on the left, has a simple linear flow. Job B, on the right, isn’t linear because there are multiple execution paths.

Figure 10.2. In this advanced version of the import products job, the flow of steps isn’t linear anymore and requires more complex features: job execution can end immediately after the first step, steps need to share data, and part of the configuration must be reusable by other jobs.

Figure 10.3. A nonlinear flow. If the read-write step fails, the execution shouldn’t end; instead, the job should generate a report. For all other cases, the execution proceeds directly to the cleanup step.

Figure 10.4. A nonlinear flow using a custom exit status. Custom exit statuses carry more semantics than standard exit statuses (COMPLETED, FAILED, and so on), which is helpful in making complex flow decisions.

Figure 10.5. A step execution listener can change the exit status of a step. The job can then use the exit status for the transition decision to the next step.

Figure 10.6. A job execution decider is registered after a step to modify the step’s exit status. The job then uses the exit status in its transition decision.

Figure 10.7. The verify step checks the integrity of the extracted files and extracts import metadata needed by the track import step. These steps need a way to communicate this information.

Figure 10.8. A job execution has its own execution context. Within a job execution, each step also has its own execution context. Spring Batch stores execution contexts across executions.

Figure 10.9. Sharing data through the job execution context. A first step writes data in the job execution context for a subsequent step to read.

Figure 10.10. The succession of calls needed to access the job execution context from a tasklet

Figure 10.11. A batch artifact such as a tasklet shouldn’t embed too much business logic. Instead, it should use a dedicated business component. Such delegation allows for better reusability of business logic and makes business components easier to test because they don’t depend on the batch environment.

Figure 10.12. Sharing data by writing into the step execution context and promoting data to the job execution context. The receiving step then has access to the data.

Figure 10.13. Using a Spring bean as a holder to share data. Spring injects the holder as a dependency into the batch artifacts that want to share data. An artifact then writes data in the holder, and the receiving artifact reads the data from the holder.

Figure 10.14. A flow of steps can be defined as a standalone entity so that other jobs can reuse it. This promotes reusability of code and configuration.

Figure 10.15. When a step ends, Spring Batch lets you choose if you want to complete, fail, or stop the job execution. In the case of the import products job, if the job didn’t download a ZIP archive, it makes sense to consider the job execution complete.

Chapter 11. Enterprise integration

Figure 11.1. The online store application uses enterprise integration techniques to import product data from other systems.

Figure 11.2. An information portal is a typical enterprise integration project. It gathers data from several applications and makes it available in one place to the end user.

Figure 11.3. Data replication in an enterprise integration project. In this example, the customer directory holds the reference data for anything related to customers. The billing and shipping applications need this information. The system replicates data from the customer directory to the shipping and billing data stores.

Figure 11.4. The online store application uses transfer file integration to synchronize its catalog with partner catalogs. The corresponding batch process is the basis of our enterprise integration scenario.

Figure 11.5. A client submits products to import over HTTP. The system copies the import data into a file. A triggering system—like a scheduler—can then launch a Spring Batch job to read the file and update the database accordingly. Clients update the store catalog more frequently, but the application controls the frequency.

Figure 11.6. Spring Integration enables messaging within Spring applications. It can be embedded in any Spring application and can integrate with external systems using built-in adapters for various technologies, such as HTTP, JMS, file systems, and RMI.

Figure 11.7. The Spring Integration quick-start uses Spring Integration messages to trigger Spring Batch jobs.

Figure 11.8. The job launching message handler receives job launch request messages and uses Spring Batch’s job launcher to effectively launch jobs. It retrieves job beans from the job registry—an infrastructure bean that Spring Batch provides.

Figure 11.9. The flow of messages represented with enterprise integration pattern icons. Job launch requests are sent and received by the service activator, which unwraps them from messages and calls the job launching message handler. The service activator retrieves the job executions and sends them on a dedicated channel, where Spring Batch outputs them to the console.

Figure 11.10. Job submissions come in as HTTP PUT requests. The Spring MVC controller uses a repository to record each submission in a database. The controller then sends submissions to the Spring Integration messaging infrastructure through a gateway.

Figure 11.11. In a web application, a DispatcherServlet (the heart of Spring MVC) has its own Spring application context, which can see beans from the root application context. The scope of the root application context is the entire web application and contains beans for data access, business services, and all the beans from frameworks like Spring Batch and Spring Integration.

Figure 11.12. The job submission enforces REST rules. Clients must send an HTTP PUT to import products on a specific URL, which identifies the import as a resource. The REST controller sends back appropriate status codes: 202 if the submission is accepted, 409 if the server already accepted the import.

Figure 11.13. The file-writing channel adapter receives messages and copies their payload to the file system. It delegates filename creation to a filename generator (an example of the strategy pattern).

Figure 11.14. Triggering the import job when an import file is created in the submission directory. The Spring Integration inbound file message source polls the submission directory and sends messages for each new file. A custom message handler converts the file messages into job launch messages. Our message-driven job launcher receives these messages to trigger Spring Batch jobs.

Figure 11.15. The conversion between a file message and a job launch request message. The latter then goes to the generic job launcher to trigger the import job.

Figure 11.16. The import job has two steps. The first step maps the import to the job instance, which gives you access to the import status. The second step reads the XML file and updates the database.

Figure 11.17. A client can find out the status of an import from the REST controller. The controller consults the repository, which uses system data and Spring Batch metadata to communicate the status of import jobs.

Chapter 12. Monitoring jobs

Figure 12.1. Monitoring a web application and checking its availability

Figure 12.2. Monitoring batch jobs using execution data from the job repository database

Figure 12.3. Database schema for the Spring Batch job repository

Figure 12.4. Job execution classes

Figure 12.5. Interactions between job and repository during batch execution

Figure 12.6. Contents of the BATCH_JOB_INSTANCE table

Figure 12.7. Contents of the BATCH_JOB_EXECUTION table

Figure 12.8. Contents of the BATCH_STEP_EXECUTION table

Figure 12.9. Overview of the Spring Batch Admin architecture

Figure 12.10. Navigating Spring Batch Admin

Figure 12.11. Job names registered

Figure 12.12. Job instances for a given job

Figure 12.13. Recent and current job executions

Figure 12.14. Details for a job execution

Figure 12.15. Step execution list for the job execution

Figure 12.16. Details of the step execution

Figure 12.17. Recent and current job executions containing a failed job execution

Figure 12.18. Details of a failed job execution

Figure 12.19. Details of a step execution failure include a stack trace in the exit message.

Figure 12.20. JMX architecture

Figure 12.21. Using JMX with Spring Batch

Figure 12.22. Monitoring batch jobs using JConsole

Figure 12.23. Viewing JobOperator operations (methods) in JConsole

Figure 12.24. Getting job instance identifiers for a job name

Figure 12.25. Getting job execution identifiers for a job instance

Figure 12.26. Displaying the summary of a job execution

Figure 12.27. Displaying summaries for all step executions of a job execution

Chapter 13. Scaling and parallel processing

Figure 13.1. Vertical scaling (scaling up) migrates an application to more powerful hardware.

Figure 13.2. Horizontal scaling splits application processing on different nodes and requires load balancing.

Figure 13.3. Local scaling in a single process executes batch job steps in parallel.

Figure 13.4. Remote scaling in more than one process executes batch job steps in parallel.

Figure 13.5. A step reading and writing using multiple threads

Figure 13.6. Implementation of the process indicator pattern in a step

Figure 13.7. The process indicator pattern for a step using an ItemProcessor

Figure 13.8. Executing steps in parallel using dedicated threads

Figure 13.9. Remote chunking with a master machine reading and dispatching data to slave machines for processing

Figure 13.10. Spring Integration–based implementation of remote chunking using a messaging gateway and a listener to communicate between master and slaves through channels

Figure 13.11. Partitioning splits input data processing into several step executions processed on the master or remote slave machines.

Figure 13.12. Partitioning SPI objects involved in partitioning and processing data for a partitioned step

Figure 13.13. Using dedicated threads to process data when importing product files with partitioning

Figure 13.14. Partitioning using Spring Integration: the master and slaves communicate using channels, a messaging gateway, and a listener.

Figure 13.15. Partitioning based on database column values

Chapter 14. Testing batch applications

Figure 14.1. High-level view of our batch application workflow. Our testing strategy applies to all the components shown in the figure.

Figure 14.2. Components unit tested by our examples. We unit test all components except the statistics step.

Figure 14.3. Components covered by integration tests

Figure 14.4. Components covered in our functional test examples

Appendix A. Setting up the development environment

Figure A.1. STS can import a Maven project. STS then configures the project by using the project POM.

Figure A.2. The blank Maven project imported in STS. STS automatically includes the dependencies specified in the POM.

Figure A.3. When creating a Spring configuration file with the wizard, STS lets you choose which XML namespaces you want to include in the file declaration. You can also change the namespaces once the wizard creates the file, on the Namespaces tab.

Figure A.4. The XML editor for Spring configuration files is arguably STS’s most useful tool for the Spring developer. It provides validation, code completion, and graphical visualization for Spring Batch.

Figure A.5. Visualizing a Spring Batch job inside STS, thanks to the Batch-Graph tab. You can also edit the job definition by dragging components from the left to the main editor area. You can edit each component by double-clicking it.

Appendix B. Managing Spring Batch Admin

Figure B.1. The home page of a Spring Batch Admin instance lists the services the application provides.