Spring Batch Basics
Spring Batch Basics
when you build an application and when you run that application, how exactly it works behind the
scene. Once we understand that, we can write this code.
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
@SpringBootApplication
public class DemoApplication {
}
Whenever Java JDK is installed, it comes with JRE. Which is JVM+ Libraries.
Java Variables, Data types & Literals:
Java is statically typed and also a strongly typed language because, in Java, each type of data (such as
integer, character, hexadecimal, packed decimal, and so forth) is predefined as part of the
programming language and all constants or variables defined for a given program must be described
with one of the Java data types.
Data types in Java are of different sizes and values that can be stored in the variable that is made as
per convenience and circumstances to cover up all test cases. Java has two categories in which data
types are segregated
1. Primitive Data Type: such as boolean, char, int, short, byte, long, float, and double. The
Boolean with uppercase B is a wrapper class for the primitive data type boolean in Java.
2. Non-Primitive Data Type or Object Data type: such as String, Array, etc.
3. Now, let us explore different types of primitive and non-primitive data types.
4.
Example
Type Description Default Size Literals Range of values
twos-
complement 0 8 bits (none) -128 to 127
byte integer
twos-
complement 0 16 bits (none) -32,768 to 32,767
short integer
-2,147,483,648
twos-
complement 0 32 bits -2,-1,0,1,2 to
intger
int 2,147,483,647
-
twos- 9,223,372,036,854,775,808
-2L,-
complement 0 64 bits
1L,0L,1L,2L to
integer
long 9,223,372,036,854,775,807
1.23e100f , -
IEEE 754 1.23e-
0.0 32 bits upto 7 decimal digits
floating point 100f , .3f ,3.14
float F
300d , 1e1d
The boolean data type represents a logical value that can be either true or false. Conceptually, it
represents a single bit of information, but the actual size used by the virtual machine is
implementation-dependent and typically at least one byte (eight bits) in practice. Values of the
boolean type are not implicitly or explicitly converted to any other type using casts. However,
programmers can write conversion code if needed.
Syntax:
boolean booleanVar;
The byte data type is an 8-bit signed two’s complement integer. The byte data type is useful for
saving memory in large arrays.
Syntax:
byte byteVar;
The short data type is a 16-bit signed two’s complement integer. Similar to byte, a short is used
when memory savings matter, especially in large arrays where space is constrained.
Syntax:
short shortVar;
Syntax:
int intVar;
The long data type is a 64-bit signed two’s complement integer. It is used when an int is not large
enough to hold a value, offering a much broader range.
Syntax:
long longVar;
Remember: In Java SE 8 and later, you can use the long data type to represent an unsigned 64-bit
long, which has a minimum value of 0 and a maximum value of 2 64 -1. The Long class also contains
methods like comparing Unsigned, divide Unsigned, etc to support arithmetic operations for
unsigned long.
The float data type is a single-precision 32-bit IEEE 754 floating-point. Use a float (instead of double)
if you need to save memory in large arrays of floating-point numbers. The size of the float data type
is 4 bytes (32 bits).
Syntax:
float floatVar;
The double data type is a double-precision 64-bit IEEE 754 floating-point. For decimal values, this
data type is generally the default choice. The size of the double data type is 8 bytes or 64 bits.
Syntax:
double doubleVar;
Note: Both float and double data types were designed especially for scientific calculations, where
approximation errors are acceptable. If accuracy is the most prior concern then, it is recommended
not to use these data types and use BigDecimal class instead.
The char data type is a single 16-bit Unicode character with the size of 2 bytes (16 bits).
Syntax:
char charVar;
Unlike languages such as C or C++ that use the ASCII character set, Java uses the Unicode character
set to support internationalization. Unicode requires more than 8 bits to represent a wide range of
characters from different languages, including Latin, Greek, Cyrillic, Chinese, Arabic, and more. As a
result, Java uses 2 bytes to store a char, ensuring it can represent any Unicode character.
The Non-Primitive (Reference) Data Types will contain a memory address of variable values because
the reference types won’t store the variable value directly in memory. They are strings, objects,
arrays, etc.
1. Strings
Strings are defined as an array of characters. The difference between a character array and a string
in Java is, that the string is designed to hold a sequence of characters in a single variable whereas, a
character array is a collection of separate char-type entities. Unlike C/C++, Java strings are not
terminated with a null character.
Example:
2. Class
A Class is a user-defined blueprint or prototype from which objects are created. It represents the set
of properties or methods that are common to all objects of one type. In general, class declarations
can include these components, in order:
1. Modifiers : A class can be public or has default access. Refer to access specifiers for classes
or interfaces in Java
2. Class name: The name should begin with an initial letter (capitalized by convention).
3. Superclass(if any): The name of the class’s parent (superclass), if any, preceded by the
keyword extends. A class can only extend (subclass) one parent.
3. Object
An Object is a basic unit of Object-Oriented Programming and represents real-life entities. A typical
Java program creates many objects, which as you know, interact by invoking methods. An object
consists of :
3. Identity : It gives a unique name to an object and enables one object to interact with other
objects.
4. Interface
Like a class, an interface can have methods and variables, but the methods declared in an interface
are by default abstract (only method signature, no body).
Interfaces specify what a class must do and not how. It is the blueprint of the class.
An Interface is about capabilities like a Player may be an interface and any class
implementing Player must be able to (or must implement) move(). So it specifies a set of
methods that the class has to implement.
If a class implements an interface and does not provide method bodies for all functions
specified in the interface, then the class must be declared abstract.
A Java library example is Comparator Interface . If a class implements this interface, then it
can be used to sort a collection.
5. Array
An Array is a group of like-typed variables that are referred to by a common name. Arrays in Java
work differently than they do in C/C++. The following are some important points about Java arrays.
Since arrays are objects in Java, we can find their length using member length. This is
different from C/C++ where we find length using size.
A Java array variable can also be declared like other variables with [] after the data type.
The variables in the array are ordered and each has an index beginning with 0.
Java array can also be used as a static field, a local variable, or a method parameter.
The size of an array must be specified by an int value and not long or short.
Strong Typing: Java enforces strict type checking at compile-time, reducing runtime errors.
Memory Efficiency: Choosing the right data type based on the range and precision needed
helps in efficient memory management.
Immutability of Strings: Strings in Java cannot be changed once created, ensuring safety in
multithreaded environments.
Array Length: The length of arrays in Java is fixed once declared, and it can be accessed
using the length attribute
This Spring Batch tutorial explains the programming model and the domain language of batch
applications in general and, in particular, shows some useful approaches to the design and
development of batch applications using the current Spring Batch 3.0.7 version.
By way of example, this article considers source code from a sample project that loads an XML-
formatted customer file, filters customers by various attributes, and outputs the filtered entries to a
text file. The source code for our Spring Batch example (which makes use of Lombok annotations) is
available here on GitHub and requires Java SE 8 and Maven.
It is important for any batch developer to be familiar and comfortable with the main concepts of
batch processing. The diagram below is a simplified version of the batch reference architecture that
has been proven through decades of implementations on many different platforms. It introduces the
key concepts and terms relevant to batch processing, as used by Spring Batch.
As shown in our batch processing example, a batch process is typically encapsulated by
a Job consisting of multiple Steps. Each Step typically has a single ItemReader, ItemProcessor,
and ItemWriter. A Job is executed by a JobLauncher, and metadata about configured and executed
jobs is stored in a JobRepository.
Each Job may be associated with multiple JobInstances, each of which is defined uniquely by its
particular JobParameters that are used to start a batch job. Each run of a JobInstance is referred to
as a JobExecution. Each JobExecution typically tracks what happened during a run, such as current
and exit statuses, start and end times, etc.
A Step is an independent, specific phase of a batch Job, such that every Job is composed of one or
more Steps. Similar to a Job, a Step has an individual StepExecution that represents a single attempt
to execute a Step. StepExecution stores the information about current and exit statuses, start and
end times, and so on, as well as references to its corresponding Step and JobExecution instances.
JobRepository is the mechanism in Spring Batch that makes all this persistence possible. It provides
CRUD operations for JobLauncher, Job, and Step instantiations. Once a Job is launched,
a JobExecution is obtained from the repository and, during the course of
execution, StepExecution and JobExecution instances are persisted to the repository.
The actual startup of the application happens in a class looking something like the following:
@EnableBatchProcessing
@SpringBootApplication
prepareTestData(1000);
SpringApplication.run(BatchApplication.class, args);
The @EnableBatchProcessing annotation enables Spring Batch features and provides a base
configuration for setting up batch jobs.
The @SpringBootApplication annotation comes from the Spring Boot project that provides
standalone, production-ready, Spring-based applications. It specifies a configuration class that
declares one or more Spring beans and also triggers auto-configuration and Spring’s component
scanning.
Our sample project has only one job that is configured by CustomerReportJobConfig with an
injected JobBuilderFactory and StepBuilderFactory. The minimal job configuration can be defined
in CustomerReportJobConfig as follows:
@Configuration
@Autowired
@Autowired
@Bean
return jobBuilders.get("customerReportJob")
.start(taskletStep())
.next(chunkStep())
.build();
@Bean
return stepBuilders.get("taskletStep")
.tasklet(tasklet())
.build();
@Bean
return RepeatStatus.FINISHED;
};
One approach, as shown in the above example, is tasklet-based. A Tasklet supports a simple
interface that has only one method, execute(), which is called repeatedly until it either
returns RepeatStatus.FINISHED or throws an exception to signal a failure. Each call to the Tasklet is
wrapped in a transaction.
Another approach, chunk-oriented processing, refers to reading the data sequentially and creating
“chunks” that will be written out within a transaction boundary. Each individual item is read in from
an ItemReader, handed to an ItemProcessor, and aggregated. Once the number of items read equals
the commit interval, the entire chunk is written out via the ItemWriter, and then the transaction is
committed. A chunk-oriented step can be configured as follows:
@Bean
return jobBuilders.get("customerReportJob")
.start(taskletStep())
.next(chunkStep())
.build();
}
@Bean
return stepBuilders.get("chunkStep")
.<Customer, Customer>chunk(20)
.reader(reader())
.processor(processor())
.writer(writer())
.build();
The chunk() method builds a step that processes items in chunks with the size provided, with each
chunk then being passed to the specified reader, processor, and writer. These methods are
discussed in more detail in the next sections of this article.
Custom Reader
For our Spring Batch sample application, in order to read a list of customers from an XML file, we
need to provide an implementation of the interface org.springframework.batch.item.ItemReader:
An ItemReader provides the data and is expected to be stateful. It is typically called multiple times
for each batch, with each call to read() returning the next value and finally returning null when all
input data has been exhausted.
Spring Batch provides some out-of-the-box implementations of ItemReader, which can be used for a
variety of purposes such as reading collections, files, integrating JMS and JDBC as well as multiple
sources, and so on.
In our sample application, the CustomerItemReader class delegates actual read() calls to a lazily
initialized instance of the IteratorItemReader class:
this.filename = filename;
@Override
if (delegate == null) {
return delegate.read();
@StepScope
@Bean
Custom Processors
ItemProcessors transform input items and introduce business logic in an item-oriented processing
scenario. They must provide an implementation of the
interface org.springframework.batch.item.ItemProcessor:
The method process() accepts one instance of the I class and may or may not return an instance of
the same type. Returning null indicates that the item should not continue to be processed. As usual,
Spring provides few standard processors, such as CompositeItemProcessor that passes the item
through a sequence of injected ItemProcessors and a ValidatingItemProcessor that validates input.
In the case of our sample application, processors are used to filter customers by the following
requirements:
A customer must be born in the current month (e.g., to flag for birthday specials, etc.)
A customer must have less than five completed transactions (e.g., to identify newer
customers)
@Override
return item;
return null;
super(
item -> {
throw new ValidationException("Customer has less than " + limit + " transactions");
);
setFilter(true);
}
This pair of processors is then encapsulated within a CompositeItemProcessor that implements the
delegate pattern:
@StepScope
@Bean
return processor;
Custom Writers
The write() method is responsible for making sure that any internal buffers are flushed. If a
transaction is active, it will also usually be necessary to discard the output on a subsequent rollback.
The resource to which the writer is sending data should normally be able to handle this itself. There
are standard implementations such
as CompositeItemWriter, JdbcBatchItemWriter, JmsItemWriter, JpaItemWriter, SimpleMailMessageI
temWriter, and others.
In our sample application, the list of filtered customers is written out as follows:
public CustomerItemWriter() {
OutputStream out;
try {
} catch (FileNotFoundException e) {
out = System.out;
@Override
writer.println(item.toString());
@PreDestroy
@Override
writer.close();
By default, Spring Batch executes all jobs it can find (i.e., that are configured as
in CustomerReportJobConfig) at startup. To change this behavior, disable job execution at startup by
adding the following property to application.properties:
spring.batch.job.enabled=false
@Scheduled(fixedRate = 5000)
customerReportJob(),
new JobParametersBuilder().toJobParameters()
);
There is a problem with the above example though. At run time, the job will succeed the first time
only. When it launches the second time (i.e. after five seconds), it will generate the following
messages in the logs (note that in previous versions of Spring Batch
a JobInstanceAlreadyCompleteException would have been thrown):
This happens because only unique JobInstances may be created and executed and Spring Batch has
no way of distinguishing between the first and second JobInstance.
There are two ways of avoiding this problem when you schedule a batch job.
One is to be sure to introduce one or more unique parameters (e.g., actual start time in
nanoseconds) to each job:
@Scheduled(fixedRate = 5000)
jobLauncher.run(
customerReportJob(),
);
Alternatively, you can launch the next job in a sequence of JobInstances determined by
the JobParametersIncrementer attached to the specified job
with SimpleJobOperator.startNextInstance():
@Autowired
@Autowired
private JobExplorer jobs;
@Scheduled(fixedRate = 5000)
if (lastInstances.isEmpty()) {
} else {
operator.startNextInstance(JOB_NAME);
Usually, to run unit tests in a Spring Boot application, the framework must load a
corresponding ApplicationContext. Two annotations are used for this purpose:
@RunWith(SpringRunner.class)
@ContextConfiguration(classes = {...})
@Configuration
@Bean
A typical test for a job and a step looks as follows (and can use any mocking frameworks as well):
@RunWith(SpringRunner.class)
@Autowired
private JobLauncherTestUtils testUtils;
@Autowired
@Test
Assert.assertNotNull(result);
Assert.assertEquals(BatchStatus.COMPLETED, result.getStatus());
@Test
Assert.assertEquals(BatchStatus.COMPLETED, testUtils.launchStep("taskletStep").getStatus());
Spring Batch introduces additional scopes for step and job contexts. Objects in these scopes use the
Spring container as an object factory, so there is only one instance of each such bean per execution
step or job. In addition, support is provided for late binding of references accessible from
the StepContext or JobContext. The components that are configured at runtime to be step- or job-
scoped are tricky to test as standalone components unless you have a way to set the context as if
they were in a step or job execution. That is the goal of
the org.springframework.batch.test.StepScopeTestExecutionListener and org.springframework.batc
h.test.StepScopeTestUtils components in Spring Batch, as well
as JobScopeTestExecutionListener and JobScopeTestUtils.
The TestExecutionListeners are declared at the class level, and its job is to create a step execution
context for each test method. For example:
@RunWith(SpringRunner.class)
@TestExecutionListeners({DependencyInjectionTestExecutionListener.class,
StepScopeTestExecutionListener.class})
return MetaDataInstanceFactory.createStepExecution();
@Test
customer.setId(1);
customer.setName("name");
customer.setBirthday(new GregorianCalendar());
Assert.assertNotNull(processor.process(customer));
There are two TestExecutionListeners. One is from the regular Spring Test framework and handles
dependency injection from the configured application context. The other is the Spring
Batch StepScopeTestExecutionListener that sets up step-scope context for dependency injection into
unit tests. A StepContext is created for the duration of a test method and made available to any
dependencies that are injected. The default behavior is just to create a StepExecution with fixed
properties. Alternatively, the StepContext can be provided by the test case as a factory method
returning the correct type.
Another approach is based on the StepScopeTestUtils utility class. This class is used to create and
manipulate StepScope in unit tests in a more flexible way without using dependency injection. For
example, reading the ID of the customer filtered by the processor above could be done as follows:
@Test
customer.setId(1);
customer.setName("name");
customer.setBirthday(new GregorianCalendar());
() -> processor.process(customer).getId()
);
Assert.assertEquals(1, id);
This article introduces some of the basics of design and development of Spring Batch applications.
However, there are many more advanced topics and capabilities—such as scaling, parallel
processing, listeners, and more—that are not addressed in this article. Hopefully, this article
provides a useful foundation for getting started.
Information on these more advanced topics can then be found in the official Spring Back
documentation for Spring Batch.
A “Step” is an independent, specific phase of a batch “Job”, such that every Job is composed of one
or more Steps.
“JobRepository” is the mechanism in Spring Batch that makes all this persistence possible. It
provides CRUD operations for JobLauncher, Job, and Step instantiations.
What are the two main approaches to building a Step in Spring Batch?
One approach is tasklet-based, where a Tasklet supports a simple interface with a single execute()
method. The other approach, chunk-oriented processing, refers to reading the data sequentially and
creating “chunks” that will be written out within a transaction boundary.