Module 3: Working with Files
Overview
Talend Open Studio provides built-in components to read from and write to various file formats.
This module introduces the supported file types, how to work with them, and how to preview
and process their data using Talend components.
3.1 Supported File Formats
Talend supports a variety of file types. In this module, we focus on the most common formats:
Format Description
CSV (Delimited) Text file where fields are separated by a delimiter such as , or ;.
Excel Spreadsheet files (.xls, .xlsx) commonly used for structured tabular data.
JavaScript Object Notation format, used to store structured, hierarchical
JSON
data.
3.2 Reading Flat Files
🔹 tFileInputDelimited (for CSV/TSV)
Used to read delimited text files line by line.
Supports setting delimiters, headers, and row/column parsing.
Works best with structured, tabular data.
Key settings:
File name/Stream: Path to CSV file.
Field Separator: e.g., , for CSV or \t for TSV.
Header: Number of header rows to skip.
Schema: Defines structure of the data (column names and data types).
🔹 tFileInputExcel
Used to read .xls or .xlsx files.
Allows selection of sheet name, header rows, and data range.
Key settings:
File Name: Excel file path.
Sheet List: Sheet to read from.
Header: Row number containing column headers.
Limit/Start Row: Controls the row range to read.
🔹 tFileInputJSON
Reads JSON data and extracts fields using JSONPath expressions.
Supports nested structures and arrays.
Key settings:
File name/stream: JSON file path.
Loop JsonPath query: Specifies the node to iterate over (e.g., $.transactions[*]).
Mapping: Maps JSON fields to schema columns.
3.3 Writing Files
🔹 tFileOutputDelimited
Writes output in CSV or delimited format.
Can append or overwrite files.
Supports custom delimiter, header writing, and encoding.
Key settings:
File Name: Output file path.
Field Separator: , or other delimiter.
Row Separator: \n or \r\n.
Include Header: Write column names as header.
🔹 tFileOutputExcel
Exports data to .xls or .xlsx.
Sheet name, cell formatting, and write mode can be customized.
Key settings:
File Name: Excel file to write to.
Sheet Name: Target sheet name.
Append Mode: Whether to overwrite or append.
🔹 tFileOutputJSON
Converts and writes rows to JSON format.
Can structure JSON hierarchically using a root node and nested fields.
Key settings:
File Name: Output file path.
Encoding: Usually UTF-8.
Root Tag / Row Tag: Define structure of the resulting JSON.
3.4 File Schema Definition and Preview
Schemas define the structure of data being read or written:
Each component allows you to define column names, data types, and lengths.
Use Edit Schema button to define schema manually or retrieve from file.
You can preview sample data after setting the file path using the Preview button.
💡 Tip: Use schema propagation to share schema between components to avoid duplication.
3.5 Debugging with tLogRow
tLogRow is used to display row data in the console during job execution.
Modes:
Table: Structured view.
Basic: Simple text output.
Vertical: Field-by-field breakdown.
Use cases:
Verifying output of file readers.
Debugging transformations.
Monitoring intermediate steps in your job flow.
3.6 Iterating Files with tFileList
tFileList allows looping through a set of files in a directory.
Common usage:
Batch-processing multiple files (e.g., daily exports, log files).
Input to tFileInputDelimited or tFileInputJSON.
Key settings:
Directory: Folder containing files.
Include Subdirectories: Recursive scan.
File Mask: Wildcard pattern like *.csv.
Common pattern:
1. Use tFileList to iterate.
2. Use tFlowToIterate to pass filename.
3. Use ${tFileList.CURRENT_FILEPATH} in reader components.
Summary
By the end of this module, you will be able to:
Read and write CSV, Excel, and JSON files using appropriate Talend components.
Define schemas and preview file contents.
Use tLogRow to debug job flows.
Process multiple files using iteration with tFileList.
🔹 Exercise 1: Reading CSV File using tFileInputDelimited
Objective:
Learn how to read a delimited (CSV) file using tFileInputDelimited and preview its data using
tLogRow.
Steps:
1. Open your Talend project.
2. Create a new Job:
Job Name: ReadCSVFile
Purpose: Read and display CSV data
3. Drag the following components from the Palette:
o tFileInputDelimited
o tLogRow
4. Connect tFileInputDelimited to tLogRow using a Row → Main link.
5. Configure tFileInputDelimited:
o File Name: Browse to [Link]
o Field Separator: ,
o Header: 1 (skip header row)
o Click Edit Schema → Add columns:
id_produk (String)
nama_produk (String)
kategori (String)
harga (Integer)
o Click Preview to verify the data.
6. Configure tLogRow:
o Mode: Table
7. Run the Job and observe the output.
🔹 Exercise 2: Reading Excel File using tFileInputExcel
Objective:
Learn how to read structured tabular data from an Excel file (.xlsx) using tFileInputExcel.
Steps:
1. Create a new Job: ReadExcelFile
2. Add components:
o tFileInputExcel
o tLogRow
3. Link tFileInputExcel → tLogRow via Row → Main.
4. Configure tFileInputExcel:
o File Name: Browse to [Link]
o Sheet List: Sheet1
o Header: 1
o Start: 2
o Limit: 0 (no limit)
o Define Schema:
id_pelanggan (String)
nama (String)
email (String)
kota (String)
5. Set tLogRow to Table mode and Run the job.
🔹 Exercise 3: Reading JSON File using tFileInputJSON
Objective:
Understand how to extract structured data from a JSON file using JSONPath.
Steps:
1. Create a new Job: ReadJSONFile
2. Components:
o tFileInputJSON
o tLogRow
3. Link tFileInputJSON to tLogRow
4. Configure tFileInputJSON:
o File Name: Browse to [Link]
o Loop JsonPath query: $.transaksi[*]
o Mapping:
id_transaksi → $.id_transaksi
id_produk → $.id_produk
id_pelanggan → $.id_pelanggan
tanggal → $.tanggal
jumlah → $.jumlah
5. Click Edit Schema to match these fields.
6. Set tLogRow to Basic or Table mode.
7. Run the job and verify the output.
🔹 Exercise 4: Writing CSV using tFileOutputDelimited
Objective:
Write processed data to a CSV file using tFileOutputDelimited.
Steps:
1. Duplicate ReadCSVFile job → rename to WriteCSVFile
2. Replace tLogRow with tFileOutputDelimited
3. Configure tFileOutputDelimited:
o File Name: output/produk_export.csv
o Include Header: true
o Field Separator: ,
o Row Separator: \n
o Append: false
4. Run the job and check the exported file.
🔹 Exercise 5: Writing Excel using tFileOutputExcel
Objective:
Export data to an Excel file using tFileOutputExcel.
Steps:
1. Duplicate ReadExcelFile → Rename to WriteExcelFile
2. Replace tLogRow with tFileOutputExcel
3. Configure tFileOutputExcel:
o File Name: output/pelanggan_export.xlsx
o Sheet Name: DataPelanggan
o Append: false
4. Run and verify the output file.
🔹 Exercise 6: Writing JSON using tFileOutputJSON
Objective:
Export structured data to a JSON file using tFileOutputJSON.
Steps:
1. Duplicate ReadJSONFile → Rename to WriteJSONFile
2. Replace tLogRow with tFileOutputJSON
3. Configure tFileOutputJSON:
o File Name: output/transaksi_export.json
o Encoding: UTF-8
o Root Tag: transaksi
o Row Tag: item
4. Run the job and inspect the JSON structure.
🔹 Exercise 7: Iterating Over Multiple CSV Files using tFileList
Objective:
Use tFileList to loop through a folder of multiple CSV files and process them one-by-one.
Steps:
1. Create a new Job: ProcessMultipleCSV
2. Add components:
o tFileList
o tFileInputDelimited
o tLogRow
3. Link:
o tFileList → tFileInputDelimited (trigger: iterate)
o tFileInputDelimited → tLogRow (row → main)
4. Configure tFileList:
o Directory: data/csv/
o File Mask: *.csv
5. Configure tFileInputDelimited:
o File Name: ${tFileList.CURRENT_FILEPATH}
o Field Separator: ,
o Header: 1
o Define a sample schema (e.g., produk fields)
6. Run the job — it will process all CSV files in the folder.