Skip to main content

Working with Delimited Files

A Delimited Text File is a method of representing a Table of Data in a text file using characters to indicate a Structure of Columns and Rows. Common types of Delimited Text Files include Comma Separated Values (CSV) and Tab Separated Values (TSV).

Delimited file support in Guzzle provides extensive features to specify file format details and many other properties which make it easier to work with our Data. This article outlines how to work with Delimited files for source and target in Ingestion activity.

Delimited Text File Properties in Guzzle#

PropertyDescriptionDefault Value
Character SetIt refers to the Set of Characters used to Read/Write test files. Allowed Values include: UTF-8, UTF-16 etc.UTF-8
Column DelimiterThe Characters used to separate columns in a file.,
Quote DelimiterThe single character to quote column values if it contains column delimiter or new line (or row delimiter)"
Escape CharacterThe single character to escape quotes inside a quoted value. Essentially if the “Quote Delimiter" is also part of column value the same can be escaped using this\
Trim WhitespaceYou may choose to Trim Whitespaces on any one end or both ends of the Characters. This will be applied to all the columns irrespective of their values or data type.
It includes four options like
1. none -> no trimming whitespace.
2. both -> remove whitespace on both side.
3.leading -> remove whitespace on front of the data.
4. trailing -> remove whitespace at the end of data.
None
Contains HeadersChoose whether to include headings for columns.Yes
Infer SchemaChoose whether to apply a schema.No
Configure processed pathThe Configure Processed Paths feature allows the user to specify the directory and Guzzle moves the Data into that directory. When creating a processed file path Guzzle creates 3 subfolders: processed, rejected and partial.
For more information click here.
NULL
Configure control file settingsThe Configure Control File feature cross check whether a file is valid or not. It compares the number of records in the original file and the control file extension. Guzzle provides the Configure Control File feature for all local file formats including Delimited, JSON, XML, Excel and Fixed Length Files.
For more information click here.
NULL
Partial LoadSpecify partial loading of files.False

The Interface for the Delimited File Format is#

Column Mapping in Delimited Files#

We can also add Column Mapping to specify how to map columns in the source file. This is applicable for files which are having headers or without header. The functionality is meant to achieve either or both of the following item:

  • Reduce the number of columns to be read

  • Map the columns to a new field name

All we need to do is add the Column Name and the Index we would like to Map the Column to. Example : In example "first_name" is indexed with 4th column which is "gender" and "age" is indexed with 1st column which is "id".

Before Column Mapping:#

After Column Mapping:#

In target : Properties on target is same as mentioned for source, but two more properties are added which is

  1. Generate Single file : this option is selected when you want to generate single file in target path.
  2. Preserve Hierarchy : this option is selected when you have to maintain same hierarchy as source file has.