Skip to main content

Working with Big Data

This article outlines how to work with Big Data file formats namely: Parquet, Avro and ORC in Guzzle’s Ingestion activity. Guzzle supports these file formats for source and target in Ingestion activity.

Big Data File Formats as a Source#

PropertyDescriptionDefault ValueRequired
Character SetIt refers to the Set of Characters used to Read/Write test files. Allowed Values include: UTF-8, UTF-16 etc.UTF-8Yes
File PatternThis is the file name pattern to find matching files in the data store. Refer to Working with Multiple Files for more details on defining the patter Example: customer/data/*.orcNoneYes
Configure processed pathThe Configure Processed Paths feature allows the user to specify the directory and Guzzle moves the Data into that directory. When creating a processed file path Guzzle creates 3 subfolders: processed, rejected and partial.
For more information click here.
NULLNo
Configure control file settingsThe Configure Control File feature cross check whether a file is valid or not. It compares the number of records in the original file and the control file extension. Guzzle provides the Configure Control File feature for all local file formats including Delimited, JSON, XML, Excel and Fixed Length Files.
For more information click here.
NULLNo
Partial LoadSpecify partial loading of files.FalseNo

Interface for Big Data format#

Big Data File Format as Target and Target section#

PropertyDescriptionDefault ValueRequired
Character SetIt refers to the Set of Characters used to Read/Write test files. Allowed Values include: UTF-8, UTF-16 etc.UTF-8Yes
PathFile path where user want to store dataNoneYes
CompressionThis is used to specify the compression codec used by the file when writing to Parquet, Orc and Avro Files respectively. When reading from Big Data files, Guzzle determines the compression code based on the file metadata. Supported types include Snappy, Brotli, Lzo etc.SnappyNo
Generate Single FileFor generating single file on given path.FalseNo
Preserve Hierarchythis option is selected when user have to maintain same hierarchy as source file has.FalseNo

The Interface for the Big Data Formats is#

info

As this files has binary type data we can see data in table form by clicking on "Sample Data" which is present on right side on the corner of UI.