Skip to main content

Working with JSON Files

JavaScript Object Notation (JSON) is a standard text-based format for representing structured data based on JavaScript object syntax. It is commonly used for transmitting data in web applications. JSON is a language-independent data format.

note

JSON file support in Guzzle provides extensive features to specify file format details and many other properties which make it easier to work with our Data. This article outlines how to work with JSON files for source and target in Ingestion activity.

JSON File Properties in Guzzle#

PropertyDescriptionDefault ValueAvailable in Source SectionAvailable in Target Section
Character SetIt refers to the Set of Characters used to Read/Write test files. Allowed Values include: UTF-8, UTF-16 etc.UTF-8✔X
Override JSON root pathIt can be used to specify an object or location where we want to take our data from. For e.g.- If we want to focus on one particular column of our data we can specify the Column name here.
Example :
{
"id" : 1,
"name" : {
"first_name" : "ABCD",
"last_name" : "XYZ"
},
"age" : 10
}
if we give name as a value than it will fetch data first_name and last_name and consider name as a root node.
None✔X
Multi-LineThis is applicable when a single data record spans across multiple lines. This applies to both JSON files which are in array form or separate JSON documents.False✔X
Configure processed pathThe Configure Processed Paths feature allows the user to specify the directory and Guzzle moves the Data into that directory. When creating a processed file path Guzzle creates 3 subfolders: processed, rejected and partial.
For more information click here.
NULL✔X
Configure control file settingsThe Configure Control File feature cross check whether a file is valid or not. It compares the number of records in the original file and the control file extension. Guzzle provides the Configure Control File feature for all local file formats including Delimited, JSON, XML, Excel and Fixed Length Files.
For more information click here.
NULL✔✔
Partial LoadSpecify partial loading of files.False✔X

The Interface for the JSON Format for both Source#

The Interface for the JSON Format for both Target#

Multiline:#

Each line must contain a separate, self-contained valid JSON object.

Example:

{"id": 0001, "name": "John", "address": “Broward County” }
{"id": 0002, "name": "Lynda", "address": “Velcore line”}
{"id": 0003, "name": "Roan", "address": “Tarell street”}

When Multi-line is true Guzzle expects Data in the following way:

  • regular multi-line JSON file or entire file is a singular JSON object / array

  • it should be one singular valid JSON object / array

note

every object in that array or object is separated by commas (because that single JSON object/array should valid)

for example:

[

{"id": 0001, "name": "Lynda", "address": “Broward County” }
{"id": 0002, "name": "Lynda", "address": “Velcore line”}
{"id": 0003, "name": "Roan", "address": “Tarell street”}

]

**Fig 1.1** Entire file is a singular Json Array

{

"Addresse1": {
"address":
[
{ "id": "1001", "street": "Broward County" },
{ "id": "1002", "street": "Velcore line" }
]
},
"Addresses2": {
"address":
[
{ "id": "1001", "street": "Broward County" },
{ "id": "1002", "street": "Velcore line" }
]
},
"Addresses3": {
"address":
[
{ "id": "1001", "street": "Broward County" },
{ "id": "1002", "street": "Velcore line" }
]
}

}

**Fig 1.2** File as Singular Json object

Illustrations#

JSON file containing with single object separated

{"id": 0001, "name": "John", "address": “Broward County”}

{"id": 0002, "name": "Lynda", "address": “Velcore line”}

{"id": 0003, "name": "Roan", "address": “Tarbell street”}

Multiline : true

Sample Output when multiline is true is shown below:

IDnameaddress
001JohnBroward County

Sample Output when multiline is false is shown below:

IDnameaddress
001JohnBroward County
002LyndaVelcore line
003RoanTarbell street

JSON file containing array with multi line#

Sample source data (Customer.json) :

[

{
"id": "0001",
"name": "John",
"addresses":
{
"address":
[
{ "id": "1001", "street": "Broward County" },
{ "id": "1002", "street": "Velcore line" }
]
}
},
{
"id": "0002",
"name": "Lynda",
"addresses":
{
"address":
[
{ "id": "1001", "street": "Broward County" },
{ "id": "1002", "street": "Velcore line" }
]
}
},
{
"id": "0003",
"name": "Roan",
"addresses":
{
"address":
[
{ "id": "1001", "street": "Broward County" },
{ "id": "1002", "street": "Velcore line" }
]
}
}

]

When multiline is set to false Guzzle will treat the entire file like a single JSON object.

Job Config (json_with_multiline_false):

version: 1 job: type: ingestion source: endpoint: local files properties: source_file_pattern: json/customer1.json format: json charset: UTF-8 partial_load: false format_properties: clean_column_name: false multiline: false

Sample Output when multiline is false is shown below:

IDnameaddresses
0001John[[1001, Broward County],[1002, Velcore line]]

When multiline is set to true Guzzle will treat the entire file like a single JSON object.

Job Config (json_with_multiline_true):

version: 1

job:

type: ingestion

source:

endpoint: local files

properties:

source_file_pattern: json/customer1.json
format: json
charset: UTF-8
partial_load: false
format_properties:
clean_column_name: false
multiline: true

Sample Output when multiline is True is shown below:

IDnameaddresses
0001John[[1001, Broward County], [1002, Velcore line]]
0002Lynda[[1001, Broward County], [1002, Velcore line]]
0003Roan[[1001, Broward County], [1002, Velcore line]]
note

Every object in the array is separated by a comma

Column Mapping in JSON Files#

We can also add Column Mapping to specify how to map columns in the source file. This is applicable for files which are having headers or without header. Column Mapping in JSON helps in mapping the columns to a new Column Name as shown here:

Here we have mapped the columns to a new Column Name. name is mapped to first_name and created_time is mapped to time.#

x

{"id": 0001, "name": "John", "address": “Broward County”}

{"id": 0002, "name": "Lynda", "address": “Velcore line”}

{"id": 0003, "name": "Roan", "address": “Tarell street”}

Config:

version: 1

job:

type: ingestion

source:

endpoint: json_files

properties:

source_file_pattern: customer.json
format: json
charset: UTF-8
partial_load: false
format_properties:
clean_column_name: false
multiline: true

column_mapping:

name: first_name
address: address_line1
id: customer_id

Sample output:

customer_idfirst_nameaddress_line1
0001JohnBroward County
0002LyndaVelcore line
0003RoanTarrel street

If file has multiline (Customer.json)#

column_mapping:

addresses.address: address_line1

Sample Output

IDnameaddress_line1addresses
0001JohnBroward County[[1001, Broward County], [1002, Velcore line]]
0002LyndaVelcore line[[1001, Broward County], [1002, Velcore line]]
0003RoanTarrel street[[1001, Broward County], [1002, Velcore line]]