Skip to main content

Parameter

  • This article helps you understand the Guzzle parameters and how to use them to control your Guzzle Activity behavior.
  • Parameters can be used to control the behavior of a batch/pipeline/activities, such as by passing the dynamic table name and file name or overriding spark configurations.
  • Parameters are used to make pipelines or activities more flexible and configurable.
  • They allow you to define values that can be passed into the pipeline or activities at runtime, enabling dynamic behavior.
  • Parameters can be used to customize source and destination connections, specify file paths, set data transformation rules, and more.
  • By utilizing these parameters effectively, you can create reusable pipelines that can be easily configured for different scenarios, making your data integration processes more flexible and manageable.

Guzzle Internal parameters#

  • Guzzle internal parameters are created mainly for changing behavior of Guzzle job. We can modify internal parameters in Runtime dialog box.
  • Using internal parameters we can override the behavior of Activity, Pipeline and Batch.
ParameterDefault ValueActivityPipelineBatchDescription
batch_id-1YesNoNoThis parameter is used to override batch parameter in the job.
guzzle.batchpipeline.threads4YesYesYesWhen guzzle reads file from the source. Guzzle will create different treads to process the file. For example, Lets say we have 12 file and then using this parameter guzzle will create 3 thread of 4 file in process all files in parallel.
guzzle.ingestion.load_typeIncrementalYesYesYesGuzzle provides watermark support to perform incremental data processing in ingestion module. We can change that behavior using this parameter. For example, if we don't want to perform incremental load then we can select full load from dropdown
hive.storage_formatORCYesYesYesGuzzle support auto create table support in hive datastore, when guzzle internally create hive table by default it uses ORC. Using this parameter we can change data format in hive.
job_instance_idGenerated by guzzleYesNoNoWe can override job instance id using this parameter.
stage_id-1YesNoNoGuzzle allow us to override stage_id value of the job
guzzle.job_group.partialFalseNoYesNoA Pipeline can also be configured to Partial load to allow pipeline execution to continue further even if any jobs within pipelines called in the Pipeline fails.
guzzle.job_group.resumeFalseNoYesYesThis feature allows to resume the job group or pipeline from where it has failed
guzzle.stage.partialFalseNoYesYesA Batch can also be configured to Partial load to allow batch execution to continue further even if any jobs in Batch fails.
guzzle.stage.resumeFalseNoYesYesThis feature allows to resume the Batch from where it has failed

Parameter precedence#

note
  • Suppose We have define spark parameter while running the batch, guzzle will pass spark parameter to all pipelines and activities which are associated with batch.
  • Support we have define spark configurations in individual activities in pipeline and while running the batch we can also passing the spark conf. Guzzle will take precedence of spark configs which are passed in activities.
  • In REST datastore, guzzle allows to pass request parameter. Guzzle don't allow us to pass same parameter twice.

Additional parameters Order of Precedence#

Pipeline
  • Additional Parameter passed in separate activities in pipeline takes highest precedence.

Runtime
  • If parameters are not passed in activity level in pipeline then guzzle will take precedence of parameter which are passed while running the pipeline.
  • This is parameter will be considered in all activities inside pipeline.

Environment Parameter
  • If we did not pass parameter in pipeline and pipeline runtime level, Guzzle will take precedence from environment Parameter. You can define environment variables in Admin => My Profile => Environment

Order of Precedence #

Parameter TypeActivity inside pipeline (Highest Precedence)Runtime Parameterenvironment Parameter
Additional Parameters (User Provided Params)- Additional Parameter passed in separate activities in pipeline takes highest precedence.
NA- If parameters are not passed in activity level in pipeline then guzzle will take precedence of parameter which are passed while running the pipeline.
- This is parameter will be considered in all activities inside pipeline.

Business Date precedence#

  • While running the activity you can change business date via Guzzle UI(Guzzle provides date picker) and also you can change it by using additional parameter using key-pair. business_date priority in run activity dialog is
    • 1) Override job parameter(parameter extracted from activity)
    • 2) Business date UI - selection option
    • 3) Additional parameters - key value param

Context Column precedence#

  • In run stage dialog when user pass batch name and additional context column value as key value param -> Guzzle will override it and pass the actual batch and context column provided in UI.

Click here to check more info about the Guzzle parameters.