How overwrite and append operation works in Guzzle
Overwrite and append operations are supported in the below technologies in the target section:
- Hive
- Delta
- Azure SQL
- Azure Synapse Analytics
#
Append operation- Append mode is used to atomically add new data to an existing target table.
#
ExampleBelow is the scenario that if the user wants to perform ingestion activity, source and target are delta table and writer mode is append. And the data in both source and target tables before job execution are like this:
Records in source table:
- Records in target table: While executing the job, Guzzle will add new data to the existing target table.
- Records of target table after job run successfully:
#
Overwrite operation- Overwrite operation works differently for the table with partition columns and without partition columns.
#
Overwrite operation if target table do not have partition columns:- Overwrite mode to atomically replace all the data in a target table with the source data.
#
ExampleBelow is the scenario that if the user wants to perform ingestion activity, source and target are hive table and writer mode is overwrite. And the data in both source and target tables before job execution are like this:
Records in source table:
- Records in target table:
- While executing the job, Guzzle will replace all target records with source data.
- Records of target table after job run successfully:
#
Overwrite operation if target table have partition columns:- Overwrite mode is used to replace the existing data from the target table with partitioned source records. The overwritten data can be specified by an arbitrary expression.
#
ExampleBelow is the scenario that if the user wants to perform ingestion activity, source and target are hive table and writer mode is overwrite. And the data in both source and target tables before job execution are like this:
Records in source table:
- Records in the target table with location and system column as partition columns:
While executing the job, Guzzle will replace target data with source data that matches an arbitrary expression, in our example expression is ((location=sg AND system=default) OR (location=in AND system=demo)) and from target table data matches this expression will be overwritten by source data. So, from the target table record with id 4 is deleted. Guzzle's overwrite works same as sparks overwrite. For more information on how spark overwrites works follow this link.
Records of target table after job run successfully: