Skip to main content

Processing behaviour changes (Guzzle 2.4.0)

  • Before Guzzle v2.4.0, Processing module was using two steps process to insert data in target table
    • It truncates target table or partition based on truncate partition section configuration using TRUNCATE TABLE query
    • Insert source processed data in target table using INSERT INTO query
  • From Guzzle v2.4.0, Processing module use single atomic operation INSERT OVERWRITE to truncate and insert data in target table
  • Due to this change, below are some behaviour changes you will find in Guzzle v2.4.0
Feature descriptionBefore v2.4.0From v2.4.0
When partition column is present in source data and same column is configured in truncate partitionIt allows partition column in source data and respect source partition column valueJob will failed if partition column is found in source data and same partition column is configured in truncate partition section. To prevent this situation, user has to manually exclude partition column from source.
When subset of partition columns are configured in truncate partition section for hive technologyIt allows subset of partition columns in truncate partition sectionJob will failed if subset of partition columns are configured in truncate partition section. To prevent this situation, user has to configure either all partition columns or none partition columns. This applies only for hive technology. For delta technology, user can configured subset of partition columns in truncate partition section.
When truncate partition section is not configured for hive technology. It will result into different output.It truncates full target table and insert source data in target tableIt will not truncate full target table. It will truncate only those partitions for which source contains data. Other partitions data will remain intact.