Skip to main content

Reconcilation of data accross two tables

Data reconciliation is used during a data migration where the target data is compared against original source data.

Guzzle Support below datastores for reconciliations.

  1. JDBC
  2. Hive
  3. Delta

Guzzle provides below functionality and parameters for reconciliation of data across source and target resources.

PropertyDescriptionDefault ValueRequired
DatastoreYou can choose any of the available datastore from drop-down (as appropriate). If the connection is not available in drop-down then you have to create new.NoneYes
Table + FilterSpecify the table from where to read data. The table can contain schema and also database name. We can specify the Schema Name along with the Table Name. If it is not specified it is taken from the Datastore.

Also, when the Table option is selected, you can also specify the Filter. The filter can be used to select only a part of the table. We may basically enter a condition and only the part of the table that satisfies the condition will be seen in the Target Section.
SQLAlternatively Specify the SQL query which will be run as-is on the sourceNoneYes


The recon section will give you ability to provide comparison logic between source and target datastore

PropertyDescriptionDefault ValueRequired
Primary KeysSpecify primary keys for source and target endpoint based on that reference; Guzzle will show comparisons between two aggregated metrics.NoneYes
Grouping ColumnsSpecify columns to be used for grouping source data and target data for aggregating metrics to be compared between source and target for recon.NoneNo
Execute on sourceSpecify if you want to push down the source and target SQL queries along with filters to be applied on respective source and target endpoints for execution. Only aggregation and result comparison would be performed by Guzzle using Spark dataframe.FalseNo
Partial MatchSpecify If you want to show specific status with recon metrics.
Guzzle follow below partial match conditionsFalseNo

Partial Match#

Guzzle follow below Partial Match Conventions to show status of job

Partial MatchInvalid RecordsStatus
True= Total RecordsFAILED
True0 < && > Total RecordsWARNING
False> 0FAILED

Recon Metrics#

You can specify the source aggregation column and target aggregation column which will be compared, and recon results will be stored in the Guzzle repository table recon_summary and recon_detail.

PropertyDescriptionDefault ValueRequired
NameSpecify the name of the recon metricsNoneYes
Source AggregationSpecify source aggregation functionNoneYes
Target AggregationSpecify Target aggregation functionNoneYes
Detail Check EnabledSpecify if you want to bring detailed information on reconciliation records, if it’s true then guzzle will calculate results for the mismathced records and store them in the Guzzle repository table recon_detail.FalseNo