Skip to main content

Housekeeping

Guzzle’s Housekeeping module allows the user to implement granular level housekeeping on the target tables according to different data retention policies across various data storage layers within the data lake.

Housekeeping Properties#

PropertyDescriptionDefault ValueRequired
TableSpecify the table from where to read data. The table can contain schema or even database name. If you select Table as input, data flow fetches all the data from the table specified in the dataset.NoneYes
Housekeeping ColumnSpecify column name to be used for housekeeping the table. Mostly this is going to be the partition columns of the table.NoneYes
OperationSelect if you want to perform delete or move on the Housekeeping Table by selecting an option from the drop-down menu.DeleteYes
Configure Table DependencyThis property is used to manually specify source table dependency and to manually configure Apache Atlas lineage.NoneNo
Reference PointSpecify any of the available options in the drop-down list — Current Time, Max Value, Value as a reference point for applying the housekeeping rules.
Current Time: This will housekeep treating the current system timestamp as a reference point to apply the housekeeping rules defined under Retention Strategies.
Max Value: it will calculate the maximum value for the Housekeeping Column specified in the above section as a reference point.
Value: It allows the user to specify a custom value or pass a value through a runtime parameter as a reference point for housekeeping.
Current TimeYes
Retention Strategies