Skip to main content

Working with Azure Synapse Analytics Database

note

Only supported on Databricks compute

Azure Synapse Analytics is a cloud-based enterprise data warehouse platform that leverages massively parallel processing (MPP) to quickly run complex queries across petabytes of data.

This article outlines how to use Azure Synapse Analytics as source or target in Ingestion activity. Guzzle leverages Databrick’s Synapse Spark Connector detail of which can be found here which uses a common Azure Blob storage container to exchange data between two systems (Databricks Cluster and Azure Synapse). Ingestion Active lets you leverage this native connector offered by Databricks and allows you to specify different configurations that are supported by this connector.

Azure Synapse Analytics as a Source#

Note: In the Datastore section we can choose to read from Azure Synapse either as a Table or through SQL (in the form of a query).

PropertyDescriptionDefault ValueRequired
Table + FilterSpecify the table from where to read data. The table can contain schema or even database name. Example:
customerdb.[dbo].[customer]
[customer] (default schema name dbo will be assumed)
[dbo].[customer] (database name shall be as per Datastore config)
Also when the Table option is selected, you can also specify the Filter which needs to be included when querying data from synapse.
NoneYes.
SQLSpecify the SQL query which will be run as-is on the source
Clean Column NameTo specify whether to clean the column names of source data (both file headers and column names from table). All the characters other than alphanumeric will be converted to _.
Ex. 'col.1' will become 'col_1', 'col@1' will become 'col_1'
FalseNo
Configure Pre and Post SQLGuzzle supports Pre-SQL and Post-SQL for source and target and their execution in ingestion. It is mainly used for pre and post formatting of data in database.
For more information click here.
NULLNo
Configure watermarkA watermark represents tracking the last loaded value for one or more columns for a given source table or source SQL to enable loading data incrementally. Using watermark columns is one of the mechanisms used for changed data capture (CDC).
For more information click here.
NULLNo
Configure columns restrictionMention column names and choose from the options to exclude and include columns.
For more information click here.
NULLNo
Configure table dependencyNULLNo

Interface for Azure Synapse Analytics database as source is#

Azure Synapse Analytics as a Target or Reject#

PropertyDescriptionDefault ValueRequired
TableThis is used to specify the Target Table we would like to create based on our Source.NoneYes
OperationProvides two options that determines whether the source content or records are expected to be appended or overwritten in order to disallow existing data-
Append: To atomically add new data to an existing target table.
Overwrite: To atomically replace all the data in a target table with source table.
For more information click here.
AppendYes
Max String LengthMax String Length to set the string length for all VARCHAR/NVARCHAR type columns when writing data into Azure Synapse.
Any data which is greater than this will be clipped4000Yes
Write SemanticsThere are two possible values that can be selected :
Polybase:This option shall use Polybase feature of Azure Synapse to load or unload data to from common Blob storage.
Copy:The Copy statement offers a more convenient way of loading data into Azure Synapse without the need to create an external table, requires fewer permissions to load data, and provides an improved performance for high-throughput data ingestion into Azure Synapse.
PolybaseYes
Configure pre and post sqlGuzzle supports Pre-SQL and Post-SQL for source and target and their execution in ingestion. It is mainly used for pre and post formatting of data in database.
For more information click here.
NULLNo

Interface for Azure Synapse Analytics database as targte is#