Working with Azure Synapse Analytics Database
note
Only supported on Databricks compute
Azure Synapse Analytics is a cloud-based enterprise data warehouse platform that leverages massively parallel processing (MPP) to quickly run complex queries across petabytes of data.
This article outlines how to use Azure Synapse Analytics as source or target in Ingestion activity. Guzzle leverages Databrick’s Synapse Spark Connector detail of which can be found here which uses a common Azure Blob storage container to exchange data between two systems (Databricks Cluster and Azure Synapse). Ingestion Active lets you leverage this native connector offered by Databricks and allows you to specify different configurations that are supported by this connector.
#
Azure Synapse Analytics as a SourceNote: In the Datastore section we can choose to read from Azure Synapse either as a Table or through SQL (in the form of a query).
Property | Description | Default Value | Required |
---|---|---|---|
Table + Filter | Specify the table from where to read data. The table can contain schema or even database name. Example: customerdb.[dbo].[customer] [customer] (default schema name dbo will be assumed) [dbo].[customer] (database name shall be as per Datastore config) Also when the Table option is selected, you can also specify the Filter which needs to be included when querying data from synapse. | None | Yes. |
SQL | Specify the SQL query which will be run as-is on the source | ||
Clean Column Name | To specify whether to clean the column names of source data (both file headers and column names from table). All the characters other than alphanumeric will be converted to _. Ex. 'col.1' will become 'col_1', 'col@1' will become 'col_1' | False | No |
Configure Pre and Post SQL | Guzzle supports Pre-SQL and Post-SQL for source and target and their execution in ingestion. It is mainly used for pre and post formatting of data in database. For more information click here. | NULL | No |
Configure watermark | A watermark represents tracking the last loaded value for one or more columns for a given source table or source SQL to enable loading data incrementally. Using watermark columns is one of the mechanisms used for changed data capture (CDC). For more information click here. | NULL | No |
Configure columns restriction | Mention column names and choose from the options to exclude and include columns. For more information click here. | NULL | No |
Configure table dependency | NULL | No |
#
Interface for Azure Synapse Analytics database as source is#
Azure Synapse Analytics as a Target or RejectProperty | Description | Default Value | Required |
---|---|---|---|
Table | This is used to specify the Target Table we would like to create based on our Source. | None | Yes |
Operation | Provides two options that determines whether the source content or records are expected to be appended or overwritten in order to disallow existing data- Append: To atomically add new data to an existing target table. Overwrite: To atomically replace all the data in a target table with source table. For more information click here. | Append | Yes |
Max String Length | Max String Length to set the string length for all VARCHAR/NVARCHAR type columns when writing data into Azure Synapse. | ||
Any data which is greater than this will be clipped | 4000 | Yes | |
Write Semantics | There are two possible values that can be selected : Polybase:This option shall use Polybase feature of Azure Synapse to load or unload data to from common Blob storage. Copy:The Copy statement offers a more convenient way of loading data into Azure Synapse without the need to create an external table, requires fewer permissions to load data, and provides an improved performance for high-throughput data ingestion into Azure Synapse. | Polybase | Yes |
Configure pre and post sql | Guzzle supports Pre-SQL and Post-SQL for source and target and their execution in ingestion. It is mainly used for pre and post formatting of data in database. For more information click here. | NULL | No |