Skip to main content

Datastore Overview

Datastore in Guzzle represents either on premise or cloud data service which can either act as source or sink (target) for Guzzle activities. A datastore can either be cloud files services like S3, Azure Blob, DBFS or databases like Azure SQL, MySQL or apps which are accessible via Rest API.

Additionally Guzzle supports datastore to run external activities like procedure, Azure Data Factory pipeline or notebook.

Supported Datastores#

Guzzle broadly supports three types of datastores: File, Database and API. Below matrix provides supported Datastores โ€” across different Computes

Datastore and Compute Matrix#

LocalAzureAWS
CategoryConnectorLocal SparkAzure DatabricksApache Synapse AnalyticsAWS GlueAWS DatabricksAWS EMR (EC2)AWS EMR Serverles
FileDBFSxโœ“xxโœ“xx
ADLS Gen2xโœ“โœ“xโœ“xx
Server file systemโœ“โœ“xxโœ“xx
HDFSxxx
AWS S3xโœ“โœ“โœ“โœ“โœ“โœ“
DatabaseDeltaxโœ“โœ“โœ“โœ“xx
Hiveโœ“โœ“โœ“โœ“โœ“โœ“โœ“
Azure SQLxโœ“โœ“xโœ“xx
Azure Synapse Connector for Databricksxโœ“xxxxx
Azure Synapse Analytics Nativexxโœ“xxxx
JDBCโœ“โœ“โœ“โœ“โœ“โœ“โœ“
Snowflakexโœ“xxโœ“xx
APIRest APIโœ“โœ“โœ“โœ“โœ“โœ“โœ“
OthersAzure Data FactoryN/AN/AN/AN/AN/AN/AN/A
Databricksxโœ“xxโœ“xx

Guzzle broadly supports three types of datastores: File, Database and API. Below matrix provides supported Datastores โ€” across different activities

Datastore and Activity Matrix#

CategoryConnectorIngestion
(Source/Traget)
Processing
(Source/Traget)
Reconciliation
(Source/Traget)
Constraint ChecksHousekeepingExternal
FileDBFSโœ“xxxxx
ADLS Gen2โœ“xxxxx
Server file systemโœ“xxxxx
HDFSโœ“xxxxx
AWS S3โœ“xxxxx
DatabaseDeltaโœ“โœ“โœ“โœ“โœ“x
Hiveโœ“โœ“โœ“โœ“โœ“x
Azure SQLโœ“โœ“โœ“โœ“xx
Azure Synapse Analyticsโœ“โœ“โœ“โœ“xx
Azure Synapse Analytics Nativeโœ“โœ“โœ“โœ“xx
JDBCโœ“โœ“โœ“โœ“xโœ“
Snowflakeโœ“โœ“โœ“โœ“xx
APIRest APIโœ“xxxxx
OthersAzure Data Factoryxxxxxโœ“
Databricksxxxxxโœ“

*In development phase

note
  1. The JDBC connector is a generic connector that lets you connect to any database that supports the JDBC interface.
  2. Rest API connector allows you to connect any API or cloud application which provides REST API interface

Support for External activity#

Guzzle supports running external procedures, notebook and pipelines on remote systems and data services. This is supported via External activity in Guzzle. The supported connectors for calling external activities are below:

ConnectorRemark
DatabricksTo call Databricks Notebook, Databricks Jar or Python task
Azure Data Factory (ADF)To call ADF pipeline or Azure Synapse* pipeline
JDBCTo trigger stored procedure or run a JDBC datastore

Supported File Format#

  • Delimited format

  • Fixed length files

  • Text files (using Grok or Regex)

  • JSON format

  • XML format

  • Excel format

  • ORC format

  • Avro format

  • Parquet format