Datastore Overview
Datastore in Guzzle represents either on premise or cloud data service which can either act as source or sink (target) for Guzzle activities. A datastore can either be cloud files services like S3, Azure Blob, DBFS or databases like Azure SQL, MySQL or apps which are accessible via Rest API.
Additionally Guzzle supports datastore to run external activities like procedure, Azure Data Factory pipeline or notebook.
#
Supported DatastoresGuzzle broadly supports three types of datastores: File, Database and API. Below matrix provides supported Datastores โ across different Computes
#
Datastore and Compute MatrixLocal | Azure | AWS | ||||||
---|---|---|---|---|---|---|---|---|
Category | Connector | Local Spark | Azure Databricks | Apache Synapse Analytics | AWS Glue | AWS Databricks | AWS EMR (EC2) | AWS EMR Serverles |
File | DBFS | x | โ | x | x | โ | x | x |
ADLS Gen2 | x | โ | โ | x | โ | x | x | |
Server file system | โ | โ | x | x | โ | x | x | |
HDFS | x | x | x | |||||
AWS S3 | x | โ | โ | โ | โ | โ | โ | |
Database | Delta | x | โ | โ | โ | โ | x | x |
Hive | โ | โ | โ | โ | โ | โ | โ | |
Azure SQL | x | โ | โ | x | โ | x | x | |
Azure Synapse Connector for Databricks | x | โ | x | x | x | x | x | |
Azure Synapse Analytics Native | x | x | โ | x | x | x | x | |
JDBC | โ | โ | โ | โ | โ | โ | โ | |
Snowflake | x | โ | x | x | โ | x | x | |
API | Rest API | โ | โ | โ | โ | โ | โ | โ |
Others | Azure Data Factory | N/A | N/A | N/A | N/A | N/A | N/A | N/A |
Databricks | x | โ | x | x | โ | x | x |
Guzzle broadly supports three types of datastores: File, Database and API. Below matrix provides supported Datastores โ across different activities
#
Datastore and Activity MatrixCategory | Connector | Ingestion (Source/Traget) | Processing (Source/Traget) | Reconciliation (Source/Traget) | Constraint Checks | Housekeeping | External |
---|---|---|---|---|---|---|---|
File | DBFS | โ | x | x | x | x | x |
ADLS Gen2 | โ | x | x | x | x | x | |
Server file system | โ | x | x | x | x | x | |
HDFS | โ | x | x | x | x | x | |
AWS S3 | โ | x | x | x | x | x | |
Database | Delta | โ | โ | โ | โ | โ | x |
Hive | โ | โ | โ | โ | โ | x | |
Azure SQL | โ | โ | โ | โ | x | x | |
Azure Synapse Analytics | โ | โ | โ | โ | x | x | |
Azure Synapse Analytics Native | โ | โ | โ | โ | x | x | |
JDBC | โ | โ | โ | โ | x | โ | |
Snowflake | โ | โ | โ | โ | x | x | |
API | Rest API | โ | x | x | x | x | x |
Others | Azure Data Factory | x | x | x | x | x | โ |
Databricks | x | x | x | x | x | โ |
*In development phase
note
- The JDBC connector is a generic connector that lets you connect to any database that supports the JDBC interface.
- Rest API connector allows you to connect any API or cloud application which provides REST API interface
#
Support for External activityGuzzle supports running external procedures, notebook and pipelines on remote systems and data services. This is supported via External activity in Guzzle. The supported connectors for calling external activities are below:
Connector | Remark |
---|---|
Databricks | To call Databricks Notebook, Databricks Jar or Python task |
Azure Data Factory (ADF) | To call ADF pipeline or Azure Synapse* pipeline |
JDBC | To trigger stored procedure or run a JDBC datastore |
#
Supported File FormatDelimited format
Fixed length files
Text files (using Grok or Regex)
JSON format
XML format
Excel format
ORC format
Avro format
Parquet format