Ingestion and Data Connectors
Overview
The ingestion section allows users to create data connectors and import tables from various sources.
Table of Contents
Creating Data Connectors
Watch this video for detailed instructions on creating connectors for AWS, GCP, Azure, and Cloudflare.
AWS Connector
-
Click on "AmazonS3"
-
Input configurations:
storage_url
aws_region
aws_access_key_id
aws_secret_access_key
-
Click "Create"
-
Save credentials after user creation
AWS Configuration Steps
Refer to this guide for detailed instructions on how to create access keys for a bucket on AWS.
GCP Connector
- Click on "GoogleCloudStorage"
- Input configurations:
storage_url
google_service_account_key_file
- Click "Create"
GCP Configuration Steps
Refer to this guide for detailed instructions on how to create access keys for a bucket on GCP.
- Create Service Account:
# Go to Google Cloud Console
# Navigate to IAM & Admin > Service Accounts
# Click "Create Service Account"
# Fill in Service Account details
# Add roles (Storage Admin or Storage Object Viewer)
- Generate Key File:
# In Service Account details page:
- Click "Keys" tab
- Select "Add Key" > "Create new key"
- Choose JSON format
- Download key file
{
"type": "service_account",
"project_id": "my-project-id",
"private_key_id": "8f2a84c22b0030bc4f7f0780c731679654f71",
"private_key": "-----BEGIN PRIVATE KEY-----\nMIIEvgIBADANBgkqhkiG9w0BAQEFAASCBKgwggSkAgEAAoIBAQCxIUfydAvs1sYe\naeZkQ+TKkkw8d8N4Zvq3BAwcPqEdaPnOiV+20b3Xgix2SE8KGdfJWmIOj+t8\nF/ABVKJXfp6gomnXeiPjVDJJDZRuYTWum+ZG/QDVIP14R9xYucx8VZ6shxEpbRz5\nzns+/Smx5kMjI+GpKARLa9nzqUt6u9uto9/jMpi33hxOjd3ZJIE/7b4U7JGB5Nc1\nrOXVHgi6sELdzTGS5cevDvSLCO5wcCkza8m11oUbp/ouxjC5pM2BpbBENRU/dOof\n8YTBVBAzhDeGJbFQL+7GRRTAqlk7/HWRbzxYyghc69ViLEKAFRQFog4dBWjiED1M\nPgRzQ4+/AgMBAAECggEADJNtVURx+E3DYTNpnSLI4q0CZqSmJy1Aja/prcRyk6rK\nq5jha42LUyKPFPb0Cy3VFsWA8ZXOMnR7ZqMYci+9GvMtdgAx/Y95btFyhIdHS5Lx\nScVd0xpxbbBa7qhDd7UxVH6LtnoceNF7Zi0HyRAWXsAXci5A/k6OB4DPx6H0bvE7\nL5ELVte9ieAZmea98nCFGI0o21yKJMcbxymV2hekKAXjiqR7p0NtUB61GGan\n/5EXoD0Y0dHpesmSaxrKEaXNrUF2yiyfR1CKOnKuk7h1PD+y2KMg0kBTidvmMB7m\nSL0sUQ6vh0+GpRwcfqJ4+mRKpZZujn6QEvXZr02v4QKBgQDnMDuI7wZajaJ7dQzn\nR5vIqDyc+UsE35eUYN0K8i0/TKFPf8jUV2Ajl5DBc+kd11CItu4m5lO97cFevFGe\n2cHvjk27l1FIVMJRWCAKPzd9j/9YGA4rnGTy1EkhBZGVUZFGUVNgWonK3nFXoU1m\nlWOila6tCMLbKMuUkjz+1z8/jwKBgQDEI9M+zpgFc4xNte1ElZYnY6yIidGOzlmA\nmW7v8Y4xfLidAc3PXb32ohReI7jrOJSEzMHnLho9eSAjzIR1cqnbx+Lb6fPO6tcM\nit42gZ8TI6eXe3cHXKU3Ziy1tw5rVc89kzddjGsDHrBNMTszkykvJdOFggRgAfDV\nwfxdaP6U0QKBgAl7cPWs2BXeuUtXAbB6v2j7fYDyuKD6ir0LPAW26SQvgG5CT3pm\nGwtarBVDK8yNiEATQLFXwReJKOU51B8vz0SEEawgCLVuxImRk77X2O7NeSuj0PD4\n+Sr8igNQtyfosIyxyTmqfPxVI1D0zLfoaK3Cdeei9FsI0VDGrrnFGlMBAoGBAKtl\n+xXph2NMJBFMp7jFV1+4dG8ksGGw5PnCGvXHCtEoAlQB3Y4Whwhdfpr9cHztBqw+\nGjwhR4Dsti9Sa3YO62xJ8m7mtM3e3mnxeFn9T7tz7uIrXEqspRwSR4PMIeeeJunS\nGhG/wUwKp1ntaaSaNuUikwMaKSSUzZaeXCBsvfvhAoGBAMg4d5ogRH2Wte3QCpv8\n0LWfqX36G8KEMWgpZ5Rgs/CvyThR5YwdAOPilncs47+v6GFWj8u+gWaWwf3sEH//\np8G0/FDrlw+H8/QdhyCkpVx3aVE0ppFrndrhu777w5lfLKdkTczaC0uicPRBpZdp\nHfG01J1sbMK2O98l+Q07wqeS\n-----END PRIVATE KEY-----\n",
"client_email": "zinkml-v1-demo@my-project.iam.gserviceaccount.com",
"client_id": "118360750539988880508",
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
"token_uri": "https://oauth2.googleapis.com/token",
"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
"client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/zinkml-v1-demo%40my-project.iam.gserviceaccount.com",
"universe_domain": "googleapis.com"
}
- Storage URL Format:
gs://[BUCKET_NAME]
Azure Connector
- Click on "MicrosoftAzure"
- Input configurations:
storage_url
azure_storage_account_name
azure_storage_sas_key
- Click "Create"
Azure Configuration Steps
- Get Storage Account URL:
storage_url = https://<storage-account-name>.blob.core.windows.net/<container-name>
- Generate SAS Token (Choose one method):
Method 1: Azure Portal
1. Go to Azure Portal
2. Navigate to Storage Account
3. Select "Shared access signature"
4. Configure:
- Select Blob service
- Set resource types
- Set permissions
- Set validity period
5. Generate SAS and connection string
Method 2: Azure CLI
az login
az storage container generate-sas \
--account-name <storage-account-name> \
--name <container-name> \
--permissions racwdl \
--expiry <YYYY-MM-DD>T<HH:MM:SS>Z \
--https-only \
--account-key <storage-account-key>
Cloudflare R2 Connector
- Click on "CloudflareR2"
- Input configurations:
storage_url
r2_access_key_id
r2_secret_access_key
- Click "Create"
Cloudflare Configuration Steps
- Create R2 Bucket:
- Login to Cloudflare Dashboard
- Navigate to R2 section
- Create new bucket
- Create API Token:
- Go to R2 section
- Select "Manage R2 API Tokens"
- Click "Create API Token"
- Configure permissions
- Create token
- Get Storage URL:
Format: https://<account-id>.r2.cloudflarestorage.com
Find Account ID:
- Dashboard > Profile > Account ID
Import Flows
Importing from Cloud Connectors
Watch this video for detailed instructions on importing tables from cloud through connectors.
-
Navigate to connectors table
-
Select connector and click "Start Table Import Flow"
-
Select Data Format:
- Apache Parquet
- CSV
- TSV
- JSON
- NDJSON
- Excel/OpenOffice
- Apache Arrow
- Apache Avro
- Apache Feather (v2)/IPC
-
Choose Partition Scheme:
- None (unpartitioned)
- Hive
- Custom
File selection will differ for different Data Formats and Partition Schemes. Refer to the video on how to create partitions for different scenarios.
-
Select files
-
Select the Dataset in which you want to put the table. More info on Datasets.
-
Complete import
Importing from URLs and Local Files
Watch this video for detailed instructions on importing tables from URLs and Local Files.
-
Click "HTTP URL" or "Local Files"
-
Select data format
-
Choose partition scheme
-
File selection
- For URLs, Paste the url containing the table in the box and click on ‘Add URLs’. Multiple URLs should be either comma (‘,’) or newline (‘\n’) separated.
- For local files:
- Drag and drop the file or use use file browser to select the file
- Click on select to double check the file you are about to upload.
For excel file and sheet selection, refer to the video above.
- Click on ‘Upload’
- Once ‘Uploaded’, click Continue
-
Select the Dataset in which you want to put the table. More info on Datasets.
-
Complete import
Next Steps
- Verify table schema
- Commit and release Dataset
- Use tables in Dataflows