ZinkML Graph Dataflow Studio
Overview
The Graph Dataflow Studio is ZinkML's integrated environment for data preparation, processing, and modeling. This comprehensive interface allows users to create visual data pipelines through an intuitive graph-based system.
Video Tutorial
For a visual guide on the Dataflows, please watch our step-by-step tutorial:
Table of Contents
- Getting Started
- Loading Tables
- Data Processing
- Modeling Operations
- Execution Management
- Dataflow Operations
- Deployment
- Dataflow Management
Getting Started
Initial Setup
- Navigate to 'Graph Dataflow Studio'
- Create New Dataflow:
a. Enter unique dataflow name b. Click 'Create New Dataflow' c. Access blank canvas workspace
Interface Overview
The studio interface consists of:
- Left Panel: Data, Core, and Model tabs
- Central Canvas: Dataflow workspace
- Top Panel: Execution logs and results
Loading Tables
Data Tab Navigation
-
Access left vertical tab structure:
- Data Tab: Available datasets and tables - Core Tab: Processing operators - Model Tab: Machine learning algorithms
-
Table Access Options:
- Owned datasets
- Shared datasets
- Public datasets
- Search functionality
- Filter options
Table Loading Process
-
Locate desired table in Data tab
-
Implementation:
a. Drag and drop selected table node to canvas b. Click 'Execute' to load data c. Wait for execution to complete
-
Data Visualization:
- Table View: Raw data examination
- Plots View: Visual data analysis
- Sample data schema and statistics preview
Data Processing
Creating Data Pipelines
-
Operator Selection:
a. Access Core tab b. Browse available operators c. Search for specific functions
-
Node Connection Process:
a. Add processing nodes to canvas b. Connect output sockets to input sockets c. Create logical data flow d. Verify connections
-
Data Flow Architecture:
- Source node → Edge → Target nodes
- Multiple target connections possible
- Branching workflows supported
Execution and Validation
-
Processing Steps:
a. Click 'Execute' button b. Monitor processing status c. Review execution logs d. Verify output results
-
Parameter Management:
- Adjust node parameters
- Re-run dataflow after parameter changes
- New version for each completed execution
Modeling Operations
Model Implementation
-
Model Node Addition:
a. Select from Model tab b. Add to canvas c. Connect to data pipeline
-
Parameter Configuration:
- Default Settings
- Custom Parameters. Example:
- Cross-validation settings - Train-test split settings - Learning rate - Batch size - Optimization settings
- Grid Search Options:
- Parameter ranges - Cross-validation settings - Train-test split settings - Search strategies
Execution Management
Execution Table Details
Column | Description | Example |
---|---|---|
Version | Dataflow iteration | v1.2.3 |
Nodes | Active nodes count | 15 nodes |
Rows Processed | Data volume | 1M rows |
User | Execution initiator | john@zinkml.com |
Status | Current state | Running/Complete/Failed/ Queued |
Deployment Actions | Available operations | Deploy/Predict |
Dataflow Operations
Check the video tutorial for detailed information
- Add nodes
- Select graph portions
- Delete nodes and edges
- Replicate selected portion of graph
- Change parameters for respective nodes
- Check all nodes on the left tab
- See collaborators (users) with access to this dataflow
- Download the dataflow
- Reposition the dataflow in ‘Pretty format’ on the canvas
- Refresh to pull the latest version of this dataflow
- Fork the Dataflow: copy the dataflow as a new dataflow and make edits in the new copied dataflow
- Node Operations
- Visualize input tables
- Visualize output tables
- Right click to:
- Check error reason (if the execution failed)
- Execute this node only (executes all (and only) nodes required for execution of this node)
- Replicate this node on the canvas
- Delete the node
Deployment
Deployment Process
a. Go to Execution ('Runs') table
b. Select Executed version with 'Completed' status
c. Click on 'Deploy' action button
d. Once ‘Deployed’, Click on ‘Predict’ to use the deployed dataflow to process new tables.
Dataflow Management
Dataflow Table Information
Feature | Description |
---|---|
Name | Dataflow identifier |
Version | Current version |
Nodes & Edges | Structure details |
Latest Run | Most recent execution |
Status | Current state |
Updated/Created | Timestamps |
Access Status | Private/Public/Shared |
Actions | Collaboration tools |
Management Options
- Download dataflow
- Delete dataflow
- Collaboration tools
- Access control
Best Practices
Development Guidelines
-
Design Principles:
- Modular design - Clear documentation - Regular testing - Version control
-
Performance Optimization:
- Resource efficiency - Pipeline optimization - Caching strategies - Error handling