Datasources

Datasources are the ingestion layer of SolutionEngine.

They connect external systems to workflows and convert incoming events into executable workflow input.

Without a datasource, production workflows have no live input path.

In production, treat every datasource as an operational service. It must be configured, monitored, and validated like any other runtime component.

Role in the Execution Pipeline

A datasource is responsible for four things:

Listening to an external source continuously or at intervals
Validating or normalizing incoming payloads
Emitting events to workflow runtime
Preserving project isolation for every emitted event

End-to-end flow:

External Source -> Datasource -> Workflow Trigger Node -> Processing Graph -> Output

Scope and Ownership

Datasource resources are project-scoped.

Key rules:

A datasource belongs to one project
It can trigger one or multiple workflows within that same project
It cannot trigger workflows in other projects
Its runtime status is tracked independently

Typical statuses:

inactive
active
error

Datasource Configuration Model

All datasource types follow the same design contract:

Connection details: endpoint, topic, URL, path, broker
Access details: auth, tokens, credentials, secrets
Input behavior: polling interval, subscription mode, frame/sample rate
Recovery behavior: reconnect delay, retries, timeout handling

Even when protocol details differ, operational behavior remains consistent: receive event, create workflow input, trigger execution.

Core Datasource Types

HTTP Webhook

Push-based ingestion from external services.

Use when:

Third-party systems send event callbacks
You need near real-time API-to-workflow triggering

Common config:

endpoint path
authentication method
payload format expectations

Operational notes:

Validate request shape early
Add signature/token validation for public endpoints

HTTP Polling (HTTP GET)

Interval-based pull from an API endpoint.

Use when:

Source system cannot push events
You need periodic synchronization

Common config:

URL
poll interval
headers and auth

Operational notes:

Avoid aggressive polling in production
Include timeout and retry controls

HTTP Event Stream (SSE)

Persistent event stream ingestion.

Use when:

Source provides server-sent events
You need long-lived low-latency event intake

Common config:

stream URL
auth mode
reconnect delay

Operational notes:

Always configure reconnect behavior
Log stream disconnect reasons

MQTT Topic

Low-overhead pub/sub ingestion for IoT and telemetry.

Use when:

Devices publish small, frequent payloads
You need topic-based routing

Common config:

broker URL
topic
QoS
client credentials

Operational notes:

Use deterministic topic naming
Separate production and test topics

RTSP Camera Stream

Video ingestion from cameras and NVRs.

Use when:

Building computer vision workflows
Streaming live visual data

Common config:

RTSP URL
FPS sampling
optional camera credentials

Operational notes:

Start with lower FPS to control compute cost
Persist critical frames to buckets for incident replay

File Stream

Replays stored media as event input.

Use when:

Testing with historical footage
Repeatable model validation

Common config:

bucket/path reference
playback FPS
looping options

Operational notes:

Keep benchmark datasets immutable
Use fixed playback settings for regression tests

Selecting the Right Datasource

Use this decision guide:

Need push events from apps: HTTP Webhook
Need periodic sync from APIs: HTTP Polling
Need real-time IoT feed: MQTT
Need camera analytics: RTSP
Need replay/testing: File Stream
Need long-lived event feed: HTTP Event Stream

Runtime Reliability Guidelines

To run datasources safely in production:

Define timeout values explicitly
Configure retry/reconnect strategy
Validate incoming payload schema near ingress
Capture critical ingestion errors in logs
Monitor datasource status and event throughput

Do not move malformed payload handling deep into the workflow. Reject or normalize as close to ingress as possible.

Security Guidelines

Never hardcode secrets in datasource config
Prefer managed secrets or environment variables
Restrict webhook exposure and apply auth validation
Use least-privilege credentials for brokers and APIs
Rotate credentials on a schedule

Performance and Cost Controls

Use sampling for high-volume RTSP streams
Filter early in trigger nodes to reduce unnecessary execution
Avoid over-polling external APIs
Use edge environments for latency-sensitive camera ingestion
Benchmark throughput before scaling to full production traffic

Troubleshooting Checklist

If datasource status is error, validate in this order:

Connectivity: host, port, DNS, firewall
Auth: token validity, username/password, permissions
Payload: valid JSON/object structure
Rate and timeout settings
Workflow trigger mapping and environment health

Common symptoms:

No events: wrong endpoint/topic/path or auth failure
Intermittent events: unstable network or reconnect misconfiguration
High runtime load: over-sampling, over-polling, missing filtering

Production Pattern Example

RTSP Datasource -> Datasource Trigger -> Pre-Validation -> Run Model
               -> Confidence Filter -> Save Media -> Alert API

Why this pattern is stable:

Ingress is isolated
Validation protects downstream nodes
Filtering reduces false positives and cost
Output path supports traceability and alerting

Datasource Reference Pages

Use these pages for field-by-field setup guidance:

PreviousBucket Metadata Storage

NextHeartbeat and Status Monitoring

Datasources

Role in the Execution Pipeline

Scope and Ownership

Datasource Configuration Model

Core Datasource Types

HTTP Webhook

HTTP Polling (HTTP GET)

HTTP Event Stream (SSE)

MQTT Topic

RTSP Camera Stream

File Stream

Selecting the Right Datasource

Runtime Reliability Guidelines

Security Guidelines

Performance and Cost Controls

Troubleshooting Checklist

Production Pattern Example

Related Pages

Datasource Reference Pages