Datasources

Datasources are the ingestion layer of SolutionEngine.

They connect external systems to workflows and convert incoming events into executable workflow input.

Without a datasource, production workflows have no live input path.

In production, treat every datasource as an operational service. It must be configured, monitored, and validated like any other runtime component.


Role in the Execution Pipeline

A datasource is responsible for four things:

  • Listening to an external source continuously or at intervals
  • Validating or normalizing incoming payloads
  • Emitting events to workflow runtime
  • Preserving project isolation for every emitted event

End-to-end flow:

External Source -> Datasource -> Workflow Trigger Node -> Processing Graph -> Output

Scope and Ownership

Datasource resources are project-scoped.

Key rules:

  • A datasource belongs to one project
  • It can trigger one or multiple workflows within that same project
  • It cannot trigger workflows in other projects
  • Its runtime status is tracked independently

Typical statuses:

  • inactive
  • active
  • error

Datasource Configuration Model

All datasource types follow the same design contract:

  • Connection details: endpoint, topic, URL, path, broker
  • Access details: auth, tokens, credentials, secrets
  • Input behavior: polling interval, subscription mode, frame/sample rate
  • Recovery behavior: reconnect delay, retries, timeout handling

Even when protocol details differ, operational behavior remains consistent: receive event, create workflow input, trigger execution.


Core Datasource Types

HTTP Webhook

Push-based ingestion from external services.

Use when:

  • Third-party systems send event callbacks
  • You need near real-time API-to-workflow triggering

Common config:

  • endpoint path
  • authentication method
  • payload format expectations

Operational notes:

  • Validate request shape early
  • Add signature/token validation for public endpoints

HTTP Polling (HTTP GET)

Interval-based pull from an API endpoint.

Use when:

  • Source system cannot push events
  • You need periodic synchronization

Common config:

  • URL
  • poll interval
  • headers and auth

Operational notes:

  • Avoid aggressive polling in production
  • Include timeout and retry controls

HTTP Event Stream (SSE)

Persistent event stream ingestion.

Use when:

  • Source provides server-sent events
  • You need long-lived low-latency event intake

Common config:

  • stream URL
  • auth mode
  • reconnect delay

Operational notes:

  • Always configure reconnect behavior
  • Log stream disconnect reasons

MQTT Topic

Low-overhead pub/sub ingestion for IoT and telemetry.

Use when:

  • Devices publish small, frequent payloads
  • You need topic-based routing

Common config:

  • broker URL
  • topic
  • QoS
  • client credentials

Operational notes:

  • Use deterministic topic naming
  • Separate production and test topics

RTSP Camera Stream

Video ingestion from cameras and NVRs.

Use when:

  • Building computer vision workflows
  • Streaming live visual data

Common config:

  • RTSP URL
  • FPS sampling
  • optional camera credentials

Operational notes:

  • Start with lower FPS to control compute cost
  • Persist critical frames to buckets for incident replay

File Stream

Replays stored media as event input.

Use when:

  • Testing with historical footage
  • Repeatable model validation

Common config:

  • bucket/path reference
  • playback FPS
  • looping options

Operational notes:

  • Keep benchmark datasets immutable
  • Use fixed playback settings for regression tests

Selecting the Right Datasource

Use this decision guide:

  • Need push events from apps: HTTP Webhook
  • Need periodic sync from APIs: HTTP Polling
  • Need real-time IoT feed: MQTT
  • Need camera analytics: RTSP
  • Need replay/testing: File Stream
  • Need long-lived event feed: HTTP Event Stream

Runtime Reliability Guidelines

To run datasources safely in production:

  • Define timeout values explicitly
  • Configure retry/reconnect strategy
  • Validate incoming payload schema near ingress
  • Capture critical ingestion errors in logs
  • Monitor datasource status and event throughput

Do not move malformed payload handling deep into the workflow. Reject or normalize as close to ingress as possible.


Security Guidelines

  • Never hardcode secrets in datasource config
  • Prefer managed secrets or environment variables
  • Restrict webhook exposure and apply auth validation
  • Use least-privilege credentials for brokers and APIs
  • Rotate credentials on a schedule

Performance and Cost Controls

  • Use sampling for high-volume RTSP streams
  • Filter early in trigger nodes to reduce unnecessary execution
  • Avoid over-polling external APIs
  • Use edge environments for latency-sensitive camera ingestion
  • Benchmark throughput before scaling to full production traffic

Troubleshooting Checklist

If datasource status is error, validate in this order:

  1. Connectivity: host, port, DNS, firewall
  2. Auth: token validity, username/password, permissions
  3. Payload: valid JSON/object structure
  4. Rate and timeout settings
  5. Workflow trigger mapping and environment health

Common symptoms:

  • No events: wrong endpoint/topic/path or auth failure
  • Intermittent events: unstable network or reconnect misconfiguration
  • High runtime load: over-sampling, over-polling, missing filtering

Production Pattern Example

RTSP Datasource -> Datasource Trigger -> Pre-Validation -> Run Model
               -> Confidence Filter -> Save Media -> Alert API

Why this pattern is stable:

  • Ingress is isolated
  • Validation protects downstream nodes
  • Filtering reduces false positives and cost
  • Output path supports traceability and alerting

Related Pages


Datasource Reference Pages

Use these pages for field-by-field setup guidance: