What goes into running a Chainlink Data Streams node

On April 12, Chainlink Data Streams started carrying near-real-time pricing for U.S. stocks and ETFs on a 24/5 basis. That moves the service past crypto-only pricing into tokenized traditional finance. The infrastructure behind that service is not abstract. Someone runs it.

This post is about what that work actually involves.

What Data Streams is, operationally

Most oracle descriptions stop at "delivers prices on-chain". That is the product view. The operations view is different.

Data Streams is a pull-based oracle. Reports are generated continuously in a permissioned Data DON (decentralized oracle network). Subscribers fetch them on demand when they need them for a transaction. Each report is signed by the operators in the DON and carries cryptographic proofs that consuming contracts verify before acting.

Two implications follow from that design. First, report generation happens whether or not anyone pulls a report. Operators carry the full cost of running the DON around the clock. Second, pull-based retrieval removes the latency tax of waiting for the next scheduled on-chain update. Latency-sensitive use cases (perpetuals, derivatives, tokenized equities) adopted it quickly.

What the node runs

A Data Streams node is a Chainlink node with additional binaries and a different workload profile. The core pieces:

OCR (off-chain reporting) as the consensus mechanism between operators. A round gathers observations, aggregates them, and produces a signed report. OCR is designed to complete rounds in sub-second time when operators are well-connected and under light coordination overhead.
Data Streams-specific components on top of the standard Chainlink node. Handles report formatting, Mercury transport, and signature generation.
Connections to data providers. OCR nodes consume price data from multiple upstream feed providers to form their observations. The LinkPool OCR deployments consume feeds from dxFeed, Framework, Tiingo, XBTO, Bitscrunch, Therundown, and Enet. These are data sources the nodes pull from to produce signed observations. They are not commercial clients of LinkPool. The distinction matters because conflating the two misrepresents both parties.
Tight clock synchronisation across the DON. OCR rounds depend on operators agreeing on time within a small tolerance. Drift breaks rounds.

What the infrastructure has to do

The requirements are unforgiving in three places, and each one translates directly to hardware and network design.

Network latency to relayers and peers. OCR round time is bounded by the slowest operator's round-trip to its peers. An operator on shared public cloud infrastructure carries hypervisor overhead of roughly 0.3 to 1.2 milliseconds per hop. On top of that, the host's noisy neighbours add their own jitter. That does not sound like much until it compounds across a coordination loop that runs many times per second.

Storage throughput. Each operator retains a local history of signed reports, observation data, and OCR state. On busy feeds the write load is continuous. The read pattern is latency-sensitive when the node is catching up after a restart or handling an integrity check. Gen5 NVMe with a dedicated I/O path handles it without drama. Shared block storage usually does not.

Uptime. A node that misses OCR rounds loses rewards. More importantly, it degrades the DON's quality score and makes the feed less reliable for everyone consuming it. The target at LinkPool is 99.99%, anchored in three-zone failover across three availability zones in Manchester. etcd quorum survives a single zone going dark. Cilium BGP reconverges in under ten seconds after a spine failure, based on the hold timer configuration. MLAG at every leaf pair means a switch reload is a non-event for the pods running above it.

These three requirements are why Data Streams is not a cloud-native workload in the usual sense. Burstable CPU, shared network interfaces, and overcommitted memory all trade tail latency for price. Oracle operations pay for tail latency in missed rounds. The trade does not work.

The same coordination argument as DVT

Running Chainlink OCR feels structurally similar to running distributed validator technology clusters. Both are threshold-signing systems where a fixed set of operators has to agree on a signed message within a deadline. Both fail degraded rather than cleanly when one operator is slow. Both punish hypervisor overhead and reward consistent network latency.

The operational principle is the same in both cases: the weakest operator sets the ceiling. In a four-operator DVT cluster, imagine three operators on owned infrastructure and one on burstable cloud. The cluster's attestation timing tracks the cloud operator. OCR rounds work the same way. There is no reward for being the fastest operator. There is a cost for being the slowest.

For a deeper look at the same pattern in the staking context, see what is distributed validator technology?.

What LinkPool runs

Categorically: Chainlink OCR nodes, CCIP lanes, Data Streams (Mercury), Automation, and Keepers. All on owned infrastructure across three availability zones in Manchester. All on the same dedicated Kubernetes platform that runs our validator work, our RPC endpoints, and our DVT clusters.

We do not publish the exact namespace counts or chain counts. The surface covers the major EVM and non-EVM networks, and has done so for years.

For the architecture pattern behind the platform, see bare-metal Kubernetes hosting: cloud vs dedicated infrastructure.

Why the TradFi expansion matters operationally

The April 12 expansion into U.S. stocks and ETFs on a 24/5 basis changes the duty cycle. Crypto markets never close, but crypto-only pricing is forgiving about brief degradation during low-volume windows. Equity pricing is not. During a U.S. cash session, any oracle downtime directly affects tokenized equity settlement and ETF share pricing. It also affects the automated strategies that trade on those prices.

That raises the stakes on uptime without changing the math on infrastructure. The same dedicated hardware, three-zone failover, and target SLA that carry crypto feeds will carry equity feeds. They just carry more financial consequence per minute.

This is critical financial infrastructure, not a side-car to validator ops.

Common questions

What are the hardware requirements for a Chainlink Data Streams node?

Dedicated infrastructure with guaranteed QoS. The workload requires consistent CPU access without throttling and low-latency network paths. Shared cloud with burstable CPU introduces jitter that violates the sub-second latency requirement.

Can Data Streams and price feed nodes share a cluster?

No. Price feeds tolerate a 15-second heartbeat gap. Data Streams runs sub-second with no jitter budget. Sharing a cluster means competing for CPU and network with a workload that has fundamentally different latency tolerances.

What is the difference between Chainlink price feeds and Data Streams?

Price feeds deliver on-chain aggregated data on a heartbeat or deviation threshold. Data Streams delivers verifiable low-latency market data off-chain for protocols that need sub-second price updates — perpetuals, options, order books. Different SLA, different operational posture.

Question for operators

What does your oracle node operations look like today? Dedicated hardware with a defined failover zone, or shared capacity with cloud-bounded SLA ceilings?

Frequently asked questions

What are the hardware requirements for a Chainlink Data Streams node?

Chainlink Data Streams nodes require dedicated infrastructure with guaranteed QoS. The workload needs low-latency network paths, consistent CPU access without throttling, and NVMe storage with no I/O contention. Shared cloud with burstable CPU or shared network introduces jitter that violates the sub-second latency requirement.

Can Chainlink Data Streams and price feed nodes share the same cluster?

No. Price feeds tolerate a 15-second heartbeat gap. Data Streams runs sub-second with no jitter budget. Sharing a cluster means the Data Streams workload competes for CPU and network with a workload that has fundamentally different latency tolerances. The operational posture is different enough that they should not share infrastructure.

What is the difference between Chainlink price feeds and Data Streams?

Price feeds are on-chain aggregated data updated on a heartbeat or deviation threshold — they tolerate a 15-second gap. Data Streams delivers verifiable low-latency market data off-chain for DeFi protocols that need sub-second price updates (perpetuals, options, order books). Different SLA, different operational posture, different infrastructure requirements.