Whenever a stakeholder says they want data in real time, you shouldn't default to dreams of Flink, Kafka, and watermarks.
You should clarify with precision what an acceptable amount of latency is for this use case.
Many times when a stakeholder asks for real-time data, it can be solved with an hourly batch pipeline. The incremental benefit to jump from hourly batch to streaming isn't worth it because it impacts the homogeneity of your suite of pipelines and makes the overhead maintenance much higher!
Sometimes stakeholders say real-time and what they mean is âpredictable refresh rates.â This is a sign you need to do better as a data engineer at setting SLAs for your pipelines about when they'll refresh.
#dataengineering