Développement Fullstack

Real Latency vs Perceived Latency in GenAI Systems

Raw latency and perceived latency are different engineering problems. Production GenAI systems feel fast when they expose progress early, overlap backend work, and avoid silent waiting.

Par

Cyril Noirot

13 mai 2026

8 min de lecture

The answer is that raw latency and perceived latency are different engineering problems.

Raw latency is backend execution time. It includes retrieval, reranking, orchestration, model inference, queueing, serialization, and network overhead.

A silent 1.5-second wait can feel worse than a streamed 2.5-second answer because the first interface creates uncertainty and the second creates progress.

This distinction matters because often we optimize the wrong number, focusing only on total execution time, when the user is often reacting to something more specific:

À propos de l'auteur

Cyril Noirot

Lead Data Scientist

Data scientist freelance. Je conçois et déploie des systèmes de décision — prévision, pricing, marketing measurement, optimisation.

Plus sur Cyril Réalisations LinkedIn

The Hidden Cost of Precision in Operational AI

Enterprise AI systems often work at the first level of granularity, then become fragile when the business asks for more precision. A field lesson from pharmaceutical supply optimization on why incremental architectures matter.

Développement Fullstack12 min

Why LLM-Only Parsing Breaks in Production — And What to Do Instead

LLMs can extract structured data from anything — until they cannot. This article documents the failure modes of LLM-only parsing in production pipelines, and presents a layered architecture where determinism comes first and the LLM is used only where it is structurally irreplaceable.

Développement Fullstack6 min

Stop Talking About Agentic Commerce. Start With Agentic RAG.

Most companies are not ready for autonomous commerce agents. The practical starting point is Agentic RAG: systems that retrieve business context, reason over it, and produce decision-ready outputs.

Newsletter

Articles techniques sur la prévision, le pricing et les systèmes de décision. Aucune fréquence imposée.

Enter your email