Développement Fullstack

Real Latency vs Perceived Latency in GenAI Systems

Raw latency and perceived latency are different engineering problems. Production GenAI systems feel fast when they expose progress early, overlap backend work, and avoid silent waiting.

13 mai 2026
Partager
8 min de lecture

Every production GenAI system eventually runs into the same uncomfortable question:

The answer is that raw latency and perceived latency are different engineering problems.

Raw latency is backend execution time. It includes retrieval, reranking, orchestration, model inference, queueing, serialization, and network overhead.

The two are related, but they are not the same metric. A silent 1.5-second wait can feel worse than a streamed 2.5-second answer because the first interface creates uncertainty and the second creates progress.

À propos de l'auteur

Cyril Noirot

Cyril Noirot

Lead Data Scientist

Data scientist freelance. Je conçois et déploie des systèmes de décision — prévision, pricing, marketing measurement, optimisation.

Newsletter

Articles techniques sur la prévision, le pricing et les systèmes de décision. Aucune fréquence imposée.

Enter your email
Subscribe