Network Jitter and Delay
Real-time voice communications are sensitive to delay and variation in packet arrival times. Codecs require a steady, dependable stream of packets to provide reasonable playback quality. Packets arriving too early, too late, or out of sequence result in jerky, jumbled playback. This phenomenon is called jitter.
Because no network can guarantee a perfectly steady stream of packets under real-world conditions, VoIP phones use jitter buffers to smooth out the kinks. A jitter buffer is simply a First-In, First Out (FIFO) memory cache that collects the packets as they arrive, forwarding them to the codec evenly spaced and in proper sequence for accurate playback.
While a jitter buffer can successfully mask mild delay and jitter problems, severe jitter can overwhelm the jitter buffer, which results in packet loss (see below). Increasing the size of the jitter buffer can help, but only to a point: A jitter buffer that increases overall round-trip delay to 300 ms will make normal conversation difficult.
 
Figure 56: Jitter buffering and packet loss concealment
 
VoIP packets can arrive at a receiving phone out of sequence, late, or not at all. IP phones use a jitter buffer to reconstruct the packet stream at the receiving end, duplicating missing packets or filling in with white comfort noise when necessary.
Differences between Jitter (RTP) and Jitter (ms)
Jitter is calculated in two different ways: the first uses Real Time Protocol (RTP) time stamps, and the second uses packet time stamps. When jitter is represented in milliseconds (ms), the calculation is made using packet time stamps and not RTP time stamps.
 
Observer performs this calculation on every packet. Jitter max is the highest calculated jitter observed. Jitter itself is the latest calculated jitter value (most recent). Also note that jitter requires a minimum of 16 packets before a value will be displayed in Observer.
 
 
Observer displays jitter values for both jitter calculation methods. This is how the numbers are created in Observer:
1. Translate the packet time to RTP time units.
packet rtp-based time = ((packet time stamp in ns) / 1000) * (sampling rate in seconds) / million
2. This time is used to calculate the difference between this packet and the previous packet.
packet time difference = (packet rtp-based time) - (previous packet rtp-based time)
3. The jitter is then calculated exactly as the RTP specification describes using the RTP time stamps.
jitter = jitter + ((packet time difference) - jitter) / 16.0
4. Then for the jitter in milliseconds (ms), the time is converted using the sampling rate.
jitter_ms = jitter_ms + ((1000 * packet time difference) / (sampling rate) - jitter_ms) / 16.0