Skip to main content
Wireshark Basics – Complete Beginner to Advanced Guide
CHAPTER 15 Beginner

VoIP and Streaming Analysis

Updated: May 16, 2026
20 min read

# CHAPTER 15

VoIP and Streaming Analysis

1. Introduction

Modern enterprise networks are no longer just for emails and spreadsheets. They are dominated by real-time media: Zoom calls, Microsoft Teams, Netflix streams, and corporate IP phones. As we learned in Chapter 8, this traffic heavily relies on UDP for speed. However, troubleshooting a choppy phone call or a buffering video requires a completely different approach than reading an HTTP web page. In this chapter, we will leverage Wireshark's advanced telecommunications tools. We will separate the "Signaling" (SIP) from the "Media Payload" (RTP), analyze audio jitter, and use Wireshark's built-in tools to actually listen to recorded VoIP calls.

2. Learning Objectives

By the end of this chapter, you will be able to:
  • Differentiate between SIP (Session Initiation Protocol) and RTP (Real-time Transport Protocol).
  • Filter and analyze a VoIP call setup process using SIP.
  • Identify RTP packets carrying the actual encoded audio/video payload.
  • Utilize Wireshark’s "VoIP Calls" analysis tool to map a conversation.
  • Understand how Jitter and Packet Loss destroy real-time audio quality.

3. Beginner-friendly Explanations

The Telephone Switchboard (SIP vs. RTP): Imagine making a traditional phone call.
  1. 1. SIP (The Operator): You dial a number. The operator connects your line to your friend's line. The operator handles the ringing, the busy signals, and the hanging up. Once the connection is made, the operator steps away.
  1. 2. RTP (The Voices): This is the actual sound of your voice traveling down the copper wire directly to your friend.

In VoIP networking, SIP (Session Initiation Protocol) handles the text-based setup of the call. RTP (Real-time Transport Protocol) carries the heavy, continuous stream of raw audio data via UDP.

4. Analyzing SIP (The Call Setup)

SIP looks remarkably similar to HTTP. It uses plain text headers. Filter Wireshark for sip. You will see the chronological flow of a phone call ringing:
  1. 1. INVITE sip:user@domain.com (I am calling you).
  1. 2. 180 Ringing (The phone on the desk is physically ringing).
  1. 3. 200 OK (The person picked up the phone).
  1. 4. ACK (Connection established. Audio begins).
  1. 5. BYE (Someone hung up).

*Forensic Value:* By reading the SIP packets, you can see the exact phone numbers, IP addresses, and caller ID names involved in the call setup.

5. Analyzing RTP (The Audio Stream)

Once the SIP 200 OK is sent, a massive flood of UDP traffic begins. This is the RTP stream. RTP packets are sent exactly every 20 milliseconds. If you filter for rtp, you will see hundreds of packets a second. Expand the Real-Time Transport Protocol layer in the Details pane. You will see a Sequence Number and a Timestamp. Because UDP doesn't guarantee delivery, RTP injects its own sequence numbers so the receiving phone can reassemble the audio perfectly in order.

6. Wireshark's Built-in VoIP Player

Wireshark has a magical feature for telecom engineers. It can reconstruct the audio.
  1. 1. Capture a VoIP call.
  1. 2. Go to the top menu: Telephony -> VoIP Calls.
  1. 3. Wireshark will automatically scan the PCAP and display a list of all detected phone calls.
  1. 4. Select a call and click Play Streams.
  1. 5. A media player will pop up. Press Play, and you will literally hear the recorded audio conversation play through your computer speakers! (Assuming the call was not encrypted).

7. Troubleshooting Audio Issues (Jitter)

When a user complains, "The call sounds robotic and choppy," you look at RTP. Go to Telephony -> RTP -> RTP Streams. Wireshark calculates the health of the stream:
  • Packet Loss: If 5% of the RTP packets are lost, the audio will cut out completely.
  • Jitter: RTP packets must arrive exactly every 20ms. If network congestion causes one packet to take 10ms, and the next to take 80ms, the variance in arrival time is called "Jitter." High jitter causes the robotic, garbled voice effect.

8. Best Practices

  • Mirroring Switch Ports: To capture VoIP traffic in an office, you cannot just plug your laptop into the wall. You must log into the physical network switch and configure "Port Mirroring" (SPAN). This copies all traffic from the physical IP Phone's port and duplicates it to your laptop's port so Wireshark can capture it.

9. Common Mistakes

  • Trying to Play Encrypted Video: While Wireshark can easily play unencrypted RTP audio, modern enterprise video conferencing (Zoom, Teams) uses SRTP (Secure RTP). The media payload is cryptographically encrypted. The VoIP player will show the packets, but it cannot play the audio or video without the encryption keys.

10. Mini Project: Map a SIP Call Flow

If you have a sample SIP PCAP file (available on the Wireshark Wiki):
  1. 1. Open the file.
  1. 2. Go to Telephony -> VoIP Calls.
  1. 3. Select the call and click Flow Sequence.
  1. 4. Wireshark will generate a beautiful graphical ladder diagram showing arrows bouncing back and forth between the two IP addresses, visually mapping the INVITE, Ringing, OK, and RTP audio stream phases. This diagram is perfect for submitting in IT troubleshooting reports.

11. Practice Exercises

  1. 1. Explain the distinct architectural roles of the SIP protocol and the RTP protocol in a Voice over IP (VoIP) phone call.
  1. 2. Why do high levels of "Jitter" on a network cause severe degradation of real-time audio quality, even if no packets are permanently lost?

12. MCQs with Answers

Question 1

In a VoIP environment, which specific protocol is responsible for carrying the actual digitized audio and video payload across the network?

Question 2

Which Wireshark top-menu feature automatically scans a capture file, identifies SIP/RTP conversations, and allows an analyst to graphically map the call flow or listen to unencrypted audio?

13. Interview Questions

  • Q: A client reports dropped VoIP calls. In Wireshark, you filter for SIP and see a sequence of INVITE, followed immediately by a 403 Forbidden response. What does this indicate?
  • Q: Explain the mechanical difference between Packet Loss and Jitter in the context of an RTP audio stream. How does Wireshark assist in diagnosing these metrics?
  • Q: Why does RTP utilize Sequence Numbers within its header, given that it is encapsulated within the connectionless UDP protocol?

14. FAQs

Q: Can Wireshark capture my Netflix stream? A: Yes, it can capture the packets. However, commercial streaming services use complex TCP-based adaptive bitrate streaming (like MPEG-DASH or HLS) wrapped entirely in HTTPS encryption. You will see massive amounts of encrypted TLS data, but you cannot reconstruct or watch the Netflix movie using Wireshark.

15. Summary

In Chapter 15, we tackled the challenges of real-time telecommunications. We separated the control plane from the data plane, utilizing SIP for text-based call signaling and RTP for high-speed UDP media delivery. We leveraged Wireshark's powerful Telephony suite to graphically map call flows and synthesize raw RTP payloads into playable audio streams. Finally, we identified the true enemies of VoIP—Packet Loss and Jitter—proving that even if packets eventually arrive, inconsistent delivery timing will destroy real-time communication.

16. Next Chapter Recommendation

We have analyzed standard corporate traffic. Now it is time to look for the attackers. Proceed to Chapter 16: Malware and Security Analysis.

Finish this Chapter

Save your progress on your learning path and prepare for coding interview challenges.

Discussion

Join the discussion

Log in or create a free account to participate.

Sort: ·