- Internet & the Web
What's the difference between UDP and TCP?
A Gentle Introduction: something's missing
In a previous hub, I discussed the foundations of computer networking by breaking down problem domains along physical, logical, and network boundaries. Here's a quick review:
- Layer 1: the physical layer guides the signal along a physical path.
- Layer 2: the logical layer describes how the signal is encoded and takes care of individually addressing the local area network (LAN) participants.
- Layer 3: the network layer facilitates communication between LANs and WANs by defining addressing schemes and reachability between routers.
The network layer is an abstraction that allows communication between two hosts, regardless of whether those two hosts are on the same LAN or separated by many routed hops.
So what's missing? Isn't that what the Internet is all about? Yes and no. Without this foundation, there would be no Internet. But routing information between two hosts is only the beginning.
More about ports and protocols
Layer 4: the transport layer
If we only communicate on a one-to-one basis between Internet hosts, we can stop at layer 3. But what happens when there is more than one application on your computer that needs to talk to more than one application on a given remote IP address?
Consider the way you're reading this hub. Perhaps you landed directly on HubPages.com, or maybe you found your way here through a search engine. But how did you begin? You had a concept in mind, a label, a series of words. Let's go with "HubPages.com" as a simple example. How does your browser take a human-readable label like HubPages.com and translate it into a routable IP address? With a DNS query - ask the question "hubpages.com", get back the answer 18.104.22.168. Now what? Now the browser knows to contact the web server running on 22.214.171.124. Your browser, the web server hosting hubpages.com, and the DNS server are all software applications running on different computers. What if DNS and the web server run on the same server, the same IP address? How does the information get to the right place?
Layer 4, the transport layer, provides a way to address individual processes per computer, allowing for interprocess communication. Each interprocess stream designates a protocol, a client port, and a server port. The transmission control protocol (TCP) and user datagram protocol (UDP) are two different types of transports within the TCP/IP family. Client programs initiate the communication, and server programs passively listen and respond to client requests. How does a client know which protocol and port to specify? Established services such as DNS and web have well-known ports (udp/53 and tcp/80, respectively).
Brief history review
Consider the historical context of the development of the protocols that make up today's Internet. The existing global network of the 1970s was the telephone network. End nodes were pretty dumb: a telephone only needs to signal what destination it wants to reach, and transduce audio to and from electrical signals. All the smarts were in the network to route calls via end-to-end circuit switching.
By contrast, Internet communication consists of the exchange of packets. This packet-based networking concept was revolutionary when it was first introduced: by breaking up information into independent datagrams, it defied the dominant concept of dedicated circuits maintained by the global telephone networks. At the early stages of the Internet, engineers recognized that the network didn't need to be smart if the nodes are. The network just needs to be fast and dumb. But mostly fast. So they engineered the protocols to push the status (state) of point-to-point communications to be maintained by the participating nodes, not by the network devices in between.
When a packet arrives on an intermediary network device, the only question it has to ask is "what's the fastest way to move this packet closer to its destination?" and then take appropriate action. All the state of communication is maintained on the client and on the server. The network in between is essentially an unknown, a black box.
User Datagram Protocol
Like a telegram, a datagram is a self-contained packet containing details of who sent it and to whom, and of course, message content. Consider our previous example of a DNS query: the packet sent out will have all the information needed to route to the DNS server, ask the question about the hostname, and route the response to the querier.
On an Ethernet segment, this packet will have several layers of encapsulation:
- source and destination of local area network participants (my device and its default router)
- source and destination of IP endpoints (my device and the DNS server)
- source and destination port of DNS query (my dns client and the DNS service port)
- finally, the actual DNS query is the UDP payload
The process of sending packets by UDP is similar to sending parcels by postal service. Once sent, there is no guarantee beyond "best effort" of in-order arrival at the destination.
All in all, UDP doesn't provide much beyond layer 3 IP except for the designation of source and destination ports. Because it is connection-less (packet-based), it doesn't impose any overhead for tracking status. However, the trade-off for lower latency (faster performance) is that applications must deal with the consequences of any packets that are lost or received in a different order than they were sent.
Reads like an action novel
Recommended by my comp sci prof, once I started this book I couldn't put it down. Very accessible narrative describing the problems encountered by the engineers involved in the original Internet project, ARPANET.
Transmission Control Protocol
For a relatively small price in overhead (processing power) and performance (latency), TCP offers the advantages of connection-oriented behavior. For the duration of a "connection," packets are guaranteed to be delivered in the same order as they were sent.
As shown in the diagram labeled "TCP States", the logic behind TCP can be overwhelming. [image withdrawn pending request to publish sent to copyright holder] We'll briefly touch on the highlights, but if you're interested in a deeper dive, post your question in the comment section below.
Three Way Handshake
To open a connection, the client sends an empty payload to the server with the SYN flag, initial SEQ number, local port, and server port set in the TCP header. If the server is listening for new connections on the specified port, it will respond with an ACK for the client's SEQ number, a SYN for its own, and typically an empty payload. (If no server application is registered as listening on the given port, a TCP RST [reset] flag will be returned instead of SYN/ACK to tell the client that the requested service is not available.) To complete the three-way handshake, the client returns an ACK for the server's initial SEQ. Once the connection has been established, TCP payloads can flow in either direction (full duplex).
The following description of sender and receiver can refer to either client or server. As octets are received from the TCP stream, the receiver increments the sender's SEQ by the receive count to send back as an ACK. The sender sets the FIN flag to signal end of transmission, and either party may signal RST to reset the connection in the event of an unrecoverable error.
To achieve guaranteed delivery and connection-like performance (like a telephone call), TCP uses a combination of timers and SEQ numbers (and related ACKs) to determine whether to retransmit segments given up for lost. This timeout before retransmit leads to a potential latency issue over lossy networks.
The client and server processes do not control and are not aware of where the packet boundaries occur in the transmission of data. To say that another way, there's no way for a receiver to tell the end of one packet from the beginning of another. Applications are left to encode boundaries in-line, within the data stream.
Pro's and Con's of TCP vs UDP
low (same as underlying network)
can be perceptibly higher on lossy networks
lost in the streaming overhead
best effort, no guarantee
Order of arrival
best effort, no guarantee
guaranteed to arrive in same order as sent
Examples of well-known TCP and UDP services
The Internet we all know and love today would not be possible apart from the workhorse protocols and software implementations that keeps packets flowing. Below is a brief list of some well-known protocols and applications.
- Web Server
World wide web servers offer access to HTML documents
- Domain Name Service
Name lookup to resolve to Internet Protocol addresses
- Simple Mail Transport Service
Server to server communication forms the backbone of email service
Visual illustrations of protocols in action
Practical review of widely-used code that implements TCP/IP protocols
Wikipedia has in-depth discussions on TCP and UDP, as well as other transport protocols not covered in this hub.
The late Richard Stevens wrote definitive references on the principles of TCP/IP and writing programs that use these protocols.
Van Jacobson is a principal architect behind much of the logic of TCP.
Further extensions on standard TCP/IP suite