Fragmentation is a huge issue with IP networks. The real difference between fragmentation and a lot of the other big issues is that no one realizes what a big deal it is. It is very common to run into fragmentation problems. However, since much of the time they only cause a decrease in performance as opposed to a full-blown outage, users will become accustomed to them and eventually quit complaining. Fortunately, with a basic understanding of fragmentation, you can understand what is happening and how to either handle the fragmentation or prevent it from occurring in the first place. This article attempts to touch on what fragmentation is, why it happens, some of the issues it causes, and how it can be handled and/or avoided.
Fragmentation from the perspective of an IP network is just about like it sounds. It’s taking the payload from one big IP packet and splitting it into multiple smaller IP packets. Each fragment has the headers required for transmission over IP networks added to it.
As with many technical topics, the first thing that you need to do is define a few common and important terms relating to fragmentation.
The first important term to define is MTU (maximum transmission unit). MTU is the maximum amount of payload an interface can have pass through it. MTU is defined in bytes. A typical unmodified Ethernet interface has an MTU of 1500 bytes. Payload as opposed to total size is an important distinction to make here, and we will be touching on this again in different ways throughout the article. An Ethernet interface with a MTU of 1500 bytes means it allows 1500 bytes of payload to pass through it. This 1500-byte limit is not inclusive of layer 2 headers added. Adding an Ethernet header (18 bytes) and the optional 802.1q header (4 bytes) can make the frame itself as large as 1522 bytes. So, if you’re looking at packets passing through a trunk interface via a Wireshark capture, you could actually see frames as high as 1522 bytes. This can be confusing if you don’t understand what is really going on. Remember that MTU doesn’t mean that every packet is exactly equal to the MTU. Many of the frames will be smaller than the MTU. In this way, the behavior of Ethernet is different from other protocols such as SDH/SONET, where every frame is the same size. SDH/SONET accomplishes this by padding “incomplete” cells with filler data to ensure each and every one is the same size.
The next term to define is the MSS (maximum segment size). Though this definition might not land in a college dictionary, simply stated, the MSS is the maximum amount of data a single TCP packet is allowed to carry. So, let’s say you have 2600 bytes of data to send. With an MSS of 1300, you send two TCP frames—each one carrying 1300 bytes of payload data. If you had 2800 bytes of data, you’d have to send three frames—two with 1300 bytes, and one with 200 bytes. MSS was specifically designed to help prevent—or at least minimize—fragmentation for TCP traffic. This is, in fact, an important distinction. MSS is only applicable for TCP traffic. UDP traffic does not use MSS. So MSS will not help you if you are trying to solve a fragmentation problem where the data is UDP.
MSS does not indicate the allowable total size of the packet. It indicates how much data that frame is permitted to carry. So, with an MSS of 1300 bytes, the maximum size of the frames carrying that session’s data will actually normally be 1340 bytes (1300 bytes of payload, 20 bytes for the IP header, and 20 bytes for the TCP header). MSS is exchanged between two hosts when they go through the three-way handshake that sets up a TCP session. Each side informs the other side what MSS value it should use when sending data to it. Then, during that session, the host will never send a TCP frame whose data payload exceeds the MSS value it received from the other side during the three-way handshake. It is very important to understand that MSS is not a single number agreed upon between two hosts. MSS can be—and often is—different going in each direction.
Throughout this article, we’ll make some assumptions. Assume that we are running IP over Ethernet, with the standard 1500 byte MTU, unless otherwise specified.
Just in case you have not already drawn this conclusion, fragmentation is required whenever the size of frame to be transmitted is larger than the MTU of the interface it needs to be transmitted out of. When this occurs, one of two things must happen—either the frame is fragmented and transmitted, or that frame is not fragmented and it is dropped.
If you are wondering why a packet would ever not be fragmented, you are right on target. To answer that question, we have to take a look at the IP header. Within the IP header is a field know as the DF bit. DF stands for “Do Not Fragment.” When the DF bit is set to 1, it indicates that the packet cannot be fragmented. If it is set to 0, it indicates that the packet can be fragmented. Frames too large for transmission with the DF bit set to 0 are fragmented and sent. Frames too large for transmission with the DF bit set to 1 are dropped.
Normally, when a router has to drop a packet that it needs to fragment but isn’t allowed to, it will send back a special message to the sender telling it that it dropped the frame because it needed to fragment it and wasn’t allowed to.
It is very easy to see this happening. Below, I’m attempting to send a 1600 byte ping to Google, with the DF bit set. In this case, I am actually being notified by my own operating system that fragmentation is required, but you would see a similar reply to your ping if it were some other device sending the notification.
The format of a true ICMP fragmentation message (type 3, code 4), is as follows:
Although we are using ICMP traffic in this case, the same thing would happen if you were sending TCP or UDP traffic.
Now that we understand how ICMP is used to notify a sender of a packet being too big, we can better understand a standard used by most modern devices to help them determine the largest supportable MTU between themselves and where they are sending. This is called PMTUD—yes, yet another acronym.
PMTUD stands for path MTU detection. PMTUD states that virtually every TCP and UDP packet sent from the host has the DF bit set—which again means to any device processing the packet that it should not be fragmented. The theory behind PMTUD is that whenever a packet is sent that is too big, the device will receive an ICMP notification informing it of the maximum supported MTU. That host then makes a note of that number in its routing table. For all subsequent sessions to that specific destination, the updated lower MTU is used.
While PMTUD may sound great, there are a lot of things that can go wrong with it. Part of PCI-DSS compliance recommends that ICMP unreachable messages (which are what are used to notify hosts when the packets they send are too big) be rate-limited. And many administrators disable ICMP messages completely on their devices. When a host doesn’t get these notifications, it won’t know to lower its MTU for that destination. In addition, many firewalls (especially those whose filtering focuses mainly on whether or not there is state information present) will drop these messages, rather than forwarding them on to hosts.
By default, hosts make the assumption that the entire path from them to wherever they’re sending traffic to has the same MTU as its outgoing interface. In other words, if its interface supports a 1500-byte MTU, it assumes that every device in the path between it and wherever it’s sending to also supports 1500 bytes of payload.
So why does fragmentation occur most of the time? There are two main reasons. First, and we have already touched on this, is when some interface supports an MTU less than the MTU supported by the sender.
Second is overhead. Overhead, as related to fragmentation issues, usually becomes a problem when a tunneling mechanism such as GRE, L2TP, IPinIP, or IPsec is used. When you use a tunneling protocol, you take the data being sent, “wrap it up” in another protocol, and send it on. That “wrapping” increases the size of the data being sent, which often results in it being too big to be sent without fragmenting it first.
Let’s say host A wants to send host B 3000 bytes of data. Host A’s WAN connection has a 1500-byte MTU, so the host sends two 1500-byte packets. As long as no additional overhead is added, this is not a problem. However, let’s say there is a GRE tunnel somewhere in between. When the GRE overhead (usually 24 bytes) gets added, we are now attempting to send two 1524-byte frames rather than two 1500-byte frames. If our interface has a MTU of 1500 bytes, fragmentation is required.
If you have not already figured this out, the best way to deal with fragmentation is to avoid it completely. Fortunately, there are options here. MSS is one way to attempt to avoid fragmentation. The idea with a smaller MSS is rather quite simple. A smaller payload means a smaller packet, which means a smaller chance that fragmentation is necessary. You can configure a router to change MSS as the traffic passes through it. TCP handshake frames passing through the router have their MSS modified by the router to be whatever you specify. In this way, the network administrator can make the possibility of fragmentation much less. Again, as we touched on earlier, this is only applicable to TCP traffic.
Another way to deal with fragmentation is to always fragment traffic when it needs to be fragmented, even if the sending host told you not to (aka the DF bit is set in the IP header). This is accomplished by using policy-based routing (commonly referred to as PBR). With PBR, you can actually clear the DF bit that was set by the sender in the IP header. This then allows you to fragment the packet as required.
Doing so would look something like this:
Route-map cleardf Set ip df 0 Int eth0/0 Ip policy route-map cleardf
Note, however, this should be used with caution, as PBR means you are process switching all packets passing through the router, which can open you up to performance problems that are even worse than fragmentation to deal with.
So, now that we have dug into fragmentation a bit, why is fragmentation so bad? There are several reasons why IP fragmentation has a negative impact on the performance of a network. First is CPU. When a router needs to fragment a packet, it has to allocate memory and CPU resources to do the fragmentation. Then, when that packet is reassembled, the receiver must hold each fragment of the original packet until all fragments have been received, and do the reassembly. This also takes CPU/memory resources. And, depending on where the fragmentation is done, reassembly can be either on the true end host or on some intermediary router-type device.
Another reason that fragmentation is not desirable has to do with firewalls. Depending on the type and configuration of the firewall, you can see firewalls not handling fragments correctly. If the 2nd fragment of a packet reaches a firewall prior to the first (in other words, they are out of order), the firewall often will drop that fragment, which results in the entire original packet needing to be retransmitted. Out-of-order frames are an everyday occurrence in today’s packet-switched IP networks.
In summary, fragmentation is a largely understated and massively misunderstood topic that, despite being ignored as much as possible, causes huge problems for virtually every network. This article has attempted to give you enough of an understanding about fragmentation to recognize that an issue is related to fragmentation as well as having some tools to either prevent it or mitigate it.