SFQ patches for Linux 2.6 These patches provides the functionality of ESFQ within the existing SFQ queuing discipline. They have been entirely rewritten for (hopefully) eventual inclusion in the main Linux kernel and iproute2. I do not intend to develop ESFQ any further. The SFQ patches are (hopefully) cleaner and bug-free. Migration from ESFQ is easy: just change your scripts to use 'sfq' instead of 'esfq'--the syntax is the same. Also, if you want to compile SFQ and ESFQ at the same time, that's no problem. * the kernel ESFQ patch applies on top of the SFQ patch (with fuzz=2) * the iproute2 ESFQ patch applies on top of the SFQ patch For current information on the SFQ patches, go to http://fatooh.org/esfq-2.6/ The SFQ packages are named according to the current version of the Linux kernel. They may or may not work with earlier versions; I only test using the latest. If you find that the most recent SFQ package available does not work with the newest kernel release, please send me an email. My address is bugfood-c [at] fatooh [dot] org. ============= INSTALLATION ============= 1. Download the SFQ tar file from http://fatooh.org/esfq-2.6/ current: http://fatooh.org/esfq-2.6/sfq-2.6.24.1.tar.bz2 2. Download iproute2 from http://linux-net.osdl.org/index.php/Iproute2 current: http://developer.osdl.org/dev/iproute2/download/iproute2-2.6.24-rc7.tar.gz 3. Download Linux 2.6 from http://www.kernel.org/ current: http://www.kernel.org/pub/linux/kernel/v2.6/linux-2.6.24.1.tar.bz2 4. Untar the files you downloaded, somewhere. Commands in the following instructions are examples; you'll have to modify them depending on where you untarred the files. 5. Patch the kernel. $ cd /usr/src/linux $ patch -p1 --dry-run < ~/sfq-2.6.24.1/sfq-kernel.patch (and, if it works...) $ patch -p1 < ~/sfq-2.6.24.1/sfq-kernel.patch 6. Configure the kernel to use CONNTRACK. The conntrack hash types in SFQ depend on Netfilter connection tracking. If you want to use them, you can use 'make menuconfig' to enable connection tracking in: Networking ---> Networking options ---> [ ] Network packet filtering framework (Netfilter) ---> Core Netfilter Configuration ---> < > Netfilter connection tracking support Layer 3 Independent Connection tracking 7. Configure the kernel to include SFQ. SFQ is buried in: Networking ---> Networking options ---> QoS and/or fair queueing ---> < > SFQ queue If you enabled CONNTRACK in step 6 above, you will be able to select "Connection Tracking Hash Types" immediately below "SFQ queue". If you enabled CONNTRACK as a module, "Connection Tracking Hash Types" will only be available if SFQ is a module as well. 8. Compile and install the kernel as usual. You can reboot to the new kernel now if you want to. You will not be able to use the new features in SFQ, though, until you compile a patched version of tc (in iproute2). 9. Patch the iproute2 package. cd /usr/local/src/iproute-2.6.24-rc7 patch -p1 --dry-run < ~/sfq-2.6.24.1./sfq-iproute2.patch (and, if it works...) patch -p1 < ~/sfq-2.6.24.1/sfq-iproute2.patch 10. Compile iproute2. $ make a. You don't necessarily have to install the entire patched version of iproute2 if you don't want to (perhaps you would rather keep your distribution's package). All you need is the tc binary. # cp -p tc/tc /sbin/tc-sfq # chown root:root /sbin/tc-sfq Just modify your scripts to use /sbin/tc-sfq instead of /sbin/tc. b. Otherwise, go ahead and install iproute2. $ make install 11. Reboot to your new kernel if you haven't already, and have fun! ============= USAGE ============== ... sfq [ perturb SECS ] [ quantum BYTES ] [ limit PKTS ] [ depth SLOTS ] [ divisor HASHBITS ] [ hash HASHTYPE] Where: HASHTYPE := { classic | src | dst | fwmark | ctorigdst | ctorigsrc | ctreplsrc | ctnatchg } Options: perturb SECS Default: 0 (no perturbation) SFQ doesn't actually allocate traffic to queues in a one-to-one manner; instead, traffic is distributed among a large (but limited) number of hash slots. Sometimes the hashing algorithm results in a collision between two flows of traffic, resulting in them sharing a slot. Both flows have to fight over the bandwidth that slot is allocated. Setting perturb to some nonzero value causes the flows to be redistributed at an interval of that many seconds. There is no set method for choosing a perturb interval, though it is good to use one (see the description of hash collisions under 'divisor', below). A good perturb interval will redistribute flows frequently, but not so frequently that SFQ spends a large proportion of the total time redistributing--during that time, SFQ will not necessarily be fair. The worst case scenario for redistributing time is: time = limit * mtu / rate time: the length of time, in seconds, that SFQ will spend redistributing limit: the limit parameter of SFQ (see the limit section, below) mtu: the interface's maximum transmission unit (probably 1500 bytes) rate: the interface's (or parent class') rate in bytes/sec quantum BYTES Default: 1 MTU-sized packet Recommended: default Number of bytes a slot is allowed to dequeue before the next slot gets a turn. Do not set this to a value lower than the MTU! limit PKTS Default: depth - 1 (default 127) Recommended: default The total number of packets that will be queued before packets start getting dropped. When SFQ is saturated, a larger limit may result in slightly fewer packets being dropped, but packets will be queued for a longer time. In other words, you can get slightly better bandwidth at the expense of worse latency. Unless you have a reason to maximize bandwidth and don't care about latency at all, you should leave limit alone. For a flow that is sending packets as quickly as it can, the worst-case latency can be calculated as: time = rate / (limit * mtu) For a flow that sends a single packet when no other packets of that flow are queued, the worst-case latency is: time = rate / (flows * mtu) time: the latency in seconds rate: the interface's (or parent class') rate in bytes/sec limit: the limit parameter given to SFQ (default 128 packets) flows: the number of concurrently active flows mtu: the interface's maximum transmission unit (probably 1500 bytes) Limit must be less than depth; if the specified limit is greater than or or equal to depth, SFQ will automatically set limit to depth-1. depth SLOTS Default: 128 Depth sets the number of slots. If the number of active flows is greater than the number of slots, flows will end up sharing slots and SFQ will no longer be fair. If you anticipate more than 128 concurrently active flows, you should use a larger depth, a larger limit, and probably a larger divisor (see below). If you expect there to be far fewer than 128 concurrent flows, you may want to use a lower depth and limit in order to benefit from slightly better latency of having fewer packets queued. divisor HASHBITS Default: 10 Maximum: 14 Divisor sets the number of bits to use for the hash table. A larger hash table decreases the likelihood of collisions. When a collision occurs, two or more flows will end up sharing a slot, which isn't fair for either of those flows. The expected number of collisions can be calculated: flows - size + size * (1 - 1/size)^flows flows: the number of concurrent flows as determined by the chosen hash type size: the hash tables size, or 2^divisor There is very little disadvantage to using a large hash table--only slightly higher memory usage (tens of kilobytes). Unless your expected number of collisions is extremely small, you might as well use the maximum divisor. Regardless, setting a perturbation period will ensure that any collisions that do occur will not affect any flows for long. Note also that if the number of concurrent flows ever exceeds the specified depth (see above), flows will end up sharing slots regardless of the hash table size. hash HASHTYPE Default: classic The hash option determines the basis upon which SFQ separates traffic into flows, and hence, on what basis bandwidth is fairly allocated. See below for descriptions of the available hash types. =============HASH TYPES============== classic This is original SFQ hash. Traffic is separated by connection according to source and destination IP and port (or protocol, for protocols that do not use ports). dst, src, fwmark Hash by the packet's destination IP, source IP, or iptables mark, respectively. In early ESFQ versions, these hash types were prone to collisions when hashing similar values; that is no longer a great problem, since jhash is used. See the "perturb" parameter. ctorigdst, ctorigsrc, ctrepldst, ctreplsrc These hash types use the original and reply address from IP conntrack, so they are useful when the router running SFQ is also handling NAT/masquerade. Non-IPV4 packets, which don't have connection tracking information, are hashed based on "regular" source or destination. Note: netfilter connection tracking must be enabled in the kernel. Consider several workstations behind a router that uses SNAT or masquerade to change the source address of outgoing packets; on the router, the inside (LAN) interface is eth0 and the outside (WAN) interface is eth1. All packets leaving eth1 will have eth1's IP address, so hashing by source IP is useless. Instead, use ctorigsrc, which will correspond to the workstations' IPs. ctnatchg "Conntrack NAT change," for lack of a better name. This is a combination of ctorigsrc and ctreplsrc that is useful when a local host is configured to accept incoming connections (via DNAT "port forwarding"), but outgoing traffic must be shaped regardless of which host initiated the connection. In this situation, ctorigsrc and ctreplsrc could refer to either host, and, because the local host is NATted, src will always refer to the NAT router. Consider this example: local host = 10.0.0.2 NAT router = 1.2.3.4 remote host = 87.65.43.21 ctorigsrc = 10.0.0.2 ctorigdst = 87.65.43.21 ctreplsrc = 87.65.43.21 ctrepldst = 1.2.3.4 ctorigsrc = 87.65.43.21 ctorigdst = 1.2.3.4 ctreplsrc = 10.0.0.2 ctrepldst = 87.65.43.21 The logic for ctnatchg is simple. If ctorigdst == ctreplsrc, then the connection is outgoing, so hash by ctorigsrc. Otherwise, the connection is incoming, so hash by ctreplsrc. ==============EXAMPLES============= # Example 1: Set up an "original" SFQ as root for eth0. Note that "hash # classic" is the default. tc qdisc add dev eth0 root sfq perturb 10 # Example 2: Attach SFQ to a parent class (such as HTB) and hash by destination # IP. Very useful if eth0 is on a LAN with bandwidth-consuming clients. # This example assumes you already have a classful qdisc set up. tc qdisc add dev eth0 parent 1:12 handle 12: sfq perturb 10 hash dst # Example 3: Like the example above, but for the WAN interface (eth1 instead of # eth0). This example is intended for usage on a NAT router, so it uses # ctnatchg instead of src). tc qdisc add dev eth1 parent 1:12 handle 12: sfq perturb 10 hash ctnatchg # Example 4: Hash by netfilter mark. Note that this won't do anything unless you # actually use iptables to add different marks to packets. tc qdisc add dev eth0 root sfq perturb 10 hash fwmark ========PROBLEMS/SOLUTIONS========= Problem: SFQ doesn't seem to have an effect. Possible reason: SFQ can only work properly if it can control which packets are dropped. If the bottleneck of a link is elsewhere, such as in a DSL modem, SFQ will not drop any packets because the DSL modem is dropping enough packets that the link does not appear saturated. Diagnostic: Is SFQ attached directly to the slowest link along a route? Is SFQ attached to a classful qdisc that is shaping traffic instead? Solution: In order for SFQ to control fairness, it must be attached either to the slowest link on a route or to a classful, shaping qdisc (such as HTB). When a shaping qdisc is used, SFQ's parent class must have a maximum rate less than or equal to the actual available bandwidth. See the HTB documentation for examples. Problem: SFQ isn't being very fair. Possible reason: SFQ divides traffic into a number of smaller queues ("slots"), one for each flow. Flows are distinguished based on whatever aspect of the packets is hashed, such as source or destination. The maximum number of slots is set by the 'depth' parameter, which defaults to 128. If there are more active flows than slots, some flows will actually start sharing slots. Obviously, this is not good, and fairness will suffer. Diagnostic: Do you have more flows than slots? For example, if you have SFQ set to hash by destination IP, is your number of concurrent downloaders higher than 128 (the default number of slots)? Solution: Set SFQ to use a larger number of slots with the 'depth' parameter. Problem: I have SFQ set to 'hash src', but it isn't doing anything. Possible reason: If SFQ is on the WAN (Internet-side) interface of a router that uses NAT or masquerade, the source IP of outgoing packets will always be the IP of the router. Diagnostic: Are you using NAT or masquerade? Solution: Use 'hash ctorigsrc' or 'hash ctnatchg' instead of 'hash src'. Problem: RTNETLINK answers: No buffer space available Possible reason: You're using a depth value that is too high. SFQ can't set a hard limit for depth because the maximum value depends on the size of an external structure. The only way is to try to allocate memory and see if it fails. Diagnostic: Does using a smaller depth value (such as the default) work? Solution: Use a smaller depth value. If you actually have a need for a larger depth than SFQ can currently provide, send me a feature request. ================BUGS=============== None known. If you have any to report, send them to bugfood-c [at] fatooh [dot] org.