QoS on Linux

From The sin within

Jump to: navigation, search

Contents

Qos on Linux with tc, ipset and iptables

The purpouse of this article is to describe how you can do QoS on Linux in the specific case when you have 2 upstream peers that you do BGP with them and receiver from them only their customer routes plus the default route and when you also have different bandwidth allocated from each of them.

Prerequisites

  • 1 Linux machine with 3 netowrk cards (eth1 ISP1, eth2 ISP2 and eth0 for LAN connection)
  • knowledge of shell scripting, quagga configuration
  • iptables 1.3 and ipset 2.2.9

Getting the prefixes from each provider

After you configured the BGP sessions with the ISPs, on the Linux box, when you do an ip route list at the Linux command prompt you should get an output like this:

89.120.0.0/16 via 80.86.123.193 dev eth1  proto zebra equalize
141.85.0.0/16 via 80.86.123.193 dev eth1  proto zebra equalize
89.136.0.0/15 via 80.86.123.193 dev eth1  proto zebra equalize
82.76.0.0/14 via 82.76.246.69 dev eth2  proto zebra equalize
86.120.0.0/13 via 82.76.246.69 dev eth2  proto zebra equalize

In the example output above, you get only to see the actual Linux FIB. If you want to see the RIB, log onto the bgpd daemon (usually localhost port 2605 and do a sh ip bgp to see what routes you get from your peers)


Creating the ip sets

We create two kind of ipsets for each provider (one primary, one secondary)

/sbin/ipset -N ISP1-secondary nethash --hashsize 524288
/sbin/ipset -N ISP2-secondary nethash --hashsize 524288
/sbin/ipset -N ISP1-primary nethash --hashsize 524288
/sbin/ipset -N ISP2-primary nethash --hashsize 524288

ipset has the ability to swap values between two sets atomically, so we use this feature to make sure that when we update the tables, we won't have incosistencies in the QoS process


Populating the IP sets

Now, we need to get the data from each provider (the shell script presented below is still work in progress and therefore you should see a lot of hardcoded values, but the main principle of operation will be the same when the script will be rewritten properly)

#!/bin/sh
# (c) 2005-2006 by sin@imacandi.net & adixor@pvs.ro

TMP_FILE="/tmp/qos_update"

#get the data from ISP1 by getting the prefixes which have ISP1 gw next-hop
/sbin/ip route list | grep 80.86.123.193 | grep -iv default > $TMP_FILE 2> /dev/null
for ip in `cat $TMP_FILE | cut -f1 -d" " `; do
       ipset -A ISP1-secondary $ip
done

#get the data from ISP2 by getting the prefixed which have ISP2 gw next-hop
/sbin/ip route list | grep 82.76.246.69 | grep -iv default > $TMP_FILE 2> /dev/null
for ip in `cat $TMP_FILE | cut -f1 -d" " `; do
       ipset -A ISP2-secondary $ip
done


After this, we populate the primary table by doing a swap between the secondary and primary table.

/sbin/ipset -W ISP1-secondary ISP1-primary
/sbin/ipset -W ISP2-secondary ISP2-primary

/sbin/ipset -F ISP1-secondary
/sbin/ipset -F ISP2-secondary

Marking traffic with iptables

Now we use iptables to set the marks necessary for tc to match the traffic flows.

/sbin/iptables -t mangle -A PREROUTING -m set --set ISP1-primary src -j MARK --set-mark 0x1
/sbin/iptables -t mangle -A PREROUTING -m set --set ISP1-primary dst -j MARK --set-mark 0x2

/sbin/iptables -t mangle -A PREROUTING -m set --set ISP2-primary src -j MARK --set-mark 0x3
/sbin/iptables -t mangle -A PREROUTING -m set --set ISP2-primary dst -j MARK --set-mark 0x4

This way we can do ingress and egress shaping.

Creating tc shaping classes

And now, the actual tc rules for shaping

#!/bin/sh

TC="/sbin/tc"

$TC qdisc del dev eth1 root
$TC qdisc add dev eth0 root handle 1: htb default 2
$TC class add dev eth0 parent 1: classid 1:1 est 1sec 8sec htb rate 15000Kbit ceil 100000Kbit

# default class - no one matches this class (so far)
$TC class add dev eth0 parent 1:1 classid 1:2 est 1sec 8sec htb rate 15000Kbit ceil 100000Kbit

# ICMP class, so ICMP protocol (especially ping) works well ;)
$TC class add dev eth0 parent 1:1 classid 1:3 est 1sec 8sec htb rate 8Kbit ceil 100000Kbit

#ISP1 class
$TC class add dev eth0 parent 1:1 classid 1:4 est 1sec 8sec htb rate 10000Kbit ceil 40000Kbit

#ISP2 class
$TC class add dev eth0 parent 1:1 classid 1:5 est 1sec 8sec htb rate 1000Kbit ceil 4000Kbit

#Internet class (traffic that is not matched by the above classes will be considered Internet traffic and shaped accordingly)
$TC class add dev eth0 parent 1:1 classid 1:6 est 1sec 8sec htb rate 1000Kbit ceil 1500Kbit

The above example sets the stage for shaping creating the required classes.

Shaping example for a client

Actual shaping of a customer

# ISP1 and ISP2 shapes
$TC filter add dev eth0 parent 1: protocol ip prio 1 u32 match mark 0x1 0xffff match ip dst 1.2.3.4/32 police drop flowid 1:4
$TC filter add dev eth0 parent 1: protocol ip prio 1 u32 match mark 0x3 0xffff match ip dst 1.2.3.4/32 police drop  flowid 1:5
# Internet access
$TC class add dev eth0 parent 1:6 classid 1:10 est 1sec 8sec htb rate 64Kbit ceil 192Kbit
$TC filter add dev eth0 parent 1: protocol ip prio 2 u32 match ip dst 1.2.3.4/32 police drop flowid 1:10

Explanation of how classes have been setup:

Class 1:1  - root class
Class 1:2  - default (nothing is being matched here) - parent is root
Class 1:3  - ICMP traffic - parent is root
Class 1:4  - Traffic from ISP1 - parent is root
Class 1:5  - Traffic from ISP2 - parent is root
Class 1:6  - Traffic from the Internet - parent is root
Class 1:10 ... 1:x - Internet traffic per client - parent is 1:6 for each

Notes

There is also another project called mipclasses for doing this which is based on iptables subchaining to simulate a tree so that the matching is done as fast as possible. The solution presented here is faster because the lookup is done in the ipset hash and the search takes only one lookup in the hash per IP packet (no tree parsing for each packet forwarded by the system). This article is based on the work done by sin@imacandi.net and adixor@pvs.ro

Personal tools