Network Flows
Introduction
Network flows about network denial of service attacks is the subject addressed here. Indeed, network flows allow for very simple detection of distributed denial of service attacks due to the quantity of flows generated by the zombie army attacking you.
In this article, we’ll present other applications of these network flows: intrusion detection or policy violation, detection of hidden channels, or even worm proliferation to name just a few examples. While the detection of distributed denial of service is something relatively specific to operators and Internet service providers, are the other applications well suited to an internal network?
Netflow
Netflow is Cisco’s implementation of the technology. Originally, Netflow was a routing technology that consisted of “routing” by flows, which is not widely used nowadays. Each packet that passes through a router generates either a new flow or becomes part of an existing flow. The main characteristics of the IP datagram (by the standards of the time, which is not necessarily the case today since many applications encapsulate in other protocols), namely source and destination IP addresses, source and destination ports (or other features like message type and code for ICMP), transport protocol, ToS (Type of Service) as well as the input interface form the basic seven-tuple. Other information is added such as data and packet counters, output interface, TCP flags (here ends the characteristics of version 1), AS number (version 5), and the address of the next hop (i.e. the next router traversed). The lifetime of a flow in the cache is limited: on a router, it is destroyed after 15 seconds of inactivity, 30 minutes of activity, upon a TCP RST/FIN, or when the cache is full. Also note that a flow is unidirectional, so a TCP session generates two flows (one inbound, one outbound).
With the rise of MPLS networks, the packet is no longer routed but switched based on labels. Netflow has been adapted to allow accounting in such environments (Netflow version 9). The most commonly deployed version is version 5. Version 8 adds support for flow aggregation on the router side, while version 9, besides MPLS support and greater flexibility, adds IPv6 support and is closely aligned with IPFIX (or vice versa), the IETF standard for network flows.
Until recently, Netflow was ingress-only, meaning that only packets entering an interface were accounted for. In recent IOS versions (if the hardware supports it), egress netflow can be configured. This allows detecting incoming and outgoing attacks without having to configure Netflow on all router interfaces and verifying that the flow is not counted multiple times when traversing a large network.
There are three sampling methods: “full”, “sampled”, “random sampled”.
Full generates information for each network flow that will be exported. This is the oldest method and is supported on almost all routers, but is no longer very common among operators because the router load and amount of accounting information generated, especially during a distributed denial of service, are too significant. However, in an internal network, it is almost mandatory if you want to detect slow reconnaissance or policy violations trying to be discreet.
Sampled allows defining the percentage of flows to export over the total number of generated flows. Generally, operators limit themselves to 1 in 100, or even 1 in 1000. Even at 1 in 1000, a distributed denial of service remains relatively easy to detect. The advantage of this method is the reduction of the router’s CPU load and the amount of Netflow exported. The disadvantage is that it is not good from a statistical point of view (deterministic function).
Random Sampled was introduced relatively recently on platforms of the 72xx/75xx type (while sampled was only available on GSRs and 76xx, i.e., routers that support distributed CEF). The difference between sampled and random sampled is that the latter selects a datagram randomly from among the
configured, which is statistically better.
Until recently, sampling was per router; in very recent IOS versions, it is possible to classify datagrams according to different attributes (IP header fields, NBAR, etc.) and have sampling levels per class.
Examples of configuration for a router and multi-level switch are available in issue 4 of MISC. Note that in recent IOS versions for switches of the 65xx/76xx family, versions 5 and 8 are supported (the TCP flag was not present in version 7, among other things), but the configuration, differences in functionality depending on the supervisor and routing cards, and choosing the right IOS remain a bit of a puzzle.
In networks without routers or with equipment that cannot generate Netflow, an alternative is to connect a PC to a port in listening mode (SPAN or mirror port) and use it to export Netflow information. A list of tools (Netflow clients and servers, PCAP->Netflow generators) is available here: https://www.switch.ch/tf-tant/floma/software.html. This approach obviously doesn’t really scale, even on an internal enterprise network once it becomes large and not all traffic passes through a well-defined core.
We have discussed Netflow probes and sources, but what about the server side?
The storage method (text file, database, etc.) and the retention policy have an impact on the size of the disks you will need, but another important element is flow aggregation. This function is also available in Netflow version 8 and can therefore be activated at the source, on the router, but you should avoid activating it to avoid losing granularity. On the server, the consolidation policy must take into account active/non-active flows and time: if you aggregate over a day, the same source/destination IP pair and source/destination port (especially if these are highly demanded services like DNS) could appear multiple times and be consolidated into a single flow, which would be incorrect.
Netflow or PCAP?
The best answer is probably: both. Indeed, Netflow ignores the application content transported (only elements of the IP header and headers of transport layer protocols are processed), while a complete network trace allows you to have all this information. Netflow provides a macroscopic view of the network, PCAP a microscopic view. By focusing too much on the microscopic, we often tend to forget the global vision, especially in the context of network problem resolution. Additionally, as with logs generated by systems and applications, it is advisable to define a centralization policy for network “logs” - this is a prerequisite for post-mortem analysis of incidents.
For cost and management reasons, to name just these two factors, it quickly becomes clear that you cannot deploy a listening probe on all switches in a network. So why not simply make PCAP captures in strategic locations of the network? This alternative is a fairly good approach in many cases, but a major problem must be solved first: what is the capacity of my storage system? Indeed, a complete capture takes up a lot of space, and the amount of GB (or TB) available (and their cost) will quickly limit the number of probes and retention period…
The most interesting approach is to deploy Netflow across the entire network and listening probes at strategic locations: communication between the inside and outside of the network (Internet access, remote maintenance, VPN access, partner extranet, etc.), the core of the network (if you concentrate all your traffic on a few “large” switches), in front of critical servers, etc. If you decide to centralize this information, you will also have to take into account that your network traffic will double (normal traffic + PCAP traffic between probes and storage base), or set up a dedicated network for this purpose (which will also avoid re-sniffing the PCAP traces).
Nothing prevents you from linking these two systems. For example, it may be interesting to connect a Netflow-based anomaly detection system with a network intrusion detection tool (NIDS). With this approach, if Netflow detects an anomaly, you can either activate PCAP probes to try to get more information (if the event or attack is still ongoing), or if you have a rolling snapshot stored in a database for a few hours, launch a query. Or conversely, the NIDS raises an alert and you retrieve Netflow history from your database.
Detection
Before talking about detection, it is appropriate to address the topic of discovery… How many administrators or network managers know their network architecture and the traffic it carries well?
Network Discovery
A very interesting application of network flows is network discovery: they allow you to map it, discover the applications using it, characterize “usual” behavior (baseline), etc. This discovery stage often leads to a “cleaning” stage, or even the definition of a new architecture based on security segmentation needs (networks with different levels of sensitivity).
Scanning
A scan (especially if it’s not a slow scan) is relatively simple to detect: many small flows to the same destination (IP address or port) with, if it’s a TCP scan, few established sessions (by monitoring TCP flags) and for a UDP scan, many ICMP return messages to signal that the port is closed (or no response if the firewall or host is strict).
Viruses and Worms
The detection of viruses and worms remains a relatively simple application to implement. Indeed, most of them are not very discreet and seek to spread very quickly, most often either by direct infection attempt or after a search for open ports. An infected machine will generate many very short and very small flows to a large number of machines and/or ports.
Many worms include an engine that will look for new machines to infect following a more or less random algorithm. Most often, the network prefix of the source machine is the first to be scanned, then the upper subnet, and finally addresses taken more or less randomly (pseudo-random). At this point, datagrams may have as their destination a so-called “bogon” network prefix (https://www.cymru.com/Documents/bogon-dd.html#dd-route-non), meaning it has not yet been allocated or is not supposed to be present in the BGP routing tables of the Internet.
Often, when an attacker decides to launch a denial of service, they communicate to the agents that are part of their zombie army to forge (spoof) their source IP address. If you see outgoing flows with source addresses that are not part of your internal addressing plan, then the probability of infection is high.
To detect viruses that connect to sites (often via HTTP) to retrieve a new payload, it can be interesting to place in a special rule all sites listed by antivirus editors in the public descriptions they make available on their sites.
Hidden channels and backdoors are commonly found on WiFi networks based on encapsulation in ICMP or DNS. In enterprise environments, tunneling over HTTP is most common. These flows are generally to and from the same IP addresses, the destination port is fixed, and the flow size is significant: an ICMP or DNS flow that lasts more than a few seconds and exceeds a few kilobytes quickly becomes suspicious, as does an HTTP flow that is relatively symmetrical in terms of size and long (whereas HTTP flow is typically short - except downloads - and very asymmetric - sending vs. receiving).
A backdoor often translates into a flow to a non-standard port on the infected system, especially if this flow appears at odd hours. It can also be interesting to set up rules to detect Trojans that listen on their default port. Another type of backdoor is the “alternative” access to the Internet: in many companies, the very strict filtering policy is no longer to the liking of certain people. Flows to or from public IP addresses that do not transit through your Internet access gateway deserve to be analyzed.
Compromised machines, especially if they are servers, can be detected during remote control by the attacker: this often occurs via connection to a non-standard listening port. However, some exploits use a mechanism that allows going back through the same port as the application, or even using the same session. These are not easy to detect, but unless a backdoor is set up, it is not always easy to re-exploit the same vulnerability several times in a row (due to crashes during or because of exploitation), and the fact of keeping the session open could also be detected.
It is also important to observe the TCP flags (be careful, check if it is actually sent, which is not always the case) which often allow determining the direction of communication, namely which side is the client and which side is the server.
Defining a flow policy for different systems (clients and servers) makes it easy to detect an event that deviates from the baseline. Systems are grouped according to their network behavior and the different exchanges, either between these groups (for a coarse view) or between all systems (for a very fine view). The complexity of the system increases with the number of flows and the number of systems, which must be taken into account during the design phase (complexity of detection, but especially of policy definition).
To go further, you can store MAC address/physical port@equipment pairs in the database. This allows tracking the movement of these addresses on the network (laptops, WiFi, DHCP, etc.), alerting if new addresses appear, and coupled with Netflow historical data, detecting major deviations in behavior. It is likely that information such as MAC address and VLAN ID will be added to Netflow exports in the near future, which will avoid having to combine Netflow with SNMP polls to retrieve this information.
Spyware, IM and Auto-update
It can be interesting to monitor the first flows that follow DHCP allocation of network parameters, logging in to the station, or the first outgoing communication: indeed, this is when most “spy tools” and update or reporting features of software start “phoning home”. It is also very common to see flows that return every
And even if you don’t deploy a network flow analysis solution, it is very interesting to sniff your computer’s communications, particularly at startup. You often discover interesting and unexpected things.
FTP, P2P, etc.
Protocols and applications that use dynamic ports or rely on a control connection and a data connection are not simple to handle. Indeed, Netflow does not maintain state or relationships between two flows, and does not perform any protocol analysis that would be necessary to identify data connections managed by a control connection.
It is therefore possible to identify P2P applications based on fixed ports using only network flows. To get around this limitation, some vendors have added a layer that allows identifying the P2P protocol based on signatures, but this has nothing to do with Netflow anymore. A feature like NBAR (Network-Based Application Recognition) or better yet, equipment dedicated to P2P traffic management, are possible options.
FTP implementations that respect the RFC and some P2P applications are an exception because the data port and control port are +/- 1. So, even if the server listens on a port other than the standard port (21/TCP for FTP and 20/TCP for FTP-DATA), it is possible to identify the control+data pair.
Historically, groups that publish “cracked” applications generate files of a standard size (like 1.44 MB in the era when floppy disks were still used), which also helps strengthen the quality of detection based on flow size.
Historical Research
We often tend to only look at the top 10 of a ranking (such as the user who abuses the company’s Internet connection the most), but in setting up historical data search mechanisms, the top 10 is as important as the bottom 10. And this on all parameters that Netflow allows to have. In this way, we can often detect “lost” flows that can give interesting indications about a well-concealed attack or slow port reconnaissance.
It is also interesting to note that, in certain environments (and depending on local laws and regulations), storing only Netflow information does not constitute an invasion of privacy. This less intrusive aspect often allows “selling” this solution internally. And often, the flow says enough and the transported data is just a “bonus,” a bit like an email subject in plain text and its encrypted content…
There is a risk related to the injection of false Netflow accounting messages. Indeed, they are neither signed nor encrypted, and the only defense mechanism (very weak) is the sequence number. It is relatively simple to forge such packets, given that they are transported over UDP.
Conclusion
We have focused on describing different applications of network flows. As you have seen, their deployment on an internal enterprise network can provide unprecedented visibility, especially nowadays where networks are increasingly open and flows increasingly complex. Even a test deployment can reveal many unsuspected things…
If your network is highly segmented, with many firewalls, having Netflow sources in these different segments and a collector that centralizes the flows can help you validate your security policy and especially its implementation.
The various examples are far from an exhaustive list; it’s up to you to find new applications and define the detection criteria that correspond to your particular environment. And remember that just as with an intrusion detection tool, the quality of alerts and their number depend on the time spent fine-tuning your configuration.
The next step might be to link these detection mechanisms with automatic protection mechanisms (disabling the port on the switch for example). Technologies exist, the future will tell if they will be adopted.
Resources
Last updated 24 Sep 2008, 11:21 CEST.