When it comes to finding problems on the network, the key is learning to listen to what the network devices have to say. And just polling for SNMP data isn't enough.
"As networks become more intelligent and the applications running on the network become more collaborative in nature," says Cliff Meltzer, Senior VP of Cisco's Network Management Technology Group, "there are other interesting data collection techniques, analytics, and data classification techniques that we can use to raise the level of intelligence and capabilities that the network brings to the overall computing infrastructure."
A key step is taking a look inside the packets to see what's crossing the network. But of the two common techniques, which is best - flow analysis or packet analysis? The answer is both are needed to get a complete picture.
Getting Down to Details
The first question in determining which tool to use depends on the granularity of information needed to resolve a problem. SNMP give the highest level data, flow analyzers take a middle route, and packet analyzers focus in on the smallest details.
"These flow analysis collectors are only one part of a comprehensive network and application monitoring framework," says Gerry Carver, a principal consultant in International Network Services' (INS) performance management practice. "It is vital that enterprises and service providers have a wide range of tools that can enable monitoring of the infrastructure from a high level down to the packet level for troubleshooting specific problems."
When one needs to get right down into the details of what is happening with an application, there is no substitute for a packet analyzer. Packet analyzers - also known as probes, protocol analyzers or packet sniffers - typically consist of software or hardware to capture each packet, decode it and analyze it. Often these consist of appliances installed at key network locations. Port mirroring is used to direct a copy of each packet to the sniffer for analysis, while letting the traffic pass through unimpeded. Examples of packet analyzers include Network Instruments' Observer, Microsoft Network Monitor, Colasoft Capsa and Network General's Sniffer, the open source Wireshark (formerly called Ethereal), Sun Microsystem's snoop (included in Solaris), Fluke Networks' Protocol Inspector and tcpdump, an open source command line sniffer for Unix-like systems.
Packet analyzers have a wide variety of performance and security functions. Looking first at performance:
Since sniffers are examining the content of all the packets, like an organism's immune system, they can recognize the DNA of packets that don't belong in the network. This makes them extremely useful for security applications such as:
A network analyzer doesn't replace firewalls or other perimeter defenses, but sits on segments in the LAN to locate packets that made it past the firewall, or got inside the network, perhaps on an infected laptop. They can look for a known threat signature, but can also identify and issue alerts on packets that are out of the norm, even if they are not yet a known threat.
They can also identify machines on the network that are acting as zombies for a Denial of Service attack, for distributing spam, or acting as an illegal FTP server. Each of these cases would be identified by an unusual amount of traffic from the machine using an unexpected protocol.
Drawbacks of Packet Analyzers
Packet analyzers have their unique strengths, but also their limitations. The first of these is cost. Appliances are expensive so they are typically just used at key locations or installed on a temporary basis after an anomaly has been detected by other monitoring software. Keith Nelson is a staff consultant to a Fortune 500 company with 260 Cisco routers and 1600 HP Procurve switches on its global MPLS (Multiprotocol Label Switching) network.
"Although we were using probe-based technology to analyze application impact," says Nelson, "the increased network complexity and migration to MPLS made probe analysis difficult and expensive in our environment."
Even if one had an unlimited budget to purchase enough sniffers to cover the entire network, other problems exist. To begin with, in order for the packet analyzer to see the traffic, it needs to extract a copy of the packet from the switch. One way to do this is to set up a hub on the port and plug the analyzer into the hub to listen to all traffic. Next-generation switches support spanning or port mirroring where all traffic on a port is mirrored to another port that the analyzer plugs into. This, however, produces a significant performance hit.
Another drawback is the volume and complexity of data produced by traffic analyzers. So, while they are useful at detecting the cause of problems that can't be found by other means, for rapid debugging of network problems, SNMP and NetFlow or sFloware are usually preferable.
In Nelson's case, since the Cisco routers use NetFlow and the HP switches use sFlow, he needed to support both flow technologies. He went with the Traffic Sentinel software from InMon Corporation, the firm that developed sFlow. With it he can monitor SNMP, NetFlow and sFlow across his entire network from a single server.
"After using InMon for over a year, it has become the tool of choice for traffic analysis," he says. "Although we still use Sniffer products for packet analysis, the broad view we have of our network using InMon has greatly enhanced our ability to quickly diagnose traffic patterns and address the underlying issues."
The Broader Look
When one needs a broad look at the network, flow monitoring is the quick and simple solution. Flow monitoring is a push technology activated on the switch or router to provide data on packets to a "collector" - a server set up with a database and analysis software to convert that raw flow data into charts, graphs and reports for instantaneous use.
Flow technologies come in two varieties: sFlow and NetFlow. sFlow is a hardware-based technology developed by Inmon Corporation and consists of a chip installed in a network device. sFlow, therefore, doesn't add any load to the CPU. The more common flow technology is NetFlow. Cisco developed NetFlow in the mid-1990s and it is now part of the company's Internetwork Operating System (IOS). Recently the IETF (Internet Engineering Task Force) released a standard called IPFIX based on NetFlow v.9 broadening its use far beyond Cisco devices.
"The beauty of this is, because it is a standard, you can look at data from different vendors and still apply the correct level of forensics or traffic analysis to it," says Meltzer.
Both flow technologies operate on the same basic principle. They examine a stream of packets passing through for common characteristics (e.g. source IP address, destination IP address, protocol). When the packets are identical on all characteristics, they are considered to be part of the same flow. The packets belonging to the flow are counted, and the data is then sent to the collector.
Flow technologies are a step above the data that is provided by SNMP.
"NetFlow is a mature, well-adopted standard and nearly all large service providers and enterprises use it," says Meltzer. "It is very powerful and necessary for doing forensics on security type events when you need to see much more of the raw data."
For example, SNMP can report on total bandwidth utilization, but can't identify what users, protocols or applications make up that traffic. So, if a user calls in and says the network is slow, clicking on an SNMP screen might show that utilization is pegged at 100% but doesn't tell you why. If one also had NetFlow activated on the router port, one could instantly see who that traffic was coming from and what they were sending. No, it wouldn't let you look at the content of the packets, but it would show that 95% of the traffic was an FTP transfer between servers that could be moved to an off peak so it wouldn't impact service levels.
It is also, as Meltzer mentioned, useful for security since it identifies the devices and protocols on the network. Duncan Bond, data network supervisor for the State of Maine, uses Scrutinizer software from Plixer International, Inc. as a means of accessing NetFlow data to resolve network problems.
"Yesterday users at a site reported extreme slowness and Scrutinizer showed us that a particular conversation between two servers was causing the problem" he says. "Someone was copying files between the two servers and, since it was not critical, we were able to get support staff to cancel that transaction and reschedule it off hours."
He also cites the example where every Tuesday morning bandwidth to a number of sites was saturated. By looking at the NetFlow data, it became clear that this was caused by automated software updates.
"Rather than adding bandwidth, they were able to come up with alternate ways to handle the updates without causing problems at the site," says Bond. "It became a traffic engineering exercise, changing business practices rather than paying for new circuits."
Security was another area where NetFlow is useful. For example, when a vendor hooks an infected laptop into the network, that traffic shows up as an anomaly, and it also helps to verify that links which should run encrypted actually were.
"With NetFlow it is fairly easy: look at the graphs and see what protocols are present on the WAN links," he says. "On the edges we could see the point to point conversations, but not in the encrypted tunnel, which validated that what we expected to be happening was happening."
There are also times that security problems show up unexpectedly.
"I might be looking at an Internet link to see what is our load of HTTPS and along the way I might spot a protocol I shouldn't see there," says Bond. "Then I would follow up to see the real significance of that traffic."
No Conflict
Although NetFlow is cheaper and easier to use than packet analysis, the two technologies are not truly in conflict with each other. Rather, each one gives a different way of looking at network traffic. The State of Maine, uses SNMP for a high-level look at the utilization statistics, NetFlow to spot which applications are running on the network, and then finally packet analyzers themselves to debug anomalous applications. The state's uses 500 Cisco and Nortel routers to connect 750 business units across the state using a mix of SONET ring, ATM, T-1 and dialup. It had been using Plixer International, Inc.'s (Sanford, Maine) WebNM network management software for SNMP, and Fluke Network's Protocol Inspector devices to analyze packets crossing the backbone.
"We have a number of Fluke remote probes, primarily on our major critical links, such as the one to the Internet," says Bond. "That allows us to have a realtime RMON-based solution as well as a packet capture mechanism if we need it."
He says it is simple enough to go out and install a packet analyzer at locations around the state capital area as needed. But just getting out to some of the remote offices entails a four-hour drive each way. In addition, he wanted to be able to look back and see what had caused a problem in the past, not just what was currently on the network. Like other network administrators, he gets calls informing him that the network was slow the day after the problem occurred.
After some investigation he decided that activating NetFlow on the Cisco routers and IPFIX on the Nortel boxes would give him the missing data. He chose Plixer's Scrutinizer product which collects and analyzed NetFlow, sFlow and IPFIX data. It was a simple process to download the software and follow the prompts to configure it. Directing all the 500 routers to send the data added another six to eight hours. Now, he can go into WhatsUp Gold and see the status of the network connections. If there is a problem, he first goes into WebNM to check the utilization statistics, and then clicks over to Scrutinizer to see what is slowing down the traffic. One time he tracked the saturated bandwidth to a file transfer between servers and got that process killed and moved off hours. In another case, the NetFlow data told him that the slowdown every Tuesday morning was caused by automated software updates.
"Rather than adding bandwidth, they were able to come up with alternate ways to handle the updates without causing problems at the site," says Bond. "It became a traffic engineering exercise, changing business practices rather than paying for new circuits."
He says that with it he has been able to monitor how much bandwidth applications are using for capacity planning purposes, and can verify that there aren't rogue applications running on the network. But while it is a useful tool, it still doesn't answer every question. For example, one time an application was taking an inordinate amount of time for each transaction.
"With NetFlow, we could see that the conversation was happening but not determine why it was running slow," he says.
That was a situation which called for inspecting the content of the packets, and using the Fluke device, he found what was behind the slowdown.
"With the packet analyzer we were able to look inside and see that there was an invalid directory pointer being called in some NetBios packets that were putting in about a three-minute delay on every transaction," he says.
Joe Zwers is a freelance writer specializing in IT.
Comments
Post new comment