When a problem happens on your network, you need a system in place to detect the issue and its source. Fault management tools continuously scan a network for problems, then analyze the situation and provide users with the solution they need to implement. Depending on the issue, the fault management tool may automatically dispatch restorative scripts or programs to instantly fix problems. Fault management is important in both finding and fixing network problems, and is an invaluable resource for network teams.
Many network performance monitors (NPMs) come equipped with fault management capabilities built-in. NPMs examine your network’s current performance and alerts users of problems that are dragging the network down. Because NPMs hunt for performance issues, it’s easy for them to also implement fault detection and response functions. Below, we list the fundamentals of network fault management and how NPMs use it to improve network performance.
The fault management cycle
Fault management operates on a continuous cycle that always looks for problems on your network. While every fault management program’s specific process is different, the general fault management cycle follows the same basic steps:
- Detection. The fault management tool checks the network and discovers problems that affect performance or data transmission.
- Diagnosis. The tool determines what the problem actually is and where on the network it’s located.
- Alerting. The tool alerts the user to the problem. If a tool creates multiple alerts about the same problem, it automatically correlates them and combines them into one alert before sending it.
- Resolving. The tool automatically executes programs or scripts designed to fix the problem. If the automatic solutions don’t work, the management program recommends manual intervention.
Discovering and monitoring devices
A fault management tool can’t operate efficiently if it doesn’t have a clear picture of the network’s topology. NPMs include network visibility capabilities that allow them to create a map of every device and node connected to the network. This allows the fault management functions to see everything on the network that might go down or cause performance issues.
Fault management programs send inquiries to devices and nodes on a routine basis to determine if the hardware is functioning properly. They collect information like system logs and SNMP trap data and analyze it for any abnormal performance or behavior. Sometimes, nodes that independently detect performance problems will send information to the fault manager without being prompted by the program. The fault manager takes all this information and uses it to find any problems that need to be addressed.
Automatically fix minor problems
Not every problem that affects your network’s performance is huge or requires a lot of attention. Many problems simply require a one-step fix that takes little time to apply. Fault management tools can automatically apply fixes to these problems whenever they occur. This allows IT teams to focus on actual problems that will take time and effort to fix.
Because fault management tools are constantly searching for performance issues, the program will fix these issues before you know about them. You’ll still be alerted to any events that happen even if the software takes care of it by itself. You can set different intensities of monitoring based on how problematic the area traditionally performs. An area of your network that experiences more issues than others can be monitored more frequently or more rigorously.
Our Network Monitoring Buyer’s Guide contains profiles on the top network performance monitor vendors, as well as questions you should ask providers and yourself before buying.
Check us out on Twitter for the latest in NetMon news and developments!