Ping monitoring program. Network monitoring: As we follow, all nodes worked for large companies

According to this optics, which runs through the forest to the collector, you can conclude that the installer did not follow the technology. The attachment in the photo also suggests that he is probably a sailor - a node-node.

I am from the Team of Physical Operations of the Network, Simply put - technical support that is responsible for flashing the bulbs on the routers blissing, as it should. We have "under the wing" various large companies with infrastructure throughout the country. I don't climb into their business, our task is to work on the physical level and traffic passed as it should.

The general meaning of the work is a permanent survey of nodes, removing telemetry, test runs (for example, checking for vulnerabilities search), ensuring performance, monitoring applications, traffic. Sometimes inventory and other perversions.

I will tell about how it is organized and a couple of stories from the departure.

As it usually happens

Our team is sitting in the office in Moscow and removes the telemetry of the network. Actually, these are permanent ping nodes, as well as obtaining monitoring data if the glands are smart. The most frequent situation - Ping does not pass several times in a row. In 80% of cases for the retail network, for example, it turns out to be disconnected by the power supply, so we, seeing such a picture, do the following:
  1. First call the provider about accidents
  2. Then - to the power station about the disconnection
  3. Then try to establish a connection with someone on the object (this is not always possible, for example, in 2 nights)
  4. And, finally, if in 5-10 minutes the above described did not help, leave for themselves or send the "Avatar" - the contractor engineer sitting somewhere in Izhevsk or Vladivostok, if the problem is there.
  5. With the "Avatar", they keep a constant connection and "we" lead it by infrastructure - we have sensors and service manuals, in its pliers.
  6. Then the engineer sends us a report with the photo about what it was.

Dialogues are sometimes like:
- So, the connection disappears between the buildings No. 4 and 5. Check the router in the fifth.
- Order, included. There is no connector.
- OK, go through the cable to the fourth case, there is still a node.
- ... OPPA!
- What happened?
- Here the 4th house was demolished.
- What??
- Apply the photo to the report. I can not restore the house in SLA.

But more often it turns out to find a break and restore the channel.

Approximately 60% of the departures - "in milk", because either the power is interrupted (shovel, foreman, intruders), or the provider does not know about his failure, or a short-term problem is eliminated before the assembly arrival. However, there are situations where we learn about the problem earlier users and earlier by the IT services of the Customer, and report on the decision before they will understand what happened. Most often such situations happen at night, when the activity in customer companies is low.

Who needs it and why

As a rule, any large company has its IT department, which clearly understands the specifics and tasks. On average and big business, Enicheyev and network engineers often outsource. It is simply profitable and convenient. For example, one retailer has its very steep aestishniks, but they are far from replacing routers and cable tracking.

What are we doing

  1. We work on appeals - packets and panic calls.
  2. We make prevention.
  3. Watch the recommendations of iron vendors, for example, in terms of that.
  4. Connect to the monitoring of the Customer and remove the data from it to leave the incidents.
With monitoring history often lies in the fact that it is not. Or he was raised 5 years ago and is not very relevant. In the simplest case, if there is really no monitoring, we offer the customer a simple open source Russian Zabbix for free - and it's good for him, and it's easier for us.

The first way - Simple Chersks is just a machine that pines all network nodes and ensures that they are correctly answered. Such implementation does not require any changes or minimum cosmetic changes in the customer's network. As a rule, in a very simple case, we put a robix directly to yourself into one of the data centers (we have their own two in the office of the Crook on Volochevskaya). In more complex, for example, if your protected network is used - by one of the cars in the Code of the Customer:

Barix can be applied and more difficult, for example, it has agents that are set on * NIX and WIN-nodes and show system monitoring, as well as External CHECK mode (with support for SNMP protocol). Nevertheless, if the business needs something like that, then either they already have their own monitoring, or a more functional and rich solution is selected. Of course, it is no longer open, and it costs money, but even a banal accurate inventory is already about a third of the costs.

This is also done, but this is the story of colleagues. Here they sent a couple of screenshots of infosima:

I am the operator "Avatar", so I will tell you further about my work.

What does a typical incident look like

Before us, screens with this general status:

At this object, Zabbix collects for us quite a lot of information: party number, serial number, CPU load, device description, interface availability, etc. All the necessary information Available from this interface.

An ordinary incident usually begins with the fact that one of the channels leading to, for example, the customer store (whom he has 200-300 pieces). Retail is now a wettered, not the fact that about seven years ago, so the box will continue to work - the channels are two.

We take for the phones and make a minimum of three calls: provider, power plants and people in place ("Yes, we have been driving the reinforcing fittings, someone's cable of crops ... and, yours? Well, well, what they found").

As a rule, no monitoring before the escalation would have passed hours or days - the same backup channels were not always checked. We know immediately and leave immediately. If there is additional information besides pings (for example, a model of a buggy piece of iron) - immediately a complete set of an exit engineer with the necessary parts. Further already at the place.

The second frequency is a regular call - the failure of one of the terminals from users, for example, a DECT-phone or Wi-Fi-router, distributed the network to the office. Here we learn about the problem from the monitoring and almost immediately get a call with detail. Sometimes the call does not add anything new ("Tube take, does not ring something"), sometimes it is very useful ("we dropped it from the table"). It is clear that in the second case it is clearly not a loss of the highway.

Equipment in Moscow is taken from our warehouses of the hot reserve, we have several types of them:

Customers usually have their own stocks of frequently failing components - tubes for office, power supplies, fans, and so on. If you need to deliver something that is not in place, not in Moscow, we usually go to themselves (because installation). For example, I had a night departure to Nizhny Tagil.

If the customer has its own monitoring, they can unload data to us. Sometimes we deployabeys in polling mode, just to provide transparency and control SLA (this is also free for the customer). Additional sensors we do not put (these are made by colleagues that ensure the continuity of production processes), but we can connect to them if the protocols are not exotic.

In general, the customer's infrastructure does not touch, simply support in the form as it is.

According to the experience, I will say that the last ten customers have moved to external support due to the fact that we are very predictable in terms of costs. Clear Budgeting, Good Cases Management, Report on Each Application, SLA, Equipment Reports, Prevention. Ideally, of course, we for CIO customer type cleaners - come and do, everything is clean, do not distract.

Another thing is to note - inventory is becoming a real problem in some large companies, and we sometimes attract purely for it. Plus, we make storage of configurations and their management, which is convenient with different reconnect moves. But, again in complex cases This is also not me - we have a special team that transports date centers.

And one more important point: our department does not deal with a critical infrastructure. Everything inside the Codes and all banking-insurance and operator, plus the nucleus system of retail is an X-command. Here are these guys.

More practice

Many modern devices can give a lot of service information. For example, network printers are very easily monitoring the toner level in the cartridge. You can count on replacement period in advance, plus to have a notification by 5-10% (if the office suddenly began to print not in the standard graphic) - and immediately send Enichea before the accounting starts panic.

Very often we take the annual statistics making the same monitoring system plus we. In the case of Zabikix, this is a simple cost planning and understanding, which is where it has been, and in the case of infosim, there is also a material for calculating scaling for a year, loading admins and all sorts of other pieces. In statistics there is energy consumption - in last year Almost everyone began to ask him, apparently to scatter the internal costs between the departments.

Sometimes real heroic salvation is obtained. Such situations are a rarity, but from what I remember for this year - we saw about 3 nights to increase the temperature to 55 degrees on the cycleommutator. In the distant server stood "stupid" air conditioners without monitoring, and they failed. We immediately called a cooling engineer (not ours) and called the author's duty officer. He put out some of the non-critical services and kept the server from Thermal ShotDown before the guy's arrival with mobile air conditioning, and then the fixing of the regular one.

Polycomers and other expensive video conferencing equipment are very well monitored by the degree of charging batteries before conferences, is also important.

Monitoring and diagnostics are needed to all. As a rule, itselves without experience is to introduce long and difficult: systems are either extremely simple and pre-confined or with an aircraft carrier and with a bunch of typical reports. Sharpening with a file under the company, inventing the implementation of its tasks of the internal IT unit and the withdrawal of information that they need most, plus the support of the whole history in the current state - the path of Rabel, if there is no implementation experience. Working with monitoring systems, we choose the golden middle between free and top solutions - as a rule, not the most popular and "thick" vendors, but clearly decisive task.

Once there were enough atypical appeal. The customer had to give the router to some kind of separate unit, and exactly inventory. The router was a module with the specified serial. When the router began to prepare on the road, it turned out that this module is not something. And no one can find it. The problem is slightly aggravated by the fact that the engineer who worked with this branch last year already in retirement, and left for grandchildren to another city. They contacted us, asked to search. Fortunately, iron gave reports on serialies, and infosim did inventory, so we found this module in the infrastructure in a couple of minutes, described the topology. The fugitive was dragged along the cable - it was in another server in the closet. The history of movement showed that he got there after the failure of a similar module.


Frame from a feature film about Hottabach, exactly describing the attitude of the population to the cameras

Many incidents with cameras. Once it failed at once 3 cameras. Cabin breaks on one of the plots. The installer blew a new in the corrugation, two cameras of three after a number of shamans rose. And the third is not. Moreover, it is not clear where she is at all. I raise the video stream - the last footage right before the fall - 4 in the morning, suitable three men in scarves on the faces, something bright below, the camera shakes greatly, falls.

Once configured the camera, which should focus on the "hares", lying through the fence. While we were driving, thought how we would denote the point where the violator should appear. It was not useful - for those 15 minutes that we were there, a person was penetrated on the object only in the point we need. Straight tuning table.

As I already led an example above, the story about the demolished building is not a joke. Once disappeared link to the equipment. On the spot - there is no pavilion where copper passed. The pavilion was demolished, the cable was gone. We saw that the router was died. The installer arrived, begins to look - and the distance there between the nodes of a couple of kilometers. In his set, Vipnetovsky tester, the standard - sounded from one connector, sounded from the other - went to look. Typically, the problem is immediately visible.


Cable tracking: These are optics in the corrugation, the continuation of the story from the very top of the post about the naval node. Here, in the end, except for absolutely amazing installation, the problem was discovered that the cable was departed from the mounts. There are all those who are not too lazy, and loosen the metal structures. Approximately a five thousandth representative of the proletariat broke the optics.

On one object, all nodes were turned off about once a week. And at the same time. We have been looking for a regularity for quite a long time. The installer found the following:

  • The problem always takes place in the change of the same person.
  • It differs from other things that wears a very heavy coat.
  • The hanger for clothes is mounted automatic.
  • The lid of the automaton, someone, for a very long time, is still in prehistoric times.
  • When this comrade comes to the object, he hangs clothes, and it turns off the machine guns.
  • He immediately includes them back.

On the same object at the same time the equipment was turned off at night. It turned out that local craftsmen connected to our nutrition, led the extension and stick the kettle and an electric stove. When these devices work at the same time - knocks out the entire pavilion.

In one of the shops of our immense homeland, the entire network fell constantly with the closure.The installer saw that all the power was removed on the lighting line. Once the store turns off the top lighting of the hall (consumeing a lot of energy), all network equipment is turned off.

There was a case that the janitor shovel interrupted the cable.

We often see just copper lying with a torn corrugation. Once between two workshops, local craftsmen simply defeated the twisted pair without any protection.

Frequently from civilization, employees often complain that they irradiate "our" equipment.Switches on some distant facilities can be in the same room as the duty officer. Accordingly, it was a couple of times harmful grandmothers, which were completely truth and untrue turned off them at the beginning of the shift.

In another distant city on the optics hung mop. Split corrugation from the wall, began to use it as fastener for equipment.


In this case, the nutrition clearly has problems.

What is able to "big" monitoring

I will still tell you about the possibilities of more serious systems, on the example of INFOSIM installations, there are 4 solutions combined into one platform:
  • Failure management is the control of failures and correlation of events.
  • Performance management.
  • Inventory and automatic topology detection.
  • Configuration management.
What is important, infosim supports a bunch of "out of the box" equipment at once, that is, it easily disassembles all of their internal exchange and gets access to all their technical data. Here is a list of vendors: Cisco Systems, Huawei, HP, AVAYA, Redback Networks, F5 Networks, Extreme Networks, Juniper, Alcatel-Lucent, Fujitsu Technology Solutions, ZyXEL, 3Com, Ericsson, ZTE, ADVA Optical Networking, Nortel Networks, Nokia Siemens Networks , Allied Telesis, Radcom, Allot Communications, Enterasys Networks, Telco Systems, etc.

Separately about the inventory. The module does not just show the list, but also builds the topology itself (at least in 95% of cases, it tries and falls right). It also allows you to have a current database of used and idle IT equipment (network, server equipment, etc.), to carry out the replacement of outdated equipment (EOS / EOL) on time. In general, convenient for big business, but in a small lot of it is done by hand.

Report examples:

  • Reports in the section by type OS, firmware, models and equipment manufacturers;
  • Report on the number of free ports on each switch on the network / for the selected manufacturer / according to the model / by subnet, etc.;
  • Report on newly added devices for a given period;
  • Notification of the low toner level in printers;
  • Communication channel suitability for delays sensitive and loss traffic, active and passive methods;
  • Tracking the quality and availability of communication channels (SLA) - generation of reports on the quality of communication channels with a breakdown by telecom operators;
  • Control of failures and correlation of events The functionality is implemented at the expense of the Root-Cause Analysis mechanism (without the need to write the rules by the administrator) and the Alarm States Machine mechanism. Root-Cause Analysis is an analysis of the root cause of the accident based on the following procedures: 1. Automatic detection and location of the failure; 2. Reducing the number of emergency events to one key; 3. Detection of the effects of failure - to whom and what has affected failure.
You can still put on the network here are such things that are immediately integrated into monitoring:


Stablenet - Embedded Agent (SNEA) - Computer size a little more tutu cigarettes.

Installation is performed in ATMs, or selected network segments where availability is required. With their help, load testing is performed.

Cloud monitoring

Another installation model - SaaS in the cloud. Delivered for one global customer (a continuous production cycle company with distribution geography from Europe in Siberia).

Dozens of objects, including - factories and warehouses of finished products. If they have fallen channels, and their support was carried out from foreign offices, then the delays of shipment began, which the wave led to losses further. All work was done on request and a lot of time was spent on the investigation of the incident.

We set up monitoring specifically for them, then finished on a number of areas according to the peculiarities of their routing and iron. It all was done in the cloud of the Crook. Made and passed the project very quickly.

The result is:

  • Due to partial transmission of network infrastructure management, it was possible to optimize at least 50%. Inaccessibility of equipment, channel loading, exceeding the parameters recommended by the manufacturer: All this is fixed for 5-10 minutes, is diagnosed and eliminated within an hour.
  • Upon receipt of the service from the cloud, the Customer translates capital expenditures on the deployment of its network monitoring system into the operational costs for the monthly fee for our service, from which you can refuse at any time.

The advantage of the cloud is that in its decision we are as it were to the network and we can look at everything that happens more objectively. At that time, if we were in the end of the network, we would see the picture only before the failure node, and what happens to him, we would not be known.

A pair of pictures finally

This is "Morning Puzzle":

And we found a treasure:

In the chest was that:

Well, finally, about the most cheerful departure. I somehow went to the object of retail.

The following happened there: first began to drip off the roof on the false-brake. Then the lake was formed in a false platform, which blocked and sold one of the tiles. As a result, all this hurried to the electrician. Further I do not know exactly what it happened, but somewhere in the neighboring room, the fire began, and the fire began. At first, the powder fire extinguishers worked, and then the firefighters arrived and poured all the foam. I arrived after them to disassembly. It must be said that the tsiska 2960 was driving after all this - I was able to pick up the config and send the device to repair.

Once again, when worked out the powder system, the Tiscovsky 3745 in one bank was filled with a powder almost completely. All interfaces were clogged - 2 to 48 ports. It was necessary to turn on the place. I remembered the last case, we decided to try to remove the configs "on the hot", wrapped off, cleaned up, as was able to. We were able to - first the device said "PFF" and sneezed into us a large powder jet. And then crumble and rose.

Echo request

Echo request (Ping) is a diagnostic tool used to find out whether a specific node in the IP network is available. Echo request is performed using ICMP Protocol (Internet Control Message Protocol). This protocol is used to send an echo request to the verified node. The ICMP package must be configured on the node.

Check
by echo request

PRTG - Echo requests and network monitoring tools for Windows. It is compatible with all major Windows systems, including Windows Server 2012 R2 and Windows 10.

PRTG is a powerful tool for the entire network. For servers, routers, switches, uninterrupted operation and cloud connections, PRTG tracks all characteristics, and you can get rid of administrative worries. Sensor of echo requests, as well as SNMP sensors, NetFlow and package analysis are used to collect detailed information about the availability and workload of the network.

PRTG has a custom built-in alarm system that quickly notifies about malfunctions. The echo request sensor is configured as the main sensor for network devices. In case of refusal of this sensor, all other sensors on the device are translated into sleep mode. This means that instead of a disturbing message, you will receive only one notice.

At any time, at your request, a brief overview can be displayed on PRTG monitoring panel. You will immediately see if everything is in order. The monitoring panel is configured in accordance with your specific needs. Away from the workplace, for example, when working in the server room, access to PRTG is possible through the smartphone application, and you never miss a single event.

Initial monitoring is configured immediately during installation. It becomes possible thanks to the automatic detection function: PRTG sends echo requests to your private IP addresses and automatically creates sensors for available devices. Opening PRTG for the first time, you will immediately be able to check the availability of your network.

The PRTG program has a transparent licensing model. You can test PRTG for free. The echo request sensor and the alarm function also includes a free version and have an unlimited life. If your company or network will need more opportunities, it will not be difficult to update the license.

Screenshots
Brief introduction to PRTG: Ping monitoring

Your Echo Query Sensors as a Palm
- Even in the way

The PRTG program is installed in a few minutes and is compatible with most mobile devices.

PRTG controls for you the work of these and many other manufacturers and applications

Three PRTG sensor for monitoring echo requests

Sensor
Echo requests


From the cloud

The cloud echo sector sensor uses the PRTG cloud to measure the execution time of echo requests to your network from different places in the world. This sensor allows you to see the availability of your network in Asia, Europe and America. In particular, this indicator is very important for international companies. .

By purchasing the PRTG program, you will receive a comprehensive free support. Our task is to solve your problems as quickly as possible! Especially for this, along with other materials, we have prepared training videos and an exhaustive guide. We try to answer all applications in the support service within 24 hours (on working days). You will find answers to many questions in our knowledge base. For example, the search query "Monitoring Echo Query" issues 700 results. A few examples:

"I need an echo request sensor that will collect information only about the availability of the device, without changing its status. Is it possible?"

"Can I create an inverse echo request sensor?"


"With PRTG, we work much calmer, knowing that there is continuous monitoring of our systems."

Marcus Bunch, Network Administrator, Shyukhtermann Clinic (Germany)

  • Full version of PRTG for 30 days
  • After 30 days - free version
  • For an extended version - Commercial license

Network Monitoring Software - Version 19.2.50.2842 (May 15th, 2019)

Hosting

Available and cloud version (PRTG in the cloud)

Languages

English, German, Russian, Spanish, French, Portuguese, Netherlands, Japanese and Simplified Chinese

Prices

For free up to 100 sensors (prices)

Comprehensive monitoring

Network devices, bandwidth, servers, applications, virtual environments, remote systems, Internet things and much more.

Supported Suppliers and Applications

Network and Ping Monitoring with PRTG: Three Practical Examples

The Prutg program rely 200,000 administrators worldwide. These administrators can work in various industries, but they have one overall feature - the desire to guarantee and improve the availability and performance of their networks. Three examples of use:

Zurich Airport

Zurich Airport is Switzerland's largest airport, so it is especially important that all its electronic systems function smoothly. To make it possible, the IT division introduced the PRTG Network Monitor program from Paessler AG. Using more than 4500 sensors, this tool guarantees the immediate detection of problems that are immediately eliminated by the specialists of the IT division. In the past, the IT division used a set of various monitoring programs. But ultimately the leadership concluded that this software is unsuitable for specialized monitoring of operational and technical staff. Example of use.

University "Bauhauz", Weimar

IT of the University of Bauhauses in Weimara uses 5,000 students and 400 employees. In the past, an isolated Nagios-based solution was used to monitor the university's network. The system is technically outdated and was not able to satisfy the needs of an IT infrastructure. Infrastructure modernization would be extremely expensive. Instead, the university requested new solutions to monitor the network. The leaders of the IT division wanted to get a comprehensive software product, characterized by ease of use, simple installation and excellent economic indicators. Therefore, they chose PRTG. Example of use.

Public utilities city of Frankental

A little more than 200 employees of the public utilities of Frankental are responsible for the supply of electricity, gas and water to private consumers and organizations. The organization with all its buildings also depends on the locally distributed infrastructure, which consists of about 80 servers and 200 connected devices. The leaders of the IT department of the Frankental utility enterprise was looking for an affordable software that meets them specific needs. At first, IT specialists installed the free trial version of PRTG. Currently, about 1,500 sensors controlling, among other things, public swimming pools are used in Frankental's utilities. Example of use.

Practical advice. Tell me, Greg, do you have any recommendations on monitoring echo requests (pings)?

"Echo-query sensors are likely the most important elements of network monitoring. They must be properly configured, especially with the connection of your connections. If, for example, you track the virtual machine operation, it is useful to place an echo request sensor in connection to its node. In the event of a node failure, you will not receive a notification for each virtual machine connected to it. In addition, the echo query sensors can be good indicators of the proper operation of the network path to the node or the Internet, especially in scenarios with high availability or failure of failures. "

Greg Campion, PAESSLER AG system administrator

According to this optics, which runs through the forest to the collector, you can conclude that the installer did not follow the technology. The attachment in the photo also suggests that he is probably a sailor - a node-node.

I am from the Team of Physical Operations of the Network, Simply put - technical support that is responsible for flashing the bulbs on the routers blissing, as it should. We have "under the wing" various large companies with infrastructure throughout the country. I don't climb into their business, our task is to work on the physical level and traffic passed as it should.

The general meaning of the work is a permanent survey of nodes, removing telemetry, test runs (for example, checking for vulnerabilities search), ensuring performance, monitoring applications, traffic. Sometimes inventory and other perversions.

I will tell about how it is organized and a couple of stories from the departure.

As it usually happens

Our team is sitting in the office in Moscow and removes the telemetry of the network. Actually, these are permanent ping nodes, as well as obtaining monitoring data if the glands are smart. The most frequent situation - Ping does not pass several times in a row. In 80% of cases for the retail network, for example, it turns out to be disconnected by the power supply, so we, seeing such a picture, do the following:
  1. First call the provider about accidents
  2. Then - to the power station about the disconnection
  3. Then try to establish a connection with someone on the object (this is not always possible, for example, in 2 nights)
  4. And, finally, if in 5-10 minutes the above described did not help, leave for themselves or send the "Avatar" - the contractor engineer sitting somewhere in Izhevsk or Vladivostok, if the problem is there.
  5. With the "Avatar", they keep a constant connection and "we" lead it by infrastructure - we have sensors and service manuals, in its pliers.
  6. Then the engineer sends us a report with the photo about what it was.

Dialogues are sometimes like:
- So, the connection disappears between the buildings No. 4 and 5. Check the router in the fifth.
- Order, included. There is no connector.
- OK, go through the cable to the fourth case, there is still a node.
- ... OPPA!
- What happened?
- Here the 4th house was demolished.
- What??
- Apply the photo to the report. I can not restore the house in SLA.

But more often it turns out to find a break and restore the channel.

Approximately 60% of the departures - "in milk", because either the power is interrupted (shovel, foreman, intruders), or the provider does not know about his failure, or a short-term problem is eliminated before the assembly arrival. However, there are situations where we learn about the problem earlier users and earlier by the IT services of the Customer, and report on the decision before they will understand what happened. Most often such situations happen at night, when the activity in customer companies is low.

Who needs it and why

As a rule, any large company has its IT department, which clearly understands the specifics and tasks. On average and big business, Enicheyev and network engineers often outsource. It is simply profitable and convenient. For example, one retailer has its very steep aestishniks, but they are far from replacing routers and cable tracking.

What are we doing

  1. We work on appeals - packets and panic calls.
  2. We make prevention.
  3. Watch the recommendations of iron vendors, for example, in terms of that.
  4. Connect to the monitoring of the Customer and remove the data from it to leave the incidents.
With monitoring history often lies in the fact that it is not. Or he was raised 5 years ago and is not very relevant. In the simplest case, if there is really no monitoring, we offer the customer a simple open source Russian Zabbix for free - and it's good for him, and it's easier for us.

The first way - Simple Chersks is just a machine that pines all network nodes and ensures that they are correctly answered. Such implementation does not require any changes or minimum cosmetic changes in the customer's network. As a rule, in a very simple case, we put a robix directly to yourself into one of the data centers (we have their own two in the office of the Crook on Volochevskaya). In more complex, for example, if your protected network is used - by one of the cars in the Code of the Customer:

Barix can be applied and more difficult, for example, it has agents that are set on * NIX and WIN-nodes and show system monitoring, as well as External CHECK mode (with support for SNMP protocol). Nevertheless, if the business needs something like that, then either they already have their own monitoring, or a more functional and rich solution is selected. Of course, it is no longer open, and it costs money, but even a banal accurate inventory is already about a third of the costs.

This is also done, but this is the story of colleagues. Here they sent a couple of screenshots of infosima:

I am the operator "Avatar", so I will tell you further about my work.

What does a typical incident look like

Before us, screens with this general status:

At this object, Zabbix collects for us quite a lot of information: party number, serial number, CPU load, device description, interface availability, etc. All the necessary information is available from this interface.

An ordinary incident usually begins with the fact that one of the channels leading to, for example, the customer store (whom he has 200-300 pieces). Retail is now a wettered, not the fact that about seven years ago, so the box will continue to work - the channels are two.

We take for the phones and make a minimum of three calls: provider, power plants and people in place ("Yes, we have been driving the reinforcing fittings, someone's cable of crops ... and, yours? Well, well, what they found").

As a rule, no monitoring before the escalation would have passed hours or days - the same backup channels were not always checked. We know immediately and leave immediately. If there is additional information besides pings (for example, a model of a buggy piece of iron) - immediately a complete set of an exit engineer with the necessary parts. Further already at the place.

The second frequency is a regular call - the failure of one of the terminals from users, for example, a DECT-phone or Wi-Fi-router, distributed the network to the office. Here we learn about the problem from the monitoring and almost immediately get a call with detail. Sometimes the call does not add anything new ("Tube take, does not ring something"), sometimes it is very useful ("we dropped it from the table"). It is clear that in the second case it is clearly not a loss of the highway.

Equipment in Moscow is taken from our warehouses of the hot reserve, we have several types of them:

Customers usually have their own stocks of frequently failing components - tubes for office, power supplies, fans, and so on. If you need to deliver something that is not in place, not in Moscow, we usually go to themselves (because installation). For example, I had a night departure to Nizhny Tagil.

If the customer has its own monitoring, they can unload data to us. Sometimes we deployabeys in polling mode, just to provide transparency and control SLA (this is also free for the customer). Additional sensors we do not put (these are made by colleagues that ensure the continuity of production processes), but we can connect to them if the protocols are not exotic.

In general, the customer's infrastructure does not touch, simply support in the form as it is.

According to the experience, I will say that the last ten customers have moved to external support due to the fact that we are very predictable in terms of costs. Clear Budgeting, Good Cases Management, Report on Each Application, SLA, Equipment Reports, Prevention. Ideally, of course, we for CIO customer type cleaners - come and do, everything is clean, do not distract.

Another thing is to note - inventory is becoming a real problem in some large companies, and we sometimes attract purely for it. Plus, we make storage of configurations and their management, which is convenient with different reconnect moves. But, again, in difficult cases, this is also not me - we have a special, which is transporting date centers.

And one more important point: our department does not deal with a critical infrastructure. Everything inside the Codes and all banking-insurance and operator, plus the nucleus system of retail is an X-command. these guys.

More practice

Many modern devices can give a lot of service information. For example, network printers are very easily monitoring the toner level in the cartridge. You can count on replacement period in advance, plus to have a notification by 5-10% (if the office suddenly began to print not in the standard graphic) - and immediately send Enichea before the accounting starts panic.

Very often we take the annual statistics making the same monitoring system plus we. In the case of Zabikix, this is a simple cost planning and understanding, which is where it has been, and in the case of infosim, there is also a material for calculating scaling for a year, loading admins and all sorts of other pieces. In statistics there is energy consumption - in the last year almost all began to ask him, apparently to scatter the internal costs between the departments.

Sometimes real heroic salvation is obtained. Such situations are a rarity, but from what I remember for this year - we saw about 3 nights to increase the temperature to 55 degrees on the cycleommutator. In the distant server stood "stupid" air conditioners without monitoring, and they failed. We immediately called a cooling engineer (not ours) and called the author's duty officer. He put out some of the non-critical services and kept the server from Thermal ShotDown before the guy's arrival with mobile air conditioning, and then the fixing of the regular one.

Polycomers and other expensive video conferencing equipment are very well monitored by the degree of charging batteries before conferences, is also important.

Monitoring and diagnostics are needed to all. As a rule, itselves without experience is to introduce long and difficult: systems are either extremely simple and pre-confined or with an aircraft carrier and with a bunch of typical reports. Sharpening with a file under the company, inventing the implementation of its tasks of the internal IT unit and the withdrawal of information that they need most, plus the support of the whole history in the current state - the path of Rabel, if there is no implementation experience. Working with monitoring systems, we choose the golden middle between free and top solutions - as a rule, not the most popular and "thick" vendors, but clearly decisive task.

Once there were enough atypical appeal. The customer had to give the router to some kind of separate unit, and exactly inventory. The router was a module with the specified serial. When the router began to prepare on the road, it turned out that this module is not something. And no one can find it. The problem is slightly aggravated by the fact that the engineer who worked with this branch last year already in retirement, and left for grandchildren to another city. They contacted us, asked to search. Fortunately, iron gave reports on serialies, and infosim did inventory, so we found this module in the infrastructure in a couple of minutes, described the topology. The fugitive was dragged along the cable - it was in another server in the closet. The history of movement showed that he got there after the failure of a similar module.


Frame from a feature film about Hottabach, exactly describing the attitude of the population to the cameras

Many incidents with cameras. Once it failed at once 3 cameras. Cabin breaks on one of the plots. The installer blew a new in the corrugation, two cameras of three after a number of shamans rose. And the third is not. Moreover, it is not clear where she is at all. I raise the video stream - the last footage right before the fall - 4 in the morning, suitable three men in scarves on the faces, something bright below, the camera shakes greatly, falls.

Once configured the camera, which should focus on the "hares", lying through the fence. While we were driving, thought how we would denote the point where the violator should appear. It was not useful - for those 15 minutes that we were there, a person was penetrated on the object only in the point we need. Straight tuning table.

As I already led an example above, the story about the demolished building is not a joke. Once disappeared link to the equipment. On the spot - there is no pavilion where copper passed. The pavilion was demolished, the cable was gone. We saw that the router was died. The installer arrived, begins to look - and the distance there between the nodes of a couple of kilometers. In his set, Vipnetovsky tester, the standard - sounded from one connector, sounded from the other - went to look. Typically, the problem is immediately visible.


Cable tracking: These are optics in the corrugation, the continuation of the story from the very top of the post about the naval node. Here, in the end, except for absolutely amazing installation, the problem was discovered that the cable was departed from the mounts. There are all those who are not too lazy, and loosen the metal structures. Approximately a five thousandth representative of the proletariat broke the optics.

On one object, all nodes were turned off about once a week. And at the same time. We have been looking for a regularity for quite a long time. The installer found the following:

  • The problem always takes place in the change of the same person.
  • It differs from other things that wears a very heavy coat.
  • The hanger for clothes is mounted automatic.
  • The lid of the automaton, someone, for a very long time, is still in prehistoric times.
  • When this comrade comes to the object, he hangs clothes, and it turns off the machine guns.
  • He immediately includes them back.

On the same object at the same time the equipment was turned off at night. It turned out that local craftsmen connected to our nutrition, led the extension and stick the kettle and an electric stove. When these devices work at the same time - knocks out the entire pavilion.

In one of the shops of our immense homeland, the entire network fell constantly with the closure.The installer saw that all the power was removed on the lighting line. Once the store turns off the top lighting of the hall (consumeing a lot of energy), all network equipment is turned off.

There was a case that the janitor shovel interrupted the cable.

We often see just copper lying with a torn corrugation. Once between two workshops, local craftsmen simply defeated the twisted pair without any protection.

Frequently from civilization, employees often complain that they irradiate "our" equipment.Switches on some distant facilities can be in the same room as the duty officer. Accordingly, it was a couple of times harmful grandmothers, which were completely truth and untrue turned off them at the beginning of the shift.

In another distant city on the optics hung mop. Split corrugation from the wall, began to use it as fastener for equipment.


In this case, the nutrition clearly has problems.

What is able to "big" monitoring

I will still tell you about the possibilities of more serious systems, on the example of INFOSIM installations, there are 4 solutions combined into one platform:
  • Failure management is the control of failures and correlation of events.
  • Performance management.
  • Inventory and automatic topology detection.
  • Configuration management.
What is important, infosim supports a bunch of "out of the box" equipment at once, that is, it easily disassembles all of their internal exchange and gets access to all their technical data. Here is a list of vendors: Cisco Systems, Huawei, HP, AVAYA, Redback Networks, F5 Networks, Extreme Networks, Juniper, Alcatel-Lucent, Fujitsu Technology Solutions, ZyXEL, 3Com, Ericsson, ZTE, ADVA Optical Networking, Nortel Networks, Nokia Siemens Networks , Allied Telesis, Radcom, Allot Communications, Enterasys Networks, Telco Systems, etc.

Separately about the inventory. The module does not just show the list, but also builds the topology itself (at least in 95% of cases, it tries and falls right). It also allows you to have a current database of used and idle IT equipment (network, server equipment, etc.), to carry out the replacement of outdated equipment (EOS / EOL) on time. In general, convenient for big business, but in a small lot of it is done by hand.

Report examples:

  • Reports in the section by type OS, firmware, models and equipment manufacturers;
  • Report on the number of free ports on each switch on the network / for the selected manufacturer / according to the model / by subnet, etc.;
  • Report on newly added devices for a given period;
  • Notification of the low toner level in printers;
  • Communication channel suitability for delays sensitive and loss traffic, active and passive methods;
  • Tracking the quality and availability of communication channels (SLA) - generation of reports on the quality of communication channels with a breakdown by telecom operators;
  • Control of failures and correlation of events The functionality is implemented at the expense of the Root-Cause Analysis mechanism (without the need to write the rules by the administrator) and the Alarm States Machine mechanism. Root-Cause Analysis is an analysis of the root cause of the accident based on the following procedures: 1. Automatic detection and location of the failure; 2. Reducing the number of emergency events to one key; 3. Detection of the effects of failure - to whom and what has affected failure.
You can still put on the network here are such things that are immediately integrated into monitoring:


Stablenet - Embedded Agent (SNEA) - Computer size a little more tutu cigarettes.

Installation is performed in ATMs, or selected network segments where availability is required. With their help, load testing is performed.

Cloud monitoring

Another installation model - SaaS in the cloud. Delivered for one global customer (a continuous production cycle company with distribution geography from Europe in Siberia).

Dozens of objects, including - factories and warehouses of finished products. If they have fallen channels, and their support was carried out from foreign offices, then the delays of shipment began, which the wave led to losses further. All work was done on request and a lot of time was spent on the investigation of the incident.

We set up monitoring specifically for them, then finished on a number of areas according to the peculiarities of their routing and iron. It all was done in the cloud of the Crook. Made and passed the project very quickly.

The result is:

  • Due to partial transmission of network infrastructure management, it was possible to optimize at least 50%. Inaccessibility of equipment, channel loading, exceeding the parameters recommended by the manufacturer: All this is fixed for 5-10 minutes, is diagnosed and eliminated within an hour.
  • Upon receipt of the service from the cloud, the Customer translates capital expenditures on the deployment of its network monitoring system into the operational costs for the monthly fee for our service, from which you can refuse at any time.

The advantage of the cloud is that in its decision we are as it were to the network and we can look at everything that happens more objectively. At that time, if we were in the end of the network, we would see the picture only before the failure node, and what happens to him, we would not be known.

A pair of pictures finally

This is "Morning Puzzle":

And we found a treasure:

In the chest was that:

Well, finally, about the most cheerful departure. I somehow went to the object of retail.

The following happened there: first began to drip off the roof on the false-brake. Then the lake was formed in a false platform, which blocked and sold one of the tiles. As a result, all this hurried to the electrician. Further I do not know exactly what it happened, but somewhere in the neighboring room, the fire began, and the fire began. At first, the powder fire extinguishers worked, and then the firefighters arrived and poured all the foam. I arrived after them to disassembly. It must be said that the tsiska 2960 was driving after all this - I was able to pick up the config and send the device to repair.

Once again, when worked out the powder system, the Tiscovsky 3745 in one bank was filled with a powder almost completely. All interfaces were clogged - 2 to 48 ports. It was necessary to turn on the place. I remembered the last case, we decided to try to remove the configs "on the hot", wrapped off, cleaned up, as was able to. We were able to - first the device said "PFF" and sneezed into us a large powder jet. And then crumble and rose.

A Robust Ping Monitoring Tool for Automatic Checking Connection to Network Hosts. By Making Regular Pings It Monitors Network Connections and Notifies You About Detected Ups / Downs. Emco Ping Monitor Also Provides Connection Statistics Info, Including Uptime, Outages, Failed Pings, Etc. You can Easily Extend Functionality and Configure Emco Ping Monitor to Execute Custom Commands Or Launch Applications WHEN Connections Are Lost or Restored.

What is Emco Ping Monitor?

Emco Ping Monitor Can Work In The 24/7 Mode to Track The States of the Connection of One or Multiple Hosts. The Application Analyzes Ping Replies to Detect Connection Outages and Report Connection Statistics. IT CAN Automatically Detect Connection Outages and Show Windows Tray Balloons, Play Sounds and Send E-Mail Notifications. IT CAN ALSO Generate Reports and Send Them by E-mail or Save AS PDF or Html Files.

The Program Allows You to Get Information about The Statuses of All The Hosts, Check The Detailed Statistics of a Selected Host and Compare The Performance of Different Hosts. The Program Stores The Collected Ping Data The Database, So You Can Check Statistics for a selected Time Period. The Available Information Includes Min / Max / Avg Ping Time, Ping Deviation, List of Connection Outages, etc. This information can Repretesented As Grid Data and Charts.

Emco Ping Monitor: How It Works?

Emco Ping Monitor Can Be Used to Perform Ping Monitoring Of Just A Few Hosts or Thousands of Hosts. All Hosts Are Monitored in Real-Time by Dedicated Working Threads, So You Can Get Real-Time Statistics and Notifications Connection State Changes for Every Host. The Program Doeesn "T Have Special Requirements for Hardware - You Can Monitor A FEW Thousands of Hosts on a Typical Modern PC.

The Program Uses Pings To Detect Connection Outages. If a Few Pings Are Failed in A RAW - IT Reports An Outage and Notifies You About the Problem. WHEN CONNECTION IS ESTABLISHED AND PINGS START TO PASS THOUGH - THE PROGRAM DETECTS THE END OF OUTAGE AND NOTIFIES You ABOUT THAT. You can Customize Outage and Restore Detection Conditions and Also Notifications Used by the Program.

Compare Features and SELECT THE EDITION

The Program Is Available in Three Editions with the Different Set of the Features.
Compare Editions.

The Free Edition Allow Performing Ping Monitoring Of Up to 5 Hosts. IT Does Not Allow Any Specific Configuration for Hosts. IT Runs AS A Windows Program, So Monitoring Is Stopped If You Close the Ui or Log Off from Windows.

Free for Personal and Commercial Usage

Professional Edition.

The Professional Edition Allows Monitoring of Up to 250 Hosts Concurrently. Every Host Can Have A Custom Configuration Such As, Notification of E-Mail Recipients or Custom Actions to Be Executed on Connection Lost and Restore Events. IT Runs As A Windows Service, So Monitoring Continues Even If You Close The Ui or Log Off From Windows.

Enterprise Edition.

The Enterprise Edition Does Not Have Limitations On The Number of Monitored Hosts. On a Modern PC, IT IS Possible to Monitor 2500+ hosts Depending on the Hardware Configuration.

This Edition Includes All The Available Features and Works AS A CLIENT / SERVER. The Server Works AS A Windows Service to Ensure Ping Monitoring in the 24/7 Mode. The Client IS A Windows Program That Can Connect to a Server Running On a Local PC Or to a Remote Server Through a Lan Or The Internet. Multiple Clients Can Connect to the Same Server and Work Concurrently.

This Edition Also Includes Web Reports, That Allow Reviewing Host Monitoring Statistics Remotenet in a Web Browser.

The Main Features of Emco Ping Monitor

Multi-Host Ping Monitoring

The Application Can Monitor Multiple Hosts Concurrently. The FREE EDITION OF THE Application Allows Monitoring Up to Five Hosts; The Professional Edition DOESN "T Have Any Limitation for the Number of Monitored Hosts. Monitoring of Every Host Works Independently From Other Hosts. You can Monitor Tens Thousands of Hosts from A Modern PC.

Connection outages Detection.

The Application Sends ICMP Ping Echo Requests and Analyzes Ping Echo Replies to Monitor The Connection State in the 24/7 Mode. If The Preset Number of Pings Fail In A Row, The Application Detects a Connection Outage and Notifies You of the Problem. The Application Tracks All Outages, So You Can See When A Host Was Offline.

Connection Quality Analysis

Whene The Application Pings A Monitored Host, It Saves And Aggregates Data About Every Ping, So You Can Get Information about the Minimum, Maximum and Average Ping Response Times and the Ping Response Deviation from The Average for any Reporting Period. That Allows You to Estimate The Quality of the Network Connection.

Flexible Notifications.

If You Would Like to Get Notifications About Connection Lost, Connection Restored and Other Events Detected by The Application, You Can Configure The Application to Send E-Mail Notifications, Play Sounds and Show Windows Tray Balloons. The Application Can Send A Single Notification of Any Type or Repeat Notifications Multiple Times.

Charts and Reports.

All Statistic Formation Collected by The Application Can Be Repreteded Visually by Charts. You can see the ping and uptime statistics for a single host and compare the performance of multiple hosts on charts. The Application CAN Automatically Generate Reports in Different Formats On A Regular Basis to Represent The Host Statistics.

Custom Actions

You can Integrate The Application with External Software by Executing External Scripts or Executable Files When Connections Are Lost or Restored or in Case of Oter Events. For example, You can Configure The Application to Run An External Command-Line Tool to Send SMS Notifications about any channes in the hostuses.

Emco Ping Monitor. Free Assistant Admin

If the infrastructure has up to 5 virtualization hosts, then you can use a free version.

Ping Monitor: Network Connection State Monitoring Tool (free for 5 hosts)

Info:
Reliable monitoring tool for automatically checking connections with host network by executing the command ping..

Wiki:
Ping is a utility for checking connections on TCP / IP-based networks, as well as the source name of the inquiry itself.
The utility sends requests (ICMP ECHO-REQUEST) of the ICMP protocol to the specified network node and fixes the incoming responses (ICMP ECHO-REPLY). The time between sending the request and receipt of the response (RTT, from the English. Round Trip Time) allows you to determine two-way delays (RTT) on the route and frequency of packet loss, that is, indirectly determine the load on data transmission channels and intermediate devices.
Ping program is one of the main diagnostic tools in TCP / IP networks and is included in the supply of all modern network. operating systems.

https://ru.wikipedia.org/wiki/ping.

The program submitting regular ICMP requests controls network connections and notifies you about the detected recovery / drop channels. Emco Ping Monitor provides data statistics data, including uninterrupted operation, breaks in work, ping failures, etc.


Share with friends or save for yourself:

Loading...