Identifying the Right Module – Cisco Nexus

N7K.LON# locator-led ?
chassis               Blink chassis led
fan                      Blink Fan led
module              Blink module led
powersupply   Blink powersupply led
xbar                   Xbar

When you are in a remote location and the data center tech can’t identify the right module, “locator-led” command in Nexus 7000 can be used to identify the right module. In other platforms, “blink” command is utilized.

F5 Error – No such file or directory

Dec 12 20:57:36 lb-A err statsd[6711]: 011b0203:3:
Error ‘No such file or directory’ opening file /sys/block/sdb/stat

Dec 12 20:57:36 lb-A err statsd[6711]: 011b0900:3:
TMSTAT error max disk stat: read failed.

The error is because the F5 is trying to query for stats from a removed hard drive or USB. Bugid 441400 is associated with this error. This has been fixed post 11.6 code version and for older code versions, reboot fixes this error.

F5 – big3d restarting

I ran into an issue where the big3d daemon was restarting continuously on an F5 running LTM only (No GTM). The following article details the steps that were taken to solve the restart issue. The solution was achieved after raising a support case with F5 Networks.

I tried to restart the daemon (big3d & httpd)

(tmos)#restart /sys service big3d

(tmos)#restart /sys service httpd

and tried stopping and starting using the following commands but it did not help in preventing the continuous restarts:

(tmos)#stop /sys service httpd 

(tmos)#stop /sys service big3d

(tmos)#start /sys service httpd

(tmos)#start /sys service big3d

I ran the following command from tmsh:

(tmos)#load sys config

and it resulted in the following error:

Reading configuration from /config/low_profile_base.conf.
Reading configuration from /defaults/config_base.conf.
Reading configuration from /config/bigip_sys.conf.
Reading configuration from /config/bigip_base.conf.
Reading configuration from /usr/share/monitors/base_monitors.conf.
Reading configuration from /config/profile_base.conf.
Reading configuration from /config/daemon.conf.
Reading configuration from /config/bigip.conf.
Reading configuration from /config/bigip_local.conf.
Loading the configuration …
BIGpipe unknown operation error:
01070920:3: Application error for confpp: Syntax OK
The certificate does not match the key.  To change them try ‘bigpipe httpd { sslcertfile /etc/httpd/conf/ssl.crt/server.crt sslcertkeyfile /etc/httpd/conf/ssl.key/server.key }’
*************************************************************
Sep  9 22:56:52 localhost confpp[9878]: syntax check command FAILURE for unix_config_httpd returned: ‘2304’
Restarting syslog-ng:
Shutting down syslog-ng: [  OK  ]
Starting syslog-ng: [  OK  ]
Shutting down ntpd: [  OK  ]
Starting ntpd: [  OK  ]
[FAILED]ing httpd: [FAILED]

VERIFY Device Cert & Key:

As seen in the output of the “load sys config” command, the cert & key did not match. The following command (run from bash) is utilized to verify if the cert & key match or not:

openssl rsa -in /etc/httpd/conf/ssl.key/server.key -modulus -noout | openssl md5

openssl x509 -in /etc/httpd/conf/ssl.crt/server.crt -modulus -noout | openssl md5

The md5 hash was different indicating that the cert & key did not match. I used the following solution guide in order to generate new cert/key pair: SOL9114

  1. Log in to the bash command line.
  2. Generate the new device certificate and key using the following syntax:

    openssl req -x509 -nodes -days <# of days> -newkey rsa:<keysize> -keyout /config/httpd/conf/ssl.key/server.key -out /config/httpd/conf/ssl.crt/server.crt

    Note: Replace <# of days> with the number of days in year increments for which you want the certificate to be valid. I used 3650 days (10 years).

  3. Enter the certificate attributes.
  4. Restart the httpd process by typing the following command:

    bigstart restart httpd

  5. Copy the new self-signed certificate to the trusted device certificate file by typing the following command:

    cat /config/httpd/conf/ssl.crt/server.crt >> /config/big3d/client.crt

    Note: Alternatively, you can add the new certificate to the trusted device certificate file and remove all old certificates by running the following command:

    cat /config/httpd/conf/ssl.crt/server.crt > /config/big3d/client.crt

  6. (BIG-IP GTM and BIG-IP Link Controller) Copy the new self-signed certificate to the trusted server certificate file by typing the following command:

    cat /config/httpd/conf/ssl.crt/server.crt >> /config/gtm/server.crt

    Note: Alternatively, you can add the new certificate to the trusted server certificate file and remove all old certificates by typing the following command:

    cat /config/httpd/conf/ssl.crt/server.crt > /config/gtm/server.crt

After creating the right cert/key pair, the non-stop restarts of big3d stopped. After preventing the restarts, I had issues with GUI not loading the Virtual Server option and had to perform a full box reboot in order for the GUI to function without any issues.

Reference:

SOL10999

SOL13444

Identifying the Viprion Blade

F5 Viprions have different blade hardware versions. There isn’t a simple command that will help you identify the blade model. I have found this to be useful:


(/Common)(tmos)# show /sys hardware field-fmt | grep -e platform -e marketing
sys hardware platform {
marketing-name BIG-IP VPR-C2400
platform A112


Platform A112 is B2250 blade

Platform A113 is B2150 blade

Platform A109 is B2100 blade

Reference for platform:

https://support.f5.com/kb/en-us/products/big-ip-afm/releasenotes/product/relnote-afm-11-4-1.print.html

https://devcentral.f5.com/questions/f5-viprion-blade-model

F5 SNMP Problems

F5 was being polled by a server. Some of the OIDs were working while others didn’t work. Restarting snmpd didn’t help. Restarting both snmpd & subsnmpd solved the problem and all the OIDs were working again.

(tmos)# restart sys service snmpd
(tmos)# restart sys service subsnmpd

(tmos)# show sys service snmpd subsnmpd
snmpd run (pid 31980) 3 minutes, 4 starts
subsnmpd run (pid 927) 2 minutes, 1 restart

SOL8035 has information on the BigIP Daemons.

RST & FIN Out of Order

There was a constant increase in “overrun” and “input errors” on the Cisco ASA Interface. Upon examination, using “show asp drop“, “tcp-rstfin-ooo” & “tcp-3whs-failed” were constantly increasing.

ASP-DROP

Using the following to capture real-time traffic, the IP addresses and the ports can be identified:

# capture ASP type asp-drop tcp-rstfin-ooo buffer 2048 real-time

In this case, we were able to isolate port 5666 for Nagios servers as a culprit in sending RST after FIN and this was breaking the TCP protocol. This is an environment with 100s of Servers that was monitored by Nagios. When 100s of Servers end up sending RST simultaneously, it can turn out to be a mini-self-DOS. With older Firewalls & Code Versions, this can cause reboots. When we searched online, we were able to identify the following bugs on Nagios:

https://bugs.launchpad.net/ubuntu/+source/nagios-nrpe/+bug/989156

http://tracker.nagios.org/view.php?id=305

F5 CLI

F5 provides 3 different CLI navigation option:

TMSH  (tmos)#

BASH   #

bpsh     >

TMSH or Traffic Management Shell is the newer shell that is utilized to manage the F5 via CLI. BASH is used for running linux like commands with “b” as the 1st letter (#b pool show)

With the newer v11 code version, F5 is moving more towards the TMSH and has stopped developing bpsh. If you are looking to learn CLI, it is recommended that you learn TMSH on F5 rather than BASH or bpsh

To move into TMSH, type “tmsh” from BASH or bpsh

To move into BASH from TMSH, type “run util bash”

To move into bpsh from BASH, type “bpsh”

F5 Upgrade from v10 to v11 – Lessons Learned

Pre-Maintenance Checks:

Make sure that the F5 is running in “Volume Partition” mode. “lvscan” within bash should provide output like this:

config # lvscan
ACTIVE ‘/dev/vg-db-sda/dat.share.1’ [30.00 GB] normal
ACTIVE ‘/dev/vg-db-sda/dat.log.1’ [7.00 GB] normal
ACTIVE ‘/dev/vg-db-sda/dat.swapvol.1’ [1.00 GB] normal
ACTIVE ‘/dev/vg-db-sda/set.1.root’ [392.00 MB] normal
ACTIVE ‘/dev/vg-db-sda/set.1._usr’ [2.48 GB] normal
ACTIVE ‘/dev/vg-db-sda/set.1._config’ [3.00 GB] normal
ACTIVE ‘/dev/vg-db-sda/set.1._var’ [3.00 GB] normal
ACTIVE ‘/dev/vg-db-sda/set.2.root’ [256.00 MB] normal
ACTIVE ‘/dev/vg-db-sda/set.2._usr’ [1.34 GB] normal
ACTIVE ‘/dev/vg-db-sda/set.2._config’ [512.00 MB] normal
ACTIVE ‘/dev/vg-db-sda/set.2._var’ [3.00 GB] normal
ACTIVE ‘/dev/vg-db-sda/set.3.root’ [256.00 MB] normal
ACTIVE ‘/dev/vg-db-sda/set.3._usr’ [1.34 GB] normal
ACTIVE ‘/dev/vg-db-sda/set.3._config’ [512.00 MB] normal
ACTIVE ‘/dev/vg-db-sda/set.3._var’ [3.00 GB] normal
ACTIVE ‘/dev/vg-db-sda/dat.maint.1’ [300.00 MB] normal

Make sure there are no HTTP Classes in your configuration, other than the default “httpclass” by checking Local Traffic  ››  Profiles : Protocol : HTTP Class from the F5 GUI.

Make sure that there are no spaces in profile naming. SOL15144

As a precaution, go through your configuration and remove any unwanted/unused configuration elements like a “Test Virtual Server” or configuration from the past that is not in use at the moment.

Load the code version to the F5 LTM via GUI, SCP or any other preferred method.

Maintenance:

Before performing any F5 code upgrade, make sure that the “Service Check Date” on the device is AFTER the License Check Date for the new code version as listed here in SOL7727

If not, the maintenance would include a license re-activation step before proceeding with code upgrade. This step would take about 10-20 minutes.

cpcfg to the new code version location – Example: cpcfg HD1.2

Although “cpcfg HD1.x” has worked most of the times, I would recommend backing up the .UCS file in a remote location and also saving a copy in “/shared/tmp/<UCS File>“. After saving the UCS file in the “/shared/tmp/” location, you can utilize “load /sys ucs <path/to/UCS> no-license” to load the configuration as noted in SOL12880

Reboot. This will take about 20 minutes for the device to load the new configuration and come back up. If you are using HA F5, upgrade the Standby F5 first. It will take a few minutes for the Standby F5 to become “Active”. So, be patient.

Conservative estimate for the maintenance window is about 1 hour. I would recommend giving yourself 90 minutes, if you are not familiar with F5 code upgrade. Downtime can be minimized if you have BigIP F5 in High Availability Active/Standby Pair.