Juniper EX switches come with two separate flash partitions a root (default boot) and a copy of that root in another piece of flash memory. Each partition contains a copy of the boot software and the configuration. Now we've deployed a number of EX clusters and from time to time we notice that sometimes the secondary partitions (the non-active one) doesn't get upgraded with the active one. This obviously causes problems if the active partition is corrupted and wont boot. Manually booting the secondary to get the switch up doesn't help if it's in a cluster because that node is then marked at NotPrsnt (Not Present) due to the fact it won't be running the same software as the other nodes.

So, luckily Juniper have come to the rescue and brought out the latest 10.4 firmware to do the following (source: www.juniper.net)


Resilient dual-root partitioning, introduced on Juniper Networks EX Series Ethernet Switches in Junos operating system (Junos OS) Release 10.4R3, provides additional resiliency to switches in the following ways:
  • Allows the switch to boot transparently from the second root partition if the system fails to boot from the primary root partition.
  • Provides separation of the root Junos OS file system from the /var file system. If corruption occurs in the /var file system (a higher probability than in the root file system due to the greater frequency in /var of reads and writes), the root file system is insulated from the corruption.
Great news this. So lets upgrade our firmware and get this great feature. Here is a copy of todays firmware version:

show_version

We need to get hold of the latest firmware from Juniper's website so we download that...but there was also an issue with our Jloader...it was too old. We will need to delete the old loader and upgrade that as well as the firmware (Jloader Upgrade Link). Good news is we can do both then reboot to cut down on reboot time. We've downloaded the jloader and junos image (10.4R3 is recommended at the time we wrote this).

First lets just check we can ping the FTP server holding the firmware images. We're using FileZilla server for this and we've created a new user call 'junos'. You can get hold of 
FileZilla server here

ping_check


Looks good, right lets go with the upgrade.

Jloader first. The ftp load command uses the syntax 'request system software add ftp://10.10.15.23/jloader-ex-3242-11.3I20110326_0802_hmerge-signed.tgz'. This is basically saying we're using FTP to get the image and that the image is located on server with IP address 10.10.15.23. Without stating a username and password int he form ftp://username:password@ the JunOS parser will use the default username of 'anonymous' with no password. Some FTP servers come with built in anonymous support...FileZilla needs you to create a user called 'anonymous' with the password checkbox unchecked. HEre is the process:


request_jloader_process

Each node in the cluster will be upgraded in turn until it finishes and returns you back tot he prompt:

request_jloader_finish

The FileZilla management console shows you the whole process as the file is pulled back to the cluster master node. Here is a brief screen shot of the download:

request_jloader_ftp


Right, thats the jloader upgrade bit applied (but not yet active until we reboot. To save time we're now going to upgrade the firmware so that we only do one reboot. Here is the process and remember, this is a 4 node cluster as shown by the fpc0,1,2,3. If you have a larger node number then your output will be different.

request_junos_install

So, just like the man said 'A reboot is required to install the software'. Let us oblige...

request_system_reboot

It took about 5 minutes to come around again, I logged in and checked the firmware versions and loader.

show_chassis_firmware

Thats OK now I checked the state of the partitions

show_system_storage

Looks like I have two partitions there active/backup....all looking pretty sweet. Lets get some more information on the state of those partitions...detail?...nah thats what they would expect you to do lets look at the snapshot...

show_system_snapshot

We're upgraded, we've got two healthy partitions...just need to wait for a failure now to see it automatically fix itself...but I won't wish for that. I think one question we're all asking is how do I find out if there has been a partition failure if it fixes itself?

Well you've got console logs, syslog and SNMP...take your pick. From the management port you will see



WARNING: THIS DEVICE HAS BOOTED FROM THE BACKUP JUNOS IMAGE




You can of course always look at the chassis alarms.




user@switch> show chassis alarms
1 alarms currently active
Alarm time Class Description
                2011-02-17 05:48:49 PST Minor Host 0 Boot from backup root




Thank you for reading and may all of your upgrades be as sweet.

View Comments
© 2011 defaultrouteuk.com

Cisco, IOS, CCNA, CCNP, CCIE are trademarks of Cisco Systems Inc.
JunOS, JNCIA, JNCIP, JNCIE are registered trademark of Juniper Networks Inc.