GRE (Generic Routing Encapsulation) is an industry standard for encapsulating data within an IP packet. Unlike IP protocol 7 (IPv4) GRE runs over IP protocol 47. It is often used to manipulate routing over non-broadcast networks or for sending multicast over IPSEC tunnels. This tech note was setup between a Juniper EX 4200 switch cluster and a Cisco 2621XM router. One of the issues with GRE traffic is its extra header which means payloads are reduced when you use it by an extra 4bytes (minimum).

So here is our basic topology. Remember we’re just using this to prove out the connectivity NOT to delve deeply into GRE itself or how we could use this to fix a ‘situation’. We’ll be bringing more of these technical guides as soon as we can write them using another 24 bit subnet 12.1.12.0. From here we’ll have two loopback interfaces (one on each device) and we’ll setup the routing to divert traffic between each of these loopback interfaces across the tunnel.

.Screen shot 2011-05-28 at 22.19.00

First lets configure the EX switch ge-0/0/0 interface which is connected directly to the Cisco 2600. Notice the bit-wise mask at the end. Cisco’s recent NEXUS platform running the NX-OS also now uses the bitwise pattern for netmask...interesting ;-)

Screen shot 2011-05-28 at 22.31.29

Lets configure the loopback interface

Screen shot 2011-05-28 at 22.31.37

Right now we’ll configure the GRE interface itself. It doesn’t matter in the order of the next three configuration lines but you DO need them all ;-)

The source is the beginning of the tunnel from ‘this routers’ point of view’. As an analogy think of you in your car. You are driving toward a tunnel going under a river from Coolville to Duddberg. As far as you are concerned the start (source) the tunnel is in Coolville. When you return however the start of the tunnel is in Duddsberg. Same thing for traffic going into and out of your tunnel here.

Screen shot 2011-05-28 at 22.31.44

Now the destination. Remeber this is all relative and the other side will look the opposite.

Screen shot 2011-05-28 at 22.32.04

OK, now thats the tunnel built we need to ‘load it up’ with loely IPv4 traffic. So, just like a normal interface we’ll give it an IP address and a mask.

Screen shot 2011-05-28 at 22.42.19

Right JunOS side done now, lets nip over to the Cisco box and do the same.

Lets configure the Fast Ethernet 0/0 interface which is connected to the Juniper switch.

Screen shot 2011-05-28 at 22.47.14

Now we’ll configure the tunnel interface. For brevity I’ve taken a pumped all of the configuration in here but it follows EXACTLY the same sort of configuration as JunOS. Source IP of tunnel, desitination IP, IP address of the tunnel...done.

Screen shot 2011-05-28 at 23.03.44

Right so now thats up lets see if we can ping either side of the tunnel.

Cisco side first...

Screen shot 2011-05-28 at 22.38.30

Cool, now the Juniper side

Screen shot 2011-05-28 at 22.49.52

Awesome. Right but we’re pinging the sides of a point to point interface here which isn’t exactly right is it. So if we’re going to be ‘routing’ traffic through this tunnel and not just having a secondary route (whats the point in our topology anyway) we’ll need to give each side routes to one another. We’re going to route traffic for each sides Loopback interface through the tunnel.

Juniper side first

Screen shot 2011-05-28 at 22.53.06

...don’t forget the commit in JunOS. You know it never fails to impress me how Juniper got JunOS so right for administrators. If we screwed this up and the router happened to be 1000 miles away we’ve got options. Auto rollback is the best thing ever.

Now the Cisco side

Screen shot 2011-05-28 at 22.53.41

...if we screwed up IOS we’d be gone ;-) Of course we could always issue that great ‘reload in 10’ shortcut to save our ass.

Ok lets ping out over the tunnel interface.

Screen shot 2011-05-28 at 23.04.19

Now form Cisco side...just because we can

Screen shot 2011-05-28 at 23.05.23

What about some statistics to back that up man?! OK, here we go.

Right Juniper side first again. We got 6 in and out here...

Screen shot 2011-05-28 at 23.07.03

Cisco side...

Screen shot 2011-05-28 at 23.05.59

Job Done.

Thanks for Reading
View Comments
Working on an EX recently we came across this rather curious issue. One of the checks the JunOS loader does as it extracts the new firmware is the CRC and validity checks on the package. To our great surprise we found this installation balked at us about time. Apparently we were installing a file which hadn’t been built until 25963681 seconds (300days) in the future!

Clearly the administrators had been a little lax in their approach to time keeping and certainly could do with a quick look at NTP. Anyway, without further ado lets have a look at the work we did:

First lets upgrade the stack. We’ll be using FTP to transfer the file from our FTP server running on IP address 10.15.100.75.
 
network@EX_STACK> request system software add ftp://10.15.100.75/jloader-ex3242-11.3I20110326_0802_hmerge-signed.tgz
 
Checking pending install on fpc1
 
Checking pending install on fpc0
Fetching package...
Pushing bundle to fpc1
 
Validating on fpc1
 
Validating on fpc0
Done with validate on all virtual chassis members
 
fpc1:
tar: +CONTENTS: time stamp Mar 26 12:18 2011 is 25963681 s in the future
tar: +COMMENT: time stamp Mar 26 12:18 2011 is 25963680 s in the future
tar: +DESC: time stamp Mar 26 12:18 2011 is 25963679 s in the future
tar: +INSTALL: time stamp Mar 26 12:18 2011 is 25963678 s in the future
tar: jloader-ex-3242-11.3I20110326_0802_hmerge.tgz: time stamp Mar 26 12:06 2011 is 25962920 s in the future
tar: jloader-ex-3242-11.3I20110326_0802_hmerge.tgz.md5: time stamp Mar 26 12:18 2011 is 25963673 s in the future
tar: jloader-ex-3242-11.3I20110326_0802_hmerge.tgz.sha1: time stamp Mar 26 12:18 2011 is 25963672 s in the future
tar: jloader-ex-3242-11.3I20110326_0802_hmerge.tgz.sig: time stamp Mar 26 12:18 2011 is 25963672 s in the future
tar: certs.pem: time stamp Mar 26 08:02 2011 is 25948330 s in the future
verify-sig: cannot validate ./certs.pem
certificate is not yet valid: /C=US/ST=CA/L=Sunnyvale/O=Juniper Networks/OU=Juniper CA/CN=PackageDevelopment_11_3_0/emailAddress=ca@juniper.net
 
{master:0}

HA this is perfect. OK so lets configure NTP so we don’t see this ever again (of course we could always just use ‘set date’ to do a quick fix but we’re not going to do that right?

... OK maybe just a sneaky look at setting the time manually.

At the command prompt enter the following command to set the time and date to 10:10:00 on May 27th 2011

network@EX_STACK> set date 201105271010.00

OK now we’ve had some fun there lets setup NTP. This guide assumes you have chosen a suitable NTP server. We’re going to use one of Manchester University hosts because I’ve been using it since 1996 (you can find a list of public NTP servers here)

Firstly lets tell JunOS to synchronise it’s time with the NTP server. In the past JunOS needed to be within 128 seconds of the NTP server time or else it would never sync up. Remember, if you are going to use the DNS hostname in this configuration (and we are) then you should make sure you have a DNS lookup configured. In our setup we’ve configured DNS lookup against the Google DNS hosts.

Screen shot 2011-05-27 at 22.26.05

So lets syncronise our server manually by initiating an NTP ‘get’

Screen shot 2011-05-27 at 22.34.26

Hey thats pretty good - we were only 1 second out already!

Right now we’ll setup the NTP client process so lets go into configuration mode:

Screen shot 2011-05-27 at 22.35.27

Now add in the ntp configuration command

Screen shot 2011-05-27 at 22.35.19

Lets just check it took this in the configuration

Screen shot 2011-05-27 at 22.35.14

Now commit it to flash and pop back to exec mode with the ‘commit-and-quit’ command

So we’re out of configuration mode lets check out the ntp associations and status...

Screen shot 2011-05-27 at 22.35.41

Right well we’ve got a stratum (st) of 16 here which means we’re not totally ready yet and either out router (R1) can’t talk to the remote NTP server or they cant talk to us. Lets wait...and wait...

Screen shot 2011-05-27 at 22.36.23

Great. It took about 10 seconds but now we see that we have syncronised with a stratum 2 server called ‘turnip’. I guess the hostname ntp2d is a CNAME.

All good, all in sync. Now we performed the upgrade again and it all went perfectly. Well done Juniper.

Thanks for reading.
View Comments
Recovering from EX disk failure (db> prompt)

Recovering from disk failure
One of the common reasons a switch drops out the cluster is disk corruption. If the cluster can't see the switch and vice versa, I would look at the status of the disks. Another giveaway is the LOADING JUNOS message on the LCD display on the switch.

The EX4200 switches have 2 internal disks (called internal media)


1. da0s1 (called slice 1)
2. da0s2 (called slice 2)

A USB disk can be plugged in the back of the switch and is called external media

Both the disks retain a copy of the OS and configuration files. The switch can use either of the disks to get the files it needs. When one disk is corrupt the switch will automatically use the other disk.

The following procedure is to be used to fix a corrupt disk. The general outline is as follows:

• Identify the corrupt internal disk.
• Copy all files from the good internal disk to a USB disk.
• Boot from the USB disk.
• Fix the corrupt disk.
• Boot into the fixed disk.
• Reboot the whole stack. 
• Confirm fix.

Before you start make sure you connect a console cable to the back of the malfunctioning switch. All work will be done using this console connection in case you lose your connectivity.


Procedure 

1. Identifying the corrupted disk

Console onto the stack and issue the following commands in user mode:

show system snapshot all-members slice 1 media internal
show system snapshot all-members slice 2 media internal


This will display the contents of disks da0s1 and da0s2 on all stack memebers

A good disk will look like this:

fpc0:
--------------------------------------------------------------------------
Information for snapshot on internal (da0s1)
Creation date: Jan 8 08:43:51 2010
JUNOS version on snapshot:
jbase : 10.0S1.1
jcrypto-ex: 10.0S1.1
jdocs-ex: 10.0S1.1
jkernel-ex: 10.0S1.1
jroute-ex: 10.0S1.1
jswitch-ex: 10.0S1.1
jweb-ex: 10.0S1.1
jpfe-ex42x: 10.0S1.1



For reference, fpc0 is the Flexible PIC Concentrator for Switch 0 in the stack. fpc1 is Switch 1 and so on...

A corrupt disk will look like this:

fpc0:
--------------------------------------------------------------------------
error: cannot mount /dev/da0s2a


In this case disk da0s2 is corrupt.

Now that we have identified disk 2 in switch 0 in the stack as corrupt we move on to the next step.

2. Copy all files from the good internal disk to a USB disk

Log on to switch 0 using the following command:

request session member 0

Lets look at disk 2 to confirm you are on the correct switch using the following command:

show system snapshot local slice 2 media internal

The output should be:

error: cannot mount /dev/da0s2a

Insert a 2GB USB disk into the USB slot at the back of switch 0. Take note of this point. We tried flash bigger than 2GB and USB hard disks and none of them worked. Also it seemed that the format had to be UFS or FAT32.

Copy disk 1 files to the USB disk using the following command:

request system snapshot local partition media external

Expect the following output if the operation is successful

fpc0:
--------------------------------------------------------------------------
Clearing current label...
Partitioning external media (da1) ...
Verifying compatibility of destination media partitions...
Running newfs (720MB) on external media / partition (da1s1a)...
Running newfs (217MB) on external media /config partition (da1s1e)...
Running newfs (480MB) on external media /var partition (da1s1f)...
Copying '/dev/da0s1a' to '/dev/da1s1a' .. (this may take a few minutes)
Copying '/dev/da0s1e' to '/dev/da1s1e' .. (this may take a few minutes)
Copying '/dev/da0s1f' to '/dev/da1s1f' .. (this may take a few minutes)
The following filesystems were archived: / /config /var


3. Boot from the USB disk

Issue the following command:

request system reboot local media external 

You'll get this prompt...type 'yes'

Reboot the system ? [yes,no] (no) yes

4. Fix the corrupt disk

Copy files from the USB disk to disk 2 using the following command:

request system snapshot local partition media internal slice 2 

Expect the following output if the operation is successful

fpc0:
--------------------------------------------------------------------------
Clearing current label...
Partitioning internal media (da0) ...
Verifying compatibility of destination media partitions...
Running newfs (187MB) on internal media / partition (da0s2a)...
Running newfs (56MB) on internal media /config partition (da0s2e)...
Running newfs (124MB) on internal media /var partition (da0s2f)...
Copying '/dev/da1s1a' to '/dev/da0s2a' .. (this may take a few minutes)
Copying '/dev/da1s1e' to '/dev/da0s2e' .. (this may take a few minutes)
Copying '/dev/da1s1f' to '/dev/da0s2f' .. (this may take a few minutes)
The following filesystems were archived: / /config /var


5. Boot into the fixed disk

Issue the following command:

request system reboot local slice 2 media internal 

You'll get this prompt...type 'yes'

Reboot the system ? [yes,no] (no) yes

The switch will boot into loader mode. Issue the following command at the 'loader>' prompt

loader> reboot

When booted login to the switch

Have a look around the switch and try a few show commands to satisfy yourself that everything is working fine. A good command to try is 'show virtual-chassis'

Expect to see all your switches in the stack. like the sample output below, what you want to see is status Prsnt on all the cluster members. 

0 (FPC 0) Prsnt B00000000000 ex4200-48p 128 Linecard 2 vcp-0
1 (FPC 1) Prsnt B00000000000 ex4200-48p 254 Master* 0 vcp-0
2 (FPC 2) Prsnt B00000000000 ex4200-48p 128 Linecard 4 vcp-0
3 (FPC 3) Prsnt B00000000000 ex4200-48p 254 Backup 1 vcp-0
4 (FPC 4) Prsnt B00000000000 ex4200-48p 128 Linecard 3 vcp-0


6. Reboot the Whole stack

Issue the following command:

request system reboot 

At the reboot prompt type 'yes'

Reboot the system ? [yes,no] (no) yes

7. Confirm fix


Login into the stack. Look again at the cluster status. Confirm all member status is Prsnt.

0 (FPC 0) Prsnt B00000000000 ex4200-48p 128 Linecard 2 vcp-0
1 (FPC 1) Prsnt B00000000000 ex4200-48p 254 Master* 0 vcp-0
2 (FPC 2) Prsnt B00000000000 ex4200-48p 128 Linecard 4 vcp-0
3 (FPC 3) Prsnt B00000000000 ex4200-48p 254 Backup 1 vcp-0
4 (FPC 4) Prsnt B00000000000 ex4200-48p 128 Linecard 3 vcp-0  
 
Issue the following commands

show system snapshot all-members slice 1 media internal
show system snapshot all-members slice 2 media internal


This will display the contents of disks da0s1 and da0s2 on all stack members


Complete file structures and the absence of error messages confirm success. Confirm all disks 1 and 2 look like this.


fpc0:
--------------------------------------------------------------------------
Information for snapshot on internal (da0s1)
Creation date: Jan 8 08:43:51 2010
JUNOS version on snapshot:
jbase : 10.0S1.1
jcrypto-ex: 10.0S1.1
jdocs-ex: 10.0S1.1
jkernel-ex: 10.0S1.1
jroute-ex: 10.0S1.1
jswitch-ex: 10.0S1.1
jweb-ex: 10.0S1.1
jpfe-ex42x: 10.0S1.1
fpc0:
--------------------------------------------------------------------------
Information for snapshot on internal (da0s2)
Creation date: Jan 11 06:40:09 2010
JUNOS version on snapshot:
jbase : 10.0S1.1
jcrypto-ex: 10.0S1.1
jdocs-ex: 10.0S1.1
jkernel-ex: 10.0S1.1
jroute-ex: 10.0S1.1
jswitch-ex: 10.0S1.1
jweb-ex: 10.0S1.1
jpfe-ex42x: 10.0S1.1


Done. Thanks for reading
View Comments
Juniper EX switches come with two separate flash partitions a root (default boot) and a copy of that root in another piece of flash memory. Each partition contains a copy of the boot software and the configuration. Now we've deployed a number of EX clusters and from time to time we notice that sometimes the secondary partitions (the non-active one) doesn't get upgraded with the active one. This obviously causes problems if the active partition is corrupted and wont boot. Manually booting the secondary to get the switch up doesn't help if it's in a cluster because that node is then marked at NotPrsnt (Not Present) due to the fact it won't be running the same software as the other nodes.

So, luckily Juniper have come to the rescue and brought out the latest 10.4 firmware to do the following (source: www.juniper.net)


Resilient dual-root partitioning, introduced on Juniper Networks EX Series Ethernet Switches in Junos operating system (Junos OS) Release 10.4R3, provides additional resiliency to switches in the following ways:
  • Allows the switch to boot transparently from the second root partition if the system fails to boot from the primary root partition.
  • Provides separation of the root Junos OS file system from the /var file system. If corruption occurs in the /var file system (a higher probability than in the root file system due to the greater frequency in /var of reads and writes), the root file system is insulated from the corruption.
Great news this. So lets upgrade our firmware and get this great feature. Here is a copy of todays firmware version:

show_version

We need to get hold of the latest firmware from Juniper's website so we download that...but there was also an issue with our Jloader...it was too old. We will need to delete the old loader and upgrade that as well as the firmware (Jloader Upgrade Link). Good news is we can do both then reboot to cut down on reboot time. We've downloaded the jloader and junos image (10.4R3 is recommended at the time we wrote this).

First lets just check we can ping the FTP server holding the firmware images. We're using FileZilla server for this and we've created a new user call 'junos'. You can get hold of 
FileZilla server here

ping_check


Looks good, right lets go with the upgrade.

Jloader first. The ftp load command uses the syntax 'request system software add ftp://10.10.15.23/jloader-ex-3242-11.3I20110326_0802_hmerge-signed.tgz'. This is basically saying we're using FTP to get the image and that the image is located on server with IP address 10.10.15.23. Without stating a username and password int he form ftp://username:password@ the JunOS parser will use the default username of 'anonymous' with no password. Some FTP servers come with built in anonymous support...FileZilla needs you to create a user called 'anonymous' with the password checkbox unchecked. HEre is the process:


request_jloader_process

Each node in the cluster will be upgraded in turn until it finishes and returns you back tot he prompt:

request_jloader_finish

The FileZilla management console shows you the whole process as the file is pulled back to the cluster master node. Here is a brief screen shot of the download:

request_jloader_ftp


Right, thats the jloader upgrade bit applied (but not yet active until we reboot. To save time we're now going to upgrade the firmware so that we only do one reboot. Here is the process and remember, this is a 4 node cluster as shown by the fpc0,1,2,3. If you have a larger node number then your output will be different.

request_junos_install

So, just like the man said 'A reboot is required to install the software'. Let us oblige...

request_system_reboot

It took about 5 minutes to come around again, I logged in and checked the firmware versions and loader.

show_chassis_firmware

Thats OK now I checked the state of the partitions

show_system_storage

Looks like I have two partitions there active/backup....all looking pretty sweet. Lets get some more information on the state of those partitions...detail?...nah thats what they would expect you to do lets look at the snapshot...

show_system_snapshot

We're upgraded, we've got two healthy partitions...just need to wait for a failure now to see it automatically fix itself...but I won't wish for that. I think one question we're all asking is how do I find out if there has been a partition failure if it fixes itself?

Well you've got console logs, syslog and SNMP...take your pick. From the management port you will see



WARNING: THIS DEVICE HAS BOOTED FROM THE BACKUP JUNOS IMAGE




You can of course always look at the chassis alarms.




user@switch> show chassis alarms
1 alarms currently active
Alarm time Class Description
                2011-02-17 05:48:49 PST Minor Host 0 Boot from backup root




Thank you for reading and may all of your upgrades be as sweet.

View Comments
We’ve broken up the 5 node cluster and re-racked two of the nodes into a new stack. The following screen shot shows the two nodes as ‘Inactive’ and node members FPC 1 and FPC 4. Both are Linecards because the Master and Backup switches were nodes 0 and 3 in the pre-existing cluster. You can also see that only the vcp-0 virtual chassis cable has been connected (the vcp-1 cable is disconnected for no reason other than we didn’t do it).

Screen shot 2011-04-18 at 21.59.44

First things first, lets kill off the VCP ports to disable the VC traffic.

Screen shot 2011-04-18 at 22.04.16

Now go into config mode by typing ‘configure’. and then we’ll load the factory default settings.

Screen shot 2011-04-18 at 22.05.20

Before we can commit the blank configuration we need to set the root password...it won’t save without it...give it a go if you want.

Screen shot 2011-04-18 at 22.06.02

OK so now commit the blank configuration

Screen shot 2011-04-18 at 22.07.01

Back into EXEC mode we’ll take a look at the new cluster status. Note we can’t see the other node because the VCP ports are disabled.

Screen shot 2011-04-18 at 22.06.34

Lets run through the same process on the other node and commit. Now back in EXEC mode we turn the VCP ports back on. By NOT putting the keyword ‘disable’ on the end they are enabled...I know thats a bit poor but there you go.

> request virtual-chassis vc-port set interface vcp-0
> request virtual-chassis vc-port set interface vcp-1

So now we check the status of the node...all good. We have a Master and Backup in a two node cluster with FPC 0 and FPC1. The mastership priorities are both the default of 128.

Screen shot 2011-04-18 at 22.16.25

View Comments
© 2011 defaultrouteuk.com

Cisco, IOS, CCNA, CCNP, CCIE are trademarks of Cisco Systems Inc.
JunOS, JNCIA, JNCIP, JNCIE are registered trademark of Juniper Networks Inc.