LinuxMCE Forums

General => Users => Topic started by: m3freak on July 25, 2011, 01:13:21 am

Title: Diskless MDs not PXE booting
Post by: m3freak on July 25, 2011, 01:13:21 am
I've installed LinuxMCE on a 2-nic server with the snapshot image LinuxMCE-8.10-24098-i386.iso.  The instructions on the 8.10 install page are missing steps, so it was a bit of a challenge because I'm not used to Ubuntu (I'm a Red Hat guy).  In the end, I got the core updated.  I also ran the script for creating the diskless boot image. It seemed to complete without any issue.

When I try to PXE boot a Jetway Ecomini, it never gets an IP and eventually the PXE boot fails.  After many failed attempts, I tried another random PC (HP laptop) - PXE boot on it failed, too.  I reviewed the logs on the core, and saw this:

Jul 24 17:51:03 core dhcpd: DHCPDISCOVER from 00:30:18:ae:b5:56 via eth0
Jul 24 17:51:03 core dhcpd: DHCPOFFER on 192.168.80.129 to 00:30:18:ae:b5:56 via eth0

The DHCPREQUEST and DHCPACK are never logged.  Oh, the above log is for the Ecomini.

I don't understand why the device attempting the PXE boot doesn't follow through with the DHCPREQUEST.

It's very likely I've missed something.  Would someone please point out what that something is?
Title: Re: Diskless MDs not PXE booting
Post by: tkmedia on July 25, 2011, 05:43:21 am
Did you run /usr/pluto/bin/Diskless_CreateTBZ.sh  to create the initial images.

if not please do so.

HTH

Tim

Title: Re: Diskless MDs not PXE booting
Post by: m3freak on July 25, 2011, 02:32:32 pm
Did you run /usr/pluto/bin/Diskless_CreateTBZ.sh  to create the initial images.

Yes, of course I did.  I even said so in my post!

I have also run the script a few times, but the result is the same - no device will PXE boot.
Title: Re: Diskless MDs not PXE booting
Post by: purps on July 25, 2011, 02:51:41 pm
Knackered switch? And check your cables.
Title: Re: Diskless MDs not PXE booting
Post by: RayBe on July 25, 2011, 03:13:54 pm
hi m3freak,

Below are a few URL's on how-to setup your Lmce network,
i would suggest to check your setup with whats described in the URL's,
your PXE-boot devices must be on the internal part of your setup.

http://wiki.linuxmce.org/index.php/Network_Settings
http://wiki.linuxmce.org/index.php/Network_Setup
http://wiki.linuxmce.org/index.php/File:Diagram1.jpg

br,
Raymond
Title: Re: Diskless MDs not PXE booting
Post by: coley on July 25, 2011, 06:05:46 pm
m3freak,
Your logs are showing that your internal network is on eth0.
The external n/w should normally be on eth0 and internal on eth1.
If you swap your n/w cables and click "Swap interfaces" on the Network Admin page this should swap over eth0 and eth1 interfaces.

Try pxe-booting your machines again.

-Coley.
Title: Re: Diskless MDs not PXE booting
Post by: m3freak on July 25, 2011, 09:19:28 pm
m3freak,
Your logs are showing that your internal network is on eth0.
The external n/w should normally be on eth0 and internal on eth1.
If you swap your n/w cables and click "Swap interfaces" on the Network Admin page this should swap over eth0 and eth1 interfaces.

Try pxe-booting your machines again.

No way.  I'm going to cry if this really is the case.  Why would LinuxMCE care?

FYI: I put the internal on eth0 because eth0 is the on board Gb NIC.  eth1 is an Intel PRO1000 PCI card.  I figured the on-board NIC would have better performance.

I'll try the swap tonight and report back.
Title: Re: Diskless MDs not PXE booting
Post by: klovell on July 25, 2011, 10:24:00 pm
m3freak,
Your logs are showing that your internal network is on eth0.
The external n/w should normally be on eth0 and internal on eth1.
If you swap your n/w cables and click "Swap interfaces" on the Network Admin page this should swap over eth0 and eth1 interfaces.

Try pxe-booting your machines again.

-Coley.

I'm not arguing these instructions, but if he swaps the cables and click on thw swap interface button wouldn't that put him back in the same position... Like a double negative? I thought the swap interfaces button was there so you wouldn't have to physically swap the cables.  I thought it swapped eth0 and eth1 in the config. 

Correct me if i'm wrong.
Title: Re: Diskless MDs not PXE booting
Post by: bongowongo on July 25, 2011, 10:25:15 pm
Correct.

Or swap cables

Or swap interfaces

Or do Both, 1,5 times.
Title: Re: Diskless MDs not PXE booting
Post by: klovell on July 25, 2011, 10:33:48 pm
No way.  I'm going to cry if this really is the case.  Why would LinuxMCE care?

FYI: I put the internal on eth0 because eth0 is the on board Gb NIC.  eth1 is an Intel PRO1000 PCI card.  I figured the on-board NIC would have better performance.

I'll try the swap tonight and report back.

By default LMCE doesn't offer ip addresses to the external interface, therefore you can not pxe boot off of the external interface.  It doesn't care which interface is internal or external.  I had the same logic when I setup my core.  Both nic's are gb nics but I just figured the on-board nic would yield higher performance. 
Title: Re: Diskless MDs not PXE booting
Post by: bongowongo on July 25, 2011, 10:37:33 pm
Usually dedicated NIC has more speed, and doesn't lean on processor.
That could have changed, but in the olden days that was the case.
Title: Re: Diskless MDs not PXE booting
Post by: coley on July 25, 2011, 11:23:42 pm
yep, thx for the correction.
don't know why the system is giving out ip addresses on eth0.

-Coley.
Title: Re: Diskless MDs not PXE booting
Post by: purps on July 26, 2011, 12:21:59 am
I always put eth0 on external and eth1 on internal because "that is what you are supposed to do", but I've had past installs working with no problems that were the other way round.

If you want to change which card is eth0 and which card eth1, you have to do it in /etc/udev/rules.d/70-persistent-net.rules

Cheers,
Matt.
Title: Re: Diskless MDs not PXE booting
Post by: ardirtbiker on July 26, 2011, 01:44:26 am
ditto what Matt (purps) said!

Dennis
Title: Re: Diskless MDs not PXE booting
Post by: m3freak on July 27, 2011, 02:40:40 pm
So this is what I did:

1. took the udev approach and made the add-in nic eth0, and the onboard nic eth1.  PXE boot still fails.
2. I reinstalled LinuxMCE.  PXE boot still fails.
3. I changed cables.  PXE boot still fails.
4. I change the switch.  PXE boot still fails.
5. I plugged the Ecomini directly into eth1 on the core. PXE boot still fails.
6. I plugged in a lexmark printer, a random PC, and a HP laptop.  PXE boot or just plain IP assignment via DHCP fails on all of them.
7. I checked iptables - no rules, at all.

I ran tshark on eth1 to see what's happening on the wire (I had a feeling the DHCPOFFER wasn't getting out).  When I PXE boot anything, tshark shows the DHCPDISCOVER.  /var/log/daemon then shows DHCPOFFER, but tshark NEVER sees it.  That is, even though the core shows a DHCPOFFER being sent, tshark reveals the DHCPOFFER never actually reaches eth1. This explains why I wasn't seeing a DHCPREQUEST before: the PXE client isn't getting the DHCPOFFER, so it can't send a DHCPREQUEST!

Something is preventing DHCP from working correctly. I don't think it's the hardware: both nics exhibit the same behaviour.  No matter how many times I install LinuxMCE, I see the same problem.

I don't know LinuxMCE well enough to troubleshoot this.  Does anyone know what's going on?
Title: Re: Diskless MDs not PXE booting
Post by: purps on July 27, 2011, 02:51:18 pm
Hmmm. You said one of the NICs is an Intel - is that the non-onboard one? Is your onboard one a Realtek by any chance?

And you've tried all possible combinations of swapping the labels in udev and swapping the cables (either physically or in web admin)?

Could it be that both your Jetway and HP laptop both have unrecognised NICs? I'm sure you're aware that Kubuntu 810 is somewhat long in the tooth now, so you do need to sometimes fart around with more modern hardware in order to get it working.

Cheers,
Matt.
Title: Re: Diskless MDs not PXE booting
Post by: klovell on July 27, 2011, 03:10:34 pm
Something is preventing DHCP from working correctly. It's not the hardware - both nics can get DHCP IPs from my IPCop firewall.  No matter how many times I install LinuxMCE, I see the same problem.

I'm not trying to be an ass but unless there is a problem with the snapshot you downloaded this pretty much means there is a hardware problem.  Either broken, bad connection or not compatible.  Try a different snapshot to confirm, boot to an os other than linuxmce and confirm the nics actually work, then replace (not swap) cables. If all these test come back okay, check the LMCE network diagram and make sure you're connected properly.

From my experience the pxe boot, DHCP portion of LMCE just works.  I can't recall ever having a problem that swapping cables didn't fix and there is nothing to configure.  I had an issue once with one MD that had a wireless and wired card, but it was pretty far along in the boot process indicating the system did pxe boot.  I think it started booting off the wired nic but at some point tried to finish on the wireless nic which caused the system to reboot in an infinite loop.  The problem went away after I disabled the wireless card in the bios.  Are you certain the PXE boot is failing and not the boot up itself?  <--- I couldn't phrase that any better...
Title: Re: Diskless MDs not PXE booting
Post by: m3freak on July 27, 2011, 05:48:59 pm
Hmmm. You said one of the NICs is an Intel - is that the non-onboard one? Is your onboard one a Realtek by any chance?

Yes, the on-board nic is a Realtek:

04:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 06)
Subsystem: ASUSTeK Computer Inc. Device 8432
   Flags: bus master, fast devsel, latency 0, IRQ 220
   I/O ports at d800 [size=256]
   Memory at fdfff000 (64-bit, prefetchable) [size=4K]
   Memory at fdff8000 (64-bit, prefetchable) [size=16K]
   Capabilities: <access denied>
   Kernel driver in use: r8169
   Kernel modules: r8169

And you've tried all possible combinations of swapping the labels in udev and swapping the cables (either physically or in web admin)?

I believe so.

Could it be that both your Jetway and HP laptop both have unrecognised NICs? I'm sure you're aware that Kubuntu 810 is somewhat long in the tooth now, so you do need to sometimes fart around with more modern hardware in order to get it working.

IPCop is even older, yet it happily hands out IPs to these devices.  I know Kubuntu 8.10 is old, but IPCop runs a 2.4 kernel!  LinuxMCE loads the kernel modules for the NICs no problem - they wouldn't come up otherwise, right?  Also, I've used both NICs as the external interface - they got IPs from my external DHCP server without issue.

The PC I have LinuxMCE installed is new.  Well, 6 months old now, but still...

I really hope I'm not missing something basic.
Title: Re: Diskless MDs not PXE booting
Post by: m3freak on July 27, 2011, 05:54:05 pm
I'm not trying to be an ass but unless there is a problem with the snapshot you downloaded this pretty much means there is a hardware problem.  Either broken, bad connection or not compatible.  Try a different snapshot to confirm, boot to an os other than linuxmce and confirm the nics actually work, then replace (not swap) cables. If all these test come back okay, check the LMCE network diagram and make sure you're connected properly.

I installed Fedora 15 on the same PC a month ago when I was having problems with getting LinuxMCE to see the IDE drive (I'm using an old PATA drive for the OS).  Fedora 15 saw all the hardware without any problems. Everything works.

Are you certain the PXE boot is failing and not the boot up itself?  <--- I couldn't phrase that any better...

To be exact: the DHCP bit is failing.  That is, before the MD starts to boot off the network, it's supposed to get an IP from the DHCP server (on the core). It's the DHCP part that's failing.  Thus, the MD never starts the PXE boot.

Maybe I should try out 10.10 just to see if my hardware works with it.
Title: Re: Diskless MDs not PXE booting
Post by: purps on July 27, 2011, 06:11:36 pm
Yes, the on-board nic is a Realtek:

04:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 06)
Subsystem: ASUSTeK Computer Inc. Device 8432
   Flags: bus master, fast devsel, latency 0, IRQ 220
   I/O ports at d800 [size=256]
   Memory at fdfff000 (64-bit, prefetchable) [size=4K]
   Memory at fdff8000 (64-bit, prefetchable) [size=16K]
   Capabilities: <access denied>
   Kernel driver in use: r8169
   Kernel modules: r8169


I bet that's the problem. There are known issues with the r8169 module in 810. The NIC may APPEAR to be working fine i.e. you can get on the internet, see other machines, etc, but it's secretly being crap.

Blacklist the r8169 (add "blacklist r8169" to /etc/modprobe.d/blacklist) and then run...
Code: [Select]
update-initramfs -k all -u
Then remove the r8169 driver with...
Code: [Select]
rmmod r8169
Then download the r8168 driver from the Realtek website http://www.realtek.com/downloads/downloadsview.aspx?langid=1&pnid=13&pfid=5&level=5&conn=4&downtypeid=3&getdown=false, extract it, and run the script (no need to make and build and install and what not, the script does all that).

Then I would label the Realtek NIC to be eth1 in "/etc/udev/rules.d/70-persistent-net.rules", because we know that definitely supports PXE booting, make the Intel one eth0 of course, make sure that eth0 is on external and eth1 is internal (look on the advanced page on any orbiter, or in web admin, and swap interfaces if necessary), and then see how you get on.

Cheers,
Matt.
Title: Re: Diskless MDs not PXE booting
Post by: uplink on July 27, 2011, 06:39:58 pm
First, whoever said "eth0 is supposed to be external", forget that idea and beat whoever put it in your head to begin with :)

Next, I always hate it when the DHCPOFFER reply seems to get lost on the network. I still don't have a pattern about how to fix this when it happens. Last thing I wanted to throw into a wall because of this was a SoundBridge (their forums said it was a hardware issue, so no hope for a fix in a firmware update).

m3freak: I'm pretty sure your core is OK. The problem is when negotiating DHCP. Could be as "simple" as what purps said. Could be something else altogether. If I can run a LiveCD on a machine that exhibits DHCPOFFER-ignorance, I run a LiveCD just to make damn sure I can get DHCP with the out of the box distro. If not, I look in /var/log/syslog on both the Core and the machine in question to match them up. I even go as far as to run tcpdump on the machine that won't take DHCPOFFER as an answer to see if the message comes in (at this point I also get all kinds of ideas that there might be a "third party" on the network messing things up).

Not an easy thing debugging this. And it's annoying too.

I only skimmed over the thread, but I think the Realtek you mentioned is in your core. Try to leave the driver swap for last. See what you have in the machine you want to use as an MD.
Title: Re: Diskless MDs not PXE booting
Post by: uplink on July 27, 2011, 06:57:10 pm
One thing that did come to mind is this:

The DHCPOFFER message doesn't contain the parameters for PXE boot, so the machine is ignoring it. Not exactly sure how this could happen though.

But if you look in /etc/dhcp3/dhcpd.conf you should see a section that looks like this:

Code: [Select]
subnet 192.168.80.0 netmask 255.255.255.0 {
        next-server 192.168.80.1;
        filename "/tftpboot/pxelinux.0";
        option pxelinux.reboottime = 30;

        default-lease-time 86400;
        max-lease-time 604800;
        pool {
                 allow unknown-clients;
                 range 192.168.80.129 192.168.80.254;
        }
}

The important bit here is this:

Code: [Select]
        next-server 192.168.80.1;
        filename "/tftpboot/pxelinux.0";

If these two lines are missing, the PXE BIOS doesn't even bother to continue the negotiation and starts from the beginning. If they're not there it means someone decided to break PXE booting for some unknown reason.
Title: Re: Diskless MDs not PXE booting
Post by: m3freak on July 27, 2011, 08:21:11 pm
But if you look in /etc/dhcp3/dhcpd.conf you should see a section that looks like this:

Code: [Select]
subnet 192.168.80.0 netmask 255.255.255.0 {
        next-server 192.168.80.1;
        filename "/tftpboot/pxelinux.0";
        option pxelinux.reboottime = 30;

        default-lease-time 86400;
        max-lease-time 604800;
        pool {
                 allow unknown-clients;
                 range 192.168.80.129 192.168.80.254;
        }
}

The important bit here is this:

Code: [Select]
        next-server 192.168.80.1;
        filename "/tftpboot/pxelinux.0";

That's all there.  I've checked the dhcpd.conf file dozens of times.  I was hoping the problem was a simple misconfiguration, but as far as I could tell, it wasn't.
Title: Re: Diskless MDs not PXE booting
Post by: purps on July 27, 2011, 09:44:42 pm
What NICs are in the MDs out of interest?

Cheers,
Matt.
Title: Re: Diskless MDs not PXE booting
Post by: m3freak on July 27, 2011, 11:28:01 pm
What NICs are in the MDs out of interest?

I don't know.  I'll find out tonight.  I'll probably just boot the thing from a USB stick to get the answer quickly.
Title: Re: Diskless MDs not PXE booting
Post by: coley on July 28, 2011, 11:54:24 am
First, whoever said "eth0 is supposed to be external", forget that idea and beat whoever put it in your head to begin with :)

Apologies I may have implied this - must be a hangover from early 0810 installations, eth0 was needed as internet interface for the installer script.
-Coley.
Title: Re: Diskless MDs not PXE booting
Post by: purps on July 28, 2011, 12:43:09 pm
Apologies I may have implied this - must be a hangover from early 0810 installations, eth0 was needed as internet interface for the installer script.
-Coley.

Yeah me too, I remember it used to be in the installation instructions, specifically for the Internet install I think. I believe it was just to ensure the machine had some Internet going into it, and it was assumed that eth0 would be the NIC to provide it. I also seem remember it saying something along the lines of "alternatively plug cables from your router into both NICs, just to be sure".

But we digress.
Title: Re: Diskless MDs not PXE booting
Post by: pigdog on July 28, 2011, 12:51:17 pm
Hi,

Just curios but - what happens on the MD when you are trying to PXE boot?

Do you get a kernel panic, do you get we are announced to the router, at what point does the MD fail and what is the message?

Cheers.
Title: Re: Diskless MDs not PXE booting
Post by: uplink on July 28, 2011, 02:28:26 pm
Hi,

Just curios but - what happens on the MD when you are trying to PXE boot?

Do you get a kernel panic, do you get we are announced to the router, at what point does the MD fail and what is the message?

Cheers.

He said it's not getting past the DHCPOFFER message in syslog, so I assume the MD says "Getting DHCP address" (or similar) with dots printed after it, possibly displaying a new dot once every 1-3 seconds. This is a PXE BIOS message. No Linux or boot loader has been loaded at this point.
Title: Re: Diskless MDs not PXE booting
Post by: purps on July 28, 2011, 02:35:51 pm
...possibly displaying a new dot once every 1-3 seconds. This is a PXE BIOS message. No Linux or boot loader has been loaded at this point.

What type of switch do you have, m3freak? You said you tried another switch - was it the same brand/model?
Title: Re: Diskless MDs not PXE booting
Post by: m3freak on July 28, 2011, 04:32:32 pm
purps, you are a God amongst men: the driver was the f*&king culprit!!  I installed the r8168 module and boom, the Jetway Ecomini pulled in a IP and started the PXE boot.  Thanks man! I would not have considered the Realtek module in 0810 as being the source of the issue - at least not for a while.

So, back to the question of what's in the MD.  Guess what?  It's a Realtek PCIe Gb NIC.  Sweet!!   :'(

I rebuilt the diskless image after installing the r8168 module, but the MD still kernel panics.  From memory, the error is something about an eth0 file not being present.

What's my next step?  I'm going to search the forums and wiki in the meantime.

Note: my switch is a Dell PowerConnect Gb 8port...probably 6 years old now.
Title: Re: Diskless MDs not PXE booting
Post by: tkmedia on July 28, 2011, 05:38:46 pm
http://wiki.linuxmce.org/index.php/Unrecognized_NIC
http://wiki.linuxmce.org/index.php/R8168

pay close attention to the MD section on the R8168 doc

hth


Tim
Title: Re: Diskless MDs not PXE booting
Post by: purps on July 28, 2011, 06:49:49 pm
I would not have considered the Realtek module in 0810 as being the source of the issue - at least not for a while.

The fact that I said there are known problems with the r8169 driver in 810 should have been your first clue ;)

So, back to the question of what's in the MD.  Guess what?  It's a Realtek PCIe Gb NIC.  Sweet!!   :'(

I rebuilt the diskless image after installing the r8168 module, but the MD still kernel panics.  From memory, the error is something about an eth0 file not being present.

Don't worry, you should be able to get the MD working. Read my user page, check out the Living Room MD, it will be a similar process for you. The various instructions mentioned here are all based on the wiki pages that Tim mentioned.

To give you a nudge in the right direction, you need to manually place a copy of the r8168 driver that you installed on your core amongst the MD gubbins also; at the moment, it is literally just on the core, and the MD can't use it.

Cheers,
Matt.
Title: Re: Diskless MDs not PXE booting
Post by: m3freak on July 29, 2011, 05:02:09 am
So, I looked at the stuff tkmedia linked to and now I have a MD that no longer kernel panics, but instead dies a bit later in the boot. The error message printed is:

"Error: cannot connect to router: rebooting in 5 seconds."

The MD then reboots, and the same thing (as above) happens again.

I searched the forums and came across this thread:

http://forum.linuxmce.org/index.php?topic=8959.0

I tried out what Murcel suggested, and it did indeed get the boot even further along.  The problem now appears to be the MD pausing indefinitely after printing out something about "we've announced ourselves to the router" - I can't remember the exact error.  I let the MD sit like that for 45 minutes and saw no change. So, I rebooted the core for shits and giggles, and powered the MD back up after the core reboot was complete.  Unfortunately, the original error (Error: cannot connect to router: rebooting in 5 seconds.) came back.

I'm assuming I can get past this error if I run the startup-script.sh script again.

Questions:

1. Why do I have to keep running the startup-script.sh script?
2. I don't see any "diskless" script running on the core when the MD gets to the "announced ourselves to the router" message.  How do I fix this?
3. Why is my install so broken? Did one bad NIC driver really introduce this many problems?
Title: Re: Diskless MDs not PXE booting
Post by: posde on July 29, 2011, 07:37:08 am
m3freak,

please remember rule #1 - MD creation can take longer than 45minutes.

Title: Re: Diskless MDs not PXE booting
Post by: m3freak on July 29, 2011, 01:23:22 pm
m3freak,

please remember rule #1 - MD creation can take longer than 45minutes.



Ok, fair enough.  But, why do I have to run the startup_script.sh script every time I reboot the core? Well, that's been the case for the Jetway Ecomini MD, anywyay.

Also, when I do run the startup_script.sh script, the external interface of the core stops responding.
Title: Re: Diskless MDs not PXE booting
Post by: purps on July 29, 2011, 01:42:36 pm
Assuming you did a lot of cocking around before you found the solution to your core's NIC problems, reinstalling might be an idea, just to eliminate the possibility of a messed up setting somewhere.

What steps did you take exactly for the unrecognised NIC on your MD?

Cheers,
Matt.
Title: Re: Diskless MDs not PXE booting
Post by: m3freak on July 29, 2011, 02:29:42 pm
Assuming you did a lot of cocking around before you found the solution to your core's NIC problems, reinstalling might be an idea, just to eliminate the possibility of a messed up setting somewhere.

Nah, did nothing of the sort.  Actually, the install I'm working with right now is new.

What steps did you take exactly for the unrecognised NIC on your MD?

I did what was in those links.  I'd already installed the r8168 driver, so I did the other stuff:

- included the r8168 module in /etc/initramfs-tools-interactor/modules
- recreated the diskless image (the way it's described on the 0810 install page)
- I don't have a /usr/pluto/diskless dir, so I searched around until I found the post about running "startup_script.sh" to get past the MD boot error.

So that's where I am.

BTW, how can I tell if the diskless image is actually being created?  I looked on the core and can't find any running process that would indicate the MD's image is being created.

One other thing: although I can't ping or ssh to my core from the external network, I can definitely ping the external network from the core.  I didn't look at the iptables rules - maybe they're messed up?
Title: Re: Diskless MDs not PXE booting
Post by: purps on July 29, 2011, 02:47:25 pm
Run this on your core...
Code: [Select]
modprobe r8168
depmod -a
/usr/pluto/bin/Diskless_BuildDefaultImage.sh

The diskless image thing is already created, so I don't think that running the script from the 810 install page will do very much. The script above would be more appropriate. The depmod command should be run because you have changed some modules (that's my understanding anyway).

Try another reboot once you've done that, we want a directory to appear in /usr/pluto/diskless, as I am sure you are aware.

Cheers,
Matt.
Title: Re: Diskless MDs not PXE booting
Post by: uplink on July 29, 2011, 05:15:36 pm
BTW, how can I tell if the diskless image is actually being created?  I looked on the core and can't find any running process that would indicate the MD's image is being created.

The script Diskless_Setup.sh creates the image. If that is running, the Diskless MD filesystems are being created. If you don't see it, it's not happening. It creates the /usr/pluto/diskless directory and MD subdirectory.

If this is not the case for you, here's how it all works:

1. New MD PXE boots default boot image.
2. Default boot image connects to the Core and tells it to create a new MD device.
3. Default boot image displays "announced ourselves to the router" and waits for messages from the Core.
4. Core creates a MD device (check your device tree)
5. Core allocates IP address to new MD, tells new MD about it (you get "Allocated permanent IP" message on MD).
6. Core runs Diskless_Setup.sh, tells new MD about it (you get "Running Diskless_Setup.sh" message on MD).
7. When Diskless_Setup.sh finishes, Core tells MD about it. If it fails, the MD will display "Diskless_Setup.sh failed" message. If it succeeds, you'll get a success message and the Core will also tell the MD to reboot.
8. MD reboots into its new filesystem.

At no point should Diskless_Setup.sh die without the MD getting a message (error or success).

If you don't have the MD device in your tree after the router announcement, you have a different problem. If you do have the device in the tree, and MD says Diskless_Setup is running, but you don't see Diskless_Setup on the core, run /usr/pluto/bin/Diskless_Setup.sh yourself on the Core and see what's happening.
Title: Re: Diskless MDs not PXE booting
Post by: m3freak on July 31, 2011, 03:32:39 pm
Run this on your core...
Code: [Select]
modprobe r8168
depmod -a
/usr/pluto/bin/Diskless_BuildDefaultImage.sh

The first two steps get were done by the install script for the r8168 module.  I ran the script in the final step, rebooted - no change.  The MD just at the same screen, and there was no diskless image being created on the core.

I'm going to reinstall LinuxMCE.  Before I run the final install script from the desktop, I'm going to install the r8168 module.  If things still don't work, I'll try a new DVD snapshot.  If shit still fails, I'll dance on the computer.
Title: Re: Diskless MDs not PXE booting
Post by: m3freak on July 31, 2011, 04:28:52 pm
FAIL!  Reinstall of LinuxMCE and install of r8168 module right from the get go did not fix anything:

1. MD PXE boot dies after saying it can't contact the router.  The ONLY way to get past this step is to run "startup_script.sh" after EVERY SINGLE CORE REBOOT.
2. The MD's diskless image never runs.  The MD might say it's announced itself to the router, but the core doesn't actually do anything.

There is some seriously broken shit in the LinuxMCE snapshot I'm using.
Title: Re: Diskless MDs not PXE booting
Post by: m3freak on August 01, 2011, 01:21:47 am
Downloaded and installed the latest snapshot.  This is what I did:

1. Ran install
2. Appeared to complete successfully, so rebooted.
3. Logged into Kubuntu desktop
4. Tried to stop network services.  Got an error about an unknown device, even though both NICs were up.  eth0 had 192.168.80.1 IP, and eth1 had IP from my external DHCP server.
5. I unloaded the eth0 module, r8169.
6. Installed r8168 module.  After install script finished, eth0 was back up with old IP.
7. I added the r8168 module to /etc/initramfs-tools-interactor/modules
8. Created the default diskless image
9. Ran the final install script by double clicking the icon on the desktop
10. Rebooted after the install finished.
11. Last step of install ran and completed after reboot
12. Powered up the Jetway Ecomini MD
13. After it got an IP, it began the PXE boot.
14. After eth0 apparently came up, the Jetway reported it couldn't connect to the router, so it rebooted.
15.  60 minutes later, it's still rebooting because it can't find the router.

WTF.
Title: Re: Diskless MDs not PXE booting
Post by: posde on August 01, 2011, 08:02:00 am
Are you able to have other devices on your internal network receive (192.168.80.0/24) DHCP addresses and connect to the core? If yes, then your cores NIC seems to work okay.

If you can connect to the core, check if dcerouter is running on the core,

Code: [Select]
ps ax|grep DCERouter.log|grep -v grep

and have a look in /home/coredump/1 dir if there are any coredumps in there.
Title: Re: Diskless MDs not PXE booting
Post by: uplink on August 01, 2011, 06:56:54 pm
13. After it got an IP, it began the PXE boot.
14. After eth0 apparently came up, the Jetway reported it couldn't connect to the router, so it rebooted.
15.  60 minutes later, it's still rebooting because it can't find the router.

WTF.

Began the PXE boot in what way? Does it load the "default" PXE config file, then the kernel, then the initrd.img files or it doesn't get this far? If it doesn't get this far, check syslog on the Core and tell us what is says.

A normal default image boot log looks like this:

Code: [Select]
Aug  1 15:39:37 dcerouter dhcpd: DHCPDISCOVER from 08:00:27:51:34:0e via eth1
Aug  1 15:39:38 dcerouter dhcpd: DHCPOFFER on 192.168.80.129 to 08:00:27:51:34:0e via eth1
Aug  1 15:39:39 dcerouter dhcpd: DHCPREQUEST for 192.168.80.129 (192.168.80.1) from 08:00:27:51:34:0e via eth1
Aug  1 15:39:39 dcerouter dhcpd: DHCPACK on 192.168.80.129 to 08:00:27:51:34:0e via eth1
Aug  1 15:39:40 dcerouter in.tftpd[14552]: connect from 192.168.80.129 (192.168.80.129)
Aug  1 15:39:40 dcerouter atftpd[14552]: Advanced Trivial FTP server started (0.7)
Aug  1 15:39:40 dcerouter atftpd[14552]: Serving /tftpboot/pxelinux.0 to 192.168.80.129:2001
Aug  1 15:39:40 dcerouter atftpd[14552]: Serving /tftpboot/pxelinux.cfg/56424f58-0000-0000-0000-08002751340e to 192.168.80.129:49152
Aug  1 15:39:40 dcerouter atftpd[14552]: Serving /tftpboot/pxelinux.cfg/01-08-00-27-51-34-0e to 192.168.80.129:49153
Aug  1 15:39:40 dcerouter atftpd[14552]: Serving /tftpboot/pxelinux.cfg/C0A85081 to 192.168.80.129:49154
Aug  1 15:39:40 dcerouter atftpd[14552]: Serving /tftpboot/pxelinux.cfg/C0A8508 to 192.168.80.129:49155
Aug  1 15:39:40 dcerouter atftpd[14552]: Serving /tftpboot/pxelinux.cfg/C0A850 to 192.168.80.129:49156
Aug  1 15:39:40 dcerouter atftpd[14552]: Serving /tftpboot/pxelinux.cfg/C0A85 to 192.168.80.129:49157
Aug  1 15:39:40 dcerouter atftpd[14552]: Serving /tftpboot/pxelinux.cfg/C0A8 to 192.168.80.129:49158
Aug  1 15:39:40 dcerouter atftpd[14552]: Serving /tftpboot/pxelinux.cfg/C0A to 192.168.80.129:49159
Aug  1 15:39:40 dcerouter atftpd[14552]: Serving /tftpboot/pxelinux.cfg/C0 to 192.168.80.129:49160
Aug  1 15:39:40 dcerouter atftpd[14552]: Serving /tftpboot/pxelinux.cfg/C to 192.168.80.129:49161
Aug  1 15:39:40 dcerouter atftpd[14552]: Serving /tftpboot/pxelinux.cfg/default to 192.168.80.129:49162
Aug  1 15:39:40 dcerouter atftpd[14552]: Serving /tftpboot/default/vmlinuz to 192.168.80.129:49163
Aug  1 15:39:45 dcerouter atftpd[14552]: Serving /tftpboot/default/initrd to 192.168.80.129:49164
Aug  1 15:39:57 dcerouter dhcpd: DHCPDISCOVER from 08:00:27:51:34:0e via eth1
Aug  1 15:39:57 dcerouter dhcpd: DHCPOFFER on 192.168.80.129 to 08:00:27:51:34:0e via eth1
Aug  1 15:39:57 dcerouter dhcpd: DHCPREQUEST for 192.168.80.129 (192.168.80.1) from 08:00:27:51:34:0e via eth1
Aug  1 15:39:57 dcerouter dhcpd: DHCPACK on 192.168.80.129 to 08:00:27:51:34:0e via eth1

See where in the above sequence your boot process breaks down.
Title: Re: Diskless MDs not PXE booting
Post by: m3freak on August 02, 2011, 09:13:09 pm
NOTE: I've actually managed to get my MD to boot and the diskless image to be created.  However, I want to reply to the questions in this thread before leaving it and starting a new one (I have new problems....whooooopeeee!).

Are you able to have other devices on your internal network receive (192.168.80.0/24) DHCP addresses and connect to the core? If yes, then your cores NIC seems to work okay.

Couldn't before installing the r8168 driver.  Now I can.

If you can connect to the core, check if dcerouter is running on the core,

Code: [Select]
ps ax|grep DCERouter.log|grep -v grep

Not running.  I assume I can manually start it - I will research.

and have a look in /home/coredump/1 dir if there are any coredumps in there.

I have three core dumps from a few days ago.  Should I inspect them?
Title: Re: Diskless MDs not PXE booting
Post by: m3freak on August 03, 2011, 04:39:47 am
I was able to get my MD booted after I installed the newest snapshot of 0810.  I still had to run startup-script.sh, but the MD booted and kicked off the diskless image build - nice!