Author Topic: Notes On Diskless Boot times in 810  (Read 885 times)

jwelch1324

  • Newbie
  • *
  • Posts: 12
    • View Profile
Notes On Diskless Boot times in 810
« on: February 21, 2012, 04:56:39 am »
I was having a maddening time with the diskless boot times on my MD in 810 (I am not sure if this is an issue unique to me or if other people are having the issue, so please let me know if anyone else had this problem, it isn't anywhere in the forums that I could find, but it is all over the Ubutnu forums)

Essentially upon booting my diskless MD's I would get an NIS bind failure that looks like

Code: [Select]
Starting NIS Services
   binding to YP ...
...
... [repeat 10 times]

               [failed]
               [ok]

Along with this, starting cups, hal, and a few other services on the MD would hang forever. My MD boot times were on the order of about 8-10 mins (and these aren't little MD's, my main computer is a monster and it had the same problem) so I figured it had to be a networking issue.

Additionally, DNS lookups from MDs (like when doing an apt-get install on the MD) took forever (though oddly nslookup and dig were always responsive ~ms). And ssh moonXX would take ages to give me a command prompt (~3 mins from time I issued command from the core).

Looking at the /etc/nsswitch.conf file on the MD and core suggests that nis is used for host propagation, and given the dns issues it seemed likely that despite what many of the forums out there say about this binding to yp... failure, ypbind was never actually finding the NIS server on the core.

Furthermore, running ypbind -d on the core would never actually show a successful response from the core itself.

It turns out the issue is in /etc/ypserv.securenets for some odd reason my file had my external nic's IP as a trusted host, and none of my MD's nor the core's internal ip was listed as a trusted host. Adding

Code: [Select]
host 192.168.80.1
host 192.168.80.XX //MD's ip addr
....

for each configured MD and then doing a #sudo service nis restart allows the core and the MD's (reboot or run service nis restart on the MD's) to successfully bind to the NIS server and everything starts working great. ssh moonXX is basically instantaneous as you would expect from a lan machine. And even better the MD boot time is about 30-45 seconds from BIOS.

I haven't figured out what generates the ypserv.securenets file yet (I suspect its somewhere in one of the diskless setup scripts) but if anyone knows off the top of their head let me know.

Some things I will note.

My core was originally just designed to be a z-wave controller for lights/thermostat/etc... and not a media server. So the machine when I first installed LMCE had a single nic. Once I decided to put MD's in I added another nic and this caused issues with /etc/exports which I did resolve but I believe its possible that the same issues with the interface configuration were propagated into the NIS config.

Therefore if anyone else has had this binding issue and had 2 NIC's when they originally installed LMCE, please let me know so I can figure out if this was just a stupid mistake on my part for not following the guidelines and having 2 NICs in the first place, or if it is an issue in the diskless setup scripts.

Thanks!
~jw

kyfalcon

  • Guru
  • ****
  • Posts: 390
    • View Profile
Re: Notes On Diskless Boot times in 810
« Reply #1 on: February 21, 2012, 03:51:46 pm »
Could this also be a problem in 10.04?

jwelch1324

  • Newbie
  • *
  • Posts: 12
    • View Profile
Re: Notes On Diskless Boot times in 810
« Reply #2 on: February 21, 2012, 08:05:31 pm »
Potentially, I haven't setup a dedicated 1004 box yet to test. Though I plan on doing so in the next week or so when I have some time.

If someone has a 1004 setup and uses diskless MD's could you please let us know if there is a NIS binding problem on your diskless?

SSH into your diskless MD from the core
Code: [Select]
core$ sudo -i
core# ssh moonXX

moonXX#killall ypbind
moonXX#service nis restart

And see if you get the binding to YP..... failure (or backgrounded is another possible output depending on the NIS version/setup scripts)

Equivalently, on your core do the following
Code: [Select]
core$ sudo -i
core# ypbind -d -no-dbus

and you should get output similar to the following
Code: [Select]
core#ypbind -d -no-dbus
13792: parsing config file
13792: Trying entry: ypserver 192.168.80.1
13792: parsed ypserver 192.168.80.1
13792: add_server() domain: pluto, host: 192.168.80.1, slot: 0
13792: [Welcome to ypbind-mt, version 1.20.1]

13792: ping interval is 20 seconds

13794: ping host '192.168.80.1', domain 'pluto'
13794: Answer for domain 'pluto' from server '192.168.80.1'

If NIS sucessfully binds to the ypserv on the core

Otherwise you will see

Code: [Select]
core#ypbind -d -no-dbus
16411: parsing config file
16411: Trying entry: ypserver 192.168.80.1
16411: parsed ypserver 192.168.80.1
16411: add_server() domain: pluto, host: 192.168.80.1, slot: 0
16411: [Welcome to ypbind-mt, version 1.20.1]

16411: ping interval is 20 seconds

16413: ping host '192.168.80.1', domain 'pluto'
16413: Pinging all active servers.
16413: ping host '192.168.80.1', domain 'pluto'

where the last lines "Pinging all active servers" indicates that ypbind is broadcasting for any open NIS servers because it didn't receive any answer from the core, and it will loop this process, trying to bind to the core then broadcasting for an open NIS etc...

If you have the latter problem then there is an NIS failure. In which case check that both your core and MD is listed as a host in the /etc/ypserv.securenets file and try the above again.

Also note, that the core is the NIS master server so it doesn't need to bind to itself in the LMCE configuration. But it is still an easy way to test for possible NIS failure points, since you know if the core can't bind it's not a firewall problem but probably a ypserv security problem.

~jw

jwelch1324

  • Newbie
  • *
  • Posts: 12
    • View Profile
Re: Notes On Diskless Boot times in 810
« Reply #3 on: February 22, 2012, 04:32:37 am »
I'm trying a diskless boot on 1004 right now so I will let you know the results once it's done loading.

In the interm, here is a script (also attached) that will fix the issues just put this into the /usr/pluto/bin directory and run

Code: [Select]
$sudo -i
#cd /usr/pluto/bin
#chmod +x Fix_NFS.sh
#./Fix_NIS.sh

Then run the ypbind test I mentioned above and make sure the core can bind. Then reboot your MD's and see if the binding error goes away and/or if the boot times improve.

Here is the script for reference
Code: [Select]
#!/bin/bash

. /usr/pluto/bin/Config_Ops.sh
. /usr/pluto/bin/Section_Ops.sh
. /usr/pluto/bin/SQL_Ops.sh

DEVICEDATA_DisklessBoot=9
DEVICEDATA_DisklessImages=258


function setup_ypserv_snets
{
#Shamelessly Stolen from Diskless_Setup.sh
local Content=""
local Q="
SELECT
PK_Device,
IPaddress
FROM
Device
JOIN Device_DeviceData ON PK_Device = FK_Device AND FK_DeviceData = $DEVICEDATA_DisklessBoot
JOIN DeviceTemplate ON FK_DeviceTemplate = PK_DeviceTemplate
WHERE
FK_DeviceCategory = '8'
AND
FK_Device_ControlledVia IS NULL
AND
IK_DeviceData = '1'
"
local R=$(RunSQL "$Q")
local Row
for Row in $R ;do
local IP=$(Field 2 "$Row") #We just need to extract the Moon IP addr

if [[ "$IP" == "" ]] ;then
continue
fi

Content="${Content}  host ${IP}\n"
done

PopulateSection "/etc/ypserv.securenets" "DisklessMD" "$Content"

local IntrouterIP=""
Q="SELECT IPaddress FROM Device WHERE FK_DeviceTemplate = 7"
IntrouterIP=$(RunSQL "$Q")
if [[ ! "$IntrouterIP" == "" ]]; then
#Add the router to the secure hosts
Content="host ${IntrouterIP}"
PopulateSection "/etc/ypserv.securenets" "Router" "$Content"
fi
}

echo -e "\033[35m!!Warning!!\033[37m This script will overwrite your current NIS configuration!\ncontinue?[y/n]:"
read res1

if [ "$res1" == "y" ]; then

echo "Running Standard NIS Config /usr/pluto/bin/Network_NIS.sh...Please Wait..."
. /usr/pluto/bin/Network_NIS.sh &>/dev/null

echo "Setting up new Securenets file..."
setup_ypserv_snets
echo "Done"
echo -e "\033[33mRestarting NIS Services \033[37m"
service nis restart
echo "*** Complete ***"
else
echo "*** Aborted ***"
fi


NOTE: You have to rerun this every time you add a new MD (So its IP is added to the list of allowed hosts) unless you add a line at the end of Diskless_Setup.sh to run the script automatically.

EDIT: Updated script
~jw
« Last Edit: February 22, 2012, 04:37:01 am by jwelch1324 »

jwelch1324

  • Newbie
  • *
  • Posts: 12
    • View Profile
Re: Notes On Diskless Boot times in 810
« Reply #4 on: February 22, 2012, 04:46:16 am »
Ok Tested in 1004, the issue isn't there however I am almost positive the problem arose in the first place because I used a single nic install in the first place.

The same issue that appears here http://wiki.linuxmce.org/index.php/Single_to_Double_NIC in the exports file with the addition of the eth0 infront of everything happens in the ypserv.securenets file as well even after adding the second nic and fixing the network config files.

In any event, If you are having the binding issue the above script will fix it on 810 and 1004.