Thom, Mike, thanks for your input. I had to visit my friend to check out the config.
The system has
- 2x Intel Gigabit nics
- 2x 3.5 500GB SATA disks in RAID 1 configuration (1U in size, so disk replacement is difficult) on IBM server M1015 controller
- 4GB RAM (although LMCE reports 2GB total..)
- each media box has 4GB ram as well
- All is connected in 48 port Cisco SF-200 48x10/100 + 2x1000 ports (eth1 of linuxmce + ReadyNas Ultra6 connected)
I also tried a Procurve 1810-24G (24 port gigabit managed switch) that gave much better results in booting times (since the media boxes have gigabit adapters on them) but the same results (as far as number of ACTIVE media boxes concurrently).
I will try your suggestions once I have sometime to play with the system. So far, the only way to have a working installation is to power off any media box that is not needed, keeping a maximum of 6 powered on at all times.
If you remember, when the 7th or 8th (random) is powered on, the system starts to behave erratically upon attempting media/audio playback (continuous reloading of the core, DCE router dies and the proxy orbiter collapses). Even if DCErouter is forcefully restarted, proxy orbiter never wakes up.
Please let me know if you need any logs or what you need me to provide. After your helpful suggestions of monitoring iotop, i recorded the following session upon booting. Notice that even though all 10 media directors fired up, only 8 nfsd sessions were active.
dcerouter_1032148:/home/linuxmce# iotop
Total DISK READ: 2.85 M/s | Total DISK WRITE: 1459.94 K/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
387 be/3 root 0.00 B/s 91.01 K/s ?unavailable? [jbd2/sda1-8]
1766 be/4 root 538.47 K/s 0.00 B/s ?unavailable? [nfsd]
1767 be/4 root 307.16 K/s 0.00 B/s ?unavailable? [nfsd]
1768 be/4 root 174.43 K/s 3.79 K/s ?unavailable? [nfsd]
1769 be/4 root 64.46 K/s 60.67 K/s ?unavailable? [nfsd]
1770 be/4 root 492.97 K/s 0.00 B/s ?unavailable? [nfsd]
1771 be/4 root 728.07 K/s 11.38 K/s ?unavailable? [nfsd]
1772 be/4 root 420.92 K/s 30.34 K/s ?unavailable? [nfsd]
1773 be/4 root 193.39 K/s 3.79 K/s ?unavailable? [nfsd]
2048 rt/4 asterisk 0.00 B/s 0.00 B/s ?unavailable? asterisk -p -U asterisk
1 be/4 root 0.00 B/s 0.00 B/s ?unavailable? init
2 be/4 root 0.00 B/s 0.00 B/s ?unavailable? [kthreadd]
3 rt/4 root 0.00 B/s 0.00 B/s ?unavailable? [migration/0]
4 be/4 root 0.00 B/s 0.00 B/s ?unavailable? [ksoftirqd/0]
5 rt/4 root 0.00 B/s 0.00 B/s ?unavailable? [watchdog/0]
6 rt/4 root 0.00 B/s 0.00 B/s ?unavailable? [migration/1]
7 be/4 root 0.00 B/s 0.00 B/s ?unavailable? [ksoftirqd/1]
8 rt/4 root 0.00 B/s 0.00 B/s ?unavailable? [watchdog/1]
9 rt/4 root 0.00 B/s 0.00 B/s ?unavailable? [migration/2]
10 be/4 root 0.00 B/s 0.00 B/s ?unavailable? [ksoftirqd/2]
11 rt/4 root 0.00 B/s 0.00 B/s ?unavailable? [watchdog/2]
12 rt/4 root 0.00 B/s 0.00 B/s ?unavailable? [migration/3]
13 be/4 root 0.00 B/s 0.00 B/s ?unavailable? [ksoftirqd/3]
14 rt/4 root 0.00 B/s 0.00 B/s ?unavailable? [watchdog/3]
15 be/4 root 0.00 B/s 0.00 B/s ?unavailable? [events/0]
16 be/4 root 0.00 B/s 0.00 B/s ?unavailable? [events/1]
17 be/4 root 0.00 B/s 0.00 B/s ?unavailable? [events/2]
18 be/4 root 0.00 B/s 0.00 B/s ?unavailable? [events/3]
19 be/4 root 0.00 B/s 0.00 B/s ?unavailable? [cpuset]
20 be/4 root 0.00 B/s 0.00 B/s ?unavailable? [khelper]
CONFIG_TASK_DELAY_ACCT not enabled in kernel, cannot determine SWAPIN and IO %
I am also attaching the result of dmesg I gathered from the server in case it helps.
This behavior, was never noticed in LMCE 8.10. Although it is the first time I attempt to install such a huge system (up to now the largest number of remote boot MD's I have installed was 8, without any issues on 8.10), I really hope someone has a clue....
All ideas welcome, I will try to implement mike's comments and report back.
Thanks again for your help,
ted