1
Developers / Link Aggregation in LMCE
« on: July 29, 2015, 03:15:27 am »
I would like to open a discussion topic on a feature I am interesting in developing and testing for LMCE. This project is something I could tinker with and do development work on at a bench while the bulk of my system remains in storage. Before I dive in however I would like to pose some function questions to those much more experienced than I to feel out what the usefulness of this would be in the LMCE network, ie. are performance improvements expected or is this a neat trial with no true benefit other than saying I did it.
Basic idea: Provide manual configuration options to get network port aggregation working on the core. A configuration on the web-admin for after install upgrades with manual port bonding or auto-magic detection scripts for installation that would assign bonds between NICs.
Limited initial support hardware: dual pci-e NIC, quad pci-e NIC, possible later add support for multiple single inexpensive PCI NICs
Concept: In a PXE boot environment with all IO routing to and from the core, a 10/100/1000 is understood to be a minimum requirement. From my understanding of LAG, the gigabit transmission is then split among all nodes pulling and requesting data at once limiting each connected node to a fraction of the 1000mb/s. Two MDs would get 500mb/s each, four would get 250mb/s each.. etc. By adding support for two (or more) gigabit NICs working as a bonded eth1 on our cores, network bottlenecks during peak traffic conditions could get closer to the 1000mb/s data per node after the switch.
Network boost, Minimal expense: LAG would be beneficial between the core and the switch(s) only in my opinion and would require support for additional hardware on the core + extra ethernet cable to the switch. Single lines from the switch out to MDs would not be needed or beneficial.
Developer/Network Architect Input: Above is what works well on paper and depending on which article you read is either the greatest networking throughput gain ever or a lab experiment for a rookie Cisco student with no real benefit.
In our networks:
Are there peak transmission times where all MDs want data and we feel our cores cant provide it fast enough?
Are MD boots done sequentially or all at once?
When streaming media from SAS on the core is IO bottlenecked?
Is LAG believed to be a feature worth testing and developing for LMCE 14.04 and later releases?
*Regarding LAG with switch redundancy: Although it adds some complexity there is also the multiple bonded NICs going to multiple switches for redundancy in network topography. I see less usefulness here in our networks but could also experiment with this easy enough.
Basic idea: Provide manual configuration options to get network port aggregation working on the core. A configuration on the web-admin for after install upgrades with manual port bonding or auto-magic detection scripts for installation that would assign bonds between NICs.
Limited initial support hardware: dual pci-e NIC, quad pci-e NIC, possible later add support for multiple single inexpensive PCI NICs
Concept: In a PXE boot environment with all IO routing to and from the core, a 10/100/1000 is understood to be a minimum requirement. From my understanding of LAG, the gigabit transmission is then split among all nodes pulling and requesting data at once limiting each connected node to a fraction of the 1000mb/s. Two MDs would get 500mb/s each, four would get 250mb/s each.. etc. By adding support for two (or more) gigabit NICs working as a bonded eth1 on our cores, network bottlenecks during peak traffic conditions could get closer to the 1000mb/s data per node after the switch.
Network boost, Minimal expense: LAG would be beneficial between the core and the switch(s) only in my opinion and would require support for additional hardware on the core + extra ethernet cable to the switch. Single lines from the switch out to MDs would not be needed or beneficial.
Developer/Network Architect Input: Above is what works well on paper and depending on which article you read is either the greatest networking throughput gain ever or a lab experiment for a rookie Cisco student with no real benefit.
In our networks:
Are there peak transmission times where all MDs want data and we feel our cores cant provide it fast enough?
Are MD boots done sequentially or all at once?
When streaming media from SAS on the core is IO bottlenecked?
Is LAG believed to be a feature worth testing and developing for LMCE 14.04 and later releases?
*Regarding LAG with switch redundancy: Although it adds some complexity there is also the multiple bonded NICs going to multiple switches for redundancy in network topography. I see less usefulness here in our networks but could also experiment with this easy enough.