Background:
I currently have a small research cluster of 8 servers, which are colocated in the same data center via per-unit space rent. All of the networking is done via this data center 10G switches.
However this setup is no longer sustainable due to rapidly growing volumes of data (~100 tb at the moment, which is partitioned between servers, which are packed with SSDs under RAID6, which themselves pose a bottleneck), and need for larger computational capacities.
Data usage will rise to a 250-300tb in a year, and up to 1pb in 2 years, so I need a scalable solution.
I decided to go with an all-flash CephFS + a large HDD-based cold backup storage.
Problem:
I have chosen the hardware for ceph, and for the cluster extension, and all that is left is a 100G top of rack switch with preferably 32+ ports (to be able to connect the whole rack into a single 100G network).
40/100G is absolutely needed for the network not to be a bottleneck.
I believe that suitable switches that satisfy my purposes are:
-
Mellanox SN3700C - 32x QSFP28 (SN2100 has only 16 QSFP28 ports, and is therefore not future-proof)
-
Cisco 3232C - 32x QSFP28
-
Juniper QFX5120 - 32 x QSFP28
Question:
Which of the switches (if any) would make a good choice for a top of the rack switch, and be able to do routing and support an ACL? Or do I need an additional switch for that purpose?
Unfortunately I do not have a networking background, so I would be grateful for any advice or useful materials/links.