Skip to main content

Get the Reddit app

Scan this QR code to download the app now
Or check it out in the app stores

r/networking

members
online
  • Blogpost Friday!

    AutoModerator votes • comments
  • Moronic Monday!

    AutoModerator votes • comments

  • Top of Rack 100G switch choice Top of Rack 100G switch choice
    Other

    Background:
    I currently have a small research cluster of 8 servers, which are colocated in the same data center via per-unit space rent. All of the networking is done via this data center 10G switches.
    However this setup is no longer sustainable due to rapidly growing volumes of data (~100 tb at the moment, which is partitioned between servers, which are packed with SSDs under RAID6, which themselves pose a bottleneck), and need for larger computational capacities.

    Data usage will rise to a 250-300tb in a year, and up to 1pb in 2 years, so I need a scalable solution.
    I decided to go with an all-flash CephFS + a large HDD-based cold backup storage.

    Problem:
    I have chosen the hardware for ceph, and for the cluster extension, and all that is left is a 100G top of rack switch with preferably 32+ ports (to be able to connect the whole rack into a single 100G network).
    40/100G is absolutely needed for the network not to be a bottleneck.

    I believe that suitable switches that satisfy my purposes are:

    • Mellanox SN3700C - 32x QSFP28 (SN2100 has only 16 QSFP28 ports, and is therefore not future-proof)

    • Cisco 3232C - 32x QSFP28

    • Juniper QFX5120 - 32 x QSFP28

    Question:

    Which of the switches (if any) would make a good choice for a top of the rack switch, and be able to do routing and support an ACL? Or do I need an additional switch for that purpose?

    Unfortunately I do not have a networking background, so I would be grateful for any advice or useful materials/links.


    Where should IP blocking be done? Where should IP blocking be done?
    Design

    We sometimes need to manually block some abusive IP's, previously we've done this on the ingress level (within our proxy blocklist)... but in some cases, since traffic is still going through our networking stack even if were immediately tossing it has caused down time.

    I've considered blocking this at the NACL or SG level (AWS), but worry about the limits... We usually don't have more than 100 ip's blocked there at once.

    Thoughts? Where should IP blocklisting be done?

    Edit: So it seems like it should be blocked as close to the source as possible.. but NACL only accepts 40 rules and SG doesn't have deny... Any other ideas?