
Standard non AI deployment (Cloud storage) – Total TDP ≈ 339 kW / Max power ≈ 482 kW
24-rack cloud storage network that prioritizes raw capacity while staying operationally sane (redundant, routable, serviceable). It uses current, widely available tech: 30 TB class HDDs, 400 GbE leaf/spine with breakout to 100 GbE for hosts, and modern x86 servers with NVMe for journals/metadata.
At-a-glance totals
- Racks: 24 total (20 storage, 2 gateway/metadata, 1 spine/aggregation, 1 OOB & services).
- Drive mix: 30 TB enterprise HDDs in 4U, 90-bay storage servers; a small NVMe set per node for journals/metadata.
- Per storage node (4U/90-bay): 90 × 30 TB = 2.7 PB raw.
- Per storage rack (12 nodes): 32.4 PB raw.
- Storage racks: 20 → 648 PB raw datacenter-wide.
- Usable (EC 12+4, ~75% efficiency, minus 10% reserve): ≈ 437 PB usable.
- Fabric: Dual 400 GbE ToR leaves per rack; 4× 64-port 400 GbE spines (L3 ECMP).
- OOB: Dedicated 1/10 GbE management fabric + console.
Topology (concise)
- Data network:
- Leaf: 2× 32-port 400 GbE per rack (QSFP-DD; each port supports 4×100 GbE breakout).
- Downlinks: Each storage server is dual-homed at 2×100 GbE (one 100G to each ToR via 400G→4×100G breakout).
- Uplinks: Each ToR uses 4×400 GbE up (one to each spine) for fat-tree ECMP; +1×400 GbE ISL between leaves for MLAG/fast rack-local traffic.
- Spine: 4× 64-port 400 GbE fixed/chassis switches. With 40 leaves (20 racks × 2), and 4×400G per leaf, you consume 160 of 256 spine ports (ample headroom).
- Routing: BGP (leaf↔spine), per-rack routing domains; Jumbo frames enabled.
- OOB: 1/10 GbE management ToR per rack up to a pair of mgmt cores; serial console servers in the services rack.
Node building blocks
- Storage server (per node, repeated 240×):
- 4U top-loader, 90× 3.5″ bays populated with 30 TB HDDs (2.7 PB raw/node).
- Dual modern CPUs (e.g., EPYC class), 256–512 GB RAM.
- NVMe set: 2× 3.84 TB (OS mirror) + 2–4× 7.68–15.36 TB (journals/WAL/metadata as your stack requires).
- NICs: 2× 100 GbE (redundant; one to each ToR).
- Gateway/metadata servers (object/S3, monitors, managers, indexers):
- 1U or 2U compute with 2×200 GbE or 2×100 GbE; heavy NVMe (all-flash) for low-latency services.
Rack elevations (by rack)
Conventions: 52U cabinets shown top (U52) → bottom (U1). Heavy gear sits low; network high. “SS-##” are storage servers. Cables: dual 100G per node (to Leaf-A/B). Power: A/B PDUs per rack.
Racks 01–20 — Storage (Type S12) — each rack identical
- U52: Leaf-A (400 GbE, 32×QSFP-DD)
- U51: Leaf-B (400 GbE, 32×QSFP-DD)
- U50: OOB Mgmt ToR (48×1/10 GbE + 4×SFP28 uplinks)
- U49: Fiber/copper patch panel & slack tray
- U48–U45: SS-12 (4U, 90-bay)
- U44–U41: SS-11 (4U, 90-bay)
- U40–U37: SS-10 (4U, 90-bay)
- U36–U33: SS-09 (4U, 90-bay)
- U32–U29: SS-08 (4U, 90-bay)
- U28–U25: SS-07 (4U, 90-bay)
- U24–U21: SS-06 (4U, 90-bay)
- U20–U17: SS-05 (4U, 90-bay)
- U16–U13: SS-04 (4U, 90-bay)
- U12–U09: SS-03 (4U, 90-bay)
- U08–U05: SS-02 (4U, 90-bay)
- U04–U01: SS-01 (4U, 90-bay)
Per S12 rack wiring (summary):
- Downlinks: 12 servers × (2×100G) = 24×100G (3× 400G breakouts per leaf).
- Leaf uplinks: 4×400G from each leaf up to spines (one to each spine).
- Leaf ISL: 1×400G Leaf-A↔Leaf-B.
- OOB: 12 nodes + 2 leaves + PDU/iLO/BMC as needed (fits in 48×1/10G).
Rack 21 — Gateway & Metadata (Type G/M-A)
- U52: Leaf-A (400 GbE)
- U51: Leaf-B (400 GbE)
- U50: OOB Mgmt ToR
- U49: Patch panel
- U48–U37: 12× 1U Object Gateways (dual-100G, NVMe-heavy)
- U36–U25: 12× 1U Metadata/Monitors/Managers (dual-25/100G; NVMe)
- U24–U01: 12× 1U Index/Search/Log/Telemetry (dual-25/100G; NVMe)
Notes: Spread quorum services (e.g., monitors/etcd/consuls) across Racks 21, 22, and 24.
Rack 22 — Gateway & Compute (Type G/M-B)
- U52: Leaf-A (400 GbE)
- U51: Leaf-B (400 GbE)
- U50: OOB Mgmt ToR
- U49: Patch panel
- U48–U25: 24× 1U Object/Frontend/API servers (dual-100G; NVMe cache)
- U24–U01: 12× 2U Query/Analytics nodes (dual-200G or dual-100G; big RAM/NVMe)
Rack 23 — Core/Spine & Aggregation
(place near meet-me room / fiber concentration)
- U52–U45: Spine-1..4 (each 2U, 64×400 GbE; L3 ECMP)
- U44: 400G patch & MPO cassettes
- U43: 400G patch & MPO cassettes
- U42: Leaf/Spine management switch (1/10 GbE)
- U41: Time/Sync (PTP GM + GNSS receiver)
- U40–U38: Border/Edge routers or DC-GW pair (100/400 GbE)
- U37–U35: Firewalls (active/standby)
- U34–U01: Spare RU for future spines/aggregation or DWDM gear
Fabric summary: each leaf uses 4×400G (one per spine). Spines run iBGP/ECMP; any leaf or spine failure keeps full reachability.
Rack 24 — OOB Core, Services & Tooling
- U52–U51: Mgmt Core A/B (48×1/10 GbE + 100G uplinks)
- U50: Console server (96-port)
- U49: Infra jump host / KVM-IP
- U48–U47: Bare-metal/Provisioning pair (PXE/IMAGES; dual-25/100G + OOB)
- U46–U45: Monitoring/Telemetry (Prometheus/TSDB/ALERT; NVMe)
- U44–U43: Log collectors (hot NVMe)
- U42–U41: Auth/Secrets (AD/LDAP/KMS/HashiVault)
- U40–U39: Backup catalog / config mgmt
- U38–U01: Spares, tools, and a couple of short-depth 1U shelves (label kits, transceivers)
Capacity math (transparent)
- Per node: 90 × 30 TB = 2.7 PB raw.
- Per storage rack (12 nodes): 32.4 PB raw.
- 20 storage racks: 648 PB raw.
- Usable (EC 12+4 → 75% efficiency; 10% reserved): 648 PB × 0.75 × 0.9 ≈ 437 PB usable.
Data layout & failure domains
- Erasure coding: 12+4 (16-way stripes) across racks (primary domain) and nodes (secondary), ensuring any single rack failure plus multiple disk failures remain recoverable.
- CRUSH/placement (Ceph-like) or MinIO parity sets: enforce no more than one chunk per node, max two per rack.
- Journal/DB on NVMe per node; HDDs carry data only.
- Spare capacity: keep ≥10% cluster-wide free for rebalance after failures.
IP & VLANs (example)
- VLAN 100 (Storage-Front): host↔gateway S3/Swift/Files (routed at leaf).
- VLAN 200 (Storage-Back): replication/OSD traffic.
- VLAN 9 (OOB): BMC/iLO/console.
- Underlay: numbered point-to-points (/31) leaf↔spine; loopbacks for BGP.
- MTU: 9000 on data paths; 1500 on OOB.
Why this maximizes capacity (yet stays operable)
- 90-bay 4U nodes give the best HDD-per-RU you can deploy broadly today.
- Dual-100G per node is plenty for HDD throughput; 400G leaves keep uplinks sparse and simple via 4×100G breakouts.
- Four 400G spines comfortably serve 40 leaves at our uplink density, with headroom for growth.
- The layout cleanly separates data, metadata/gateways, aggregation, and OOB, so you can scale storage racks independently.
Transparent power budget
Totals
- Typical IT load: 338.8 kW
- Max (steady) IT load: 481.6 kW
- (If you size facility power/cooling with PUE 1.20 → ~406.6 kW typical, ~577.9 kW max.)
Note on HDD spin-up: each 4U/90-bay node can briefly spike to ~2.3 kW during simultaneous spin-up. Across 240 nodes, that’s an extra ~552 kW transient if not staggered. Use staggered spin-up / power-on sequencing to avoid breaker trips.
Breakdown by component (typical → max)
- Storage nodes (240 × 4U/90-bay, dual-100G): 265.2 kW → 361.2 kW
- Rack networking – data (ToR leafs): 30.8 kW → 52.8 kW
- Gateway/metadata/analytics servers (Racks 21–22): 24.6 kW → 38.4 kW
- Core/edge (spines, border, firewalls, PTP): 8.35 kW → 15.23 kW
- 400G optics (all uplinks + ISLs, both ends): 3.96 kW → 4.75 kW
- OOB/Services (mgmt cores, console, PXE, monitoring, logs, auth, backup): 3.71 kW → 5.93 kW
- Rack networking – OOB (per-rack mgmt ToR): 2.20 kW → 3.30 kW
Assumptions (tweakable)
- Storage node (4U/90-bay) steady: ~1105 W typical, ~1505 W max (HDDs active ~7.5 W ea, dual CPUs, RAM, NVMe journals/DB, 2×100G NICs, fans). Spin-up transient ~2300 W/node.
- 400G ToR leaf (32×400G): ~700 W typ, 1200 W max (switch only; optics accounted separately).
- Spine (64×400G): ~1600 W typ, 3000 W max.
- QSFP-DD 400G optics: ~10 W typ, 12 W max per module (counted on both ends of each optical link).
- 1U/2U compute (gateways/metadata/analytics): 250–900 W per server depending on role and NVMe density.