ECE1373

From Bits

The internet is big [1][2]. Even individual applications (e.g. P2P) often involve interactions with a large number of nodes. Simulating large networks is slow in software, while simulations using clusters of CPUs is expensive. Current systems can emulate a few hundred nodes in ~real time [3] (10-15k packets/s), while simulations can be somewhat faster (~1k nodes, >100k packets/s).

Given the independence of each network node and the tolerance of some amount of timing error, large-scale network simulations could be a good fit for parallel hardware acceleration. GPU acceleration may not be the ideal method, as there would be a large amount of communication between nodes ("threads"), and nodes would not often process in lock-step (SIMD/SIMT), but rather only in response to a packet.

Contents

Network Simulator/Emulator on Chip

A collection of traffic generators and network links, with some nodes being routers. The objective being to model and observe behavior of large networks, faster and/or cheaper than software simulations.

Goals

  • Simulate hundreds of nodes at real-time or faster on one chip. 10-100M packets per second?

Prior Art

Architecture

Requirements

  • Requires configurable network topology, latency/throughput, and some way of observing performance metrics.
  • Debug features: clock freeze, register probing, clock-stepping etc.

Traffic Generation

Simple traffic generators in hardware that generate packets according to some distribution. One possibility being to generate a packet of configurable size periodically. Configuration (and statistics?) done using soft-processor, since configuration isn't time-sensitive.

Future work could include replacing simple traffic generators with (multithreaded) soft-processors to easily implement complex network protocols.

Routers

Not sure what to do about the routing table.

Network Links

Delay and bandwidth can be modeled by having a timestamp associated with each packet. Not sure how this might be implemented in hardware: Multiple queues per node seems difficult to manage. Perhaps by having dedicated "link" nodes on the chip, one per edge in the network graph, which essentially functions as the destination node's input queue.

argh. This is hard. Need to implement delay queues that do repeated subtraction instead of division, yet can generate an accurate timestamp several cycles in advance.

On-Chip Network

On-Chip Network

Current Problems

  • On-Chip Network design
  • PacketQueue timestamp algorithm that does not require division (or multiplication), yet can provide a correct timestamp some cycles ahead of the timestamp's time. Alternatively, an interconnect design that does not require correct timestamp values (not likely).
  • Relative timestamp instead of absolute ones
  • Simulation time step should not advance when the a packet is forwarded to a queue from a router

Other Considerations

  • Can't use discrete event simulation b/c maintaining a central event queue is not favourable for FPGA implementation. A distributed approach is better.

Things that break

Verification

A performance Simulator that attempts to model cycle-accurate performance of the hardware design.

HDL Testbenches.

HDL

Specification for various Hardware Modules.

Schedule

Image:gantt_chart.png

Stuff

svn://stuffedcow.net/ece1373/

Weekly Status

Final presentation slides
Final report

References

  1. BGP Reports: Growth of the BGP Table - 1994 to Present
  2. Difficulties in Simulating the Internet
  3. Crystal: An Emulation Framework for Practical Peer-to-Peer Multimedia Streaming Systems