ECE1373
From Bits
The internet is big [1][2]. Even individual applications (e.g. P2P) often involve interactions with a large number of nodes. Simulating large networks is slow in software, while simulations using clusters of CPUs is expensive. Current systems can emulate a few hundred nodes in ~real time [3] (10-15k packets/s), while simulations can be somewhat faster (~1k nodes, >100k packets/s).
Given the independence of each network node and the tolerance of some amount of timing error, large-scale network simulations could be a good fit for parallel hardware acceleration. GPU acceleration may not be the ideal method, as there would be a large amount of communication between nodes ("threads"), and nodes would not often process in lock-step (SIMD/SIMT), but rather only in response to a packet.
Contents |
Network Simulator/Emulator on Chip
A collection of traffic generators and network links, with some nodes being routers. The objective being to model and observe behavior of large networks, faster and/or cheaper than software simulations.
Goals
- Simulate hundreds of nodes at real-time or faster on one chip. 10-100M packets per second?
Prior Art
Architecture
Requirements
- Requires configurable network topology, latency/throughput, and some way of observing performance metrics.
- Debug features: clock freeze, register probing, clock-stepping etc.
Traffic Generation
Simple traffic generators in hardware that generate packets according to some distribution. One possibility being to generate a packet of configurable size periodically. Configuration (and statistics?) done using soft-processor, since configuration isn't time-sensitive.
Future work could include replacing simple traffic generators with (multithreaded) soft-processors to easily implement complex network protocols.
Routers
Not sure what to do about the routing table.
Network Links
Delay and bandwidth can be modeled by having a timestamp associated with each packet. Not sure how this might be implemented in hardware: Multiple queues per node seems difficult to manage. Perhaps by having dedicated "link" nodes on the chip, one per edge in the network graph, which essentially functions as the destination node's input queue.
argh. This is hard. Need to implement delay queues that do repeated subtraction instead of division, yet can generate an accurate timestamp several cycles in advance.
On-Chip Network
Current Problems
- On-Chip Network design
- PacketQueue timestamp algorithm that does not require division (or multiplication), yet can provide a correct timestamp some cycles ahead of the timestamp's time. Alternatively, an interconnect design that does not require correct timestamp values (not likely).
- Relative timestamp instead of absolute ones
- Simulation time step should not advance when the a packet is forwarded to a queue from a router
Other Considerations
- Can't use discrete event simulation b/c maintaining a central event queue is not favourable for FPGA implementation. A distributed approach is better.
Things that break
Verification
A performance Simulator that attempts to model cycle-accurate performance of the hardware design.
HDL Testbenches.
HDL
Specification for various Hardware Modules.
Schedule
Stuff
Weekly Status
Final presentation slides
Final report

