ECE1373 HDL
From Bits
Contents |
[edit] Performance
| Revision | Quartus Ver. | Device | Fmax (MHz) | LUTs/ALUTs | FFs | Memory Usage | Comments |
| r296 | 8.0 | 2C70 | -- | 71,149 | 49,013 | 368 M4Ks | Synthesis only (250 M4Ks available) |
| r296 | 8.0 | 3SL150 | 47.35 | 55,921 | 52,909 | 304 M9Ks (1844 Memory ALUTs) | Critical path: TG qout -> TG dequeue |
| r296 | 8.1 | 3SL150 | 45.76 | 55,725 | 49,083 | 355 M9Ks | Disabled packing to LUTRAM |
| r303 | 8.1 | 3SL150 | 47.95 | 61,004 | 53,417 | 355 M9Ks | Removed encoder in Bus1SourcePart and simpler ready signals for bus 1 |
| r305 | 8.1 | 3SL150 | 55.74 | 61,571 | 53,428 | 355 M9Ks | Decoded MUX, and removed encoder in Bus1DestPart |
| r327 | 8.1 | 3SL150 | 53.68 | 66,123 | 55,979 | 364 M9Ks | Fixed wrreq bug in Bus1DestPart (r326), wiring bug in top.v and dequeue bug in Bus0SourcePart |
| r362 | 8.1 | 3SL340 | 54.40 | 73,821 | 62,871 | 396 M9Ks | PQ drop tail, max2 fix, addressed Control unit |
| r296 | 9.0 | 3SL340 | 42.14 | 66,789 | 105,097 | 16 M9Ks (41,268 Memory ALUTs -- why??) | Critical path: same |
Note: Fmax number is for slow corner 1.1V at 85C
[edit] Interface to PC
We use the DE3 ports package to implement the interface between the control PC and the simulator on the FPGA.
[edit] List of Ports
- control [39:0] (I): Control signals to the simulator (8-bit address, 32-bit data)
- state [63:0] (O): State bits from the simulator used for debugging purposes
- config [31:0] (I): Configuration words, with handshaking.
- data [31:0] (O): Data words from the simulator, with handshaking
[edit] Control Port Format
| 0x00 | [26] 31..6 | 5 | 4 | 3 | 2 | 1 | 0 |
| Legacy | Unused | error-enable | timer-enable | loopback | sim_enable | data_request | reset |
| 0x01 | [32] 31..0 | ||||||
| Timer | Timer initial value | ||||||
| 0x02 - 0xFF unused | |||||||
[edit] State Port Format
| 7 | 6 | 5 | 4 | 3 | [3] 2..0 | ||
| reset | deserializer reset | loopback | enable | error | control-unit state | ||
| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 |
| timer_enable | timer_stop | error_enable | error_stop | config_enable | config_req | data_req | data_ready |
| [16] 31..16 | |||||||
| Simulation time | |||||||
| [32] 63..32 | |||||||
| Current timer value | |||||||
[edit] Packet Format
| [16] 63..48 | [8] 47..40 | [12] 39..28 | [8] 27..20 | [16] 19..4 | [4] 19..0 |
| Timestamp | Destination | Size | Source | Injection time | Unused |
[edit] Address Format
| [1] 7 | [2] 6..5 | [5] 4..0 |
| sBus ID | Dest Partition ID | Node ID |
Note: The address 0x00 is treated as invalid, as a result, nodes on bus 0 part 0 starts at ID 1. This means that this partition can have a max of 31 nodes instead of 32. Since we have fewer Router nodes than TG/PQ nodes, and Router nodes are on sBus 0 and dBus 1, using sBus ID as bit 1 allows us to have the full 32 nodes for the TG/PQ dest partition 0.
[edit] Global Signals
- sim_time_tick: Pulse for 1 clock cycle when sim_time changes
[edit] Statistics
Counters are shifted out of the design 8 bits at a time, little endian.
| Signal | |
|---|---|
| stats_in[7:0] | Input of shift chain |
| stats_out[7:0] | Output of shift chain |
| stats_shift | Shift by 8 bits at next clock when asserted. |
| Unit | Counter | |
|---|---|---|
| PartitionPQ 0-3 | ||
| TrafficGenDiv 0-7 | ||
| TrafficGenDiv | stats_received[32] | Number of packets received by TG |
| TrafficGenDiv | stats_injected[32] | Number of packets injected by TG |
| TG->PacketQueueDivCore qin | packets_dropped[32] | Number of packets dropped by PQ |
| TG->PacketQueueDivCore qout | packets_dropped[32] | Number of packets dropped by PQ |
| PacketQueueDivCore 8-15 | packets_dropped[32] | Each packet queue has one counter |
| TrafficGenDiv 16-23 | Same as TrafficGenDiv above. | |
| PacketQueueDivCore 24-31 | Same as PacketQueueDivCore above |
A table is a poor way to express the recursive chaining of the stats[7:0] chain... Basically, when shifting out, the least significant 8 bits of each counter appear on stats_out first. qout's packets_dropped appears before qin's packets_dropped, then stats_injected, then stats_received within each TG. Within each PPQ, PQ 31-24 appears first, then TG 23-16, then PQ 15-8, then TG 7-0. PartitionPQ3 appears before PartitionPQ0.
[edit] Simulation Node
TrafficGen, Router, and PacketQueue nodes have a standard interface.
[edit] Input
- clock, reset
- enable
- config_in_valid
- config_in [7:0]: Configuration input channel. Valid when config_in_valid is 1
- sim_time [15:0]: Global simulation time counter
- sim_time_tick: A pulse indicates that sim_time has incremented
- dequeue: dequeue the head packet
- nexthop_in [7:0]: Broadcasted on the destination bus to specify which node should accept the incoming packet
- packet_in [63:0]: Valid for this node when nexthop_in = addr2
- ready_in: Valid data on nexthop_in and packet_in.
[edit] Output
- config_out_valid
- config_out [7:0]: Configuration shifter out
- error
- ready [1:0]: Indicate if a packet is ready for timestep T (bit 0) or T+1 (bit 1)
- packet_out [63:0]: Valid when either ready0 or ready1 is 1
- nexthop_out [7:0]: Next hop for this packet
- timestamp_out [1:0]: Bits [2:1] of packet_out timestamp - (T-6)
- packet_ack: The incoming packet is accepted
[edit] By Function
| Function | Signals |
|---|---|
| Global | in: clock, reset, enable, sim_time, sim_time_tick |
| Configuration | in: config_in_valid, config_in out: config_out_valid, config_out |
| Status | out: error |
| Destination Bus | in: nexthop_in, packet_in, ready_in out: packet_ack |
| Interconnect SourcePart | in: dequeue out: ready (or packet_en), packet_out, nexthop_out, timestamp_out |
[edit] Configurable Fields
Configuration fields are set by shifting in 8-bit words one at a time through the config shifter. Each node's config_out_valid and config_out outputs should be registered so that when components are cascaded in a daisy chain we don't end up with a long unregistered path.
[edit] TrafficGen
- addr2 [7:0]
- size [15:0]: Size of the packets that this TG will generate. Only the lower 12 bits are used.
- threshold [31:0]: If the RNG output is lower than threshold, a new packet is generated
- If the user wants packet interval = N timesteps, then threshold = 0xFFFFFFFF / N
- send_to [7:0]: Destination node that this TG will send to
[edit] Router
- addr2 [7:0]
- Routing table entries (512 per 2 routers)
[edit] PacketQueue
- addr2 [7:0]
- nexthop [7:0]
- bandwidth [11:0]: Bandwidth of the this link (max 4KB per ms -> 32Mbps)
- latency [11:0]: Latency of this link (max 2 s)
[edit] To Do
- Add stats counters in the nodes
- Add data path to shift out the stats counters
- Make router drop unroutable packets
[edit] Source Partition Select
Selects the packet with the earliest timestamp from the source node. Generates the select signal for the source partition bus.
[edit] Input
- clock, reset
- timestamp_in [1:0] x param_num_nodes: 2-bit timestamp for comparison.
[edit] Output
- packet_sel [N:0]: One-hot encoding of the node with the earliest packet
[edit] Source Partition Bus
[edit] Input
- clock, reset
- packet_in [63:0] x param_num_nodes: Incoming packets to the bus MUX
- packet_sel [N:0]: One-hot select signal indicating which node is granted access to the bus
[edit] Output
- packet_out [63:0]: Packet selected for this partition
[edit] Parameter
- param_num_nodes: Number of nodes attached to this partition bus
[edit] Destination Partition Bus
[edit] Input
- clock, reset
- packet_in [63:0] x num_source_partitions: Incoming packets to the crossbar output
[edit] Output
To dest_in_queues...
[edit] Parameter
- param_num_nodes: Number of nodes attached to this partition bus
