Notes
Slide Show
Outline
1
An Integrated Design Flow
for Low-Power DSPs
2
Motivation for the CATS-Platform
3
Why SIMD?
4
Why Synchronous Transfer Architecture (STA)?
  • Minimize Overhead
    • No Dynamic Scheduling
    • Instruction Memory: Custom Compression of Control Signals
  • Switchable data-flow network
  • Bridges the gap between processors and reconfigurable hardware
  • Additional requirement: separation of state and behavior enables effective compiler support
5
UML-Diagram of STA
6
Integrated Design Flow
7
Processor Description
8
Simulator and Debugger
  • LISA instruction set simulator
  • cycle accurate
  • model generated from processor description
9
VHDL-Simulation and Synthesis
10
FPGA - Emulation
11
Development Effort with Different Design Methodologies
12
High-Level Optimization of SIMD Architecture
13
Low-Level Optimization for Power-Saving Communication Architecture
14
Design Example: M5
  • Synthesis, Place&Route and Power-Simulation was performed by Michael Hosemann
  • Results based on VHDL model of M5-DSP
  • All data based on UMC 0.13µm, 9-layer-metal process
    @ 1.2V, 1 gate = 5.1µm2
  • Synthesis to netlist using Synopsys Design Compiler
  • Place and route to obtain more accurate timing and area estimates using Cadence Encounter
  • UMC memories with higher area and power consumption compared to more advanced technologies
  • Power estimates created by Synopsys Power Compiler with generic wire-load models, better characterization under way
  • Uncompressed 2-port instruction memory
  • 2-port data memory
15
M5: Area and Timing
  • Memory: 1
    • Instruction (180kBit 2-port): 1.3mm2
    • Data (1MBit 2-port): 7.4mm2
  • Logic: 1
    • Control: 0.054mm2
    • 1 PE: 0.092mm2
    • 16 PE: 1.472mm2
    • total: 1.526mm2


    • area utilization ≈ 88%
16
M5: Power Consumption
  • Memory: 1
    • Program:     28mW
    • Data: 80mW
  • Logic (worst case): 1
    • Control:    6mW
    • 1 PE: 2.5mW
    • 16 PE: 40mW
    • total: 46mW


    • No clock-gating yet
    • No low-power memories yet
17
Cooperation Uni-Dortmund and TU-Dresden
18
Conclusions and Further Work
  • Presented integrated design flow is more effective than the traditional design flow
  • New compiler efficient low power architecture
  • Powerful core generator
    • Generates VHDL, System-C, and LisaTM
    • Allows quick integration of functional units into a SIMD-STA
    • Web-interface for evaluation purposes

  • Matlab Compiler
    • Compiler backend (register allocator)
    • Generated assembler and debugger for pipelined FUs
    • Float/fixed point mapping
  • Energy Model
  • Additional Power Optimization of M5
    • Special attention: memories
19
"Thank you for your attention"
  • Thank you for your attention!
  • Please ask questions or visit us online: