0% found this document useful (0 votes)
35 views42 pages

07 Firesim Intro

An open-source platform called FireSim provides scalable FPGA-accelerated hardware simulation in the cloud. FireSim generates FPGA-hosted simulators that separate the target machine under simulation from the host machine executing the simulation. This allows for fast, cycle-exact hardware simulation with the flexibility of software-based simulators. FireSim simulations can be automatically deployed to cloud FPGAs on Amazon EC2 for inexpensive, elastic simulation resources.

Uploaded by

divevivi334.kor
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views42 pages

07 Firesim Intro

An open-source platform called FireSim provides scalable FPGA-accelerated hardware simulation in the cloud. FireSim generates FPGA-hosted simulators that separate the target machine under simulation from the host machine executing the simulation. This allows for fast, cycle-exact hardware simulation with the flexibility of software-based simulators. FireSim simulations can be automatically deployed to cloud FPGAs on Amazon EC2 for inexpensive, elastic simulation resources.

Uploaded by

divevivi334.kor
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

An Open-Source Platform

for Scalable FPGA-


Accelerated Hardware
Simulation in the Cloud

https://fires.im
@firesimproject

Sagar Karandikar, David Biancolin, Howard Mao, Alon Amid, Nathan Pemberton, Albert Magyar,
Albert Ou, Qijing Huang, Randy Katz, Borivoje Nikolić, Jonathan Bachrach, Krste Asanović
The architect/chip-developer’s design flow
1. High-level Simulation
2. Write RTL + Software, plug into your favorite ecosystem (e.g. Chipyard)
3. Co-design in software RTL sim (e.g. Verilator, VCS, etc.)
• Run microbenchmarks
4. Co-design in FPGA-accelerated simulation
• Boot an OS and run the complete software stack,
obtain realistic performance measurements
5. Tapeout → Chip
• Boot OS and run applications, but no more opportunity for co-design

2
The architect/chip-developer’s design flow
1. High-level Simulation
2. Write RTL + Software, plug into your favorite ecosystem (e.g. Chipyard)
3. Co-design in software RTL sim (e.g. Verilator, VCS, etc.)
• Run microbenchmarks
4. Co-design in FPGA-accelerated simulation
• Boot an OS and run the complete software stack,
obtain realistic performance measurements
5. Tapeout → Chip
• Boot OS and run applications, but no more opportunity for co-design

3
What about FPGA prototyping?
DRAM
Taped-out SoC DRAM DRAM
FPGA Prototype of SoC DRA
L1I Rocket L1I Rocket
L1D Core L1D Core

DRAM Model
DRAM Model

L1I

Server Server Serv


L1I Rocket

Server
Rocket
Core Core
DRAM L1D DRAM L1D

L2
L2

L1I Rocket L1I Rocket

Blade SoC Blade Blade SoC Blad

FPG
100ns

FPGA
100ns L1D Core L1D Core
latency
L1I Rocket RTL
latency
L1I Rocket RTL
Sim. SimulaIon
taped-out Sim. Simula
on FPGA
L1D Core L1D Core

Other Peripherals Other Peripherals


NIC @1 GHz NIC @100 MHz
NIC Sim Other Periph. NIC Sim Other Periph.

Fabric
Fabric

FPGA
FPGA

SoC sees 100 cycle DRAM latency


Endpoint Sim Endpoints
SoC sees 10 cycle DRAM latency
Endpoint Sim Endpoints
PCIe to Host PCIe to Host
4
The Difficulty with FPGA Prototypes
• Every FPGA clock executes one cycle of the simulated
machine
• Exposes latencies of FPGA resources to the simulated world.
Three implications:
1) FPGA resources may not be an accurate model (ex.
previous slide)
2) Simulations are non-deterministic
3) Different host FPGAs produce different simulation results

5
Want HW simulators that:
• Are as fast as silicon
• Are as detailed as silicon
• Have all the benefits of SW-based simulators
• Are low-cost
Our Thesis:
• FPGAs are the only viable basis technology
à Build FPGA-accelerated simulators with
SW-like flexibility using an open-source tool
6
How? Useful Trends Throughout the Stack
Open ISA Open, Silicon-Proven
SoC Implementations

High-Productivity
Hardware Design FPGAs in the Cloud
Language & IR

7
FireSim at 35,000 feet
• Open-source, fast, automatic, deterministic FPGA-accelerated
hardware simulation for pre-silicon verification and performance
validation
• Ingests:
• Your RTL design (FIRRTL, either via Chisel or Verilog via Yosys*)
• HW and/or SW IO models (e.g. UART, Ethernet, DRAM, etc.)
• Workload descriptions
• Produces:
• Fast, cycle-exact simulation of your design + models around it
• Automatically deployed to cloud FPGAs (AWS EC2 F1)

8
Three Distinguishing Features of FireSim
1) Not FPGA prototypes, rather FPGA-accelerated simulators
• Automatic transformation of designs into FPGA-accelerated
simulators
• Enables new debugging, resource optimization, and profiling
capabilities
2) Uses cloud FPGAs
• Inexpensive, elastic supply of large FPGAs
• Easy to collaborate with other researchers
• Heavy automation to hide FPGA complexity
3) Open-source (https://fires.im)
9
Separating Target and Host
Target: the machine under Host: the machine executing
simulation (hosting) the simulation

Physical
RTL DRAM
FPGA DRAM

taped-out 100ns Fabric


Mem
Channel

1 GHz latency 100ns


latency

Closed simulation world.


10
Separating Target and Host
Target: the machine under Host: the machine executing
simulation (hosting) the simulation

CPU CPU
Core Core Physical
RTL DRAM DRAM

taped-out 100ns
Mem
Channel
CPU CPU
1 GHz latency
Core Core
100ns
latency

Multiprocessor
Closed simulation world.
11
FireSim Generates FPGA-Hosted Simulators
CPU CPU Physical
Core Core DRAM

CPU CPU
Core Core 100ns
latency
RTL DRAM Multiprocessor
taped-out 100ns
1 GHz latency
Physical
DRAM
FPGA
Fabric 100ns
latency
12
Host Decoupling in FireSim: Transforming the Target
1) Convert RTL into a latency-insensitive [1] model using FIRRTL transform

Queue FASED[2]
DDR3 DRAM
RTL Design DRAM RTL Design Timing
(4 GB) Model
Queue (4 GB)

2) Generate FPGA-hosted model for DRAM [2] (think DRAMSim on an FPGA)


3) Generate queues (token channels) to connect the target models
[1] Theory of Latency Insensitive Design, Carloni et al, also see: RAMP
13
[2] FASED: FPGA-accelerated Simulation and Evaluation of DRAM, Biancolin et al
Host Decoupling in FireSim: Mapping to the FPGA

FASED
DRAM
<- Resp Queue Physical
Timing
DRAM
4) Allocate host resources Model Mem
RTL Designto models
Channel
100ns
100
latency
cycle
Req Queue -> latency

FPGA Fabric

SoC sees realistic DRAM latency


14
Benefits of Host Decoupling on FPGAs
Simulations:
• Execute deterministically
• Produce identical results on different hosts (FPGAs & CPUs)

This enables support for:


1. SW co-simulation (e.g. block device, network models)
2. Simulating large targets over distributed hosts (ISCA ‘18, Top Picks ‘18)
3. Non-invasive debugging and instrumentation (FPL ‘18, ASPLOS ‘20)
4. Multi-cycle resource optimizations (ICCAD ‘19)
15
What Can You Do With
FireSim?
Target Designs available in FireSim
• Chipyard (https://github.com/ucb-bar/chipyard)
• Combines large collection of open-source IP to enable constructing complex RISC-V SoCs
• RISC-V Cores: Rocket (in-order), Berkeley Out-of-Order Machine (BOOM), Ariane (Verilog in-
order core)
• Uncore components: L1, L2 caches, TileLink Xbar interconnect, TileLink Ring Interconnect
• Accelerators: Hwacha vector co-processor, Gemmini Systolic Array (ML)
• Peripherals: Disk, UART, Ethernet NIC
• NVIDIA Deep Learning Accelerator (NVDLA)
• https://github.com/nvdla/firesim-nvdla
• Vivado HLS-Generated Accelerators
• PicoRV32 (via Yosys)
• Bring your own:
• Chisel (really, FIRRTL) designs get all features
• Verilog can be ingested through Yosys or blackbox FAME-1 clock gating

17
Example use cases: Evaluating SoC Designs
• Performance Measurement
• Run SPECint 2017 with reference inputs on Rocket Chip in parallel on ~10
FPGAs within a day (e.g., in D. Biancolin, et. al., FASED, FPGA ’19)
• Rapid Full-System Design Space Exploration
• Data-parallel accelerators (Hwacha) and multi-core processors
• Complex software stacks (Linux, OpenMP, GraphMat, Caffe)

18
Example use cases: Evaluating SoC Designs
• Security:
• BOOM Spectre replication
• A. Gonzalez, et. al., Replicating and Mitigating Spectre Attacks on an
Open Source RISC-V Microarchitecture, CARRV ’19
• Keystone Enclave performance evaluation
• D. Lee, et. al., Keystone, EuroSys ‘20
• Accelerator evaluation
• Chisel-based accelerators:
• ML (H. Genc, et. al., Gemmini, Arxiv)
• Garbage collection (M. Maas, et. al., A Hardware Accelerator for
Tracing Garbage Collection, ISCA ‘18)
• NVDLA (F. Farshchi, et. al. Integrating NVIDIA Deep Learning
Accelerator (NVDLA) with RISC-V SoC on FireSim. EMC2 ‘19)
• HLS-based rapid prototyping (Q. Huang, et. al., Centrifuge,
ICCAD ‘19)
• Novel scale-out systems
• nanoPU NIC-CPU co-design (S. Ibanez, et. al., nanoPU, OSDI ‘21)

19
Example use cases: Debugging and Profiling SoC
Designs
• Debugging a Chisel design at FPGA-
speeds
• e.g. FireSim Debugging Docs
• e.g. Fixing BOOM Bugs (D. Kim, et. al.,
DESSERT, FPL ’18)
• Profiling a custom RISC-V SoC at
FPGA-speeds
• e.g. HW/SW Co-design of a networked RISC-

FirePerf
V system (S. Karandikar, et. al., FirePerf,
ASPLOS 2020)

20
How-to-build a datacenter-scale
FireSim simulation

[1] S. Karandikar et. al., “FireSim: FPGA-Accelerated Cycle-Exact Scale-Out System Simulation in the Public Cloud.” ISCA 2018
[2] S. Karandikar et. al., “FireSim: FPGA-Accelerated Cycle-Exact Scale-Out System Simulation in the Public Cloud.” IEEE Micro Top Picks 2018
21
The new datacenter hardware environment

The end of Faster


Moore’s Law networks

Custom Silicon e.g. Silicon


in the Cloud Photonics

Deeper
New datacenter
memory/storage
architectures
hierarchies
e.g. disaggregation
e.g. 3DXPoint, HBM
[1]

22
The new datacenter hardware environment

23
Disaggregated Datacenters

24
Diagram from Gao et al., OSDI’16
Mapping a datacenter simulation
• DC simulation requires:
• Model hardware at scale, cycle-accurately f1.16xlarge
• Run real software CPU
• RTL and abstract SW model co-simulation

Host Ethernet (EC2 Network)

FP
GA
• Server Simulations

s(
Server

x8
• Good fit for the FPGA

)
Server
Simulations

Switch Model
• We have tapeout-proven RTL: FAME-1 Server
Simulation(s)
transform w/Golden-Gate Server
Simulation(s)
Server
Simulation(s)
• Network simulation Server
Simulation(s)
Server
Simulation(s)
• Little parallelism in switch models (e.g. a Server
Simulation(s)
thread per port) Simulation(s)
Host
• Need to coordinate all the distributed server
simulations PCIe
• So use CPUs + host network

25
Step 1: Server SoC in RTL
Modeled System

Sim Endpoints
Other Periph.
- 4x RISC-V Rocket
Cores @ 3.2 GHz
Rocket

Rocket

Rocket

Rocket
Core

Core

Core

Core

Other Peripherals

PCIe to Host
- 16K I/D L1$
- 256K Shared L2$

erver
lade
- 200 Gb/s Eth.
NIC

Sim. NIC
Resource Util.
L1D

L1D

L1D

L1D
L1I

L1I

L1I

L1I
- < ¼ of an FPGA

Endpoint
NIC Sim
Sim Rate

L2 - N/A
26
Step 1: Server SoC in RTL
Modeled System

Sim Endpoints
Other Periph.
- 4x RISC-V Rocket
Cores @ 3.2 GHz
Rocket

Rocket

Rocket

Rocket
Core

Core

Core

Core

Other Peripherals

PCIe to Host
- 16K I/D L1$
- 256K Shared L2$

erver
lade
- 200 Gb/s Eth.
NIC

Sim. NIC
Resource Util.
L1D

L1D

L1D

L1D
L1I

L1I

L1I

L1I
- < ¼ of an FPGA

Endpoint
NIC Sim
Sim Rate

L2 - N/A
27
Si
Step 2: FPGA Simulation of one server blade
Modeled System
- 4x RISC-V Rocket
Cores @ 3.2 GHz

Sim Endpoints
Other Periph.
- 16K I/D L1$
Rocket

Rocket

Rocket

Rocket
Core

Core

Core

Core

Other Peripherals

PCIe to Host
- 256K Shared L2$

Server

Server
DRAM

Blade

Blade
- 200 Gb/s Eth.

Sim.
NIC

NIC
- 16 GB DDR3
L1D

L1D

L1D

L1D
L1I

L1I

L1I

L1I

Endpoint
NIC Sim
Resource Util.
L2 - < ¼ of an FPGA
- ¼ Mem Chans
Sim Rate
DRAM Model FPGA
Fabric - ~150 MHz
- ~40 MHz (netw)28
Si
Step 2: FPGA Simulation of one server blade
Modeled System
- 4x RISC-V Rocket
Cores @ 3.2 GHz

Sim Endpoints
Other Periph.
- 16K I/D L1$
Rocket

Rocket

Rocket

Rocket
Core

Core

Core

Core

Other Peripherals

PCIe to Host
- 256K Shared L2$

Server

Server
DRAM

Blade

Blade
- 200 Gb/s Eth.

Sim.
NIC

NIC
- 16 GB DDR3
L1D

L1D

L1D

L1D
L1I

L1I

L1I

L1I

Endpoint
NIC Sim
Resource Util.
L2 - < ¼ of an FPGA
- ¼ Mem Chans
Sim Rate
DRAM Model FPGA
Fabric - ~150 MHz
- ~40 MHz (netw)29
Step 3: FPGA Simulation of 4 server blades
DRAM DRAM
L1I

L1D
Rocket
Core Modeled System

DRAM Model
L1I

Server Server
Rocket

Cost:
L1D Core
- 4 Server Blades

L2
L1I Rocket

Blade Blade

FPGA FPGA
- 16 Cores
L1D Core

$0.49 per hour Sim.


L1I

L1D
Rocket
Core

SimulaIon
(spot)
Other Peripherals
NIC - 64 GB DDR3
NIC Sim Other Periph.

Resource Util.
Fabric
FPGA

Endpoint Sim Endpoints


PCIe to Host

4 Sims) (4 Sims)
- < 1 FPGA
$1.65 per hour Server Server - 4/4 Mem Chans
(on-demand) Blade Blade Sim Rate
Simulation Simulation
- ~14.3 MHz
(netw)
DRAM DRAM
30
Step 3: FPGA Simulation of 4 server blades
DRAM DRAM
L1I

L1D
Rocket
Core Modeled System

DRAM Model
L1I

Server Server
Rocket
L1D Core
- 4 Server Blades

L2
L1I Rocket

Blade Blade

FPGA FPGA
- 16 Cores
L1D Core

L1I Rocket

Sim. SimulaIon
L1D Core

Other Peripherals
NIC - 64 GB DDR3
NIC Sim Other Periph.

Resource Util.
Fabric
FPGA

Endpoint Sim Endpoints


PCIe to Host

4 Sims) (4 Sims)
- < 1 FPGA
Server Server - 4/4 Mem Chans
Blade Blade Sim Rate
Simulation Simulation
- ~14.3 MHz
(netw)
DRAM DRAM
31
Step 4: Simulating a 32 node rack

DRAM
L1I

L1D
Rocket
Core
DRAM
A
Modeled System
- 32 Server Blades

DRAM Model
L1I

Server Server
Rocket

- 128 Cores
L1D Core

L2
L1I Rocket

Blade Blade

FPGA FPGA FPGA


L1D Core

L1I Rocket

Sim. SimulaIon
L1D Core

Other Peripherals

Cost:
NIC

- 512 GB DDR3
NIC Sim Other Periph.

Fabric
FPGA
Endpoint Sim Endpoints
PCIe to Host

$2.60 per (4 Sims) (4 Sims) Server Server


(4 Sims)
- 32 Port ToR
Blade Blade
Simulation Simulation

hour (spot) DRAM DRAM


Switch
Host Instance CPU: ToR Switch Model
- 200 Gb/s, 2us
$13.20 per links
hour (on- Resource Util.
demand) - 8 FPGAs =
FPGA FPGA FPGA FPGA
- 1x f1.16xlarge
(4 Sims) (4 Sims) (4 Sims) (4 Sims)
Sim Rate
- ~10.7 MHz
(netw) 32
Step 4: Simulating a 32 node rack

DRAM
L1I

L1D
Rocket
Core
DRAM
A
Modeled System
- 32 Server Blades

DRAM Model
L1I

Server Server
Rocket

- 128 Cores
L1D Core

L2
L1I Rocket

Blade Blade

FPGA FPGA FPGA


L1D Core

L1I Rocket

Sim. SimulaIon
L1D Core

Other Peripherals

Cost:
NIC

- 512 GB DDR3
NIC Sim Other Periph.

Fabric
FPGA
Endpoint Sim Endpoints
PCIe to Host

$2.60 per (4 Sims) (4 Sims) Server Server


(4 Sims)
- 32 Port ToR
Blade Blade
Simulation Simulation

hour (spot) DRAM DRAM


Switch
Host Instance CPU: ToR Switch Model
- 200 Gb/s, 2us
$13.20 per links
hour (on- Resource Util.
demand) - 8 FPGAs =
FPGA FPGA FPGA FPGA
- 1x f1.16xlarge
(4 Sims) (4 Sims) (4 Sims) (4 Sims)
Sim Rate
- ~10.7 MHz
(netw) 33
Step 4: Simulating a 32 node rack

DRAM
L1I

L1D
Rocket
Core
DRAM
A
Modeled System
- 32 Server Blades

DRAM Model
L1I

Server Server
Rocket

- 128 Cores
L1D Core

L2
L1I Rocket

Blade Blade

FPGA FPGA FPGA


L1D Core

L1I Rocket

Sim. SimulaIon
L1D Core

Other Peripherals
NIC

- 512 GB DDR3
NIC Sim Other Periph.

Fabric
FPGA
Endpoint Sim Endpoints
PCIe to Host

(4 Sims) (4 Sims) Server Server


(4 Sims)
- 32 Port ToR
Blade Blade
Simulation Simulation

DRAM DRAM
Switch
Host Instance CPU: ToR Switch Model
- 200 Gb/s, 2us
links
Resource Util.
- 8 FPGAs =
FPGA FPGA FPGA FPGA
- 1x f1.16xlarge
(4 Sims) (4 Sims) (4 Sims) (4 Sims)
Sim Rate
- ~10.7 MHz
(netw) 34
Step 5: Simulating a 256 node “aggregation pod”
Modeled System
- 256 Server
Blades
- 1024 Cores
Rack Rack Rack Rack - 4 TB DDR3
- 8 ToRs, 1 Aggr

Aggregation Switch
DRAM
L1I

L1D
Rocket
Core
DRAM
Aggr
- 200 Gb/s, 2us
links
Resource Util.
- 64 FPGAs =

DRAM Model
Server
L1I Rocket
L1D Core
Server

L2
L1I Rocket

Blade Blade

FPGA FPGA FPGA


L1D Core

L1I Rocket

Sim. SimulaIon
L1D Core

Other Peripherals
NIC

NIC Sim Other Periph.

Fabric
FPGA
Endpoint Sim Endpoints
PCIe to Host

(4 Sims) (4 Sims) Server


Blade
Server
Blade
(4 Sims)
Simulation Simulation

Rack Rack Rack - 8x f1.16xlarge


DRAM DRAM

Host Instance CPU: ToR Switch Model

FPGA FPGA FPGA FPGA


(4 Sims) (4 Sims) (4 Sims) (4 Sims)
- 1x m4.16xlarge
Sim Rate
Step N: Title (placeholder slide)

- ~9 MHz (netw) 35
Step 5: Simulating a 256 node “aggregation pod”
Modeled System
- 256 Server
Blades
- 1024 Cores
Rack Rack Rack Rack - 4 TB DDR3
- 8 ToRs, 1 Aggr

Aggregation Switch
DRAM
L1I

L1D
Rocket
Core
DRAM
Aggr
- 200 Gb/s, 2us
links
Resource Util.
- 64 FPGAs =

DRAM Model
Server
L1I Rocket
L1D Core
Server

L2
L1I Rocket

Blade Blade

FPGA FPGA FPGA


L1D Core

L1I Rocket

Sim. SimulaIon
L1D Core

Other Peripherals
NIC

NIC Sim Other Periph.

Fabric
FPGA
Endpoint Sim Endpoints
PCIe to Host

(4 Sims) (4 Sims) Server


Blade
Server
Blade
(4 Sims)
Simulation Simulation

Rack Rack Rack - 8x f1.16xlarge


DRAM DRAM

Host Instance CPU: ToR Switch Model

FPGA FPGA FPGA FPGA


(4 Sims) (4 Sims) (4 Sims) (4 Sims)
- 1x m4.16xlarge
Sim Rate
Step N: Title (placeholder slide)

- ~9 MHz (netw) 36
Step 6: Simulating a 1024 node datacenter
Modeled System
Rack Rack Rack Rack - 1024 Servers
- 4096 Cores
Aggregation Switch
DRAM DRAM
Aggregation Pod - 16 TB DDR3
- 32 ToRs, 4 Aggr, 1
L1I Rocket
L1D Core

DRAM Model
Server
L1I Rocket
L1D Core
Server

L2
L1I Rocket

Blade Blade

FPGA FPGA FPGA


L1D Core

L1I Rocket

Sim. SimulaIon
L1D Core

Other Peripherals
NIC

NIC Sim Other Periph.

Fabric
FPGA
Endpoint Sim Endpoints
PCIe to Host

(4 Sims) (4 Sims) Server


Blade
Server
Blade
(4 Sims)
Simulation Simulation

Rack Rack Rack Root


DRAM DRAM

Host Instance CPU: ToR Switch Model

FPGA FPGA FPGA FPGA


(4 Sims) (4 Sims) (4 Sims) (4 Sims)

Step N: Title (placeholder slide)


- 200 Gb/s, 2us
Root Switch
Modeled System
Resource Util

13
links
Resource Util.
- 256 FPGAs =

Aggregation Pod Aggregation Pod - 32x f1.16xlarge


- 5x m4.16xlarge
Sim Rate
- ~6.6 MHz (netw)37
Step 6: Simulating a 1024 node datacenter
Modeled System
Rack Rack Rack Rack - 1024 Servers
Harnesses millions of dollars of FPGAs - 4096 Cores
Aggregation Switch
- 16 TB DDR3
to simulate 1024 nodes cycle-exactly - 32 ToRs, 4 Aggr, 1
DRAM
L1I

L1D
Rocket
Core
DRAM
Aggregation Pod

DRAM Model
Server
L1I Rocket
L1D Core
Server

L2
L1I Rocket

Blade Blade

FPGA FPGA FPGA


L1D Core

L1I Rocket

Sim. SimulaIon
L1D Core

Other Peripherals
NIC

NIC Sim Other Periph.

Fabric
FPGA
Endpoint Sim Endpoints
PCIe to Host

(4 Sims) (4 Sims) Server


Blade
Server
Blade
(4 Sims)
Simulation Simulation

Rack Rack Rack


with a cycle-accurate network simulationRoot
DRAM DRAM

Host Instance CPU: ToR Switch Model

FPGA FPGA FPGA FPGA


(4 Sims) (4 Sims) (4 Sims) (4 Sims)

- 200 Gb/s, 2us


and global synchronization
Step N: Title (placeholder slide)

Root Switch links Modeled System


Resource Util

13

Resource Util.
at a cost-to-user of only 100s of dollars/hour
- 256 FPGAs =

Aggregation Pod Aggregation Pod - 32x f1.16xlarge


- 5x m4.16xlarge
Sim Rate
- ~6.6 MHz (netw)38
Join the FireSim Community!
• Companies publicly announced using FireSim • Many academic users
• Esperanto Maxion ET • ISCA ‘18: Maas et. al. HW-GC Accelerator
• Intensivate IntenCore (Berkeley)
• SiFive validation paper @ VLSI’20 • MICRO ‘18: Zhang et. al. “Composable Building
Blocks to Open up Processor Design” (MIT)
• Chipyard integration • RTAS ‘20: Farshchi et. al. BRU (Kansas)
• Projects with public FireSim support • EuroSys ‘20: Lee et. al. Keystone (Berkeley)
• Rocket Chip, BOOM • OSDI ‘21: Ibanez et. al. nanoPU (Stanford)
• Hwacha Vector Accelerator • See FireSim website for more!
• Keystone Secure Enclave • Education
• NVIDIA Deep Learning Accelerator (NVDLA) • Berkeley CS152/252
• https://github.com/nvdla/firesim-nvdla • CCC/RV-Summit tutorials
• https://devblogs.nvidia.com/nvdla/ • MICRO 2019 full-day tutorial
• BOOM Spectre replication/mitigation
• More in-progress! PR yours! • More than 100 mailing list members
• More than 250 unique cloners per week

FireSim ISCA’18 paper selected as an IEEE Micro Top Pick of 2018 Arch. Confs
and as the CACM Research Highlights Nominee from ISCA’18 39
Wrapping-up: Productive Open-Source FPGA
Simulation
• github.com/firesim/firesim, BSD Licensed
• An “easy” button for fast, FPGA-accelerated full-
system simulation
• Plug in your own RTL designs, your own HW/SW models
• One-click: Parallel FPGA builds, Simulation run/result collection,
building target software
• Scales to a variety of use cases:
• Networked (performance depends on scale)
• Non-networked (150+ MHz), limited by your budget

• firesim command line program


• Like docker or vagrant, but for FPGA sims
• User doesn’t need to care about distributed magic happening
behind the scenes

FireSim Developer Environment 40


Wrapping-up: Productive Open-Source FPGA
Simulation
• Scripts can call firesim to fully
automate distributed FPGA sim $ cd fsim/deploy/workloads
• Reproducibility: included scripts to $ ./run-all.sh
reproduce ISCA 2018 results
• e.g. scripts to automatically run
SPECInt2017 reference inputs in ≈1 day
• Many others
• 130+ pages of documentation:
https://docs.fires.im
• AWS provides grants for
researchers:
https://aws.amazon.com/grants/
41
Learn More:
Web: https://fires.im
Docs: https://docs.fires.im
GitHub: https://github.com/firesim/firesim
Mailing List:
https://groups.google.com/forum/#!forum/firesim
@firesimproject
Questions? Email: [email protected]

The information, data, or work presented herein was funded in part by the Advanced
Research Projects Agency-Energy (ARPA-E), U.S. Department of Energy, under Award
Number DE-AR0000849, and by DARPA, Award Number HR0011-12-2-0016. Research
was also partially funded by ADEPT Lab industrial sponsors and affiliates Intel, Apple,
Futurewei, Google, and Seagate, and RISE Lab sponsor Amazon Web Services. The
views and opinions of authors expressed herein do not necessarily state or reflect
those of the United States Government or any agency thereof.

You might also like