- This my first to attend COSCUP and this is my note about the talks from SiFive.
- Speaker
- Hong-Rong Hsu
- SiFive technical Head
- AI runtime/system software
- Hong-Rong Hsu
Intro of SiFive #
- From Risc-V inventors
- From 2015, july23
- 食譜的發明者兼廚師
- 全球1000個員工
- 市值2.5Billion
Before we start #
- Risc-V is the trend
- 預估到 2025有62.4 billion的risc-v需求
- AI/ML is the trend
- Every domain needs CPU
Our Risc-V CPU can satisfy your AI requirement w/o attaching other GPU/DLA
- DLA: Deep Learning Accerlators
SiFIve Risc-V Product Families #
- Sifive Intellengence: https://www.sifive.com/cores/intelligence
Sifive Intellengence X280 Processor #
- Vector Processing for AI/ML workloads
- seems like arm's SIMD but more flexible
- 512-bit vector length extension with RISC-V Vectors(RVV) 1.0
- Sifive Intellengence Extensions, custom instructions that accerlerate AI/ML performance
Sifive Intellengence Extensions accelerate AI Computation #
Sifive Intellengence: Designed for evolving AI needs #
若過了幾年有新的算法,已經設計好的DLA是無法支援的
Support for softmax, TopK, etc
提供的粒度是比較細的,提供linear/non-linear的function可以使用
Programmability is the key, we provide it
- RVV intrinsic
How about Software? #
TensorflowLite+ NN library #
TFLite is a simple and quick deployment runtime
Dispatch op to Sifive NN library
- optimized by RVV intinsic and extended instructions
Performance
- MobileBERT-int8(NLP) achieve ? 3FPS @1G x280-dual cores
- MobileNetV3-unint8(Classification) achieve > 70FPS @1G
The limits of TFLite #
Pros
- Easy to deploy. Time to market
Cons
- Bad parallelism
- Graph optimization limitation. We want the best perfromance on SiFIve Platform
- 一個單純的interperter,無法做到op fusion
- Only support TF/TFlite
What is the something Great? #
- Interpreter is not enough, we need:
- Compiler: More optimization & code-gen automatically 不用再手刻
- Runtime: Execute Efficiently
Recommendation Software #
MLIR
Multi-Level Intermediate Representation
Mainly solve 2 problems
- Significantly reduce the cost of buliding domain specific compilers
- Software segmentation: 解決太多廠商重複造輪子(AI compiler)
如何讓更多廠商進來協作?
- MLIR Dialects(方言)
- 可以讓多種IR放進來
- 提供run trip
- MLIR Dialects(方言)
是透過LLVM去做code gen
IREE
- https://github.com/iree-org/iree
- 接下來想推在Risc-V上的
- from Google
- MLIR的其中一個User
- 目前Sifive跟Google cowork去打開risc-V上的使用
- AI compiler and runtime
IREE RVV code-gen
- End to End flow
- TFlite-> TOSA MLIR -> Linalg -> Vector +scf -> LLVM IR ->RVV
- End to End flow
Perforamce of IREE
- IREE vs TFLite
- mobileBERT-f32: IREE +70% than TFLite
- IREE vs TFLite
Summary #
HW: SiFive intellengence Provide not only computation but also programmablity
SW
- TFlite
- IREE
- 提供 QEMU 來模擬
🙏🙏🙏
Since you've made it this far, sharing this article on your favorite social media network would be highly appreciated 💖! For feedback, please ping me on Twitter.
Published