From NeuroPilot

2023-08-31 06:28:34

1. NeuroPilot Introduction

1.1 NeuroPilot Software Ecosystem

1.1.1 Neural Network Framework Support

1.1.2 NeuroPilot Software Tools

1.1.2.1 Neuron SDK

1.1.2.2 Android Run-Time Libraries

1.2 MediaTek Device Capabilities

1.2.1 Hardware Support

1.2.1.1 Device Parametric Table

1.2.2 Devices

1.2.2.1 CPU

1.2.2.2 GPU

1.2.2.3 MVPU

1.2.2.4 MDLA

2. Hardware Support Specification11111

2.1 Hardware Specifications

2.1.1. Dimensity 9000

2.1.2. APU

2.1.2.1. MVPU 2.0

2.1.2.2. MDLA 3.0

2.2. Supported Operations

2.2.1. TFLite Operations

2.2.1.1. Supported Data Types

2.2.1.2. Supported NeuroPilot Operations

2.2.1.3. Supported Hardware Operations

2.2.2. MDLA 3.0 Guidelines

2.2.2.1. General Restrictions

2.2.2.2. Supported OPs Specification

2.2.2.3. Limitations of Broadcasting

2.2.3. MVPU 2.0 Guidelines

2.2.3.1. General Restrictions

2.2.3.2. TensorFlow and TFLite Operations

2.3. NeuroPilot SDK

1. NeuroPilot Introduction

1.1 NeuroPilot Software Ecosystem

NeuroPilot is a collection of software tools and APIs which are at the center of MediaTek’s AI ecosystem. These tools are designed to fulfill the goal of “Edge AI”, which means that AI processing is performed locally on the device rather than remotely on a server. With NeuroPilot, users can develop and deploy AI applications on edge devices with extremely high efficiency. This makes a wide variety of AI applications run faster, while also keeping data private.

MediaTek’s hardware platforms, such as mobile System-on-Chip (SoC) and ultra-low power embedded devices, span different levels of compute density. MediaTek is deeply invested in creating an AI ecosystem with efficient yet powerful AI-processors in its devices, ranging from smartphones to smart homes, wearables, Internet-of-Things (IoT), and connected cars.

Open frameworks such as TensorFlow offer out-of-the-box usability, but typically lack optimized support for advanced hardware. NeuroPilot allows users to use all the available hardware resources of a MediaTek AI platform, beyond that offered by an open framework. NeuroPilot provides programming support for specialized device capabilities, which allows for better performance, power, memory consumption, and end-user experience.

1.1.1 Neural Network Framework Support

NeuroPilot’s software tools support common AI frameworks such as TensorFlow, PyTorch, and TensorFlow Lite (TFLite). NeuroPilot provides support for inspecting, loading, and converting models, either to MediaTek-optimized model formats, or open framework standard model formats.

For Android devices, NeuroPilot provides extensions for the Android Neural Network API (NNAPI). This enables developers and device makers to bring their code closer to the hardware, for better performance and power-efficiency on MediaTek devices. NeuroPilot also allows developers to use a ‘write once, apply everywhere’ flow for existing and future MediaTek devices, including smartphones, automotive, smart home, IoT, and more. This streamlines the creation process, saving cost and time to market.

1.1.2 NeuroPilot Software Tools

NeuroPilot Software Tools

Tool	Type	Description
AISimulator	Web tool	A simulator which simulates a neural network workload on MediaTek’s AI Processing Unit (APU).
Android Run-Time Libraries	Library	Libraries which provide NNAPI delegates for special-purpose hardware cores (GPU, VPU, MDLA), and support for dynamic scheduling.
4.1.1. Converter	Command line tool	Convert a pre-trained and optimized PyTorch or TensorFlow model into a TensorFlow Lite model, and perform post-training quantization.
4.1.2. Neuron SDK	Command line tools, API, Library	A TFLite model compiler which produces ready-to-run compiled binary models (.dla).
4.1.3. Quantization	Command line tool	Optimize a model for efficient inference on MediaTek devices using quantization-aware training.

1.1.2.1 Neuron SDK

Neuron SDK allows users to convert their custom models to MediaTek-proprietary binaries for deployment on MediaTek platforms. The resulting models are highly efficient, with reduced latency and a smaller memory footprint. Users can also create a runtime environment, parse compiled model files, and perform inference on the edge. Neuron SDK is aimed at users who are performing bare metal C/C++ programming for AI applications, and offers an alternative to the Android Neural Networks API (NNAPI) for deploying Neural Network models on MediaTek-enabled Android devices.

Neuron SDK consists of the following components:

4.1.2.1.1. Neuron Compiler (ncc-tflite): An offline neural network model compiler which produces statically compiled deep learning archive (.dla) files.
4.1.2.1.2. Neuron Runtime (neuronrt): A command line tool which executes a specified .dla file and reports the results.
4.1.2.2. Neuron Runtime API: A user-invoked API which supports loading and running compiled .dla files within a user’s C++ application

1.1.2.2 Android Run-Time Libraries

MediaTek provides several run-time libraries for Android devices. These libraries allow for greater control and utilization of MediaTek special-purpose cores. The main library which implements most of this capability is an optimized Android Neural Network API (NNAPI) library, which is part of the Android NDK. The NNAPI library provides NNAPI hardware delegates, which enables the use of the GPU, VPU, and MDLA cores when running neural networks. This means that any NNAPI application can use MediaTek acceleration cores without any special changes to the application code. This accelerator support also includes support for .tflite models running in the Android TFLite run-time layer.

Note:

MediaTek provides ready-to-run Android libraries for all Android-compatible devices. The developer does not need to interact with these libraries, and there is no special settings required to use them.

1.2 MediaTek Device Capabilities

Using MediaTek devices gives users extraordinary speed and efficiency for AI applications. MediaTek devices deliver outstanding performance while consuming very little power.

1.2.1 Hardware Support

NeuroPilot tools can use the following target compute devices to run neural network models.

CPU
GPU
VPU (Vision Processing Unit)
MDLA (MediaTek Deep Learning Accelerator)

Successful use of these cores depends on the following factors, which interact with a user’s model.

Neural network framework format of the trained model.
Hardware platform (e.g. part number and device capability).
Required model accuracy. Models with high accuracy requirements might limit the type and significance of the optimizations that can be applied to the model. This might also limit the target devices that are able to run the model with the required performance and accuracy.
Neural network model structure. Certain operation (OP) types are not supported on certain targets device. For details, refer to the Supported Operations section of the platform’s documentation.

Note:

NeuroPilot is not compatible with all types of GPU.
Some platforms do not have a VPU or MDLA.
For information about device hardware and compatibility, refer to the platform’s documentation or contact MediaTek.

1.2.1.1 Device Parametric Table

Device	Operator Flexibility	Performance	Power Consumption	Data Types
CPU	Very High	Low	High	FP32, FP16, INT16, INT8
GPU	Medium	Medium	Medium	FP32, FP16
VPU	Medium	High	Low	FP32, FP16, INT16, INT8
MDLA	Low	Very High	Low	FP16, INT16, INT8

As a general rule, you should target the most power-efficient device that your neural network or developer constraints can support. The lowest-power devices are also the highest performing.

1.2.2 Devices

1.2.2.1 CPU

The CPU is capable of running any neural network, and is guaranteed to support all existing and future NN operations. Support is provided in the TFlite subsystem from Google for Android devices. For native development, developers can use the TFlite C++ API. The CPU is the most flexible target device, but it is also the least optimized for power and performance.

1.2.2.2 GPU

The GPU provides neural network acceleration for floating point models.

ARM-based MediaTek platforms support GPU neural network acceleration via Arm NN and the Arm Compute Library.
Non-ARM MediaTek platforms support GPU neural network acceleration via Google’s TensorFlow Lite GPU delegate. This GPU delegate is able to accelerate a wide selection of TFlite operations.

1.2.2.3 MVPU

The MediaTek Vision Processing Unit (MVPU) offers general-purpose Digital Signal Processing (DSP) capabilities, with special hardware for accelerating complex imaging and computer vision algorithms. The MVPU also offers outstanding performance while running AI models.

1.2.2.4 MDLA

The MediaTek Deep Learning Accelerator (MDLA) is a powerful and efficient Convolutional Neural Network (CNN) accelerator. The MDLA is capable of achieving high AI benchmark results with high Multiply-Accumulate (MAC) utilization rates. The design integrates MAC units with dedicated function blocks, which handle activation functions, element-wise operations, and pooling layers.

The MDLA uses a technique called tile-based layer fusion to help achieve high compute efficiency and bandwidth reduction. Tile-based layer fusion identifies and then fuses dependent inter-layer operations, in order to reduce the amount of data the MDLA brings on-chip.

2. Hardware Support Specification11111

The following MediaTek platforms support NeuroPilot 5:

Dimensity 8000
Dimensity 8100
Dimensity 9000

2.1 Hardware Specifications

2.1.1. Dimensity 9000

Feature	D9000
Process	T-4nm
CPU	1x Arm Cortex-X2 at 3.05GHz, 1MB L2 3x Arm Cortex-A710 up to 2.85GHz, 512KB L2 4x Arm Cortex-A510 up to 1.8GHz, 256KB L2 SC/MC: 1256/4198
GPU	Arm Mali-G710 MC10 1W: 119fps, Peak: >220fps
Memory	4x LPDDR5X 7500MHz UFS 3.1, 2-lane
Camera	4K30 3-exp Video HDR x 3CAM Up to 320MP 32M+32M+32M @30 ZSD
AI	MediaTek APU 590 4x MDLA 3.0+ 2x MVPU 2.0
Video Decoder	8K 30fps
Video Encoder	8K 24fps
Display	2480x2200 120Hz WQHD+ (3680x1600) 144Hz
Connectivity	Wi-Fi 6E 2x2, 160MHz bandwidth DBDC 1x1+1x1 Bluetooth 5.3
Modem	5G NR 3CC 300MHz with ET 60MHz 4G Cat-19, DR-DSDA

2.1.2. APU

The MediaTek AI Processing Unit (APU) is a a high-performance hardware engine for deep-learning, optimized for bandwidth and power efficiency. The APU architecture consists of big, small, and tiny cores. This highly heterogeneous design is suited for a wide variety of modern smartphone tasks, such as AI-camera, AI-assistant, and OS or in-app enhancements.

2.1.2.1. MVPU 2.0

The Vision Processing Unit (VPU) offers general-purpose Digital Signal Processing (DSP) capabilities, with special hardware for accelerating complex imaging and computer vision algorithms. The VPU also offers outstanding performance while running AI models.

2.1.2.2. MDLA 3.0

本文来自互联网用户投稿，文章观点仅代表作者本人，不代表本站立场，不承担相关法律责任。如若转载，请注明出处。 如若内容造成侵权/违法违规/事实不符，请点击【内容举报】进行投诉反馈！

标签：技术

上一篇 > 天玑9000+和天玑9000有哪些区别两者参数对比
下一篇 > 秒杀系统，扣库存，高并发场景下订单和扣库存讲解

Duilib中list控件支持ctrl和shif多行选中的实现

[ICML2015]Batch Normalization:Accelerating Deep Network Training by Reducing Internal Covariate Shif

win10系统微软输入法于eclipse ctrl+shif+f冲突间接处理办法

Codeforces Round #259 (Div. 2) B. Little Pony and Sort by Shif

读LDD3，内存映射与DMA--PAGE_SHIF…

VMware虚拟机安装XP【要先分区，再设置BOOT 启动CD，shif+上移】

更换iBus五笔的左与右Shif

sublime ctrl+shif+f 没用解决办法

idea 对 ctrl + z 的撤销是 ctrl + shif + z

计算机最早的设计师应用于,计算机应用基础选择题doc.doc

win10自带截图神器：Win+Shift+S

Python基础之文件目录操作

python简述目录_Python基础之文件目录操作(示例代码)

tp5 如何做数据采集

任务2-7(服务器字体+阿里巴巴矢量库)

html标签（1)：h1~h6,p,br,pre,hr

TI 电量计介绍与芯片选型指南

几款TI电源芯片简介

TI DSP芯片C2000系列读取FLASH数据

德州仪器(Ti)平台嵌入式开发基础

TI三相电机智能栅极驱动芯片特点分类

省选模拟（12.08） T3 圈圈圈圈圈圈圈圈

Hadoop生态圈技术栈（上）

大数据开发基础入门与项目实战（三）Hadoop核心及生态圈技术栈之6.Impala交互式查询

小猿圈之Linux下Mysql 操作命令

大数据Hadoop生态圈常用面试题

大数据开发基础入门与项目实战（三）Hadoop核心及生态圈技术栈之4.Hive DDL、DQL和数据操作

备战Noip2018模拟赛11（B组）T3 Monogatari 物语

【智能优化算法-圆圈搜索算法】基于圆圈搜索算法Circle Search Algorithm求解单目标优化问题附matlab代码

NYOJ 78 圈水池

递归问题跑道汽车绕圈问题 Python实现

Hadoop生态圈（三）：MapReduce

From NeuroPilot

1. NeuroPilot Introduction

1.1 NeuroPilot Software Ecosystem

1.1.1 Neural Network Framework Support

1.1.2 NeuroPilot Software Tools

1.1.2.1 Neuron SDK

1.1.2.2 Android Run-Time Libraries

1.2 MediaTek Device Capabilities

1.2.1 Hardware Support

1.2.1.1 Device Parametric Table

1.2.2 Devices

1.2.2.1 CPU

1.2.2.2 GPU

1.2.2.3 MVPU

1.2.2.4 MDLA

2. Hardware Support Specification11111

2.1 Hardware Specifications

2.1.1. Dimensity 9000

2.1.2. APU

2.1.2.1. MVPU 2.0

2.1.2.2. MDLA 3.0

相关文章