Jetson nano部署记录
文章目录
- 1、yolov5s模型转tensorrt
- 1、前期准备
- 2、运行yolov5
- 3、TensorRT加速推理
- 4、直接使用摄像头进行部署
- 2、deepstream检测
- 1、下载SDK
- 2、安装依赖
- 3、配置环境
- 4、测试效果
- 3、使用deepstream进行部署
- 4、yolov5目前比较新的版本训练和检测
- 1、前期准备
- 2、开始训练
- 3、进行检测
这篇文章不太成功,只能算是一个失败记录吧,先不管了,怕以后忘记了,主要是也快要交毕设了,虽然不想写论文,那也得花时间去弄啊对吧,更完这篇就休息!
这篇文章需要的基础
YOLOV5目标检测记录
jetson nano上手记录
还是延续之前的思路,需要部署一个方便进行训练和检测的平台,在电脑这边训练,然后部署就在jetson nano上来进行,尽量提高jetson nano的帧率。
1、yolov5s模型转tensorrt
1、前期准备
这里需要先安装一下torch&&torchvision,这个最好还是本地下载和安装吧,相关的文件我都放到我的网盘里面了,可以根据需要下载,口令如下:
「nano工具分享」https://www.aliyundrive.com/s/hEEe9UfsXv1
点击链接保存,或者复制本段内容,打开「阿里云盘」APP ,无需下载极速在线查看,视频原画倍速播放。
下面是分享内容

安装的话torch是直接安装,torchvision是需要编译安装的,这里下载到开发板之后进行解压
unzip xxx
之后就可以进入文件夹去安装了
pip3 install numpy torch-1.8.0-cp36-cp36m-linux_aarch64.whl
等一段时间安装完成,安装有报错的话一般是缺了什么,缺啥安装啥就行了
之后是安装我们的torchvision
cd torchvision
export BUILD_VERSION=0.9.0
python3 setup.py install --user
2、运行yolov5
首先还是从github上下载代码
git clone https://github.com/ultralytics/yolov5.git
这里我的库基本都装过了,不需要继续安装了,可以直接运行

运行的话它会自动下载我们需要的模型文件,之后就会检测里面的两张图片,可以看到一张基本是0.1s,还是不太行啊

检测效果如下所示

运行过程下载我们需要的权重。
3、TensorRT加速推理
这里需要下载两个项目,这两个项目就是后面的基础了,首先是这个tensorrt的项目
git clone https://github.com/wang-xinyu/tensorrtx.git
之后是这个,如果下载失败可以多尝试几次,还是不行就直接电脑本地下载然后传过去
git clone -b v5.0 https://github.com/ultralytics/yolov5.git
首先是进入tensorrts文件夹,把里面的一个pt转wts的代码复制到yolov5这个文件夹里面去,之后把我们下载好的yolov5的这个权重也复制进去,然后就可以生成wts文件了
cd tensorrtx
cp yolov5/gen_wts.py ~/yolov5
cd ~/yolov5
python3 gen_wts.py -m yolov5s.pt # 这里要事先把模型文件放到里面去
下面就是去生成模型文件然后进行推理了
cd ~/tensorrtx/yolov5
mkdir build && cd build
mv ~/yolov5/yolov5s.wts ./
cmake ..
make
这里看到下面结果就说明OK了

之后就是那我们的wts文件来生成engine文件了,并可以使用样例来进行一下测试
./yolov5 -s yolov5s.wts yolov5s.engine s #生成engine文件需要一段时间,请耐心等待
./yolov5 -d yolov5s.engine ../samples # 推理结果保存在samples文件夹下
检测输出,说实话也没什么提升,不知道视频咋样

实际效果

这里有一个问题,就是之前那个官方权重必须是新版的,这里的新版是指当前时间的版本,目前是yolov5 6.1版本的,估计6.0版本的也是可以的,如果用5.0版本的肯定就不行的,会报错,报错信息如下。

这里也可以理解为一个版本对应的问题,就是他们的版本不对,不管是trt那边,还是yolo这边都需要用最新的版本,或者对应的版本才行,然后就是我们训练的pt文件都是基于一个官方的pt文件来的,所以神经网络的那些层其实就都和他一样的,所以就也没啥用的,就是源头的那个模型用不了,其他的也不能用。
4、直接使用摄像头进行部署
下面参考这位大佬的方案
https://blog.csdn.net/hahasl555/article/details/116500763
将下面的内容替换yolov5文件夹下的yolov5.cpp文件
vim yolov5.cpp
之后因为是vim编辑器,输入
#include
#include
#include "cuda_utils.h"
#include "logging.h"
#include "common.hpp"
#include "utils.h"
#include "calibrator.h"#define USE_FP16 // set USE_INT8 or USE_FP16 or USE_FP32
#define DEVICE 0 // GPU id
#define NMS_THRESH 0.4
#define CONF_THRESH 0.5
#define BATCH_SIZE 1// stuff we know about the network and the input/output blobs
static const int INPUT_H = Yolo::INPUT_H;
static const int INPUT_W = Yolo::INPUT_W;
static const int CLASS_NUM = Yolo::CLASS_NUM;
static const int OUTPUT_SIZE = Yolo::MAX_OUTPUT_BBOX_COUNT * sizeof(Yolo::Detection) / sizeof(float) + 1; // we assume the yololayer outputs no more than MAX_OUTPUT_BBOX_COUNT boxes that conf >= 0.1
const char* INPUT_BLOB_NAME = "data";
const char* OUTPUT_BLOB_NAME = "prob";
static Logger gLogger;char* my_classes[] = { "person" , "dog" , "bicycle" };static int get_width(int x, float gw, int divisor = 8) {//return math.ceil(x / divisor) * divisorif (int(x * gw) % divisor == 0) {return int(x * gw);}return (int(x * gw / divisor) + 1) * divisor;
}static int get_depth(int x, float gd) {if (x == 1) {return 1;}else {return round(x * gd) > 1 ? round(x * gd) : 1;}
}ICudaEngine* build_engine(unsigned int maxBatchSize, IBuilder* builder, IBuilderConfig* config, DataType dt, float& gd, float& gw, std::string& wts_name) {INetworkDefinition* network = builder->createNetworkV2(0U);// Create input tensor of shape {3, INPUT_H, INPUT_W} with name INPUT_BLOB_NAMEITensor* data = network->addInput(INPUT_BLOB_NAME, dt, Dims3{ 3, INPUT_H, INPUT_W });assert(data);std::map<std::string, Weights> weightMap = loadWeights(wts_name);/* ------ yolov5 backbone------ */auto focus0 = focus(network, weightMap, *data, 3, get_width(64, gw), 3, "model.0");auto conv1 = convBlock(network, weightMap, *focus0->getOutput(0), get_width(128, gw), 3, 2, 1, "model.1");auto bottleneck_CSP2 = C3(network, weightMap, *conv1->getOutput(0), get_width(128, gw), get_width(128, gw), get_depth(3, gd), true, 1, 0.5, "model.2");auto conv3 = convBlock(network, weightMap, *bottleneck_CSP2->getOutput(0), get_width(256, gw), 3, 2, 1, "model.3");auto bottleneck_csp4 = C3(network, weightMap, *conv3->getOutput(0), get_width(256, gw), get_width(256, gw), get_depth(9, gd), true, 1, 0.5, "model.4");auto conv5 = convBlock(network, weightMap, *bottleneck_csp4->getOutput(0), get_width(512, gw), 3, 2, 1, "model.5");auto bottleneck_csp6 = C3(network, weightMap, *conv5->getOutput(0), get_width(512, gw), get_width(512, gw), get_depth(9, gd), true, 1, 0.5, "model.6");auto conv7 = convBlock(network, weightMap, *bottleneck_csp6->getOutput(0), get_width(1024, gw), 3, 2, 1, "model.7");auto spp8 = SPP(network, weightMap, *conv7->getOutput(0), get_width(1024, gw), get_width(1024, gw), 5, 9, 13, "model.8");/* ------ yolov5 head ------ */auto bottleneck_csp9 = C3(network, weightMap, *spp8->getOutput(0), get_width(1024, gw), get_width(1024, gw), get_depth(3, gd), false, 1, 0.5, "model.9");auto conv10 = convBlock(network, weightMap, *bottleneck_csp9->getOutput(0), get_width(512, gw), 1, 1, 1, "model.10");auto upsample11 = network->addResize(*conv10->getOutput(0));assert(upsample11);upsample11->setResizeMode(ResizeMode::kNEAREST);upsample11->setOutputDimensions(bottleneck_csp6->getOutput(0)->getDimensions());ITensor* inputTensors12[] = { upsample11->getOutput(0), bottleneck_csp6->getOutput(0) };auto cat12 = network->addConcatenation(inputTensors12, 2);auto bottleneck_csp13 = C3(network, weightMap, *cat12->getOutput(0), get_width(1024, gw), get_width(512, gw), get_depth(3, gd), false, 1, 0.5, "model.13");auto conv14 = convBlock(network, weightMap, *bottleneck_csp13->getOutput(0), get_width(256, gw), 1, 1, 1, "model.14");auto upsample15 = network->addResize(*conv14->getOutput(0));assert(upsample15);upsample15->setResizeMode(ResizeMode::kNEAREST);upsample15->setOutputDimensions(bottleneck_csp4->getOutput(0)->getDimensions());ITensor* inputTensors16[] = { upsample15->getOutput(0), bottleneck_csp4->getOutput(0) };auto cat16 = network->addConcatenation(inputTensors16, 2);auto bottleneck_csp17 = C3(network, weightMap, *cat16->getOutput(0), get_width(512, gw), get_width(256, gw), get_depth(3, gd), false, 1, 0.5, "model.17");// yolo layer 0IConvolutionLayer* det0 = network->addConvolutionNd(*bottleneck_csp17->getOutput(0), 3 * (Yolo::CLASS_NUM + 5), DimsHW{ 1, 1 }, weightMap["model.24.m.0.weight"], weightMap["model.24.m.0.bias"]);auto conv18 = convBlock(network, weightMap, *bottleneck_csp17->getOutput(0), get_width(256, gw), 3, 2, 1, "model.18");ITensor* inputTensors19[] = { conv18->getOutput(0), conv14->getOutput(0) };auto cat19 = network->addConcatenation(inputTensors19, 2);auto bottleneck_csp20 = C3(network, weightMap, *cat19->getOutput(0), get_width(512, gw), get_width(512, gw), get_depth(3, gd), false, 1, 0.5, "model.20");//yolo layer 1IConvolutionLayer* det1 = network->addConvolutionNd(*bottleneck_csp20->getOutput(0), 3 * (Yolo::CLASS_NUM + 5), DimsHW{ 1, 1 }, weightMap["model.24.m.1.weight"], weightMap["model.24.m.1.bias"]);auto conv21 = convBlock(network, weightMap, *bottleneck_csp20->getOutput(0), get_width(512, gw), 3, 2, 1, "model.21");ITensor* inputTensors22[] = { conv21->getOutput(0), conv10->getOutput(0) };auto cat22 = network->addConcatenation(inputTensors22, 2);auto bottleneck_csp23 = C3(network, weightMap, *cat22->getOutput(0), get_width(1024, gw), get_width(1024, gw), get_depth(3, gd), false, 1, 0.5, "model.23");IConvolutionLayer* det2 = network->addConvolutionNd(*bottleneck_csp23->getOutput(0), 3 * (Yolo::CLASS_NUM + 5), DimsHW{ 1, 1 }, weightMap["model.24.m.2.weight"], weightMap["model.24.m.2.bias"]);auto yolo = addYoLoLayer(network, weightMap, "model.24", std::vector<IConvolutionLayer*>{det0, det1, det2});yolo->getOutput(0)->setName(OUTPUT_BLOB_NAME);network->markOutput(*yolo->getOutput(0));// Build enginebuilder->setMaxBatchSize(maxBatchSize);config->setMaxWorkspaceSize(16 * (1 << 20)); // 16MB
#if defined(USE_FP16)config->setFlag(BuilderFlag::kFP16);
#elif defined(USE_INT8)std::cout << "Your platform support int8: " << (builder->platformHasFastInt8() ? "true" : "false") << std::endl;assert(builder->platformHasFastInt8());config->setFlag(BuilderFlag::kINT8);Int8EntropyCalibrator2* calibrator = new Int8EntropyCalibrator2(1, INPUT_W, INPUT_H, "./coco_calib/", "int8calib.table", INPUT_BLOB_NAME);config->setInt8Calibrator(calibrator);
#endifstd::cout << "Building engine, please wait for a while..." << std::endl;ICudaEngine* engine = builder->buildEngineWithConfig(*network, *config);std::cout << "Build engine successfully!" << std::endl;// Don't need the network any morenetwork->destroy();// Release host memoryfor (auto& mem : weightMap){free((void*)(mem.second.values));}return engine;
}ICudaEngine* build_engine_p6(unsigned int maxBatchSize, IBuilder* builder, IBuilderConfig* config, DataType dt, float& gd, float& gw, std::string& wts_name) {INetworkDefinition* network = builder->createNetworkV2(0U);// Create input tensor of shape {3, INPUT_H, INPUT_W} with name INPUT_BLOB_NAMEITensor* data = network->addInput(INPUT_BLOB_NAME, dt, Dims3{ 3, INPUT_H, INPUT_W });assert(data);std::map<std::string, Weights> weightMap = loadWeights(wts_name);/* ------ yolov5 backbone------ */auto focus0 = focus(network, weightMap, *data, 3, get_width(64, gw), 3, "model.0");auto conv1 = convBlock(network, weightMap, *focus0->getOutput(0), get_width(128, gw), 3, 2, 1, "model.1");auto c3_2 = C3(network, weightMap, *conv1->getOutput(0), get_width(128, gw), get_width(128, gw), get_depth(3, gd), true, 1, 0.5, "model.2");auto conv3 = convBlock(network, weightMap, *c3_2->getOutput(0), get_width(256, gw), 3, 2, 1, "model.3");auto c3_4 = C3(network, weightMap, *conv3->getOutput(0), get_width(256, gw), get_width(256, gw), get_depth(9, gd), true, 1, 0.5, "model.4");auto conv5 = convBlock(network, weightMap, *c3_4->getOutput(0), get_width(512, gw), 3, 2, 1, "model.5");auto c3_6 = C3(network, weightMap, *conv5->getOutput(0), get_width(512, gw), get_width(512, gw), get_depth(9, gd), true, 1, 0.5, "model.6");auto conv7 = convBlock(network, weightMap, *c3_6->getOutput(0), get_width(768, gw), 3, 2, 1, "model.7");auto c3_8 = C3(network, weightMap, *conv7->getOutput(0), get_width(768, gw), get_width(768, gw), get_depth(3, gd), true, 1, 0.5, "model.8");auto conv9 = convBlock(network, weightMap, *c3_8->getOutput(0), get_width(1024, gw), 3, 2, 1, "model.9");auto spp10 = SPP(network, weightMap, *conv9->getOutput(0), get_width(1024, gw), get_width(1024, gw), 3, 5, 7, "model.10");auto c3_11 = C3(network, weightMap, *spp10->getOutput(0), get_width(1024, gw), get_width(1024, gw), get_depth(3, gd), false, 1, 0.5, "model.11");/* ------ yolov5 head ------ */auto conv12 = convBlock(network, weightMap, *c3_11->getOutput(0), get_width(768, gw), 1, 1, 1, "model.12");auto upsample13 = network->addResize(*conv12->getOutput(0));assert(upsample13);upsample13->setResizeMode(ResizeMode::kNEAREST);upsample13->setOutputDimensions(c3_8->getOutput(0)->getDimensions());ITensor* inputTensors14[] = { upsample13->getOutput(0), c3_8->getOutput(0) };auto cat14 = network->addConcatenation(inputTensors14, 2);auto c3_15 = C3(network, weightMap, *cat14->getOutput(0), get_width(1536, gw), get_width(768, gw), get_depth(3, gd), false, 1, 0.5, "model.15");auto conv16 = convBlock(network, weightMap, *c3_15->getOutput(0), get_width(512, gw), 1, 1, 1, "model.16");auto upsample17 = network->addResize(*conv16->getOutput(0));assert(upsample17);upsample17->setResizeMode(ResizeMode::kNEAREST);upsample17->setOutputDimensions(c3_6->getOutput(0)->getDimensions());ITensor* inputTensors18[] = { upsample17->getOutput(0), c3_6->getOutput(0) };auto cat18 = network->addConcatenation(inputTensors18, 2);auto c3_19 = C3(network, weightMap, *cat18->getOutput(0), get_width(1024, gw), get_width(512, gw), get_depth(3, gd), false, 1, 0.5, "model.19");auto conv20 = convBlock(network, weightMap, *c3_19->getOutput(0), get_width(256, gw), 1, 1, 1, "model.20");auto upsample21 = network->addResize(*conv20->getOutput(0));assert(upsample21);upsample21->setResizeMode(ResizeMode::kNEAREST);upsample21->setOutputDimensions(c3_4->getOutput(0)->getDimensions());ITensor* inputTensors21[] = { upsample21->getOutput(0), c3_4->getOutput(0) };auto cat22 = network->addConcatenation(inputTensors21, 2);auto c3_23 = C3(network, weightMap, *cat22->getOutput(0), get_width(512, gw), get_width(256, gw), get_depth(3, gd), false, 1, 0.5, "model.23");auto conv24 = convBlock(network, weightMap, *c3_23->getOutput(0), get_width(256, gw), 3, 2, 1, "model.24");ITensor* inputTensors25[] = { conv24->getOutput(0), conv20->getOutput(0) };auto cat25 = network->addConcatenation(inputTensors25, 2);auto c3_26 = C3(network, weightMap, *cat25->getOutput(0), get_width(1024, gw), get_width(512, gw), get_depth(3, gd), false, 1, 0.5, "model.26");auto conv27 = convBlock(network, weightMap, *c3_26->getOutput(0), get_width(512, gw), 3, 2, 1, "model.27");ITensor* inputTensors28[] = { conv27->getOutput(0), conv16->getOutput(0) };auto cat28 = network->addConcatenation(inputTensors28, 2);auto c3_29 = C3(network, weightMap, *cat28->getOutput(0), get_width(1536, gw), get_width(768, gw), get_depth(3, gd), false, 1, 0.5, "model.29");auto conv30 = convBlock(network, weightMap, *c3_29->getOutput(0), get_width(768, gw), 3, 2, 1, "model.30");ITensor* inputTensors31[] = { conv30->getOutput(0), conv12->getOutput(0) };auto cat31 = network->addConcatenation(inputTensors31, 2);auto c3_32 = C3(network, weightMap, *cat31->getOutput(0), get_width(2048, gw), get_width(1024, gw), get_depth(3, gd), false, 1, 0.5, "model.32");/* ------ detect ------ */IConvolutionLayer* det0 = network->addConvolutionNd(*c3_23->getOutput(0), 3 * (Yolo::CLASS_NUM + 5), DimsHW{ 1, 1 }, weightMap["model.33.m.0.weight"], weightMap["model.33.m.0.bias"]);IConvolutionLayer* det1 = network->addConvolutionNd(*c3_26->getOutput(0), 3 * (Yolo::CLASS_NUM + 5), DimsHW{ 1, 1 }, weightMap["model.33.m.1.weight"], weightMap["model.33.m.1.bias"]);IConvolutionLayer* det2 = network->addConvolutionNd(*c3_29->getOutput(0), 3 * (Yolo::CLASS_NUM + 5), DimsHW{ 1, 1 }, weightMap["model.33.m.2.weight"], weightMap["model.33.m.2.bias"]);IConvolutionLayer* det3 = network->addConvolutionNd(*c3_32->getOutput(0), 3 * (Yolo::CLASS_NUM + 5), DimsHW{ 1, 1 }, weightMap["model.33.m.3.weight"], weightMap["model.33.m.3.bias"]);auto yolo = addYoLoLayer(network, weightMap, "model.33", std::vector<IConvolutionLayer*>{det0, det1, det2, det3});yolo->getOutput(0)->setName(OUTPUT_BLOB_NAME);network->markOutput(*yolo->getOutput(0));// Build enginebuilder->setMaxBatchSize(maxBatchSize);config->setMaxWorkspaceSize(16 * (1 << 20)); // 16MB
#if defined(USE_FP16)config->setFlag(BuilderFlag::kFP16);
#elif defined(USE_INT8)std::cout << "Your platform support int8: " << (builder->platformHasFastInt8() ? "true" : "false") << std::endl;assert(builder->platformHasFastInt8());config->setFlag(BuilderFlag::kINT8);Int8EntropyCalibrator2* calibrator = new Int8EntropyCalibrator2(1, INPUT_W, INPUT_H, "./coco_calib/", "int8calib.table", INPUT_BLOB_NAME);config->setInt8Calibrator(calibrator);
#endifstd::cout << "Building engine, please wait for a while..." << std::endl;ICudaEngine* engine = builder->buildEngineWithConfig(*network, *config);std::cout << "Build engine successfully!" << std::endl;// Don't need the network any morenetwork->destroy();// Release host memoryfor (auto& mem : weightMap){free((void*)(mem.second.values));}return engine;
}void APIToModel(unsigned int maxBatchSize, IHostMemory** modelStream, float& gd, float& gw, std::string& wts_name) {// Create builderIBuilder* builder = createInferBuilder(gLogger);IBuilderConfig* config = builder->createBuilderConfig();// Create model to populate the network, then set the outputs and create an engineICudaEngine* engine = build_engine(maxBatchSize, builder, config, DataType::kFLOAT, gd, gw, wts_name);assert(engine != nullptr);// Serialize the engine(*modelStream) = engine->serialize();// Close everything downengine->destroy();builder->destroy();config->destroy();
}void doInference(IExecutionContext& context, cudaStream_t& stream, void** buffers, float* input, float* output, int batchSize) {// DMA input batch data to device, infer on the batch asynchronously, and DMA output back to hostCUDA_CHECK(cudaMemcpyAsync(buffers[0], input, batchSize * 3 * INPUT_H * INPUT_W * sizeof(float), cudaMemcpyHostToDevice, stream));context.enqueue(batchSize, buffers, stream, nullptr);CUDA_CHECK(cudaMemcpyAsync(output, buffers[1], batchSize * OUTPUT_SIZE * sizeof(float), cudaMemcpyDeviceToHost, stream));cudaStreamSynchronize(stream);
}bool parse_args(int argc, char** argv, std::string& engine) {if (argc < 3) return false;if (std::string(argv[1]) == "-v" && argc == 3) {engine = std::string(argv[2]);}else {return false;}return true;
}int main(int argc, char** argv) {cudaSetDevice(DEVICE);//std::string wts_name = "";std::string engine_name = "";//float gd = 0.0f, gw = 0.0f;//std::string img_dir;if (!parse_args(argc, argv, engine_name)) {std::cerr << "arguments not right!" << std::endl;std::cerr << "./yolov5 -v [.engine] // run inference with camera" << std::endl;return -1;}std::ifstream file(engine_name, std::ios::binary);if (!file.good()) {std::cerr << " read " << engine_name << " error! " << std::endl;return -1;}char* trtModelStream{ nullptr };size_t size = 0;file.seekg(0, file.end);size = file.tellg();file.seekg(0, file.beg);trtModelStream = new char[size];assert(trtModelStream);file.read(trtModelStream, size);file.close();// prepare input data ---------------------------static float data[BATCH_SIZE * 3 * INPUT_H * INPUT_W];//for (int i = 0; i < 3 * INPUT_H * INPUT_W; i++)// data[i] = 1.0;static float prob[BATCH_SIZE * OUTPUT_SIZE];IRuntime* runtime = createInferRuntime(gLogger);assert(runtime != nullptr);ICudaEngine* engine = runtime->deserializeCudaEngine(trtModelStream, size);assert(engine != nullptr);IExecutionContext* context = engine->createExecutionContext();assert(context != nullptr);delete[] trtModelStream;assert(engine->getNbBindings() == 2);void* buffers[2];// In order to bind the buffers, we need to know the names of the input and output tensors.// Note that indices are guaranteed to be less than IEngine::getNbBindings()const int inputIndex = engine->getBindingIndex(INPUT_BLOB_NAME);const int outputIndex = engine->getBindingIndex(OUTPUT_BLOB_NAME);assert(inputIndex == 0);assert(outputIndex == 1);// Create GPU buffers on deviceCUDA_CHECK(cudaMalloc(&buffers[inputIndex], BATCH_SIZE * 3 * INPUT_H * INPUT_W * sizeof(float)));CUDA_CHECK(cudaMalloc(&buffers[outputIndex], BATCH_SIZE * OUTPUT_SIZE * sizeof(float)));// Create streamcudaStream_t stream;CUDA_CHECK(cudaStreamCreate(&stream));cv::VideoCapture capture(1);//cv::VideoCapture capture("../overpass.mp4");//int fourcc = cv::VideoWriter::fourcc('M','J','P','G');//capture.set(cv::CAP_PROP_FOURCC, fourcc);if (!capture.isOpened()) {std::cout << "Error opening video stream or file" << std::endl;return -1;}int key;int fcount = 0;while (1){cv::Mat frame;capture >> frame;if (frame.empty()){std::cout << "Fail to read image from camera!" << std::endl;break;}fcount++;//if (fcount < BATCH_SIZE && f + 1 != (int)file_names.size()) continue;for (int b = 0; b < fcount; b++) {//cv::Mat img = cv::imread(img_dir + "/" + file_names[f - fcount + 1 + b]);cv::Mat img = frame;if (img.empty()) continue;cv::Mat pr_img = preprocess_img(img, INPUT_W, INPUT_H); // letterbox BGR to RGBint i = 0;for (int row = 0; row < INPUT_H; ++row) {uchar* uc_pixel = pr_img.data + row * pr_img.step;for (int col = 0; col < INPUT_W; ++col) {data[b * 3 * INPUT_H * INPUT_W + i] = (float)uc_pixel[2] / 255.0;data[b * 3 * INPUT_H * INPUT_W + i + INPUT_H * INPUT_W] = (float)uc_pixel[1] / 255.0;data[b * 3 * INPUT_H * INPUT_W + i + 2 * INPUT_H * INPUT_W] = (float)uc_pixel[0] / 255.0;uc_pixel += 3;++i;}}}// Run inferenceauto start = std::chrono::system_clock::now();doInference(*context, stream, buffers, data, prob, BATCH_SIZE);auto end = std::chrono::system_clock::now();//std::cout << std::chrono::duration_cast(end - start).count() << "ms" << std::endl; int fps = 1000.0 / std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count();std::vector<std::vector<Yolo::Detection>> batch_res(fcount);for (int b = 0; b < fcount; b++) {auto& res = batch_res[b];nms(res, &prob[b * OUTPUT_SIZE], CONF_THRESH, NMS_THRESH);}for (int b = 0; b < fcount; b++) {auto& res = batch_res[b];//std::cout << res.size() << std::endl;//cv::Mat img = cv::imread(img_dir + "/" + file_names[f - fcount + 1 + b]);for (size_t j = 0; j < res.size(); j++) {cv::Rect r = get_rect(frame, res[j].bbox);cv::rectangle(frame, r, cv::Scalar(0x27, 0xC1, 0x36), 2);std::string label = my_classes[(int)res[j].class_id];cv::putText(frame, label, cv::Point(r.x, r.y - 1), cv::FONT_HERSHEY_PLAIN, 1.2, cv::Scalar(0xFF, 0xFF, 0xFF), 2);std::string jetson_fps = "Jetson Nano FPS: " + std::to_string(fps);cv::putText(frame, jetson_fps, cv::Point(11, 80), cv::FONT_HERSHEY_PLAIN, 3, cv::Scalar(0, 0, 255), 2, cv::LINE_AA);}//cv::imwrite("_" + file_names[f - fcount + 1 + b], img);}cv::imshow("yolov5", frame);key = cv::waitKey(1);if (key == 'q') {break;}fcount = 0;}capture.release();// Release stream and bufferscudaStreamDestroy(stream);CUDA_CHECK(cudaFree(buffers[inputIndex]));CUDA_CHECK(cudaFree(buffers[outputIndex]));// Destroy the enginecontext->destroy();engine->destroy();runtime->destroy();return 0;
}
这里需要替换为自己的

再次进行编译和测试
cd build
make
sudo ./yolov5 -v yolov5s.engine
就可以在看到摄像头检测的图像了
2、deepstream检测
1、下载SDK
关于DeepStream的介绍参考这篇文章
https://blog.csdn.net/Tosonw/article/details/104154090?msclkid=12bd0cf4cf6911ecb107816bf39a19e5
下面就是下载了,官方链接为
https://developer.nvidia.com/deepstream-sdk?msclkid=12bd94d8cf6911ecb9a72916005a0257
不过这个是针对jp4.6.1版本的,我这里用的不是对应的版本,所以不能下载这个

后面百度搜到了这个网站,去这里下载就可以了
https://developer.nvidia.com/embedded/deepstream-on-jetson-downloads-archived
这里可以看到需要选择5.0.1版本,或者5.0版本的都可以的

这里有500多M的大小,因此决定就还是先本地下载吧

之后使用SSH远程传输到我们的jetson nano开发板上去

2、安装依赖
使用下面的命令安装所需的依赖
sudo apt-get install libssl1.0.0 libgstreamer1.0-0 gstreamer1.0-tools gstreamer1.0-plugins-good gstreamer1.0-plugins-bad gstreamer1.0-plugins-ugly gstreamer1.0-libav libgstrtspserver-1.0-0 libjansson4
sudo apt-get install libgstreamer1.0-0 gstreamer1.0-plugins-base gstreamer1.0-plugins-good gstreamer1.0-plugins-bad gstreamer1.0-plugins-ugly gstreamer1.0-libav gstreamer1.0-doc gstreamer1.0-tools gstreamer1.0-x gstreamer1.0-alsa gstreamer1.0-gl gstreamer1.0-gtk3 gstreamer1.0-qt5 gstreamer1.0-pulseaudio libgstrtspserver-1.0-dev gstreamer1.0-rtsp
sudo apt-get install libgstreamer1.0-dev libgstreamer-plugins-base1.0-dev
sudo apt-get install libgstreamer-plugins-base1.0-dev libgstreamer1.0-dev libgstrtspserver-1.0-dev libx11-dev libgstrtspserver-1.0-dev gstreamer1.0-rtsp ffmpeg
sudo apt-get install libgstreamer-plugins-base1.0-dev libgstreamer1.0-dev libgstrtspserver-1.0-dev libx11-dev libjson-glib-dev
下面对我们的安装包进行解压
sudo tar -jxvf deepstream_sdk_v5.0.1_jetson.tbz2 -C /
这里是解压到这个位置了

进到这里,然后准备安装
cd /opt/nvidia/deepstream/deepstream-5.0/
安装
sudo ./install.sh
sudo ldconfig
3、配置环境
下面需要配置下环境变量,编辑下面的文件
sudo vim /etc/ld.so.conf
在末尾加上这个,其实这个文件就一行
/opt/nvidia/deepstream/deepstream-5.0/lib/
使我们的配置生效,退出后输入
sudo ldconfig
在将我们另一个位置配置下,就是之前配置cuda和cudnn的地方
sudo vim ~/.bashrc
在结尾处添加
unset DISPLAY
同样的使配置生效,退出后输入
source ~/.bashrc
验证安装是否成功
deepstream-app --version-all
看到下面的这个结果就说明可以了

4、测试效果
下面运行下官方的demo
cd /opt/nvidia/deepstream/deepstream
运行
deepstream-app -c samples/configs/deepstream-app/source8_1080p_dec_infer-resnet_tracker_tiled_display_fp16_nano.txt
出现下面的输出,一个多路的视频检测

运行结束,可以看到测试成功!

3、使用deepstream进行部署
暂未尝试,试了再写
4、yolov5目前比较新的版本训练和检测
1、前期准备
前面说到就是在部署的时候存在的新版和旧版的问题,这里干脆就看看新版怎么训练的,这样我们训练的模型就可以直接拿过去用了,首先还是修改配置文件,这里修改数量就行(复制yolovs的配置文件,因为用yolov5s来进行的训练)。

这里修改VOC文件,因为我们用的VOC的数据集,复制一份然后修改就行,加入训练集的目录还有标签。

修改如下所示

准备好voc2007的数据集,如下所示

2、开始训练
在train.py里面找到配置文件进行修改,加入我们之前的一些配置文件,这里有个一次训练的数目,不要选的太大了,选太大了容易出问题,就会提示你不够大什么的

最后就开始训练了,训练过程结果如下所示

3、进行检测
检测的话这里只需要修改下我们的detect.py就行了,在这里加入我们的准备好的权重文件,如下所示

训练结果如下

本文来自互联网用户投稿,文章观点仅代表作者本人,不代表本站立场,不承担相关法律责任。如若转载,请注明出处。 如若内容造成侵权/违法违规/事实不符,请点击【内容举报】进行投诉反馈!
