在Khadas VIM4 Amogic A311D2 SBC上,我更多的时间是在使用Ubuntu 22.04。它的总体性能还不错,只不过缺少3D图形加速和硬件视频解码等功能。最近,我在Wiki中看到它可以支持Linux硬件视频编码的内容,我真的有点兴奋。因为之前我很少看到它可以对这个功能提供支持,所以我决定试一下。
首先,我们需要制作一个NV12像素格式的视频,这通常是由摄像头输出的。然后我从Linaro下载了一个45秒的1080p H.264示例视频,并使用ffmpeg(一个开源软件,可以执行音讯和视讯多种格式的录影、转档、串流功能)对其进行了转换:
1 |
ffmpeg -i big_buck_bunny_1080p_H264_AAC_25fps_7200K.MP4 -pix_fmt nv12 big_buck_bunny_1080p_H264_AAC_25fps_7200K-nv12.yuv |
对了,我是在自己的笔记本电脑上直接操作的。作为原始视频,它还挺大的,45 秒的视频占用了3.3GB的存储空间,如下所示:
1 2 3 4 |
ls -lh total 3.3G -rw-rw-r-- 1 jaufranc jaufranc 40M Aug 5 2011 big_buck_bunny_1080p_H264_AAC_25fps_7200K.MP4 -rw-rw-r-- 1 jaufranc jaufranc 3.3G May 21 15:03 big_buck_bunny_1080p_H264_AAC_25fps_7200K-nv12.yuv |
现在我尝试使用aml_enc_test硬件视频编码示例在Khadas VIM4板上将视频编码为H.264,如下所示:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
khadas@Khadas:~$ time aml_enc_test 1080p.nv12 dump.h264 1920 1080 30 25 6000000 1125 1 0 2 4 src_url is : 1080p.nv12 ; out_url is : dump.h264 ; width is : 1920 ; height is : 1080 ; gop is : 30 ; frmrate is : 25 ; bitrate is : 6000000 ; frm_num is : 1125 ; fmt is : 1 ; buf_type is : 0 ; num_planes is : 2 ; codec is : 4 ; codec is H264 Set log level to 4 [initEncParams:177] enc_feature_opts is 0x0 , GopPresetis 0x0 [SetupEncoderOpenParam:513] GopPreset GOP format (2) period 30 LongTermRef 0 [vdi_sys_sync_inst_param:618] [VDI] fail to deliver sync instance param inst_idx=0 [AML_MultiEncInitialize:1378] VPU instance param sync with open param failed [SetSequenceInfo:979] Required buffer fb_num=3, src_num=1, actual src=3 1920x1080 Encode End!width:1920 real 0m26.074s user 0m1.832s sys 0m4.883s |
上面的输出也解释了使用参数。上面有一些错误的消息,但视频可以在我的计算机上使用ffplay播放而且不会出现问题。

我们可以看到编码的时长是26秒左右,这比实时编码的速度还要快,因为视频的实际时长是45秒。
接下来我们尝试使用H.265编码,如下所示:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
time aml_enc_test 1080p.nv12 dump.h265 1920 1080 30 25 6000000 1125 1 0 2 5 src_url is : 1080p.nv12 ; out_url is : dump.h265 ; width is : 1920 ; height is : 1080 ; gop is : 30 ; frmrate is : 25 ; bitrate is : 6000000 ; frm_num is : 1125 ; fmt is : 1 ; buf_type is : 0 ; num_planes is : 2 ; codec is : 5 ; codec is H265 Set log level to 4 [initEncParams:177] enc_feature_opts is 0x0 , GopPresetis 0x0 [SetupEncoderOpenParam:513] GopPreset GOP format (2) period 30 LongTermRef 0 [vdi_sys_sync_inst_param:618] [VDI] fail to deliver sync instance param inst_idx=0 [AML_MultiEncInitialize:1378] VPU instance param sync with open param failed [SetSequenceInfo:979] Required buffer fb_num=3, src_num=1, actual src=3 1920x1080 Encode End!width:1920 real 0m9.561s user 0m1.348s sys 0m2.576s |
这着实有点令人惊讶,不过H.265视频编码确实是要比H.264视频编码快得多。我再次尝试使用H.264编码看看:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
$ time aml_enc_test 1080p.nv12 dump2.h264 1920 1080 30 25 6000000 1125 1 0 2 4 src_url is : 1080p.nv12 ; out_url is : dump2.h264 ; width is : 1920 ; height is : 1080 ; gop is : 30 ; frmrate is : 25 ; bitrate is : 6000000 ; frm_num is : 1125 ; fmt is : 1 ; buf_type is : 0 ; num_planes is : 2 ; codec is : 4 ; codec is H264 Set log level to 4 [initEncParams:177] enc_feature_opts is 0x0 , GopPresetis 0x0 [SetupEncoderOpenParam:513] GopPreset GOP format (2) period 30 LongTermRef 0 [vdi_sys_sync_inst_param:618] [VDI] fail to deliver sync instance param inst_idx=0 [AML_MultiEncInitialize:1378] VPU instance param sync with open param failed [SetSequenceInfo:979] Required buffer fb_num=3, src_num=1, actual src=3 1920x1080 Encode End!width:1920 real 0m8.780s user 0m1.416s sys 0m2.274s |
哇!现在只需不到9秒了。第一次从eMMC闪存读取数据的时候还是挺慢的,但是由于文件是3.3GB的,它可以放入缓存,所以第二次就没有存储瓶颈了。

尽管如此,dump.H265文件也还可以在我的计算机上正常播放,因此这个转换是成功的。
晶晨A311D2规范上也说它是支持“H.265和H.264 at 4Kp50”视频编码的。因此,我创建了一个45秒的4Kp50视频并将其转换为NV12 YUV格式。原始视频的大小是27GB,板载eMMC闪存空间不够(32GB),我只好让把视频缩短到了30秒(大约18GB)。
现在我们可以将视频编码为H.264了,如下所示:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
khadas@Khadas:~$ time aml_enc_test 4k.nv12 dump4k.h264 3840 2160 30 50 10000000 1501 1 0 2 4 src_url is : 4k.nv12 ; out_url is : dump4k.h264 ; width is : 3840 ; height is : 2160 ; gop is : 30 ; frmrate is : 50 ; bitrate is : 10000000 ; frm_num is : 1501 ; fmt is : 1 ; buf_type is : 0 ; num_planes is : 2 ; codec is : 4 ; codec is H264 Set log level to 4 [initEncParams:177] enc_feature_opts is 0x0 , GopPresetis 0x0 [SetupEncoderOpenParam:513] GopPreset GOP format (2) period 30 LongTermRef 0 [vdi_sys_sync_inst_param:618] [VDI] fail to deliver sync instance param inst_idx=0 [AML_MultiEncInitialize:1378] VPU instance param sync with open param failed [SetSequenceInfo:979] Required buffer fb_num=3, src_num=1, actual src=3 3840x2160 Encode End!width:3840 real 2m10.611s user 0m5.819s sys 0m26.130s |
两分钟编码一个30秒的视频!!!事实证明这并没有减少它的时间,所以我再次运行示例看看:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
$ time aml_enc_test 4k.nv12 dump4k.h264 3840 2160 30 50 10000000 1501 1 0 2 4 src_url is : 4k.nv12 ; out_url is : dump4k.h264 ; width is : 3840 ; height is : 2160 ; gop is : 30 ; frmrate is : 50 ; bitrate is : 10000000 ; frm_num is : 1501 ; fmt is : 1 ; buf_type is : 0 ; num_planes is : 2 ; codec is : 4 ; codec is H264 Set log level to 4 [initEncParams:177] enc_feature_opts is 0x0 , GopPresetis 0x0 [SetupEncoderOpenParam:513] GopPreset GOP format (2) period 30 LongTermRef 0 [vdi_sys_sync_inst_param:618] [VDI] fail to deliver sync instance param inst_idx=0 [AML_MultiEncInitialize:1378] VPU instance param sync with open param failed [SetSequenceInfo:979] Required buffer fb_num=3, src_num=1, actual src=3 3840x2160 Encode End!width:3840 real 2m22.420s user 0m6.543s sys 0m28.102s |
这次甚至更慢,我觉得这应该是到存储瓶颈了。因为对于实时编码来说,这个文件所需的读取速度应该要超过600 MB/s。该系统通常会编码来自摄像头流的视频,而不是来自eMMC闪存的视频。我之前其实运行过iozone(文件系统测试工具)。如下所示:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
$ iozone -e -I -a -s 1000M -r 4k -r 16k -r 512k -r 1024k -r 16384k -i 0 -i 1 -i 2 Iozone: Performance Test of File I/O Version $Revision: 3.489 $ Compiled for 64 bit mode. Build: linux Output is in kBytes/sec Time Resolution = 0.000001 seconds. Processor cache size set to 1024 kBytes. Processor cache line size set to 32 bytes. File stride size set to 17 * record size. random random bkwd record stride kB reclen write rewrite read reread read write read rewrite read fwrite frewrite fread freread 1024000 4 42448 49401 33738 35273 30351 33959 1024000 16 95388 84746 83386 87949 78675 72818 1024000 512 109351 90438 166659 166804 144584 70463 1024000 1024 68088 98663 175108 174902 164769 58980 1024000 16384 71086 109715 178448 178144 182913 87181 iozone test complete. |
顺序读取速度大约是178MB/s。我有一个带有480GB SSD的MINIX USB集线器,于是我以400MB/s的速度进行了测试。这可能并不完全是我们需要的,但应该会有所改善。

比较遗憾的是,驱动器没有安装,即使使用fdisk和GParted等工具也无法识别。在仔细检查了Khadas VIM4规格后,我意识到USB Type-C端口是一个USB 2.0 OTG接口,应该是可以识别驱动器的,但只支持480 Mbps,所以无论如何这应该试验失败的原因之一。因为实现超过600MB / s的唯一方法是使用USB 3.0 NVMe SSD,但我没有。
因此,我需要制作一个大小约为2.9GB、时长5秒的4Kp50视频。
我第一次使用H.265运行,如下所示:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
$ time aml_enc_test 4Kp50-5s.nv12 dump4k-5s.h265 3840 2160 30 50 10000000 249 1 0 2 5 src_url is : 4Kp50-5s.nv12 ; out_url is : dump4k-5s.h265 ; width is : 3840 ; height is : 2160 ; gop is : 30 ; frmrate is : 50 ; bitrate is : 10000000 ; frm_num is : 249 ; fmt is : 1 ; buf_type is : 0 ; num_planes is : 2 ; codec is : 5 ; codec is H265 Set log level to 4 [initEncParams:177] enc_feature_opts is 0x0 , GopPresetis 0x0 [SetupEncoderOpenParam:513] GopPreset GOP format (2) period 30 LongTermRef 0 [vdi_sys_sync_inst_param:618] [VDI] fail to deliver sync instance param inst_idx=0 [AML_MultiEncInitialize:1378] VPU instance param sync with open param failed [SetSequenceInfo:979] Required buffer fb_num=3, src_num=1, actual src=3 3840x2160 Encode End!width:3840 real 0m6.905s user 0m0.661s sys 0m1.885s |
第二次运行,如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
$ time aml_enc_test 4Kp50-5s.nv12 dump4k-5s-2.h265 3840 2160 30 50 10000000 249 1 0 2 5 src_url is : 4Kp50-5s.nv12 ; out_url is : dump4k-5s-2.h265 ; width is : 3840 ; height is : 2160 ; gop is : 30 ; frmrate is : 50 ; bitrate is : 10000000 ; frm_num is : 249 ; fmt is : 1 ; buf_type is : 0 ; num_planes is : 2 ; codec is : 5 ; codec is H265 Set log level to 4 [initEncParams:177] enc_feature_opts is 0x0 , GopPresetis 0x0 [SetupEncoderOpenParam:513] GopPreset GOP format (2) period 30 LongTermRef 0 [vdi_sys_sync_inst_param:618] [VDI] fail to deliver sync instance param inst_idx=0 [AML_MultiEncInitialize:1378] VPU instance param sync with open param failed [SetSequenceInfo:979] Required buffer fb_num=3, src_num=1, actual src=3 3840x2160 Encode End!width:3840 real 0m6.828s user 0m0.663s sys 0m1.822s |
最后一次尝试使用了H.264:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
$ time aml_enc_test 4Kp50-5s.nv12 dump4k-5s.h264 3840 2160 30 50 5000000 249 1 0 2 4 src_url is : 4Kp50-5s.nv12 ; out_url is : dump4k-5s.h264 ; width is : 3840 ; height is : 2160 ; gop is : 30 ; frmrate is : 50 ; bitrate is : 5000000 ; frm_num is : 249 ; fmt is : 1 ; buf_type is : 0 ; num_planes is : 2 ; codec is : 4 ; codec is H264 Set log level to 4 [initEncParams:177] enc_feature_opts is 0x0 , GopPresetis 0x0 [SetupEncoderOpenParam:513] GopPreset GOP format (2) period 30 LongTermRef 0 [vdi_sys_sync_inst_param:618] [VDI] fail to deliver sync instance param inst_idx=0 [AML_MultiEncInitialize:1378] VPU instance param sync with open param failed [SetSequenceInfo:979] Required buffer fb_num=3, src_num=1, actual src=3 3840x2160 Encode End!width:3840 real 0m6.422s user 0m0.644s sys 0m1.879s |
虽然不是实时的,但结果越来越接近,这就意味着4Kp30应该是可行的。以下是使用H.264编码5秒4Kp30 NV12视频的结果:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
$ time aml_enc_test 4Kp30-5s.nv12 dump4kp30-5s.h264 3840 2160 30 30 5000000 150 1 0 2 4 src_url is : 4Kp30-5s.nv12 ; out_url is : dump4kp30-5s.h264 ; width is : 3840 ; height is : 2160 ; gop is : 30 ; frmrate is : 30 ; bitrate is : 5000000 ; frm_num is : 150 ; fmt is : 1 ; buf_type is : 0 ; num_planes is : 2 ; codec is : 4 ; codec is H264 Set log level to 4 [initEncParams:177] enc_feature_opts is 0x0 , GopPresetis 0x0 [SetupEncoderOpenParam:513] GopPreset GOP format (2) period 30 LongTermRef 0 [vdi_sys_sync_inst_param:618] [VDI] fail to deliver sync instance param inst_idx=0 [AML_MultiEncInitialize:1378] VPU instance param sync with open param failed [SetSequenceInfo:979] Required buffer fb_num=3, src_num=1, actual src=3 3840x2160 Encode End!width:3840 real 0m3.931s user 0m0.378s sys 0m1.161s |
这次的用时不到4秒。所以实时4Kp30 H.264的硬件视频编码肯定是可以在晶晨A311D2处理器上工作的。

看,在我的电脑上也可以正常播放。也可以将NV12 YUV图像编码为JPEG,但这一点不适用于khadas用户。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
$ jpeg_enc_test screenshot-1920x1080.nv12 dump.jpg 1920 1080 100 3 0 16 16 0 screenshot-1920x1080.nv12 dump.jpg src url: screenshot-1920x1080.nv12 out url: dump.jpg width : 1920 height : 1080 quality: 100 iformat: 3 oformat: 0 width alignment: 16 height alignment: 16 memory type: VMALLOC align: 1920->1920 align: 1080->1088 hw_encode open device fail, 13:Permission denied jpegenc_init failed |
对了,使用sudo命令确实也没什么问题。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
khadas@Khadas:~$ time sudo jpeg_enc_test screenshot-1920x1080.nv12 dump.jpg 1920 1080 100 3 0 16 16 0 screenshot-1920x1080.nv12 dump.jpg src url: screenshot-1920x1080.nv12 out url: dump.jpg width : 1920 height : 1080 quality: 100 iformat: 3 oformat: 0 width alignment: 16 height alignment: 16 memory type: VMALLOC align: 1920->1920 align: 1080->1088 mapped address is 0xffffb27f0000 hw_info->mmap_buff.size, 0x2300000, hw_info->input_buf.addr:0x0xffffb27f0000 hw_info->assit_buf.addr, 0x0xffffb466c000, hw_info->output_buf.addr:0x0xffffb46f0000 frame_size=3110400 rd_size=3110400, frame_size=3110400 offset=2088960 luma_stride=1920, h_stride=1088, hw_info->bpp=12 real 0m0.044s user 0m0.004s sys 0m0.009s |
所以,这可能只是一个简单的权限问题。在44 ms内就完成任务了,我也可以正常打开dump.jpg(屏幕截图)。

接着我使用ffmpeg将NV12文件转换为了jpeg,这个过程可能是因为使用了软件编码,因此只需要不到200 ms:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
khadas@Khadas:~$ time ffmpeg -pix_fmt nv12 -s 1920x1080 -i screenshot-1920x1080-nv12.yuv dump-ffmpeg.jpg ffmpeg version 4.4.1-3ubuntu5 Copyright (c) 2000-2021 the FFmpeg developers built with gcc 11 (Ubuntu 11.2.0-18ubuntu1) configuration: --prefix=/usr --extra-version=3ubuntu5 --toolchain=hardened --libdir=/usr/lib/aarch64-linux-gnu --incdir=/usr/include/aarch64-linux-gnu --arch=arm64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-pocketsphinx --enable-librsvg --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared libavutil 56. 70.100 / 56. 70.100 libavcodec 58.134.100 / 58.134.100 libavformat 58. 76.100 / 58. 76.100 libavdevice 58. 13.100 / 58. 13.100 libavfilter 7.110.100 / 7.110.100 libswscale 5. 9.100 / 5. 9.100 libswresample 3. 9.100 / 3. 9.100 libpostproc 55. 9.100 / 55. 9.100 [rawvideo @ 0xaaaad7c07100] Estimating duration from bitrate, this may be inaccurate Input #0, rawvideo, from 'screenshot-1920x1080-nv12.yuv': Duration: 00:00:00.04, start: 0.000000, bitrate: 622080 kb/s Stream #0:0: Video: rawvideo (NV12 / 0x3231564E), nv12, 1920x1080, 622080 kb/s, 25 tbr, 25 tbn, 25 tbc Stream mapping: Stream #0:0 -> #0:0 (rawvideo (native) -> mjpeg (native)) Press [q] to stop, [?] for help [swscaler @ 0xaaaad7c1d2a0] deprecated pixel format used, make sure you did set range correctly Output #0, image2, to 'dump-ffmpeg.jpg': Metadata: encoder : Lavf58.76.100 Stream #0:0: Video: mjpeg, yuvj420p(pc, progressive), 1920x1080, q=2-31, 200 kb/s, 25 fps, 25 tbn Metadata: encoder : Lavc58.134.100 mjpeg Side data: cpb: bitrate max/min/avg: 0/0/200000 buffer size: 0 vbv_delay: N/A frame= 1 fps=0.0 q=10.8 size=N/A time=00:00:00.04 bitrate=N/A speed=4e+04x frame= 1 fps=0.0 q=10.8 Lsize=N/A time=00:00:00.04 bitrate=N/A speed=0.514x video:123kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown real 0m0.190s user 0m0.150s sys 0m0.037s |
aml_enc_test和jpeg_enc_test其实都是很好用的小工具,可以用在晶晨A311D2上测试Linux中的硬件视频和图像编码。而且将其集成到应用程序中得到源代码也会更方便。但目前代码似乎没有开源,所以我认为它是晶晨SDK 的一部分。之后,我会向Khadas方面要源代码或者相关的获取方法。

文章翻译者:Jacob,嵌入式系统测试工程师、RAK高级工程师,物联网行业多年工作经验,熟悉嵌入式开发、测试各个环节,对不同产品有自己专业的分析与评估。