wiki:Encode/AV1

AV1 Video Encoding Guide

AV1 is an open source & royalty-free video codec developed by the Alliance for Open Media (AOMedia), a non-profit industry consortium. Depending on the use case, AV1 can achieve about 30% higher compression efficiency than VP9, and about 50% higher efficiency than H.264.

There are currently three AV1 encoders supported by FFmpeg: libaom (invoked with libaom-av1 in FFmpeg), SVT-AV1 (libsvtav1), and rav1e (librav1e). This guide currently focuses on libaom and SVT-AV1.

libaom

libaom (libaom-av1) is the reference encoder for the AV1 format. It was also used for research during the development of AV1. libaom is based on libvpx and thus shares many of its characteristics in terms of features, performance, and usage.

To install FFmpeg with support for libaom-av1, look at the Compilation Guides and compile FFmpeg with the --enable-libaom option.

libaom offers the following rate-control modes which determine the quality and file size obtained:

  • Constant quality
  • Constrained quality
  • 2-pass average bitrate
  • 1-pass average bitrate

For a list of options, run ffmpeg -h encoder=libaom-av1 or check FFmpeg's online documentation. For options that can be passed via -aom-params, checking the --help output of the aomenc application is recommended, as there is currently no official online reference for them.

Note: Users of libaom older than version 2.0.0 will need to add -strict experimental (or the alias -strict -2).

Constant Quality

libaom-av1 has a constant quality (CQ) mode (like CRF in x264 and x265) which will ensure that every frame gets the number of bits it deserves to achieve a certain (perceptual) quality level, rather than encoding each frame to meet a bit rate target. This results in better overall quality. If you do not need to achieve a fixed target file size, this should be your method of choice.

To trigger this mode, simply use the -crf switch along with the desired numerical value.

ffmpeg -i input.mp4 -c:v libaom-av1 -crf 30 av1_test.mkv

The CRF value can be from 0–63. Lower values mean better quality and greater file size. 0 means lossless. A CRF value of 23 yields a quality level corresponding to CRF 19 for x264 (source), which would be considered visually lossless.

Note that in FFmpeg versions prior to 4.3, triggering the CRF mode also requires setting the bitrate to 0 with -b:v 0. If this is not done, the -crf switch triggers the constrained quality mode with a default bitrate of 256kbps.

Constrained Quality

libaom-av1 also has a constrained quality (CQ) mode that will ensure that a constant (perceptual) quality is reached while keeping the bitrate below a specified upper bound or within a certain bound. This method is useful for bulk encoding videos in a generally consistent fashion.

ffmpeg -i input.mp4 -c:v libaom-av1 -crf 30 -b:v 2000k output.mkv

The quality is determined by the -crf, and the bitrate limit by the -b:v where the bitrate MUST be non-zero.

You can also specify a minimum and maximum bitrate instead of a quality target:

ffmpeg -i input.mp4 -c:v libaom-av1 -minrate 500k -b:v 2000k -maxrate 2500k output.mp4

Note: When muxing into MP4, you may want to add -movflags +faststart to the output parameters if the intended use for the resulting file is streaming.

Two-Pass

In order to create more efficient encodes when a particular target bitrate should be reached, you should choose two-pass encoding. Two-pass encoding is also beneficial for encoding efficiency when constant quality is used without a target bitrate. For two-pass, you need to run ffmpeg twice, with almost the same settings, except for:

  • In pass 1 and 2, use the -pass 1 and -pass 2 options, respectively.
  • In pass 1, output to a null file descriptor, not an actual file. (This will generate a logfile that FFmpeg needs for the second pass.)
  • In pass 1, you can leave audio out by specifying -an.
ffmpeg -i input.mp4 -c:v libaom-av1 -b:v 2M -pass 1 -an -f null /dev/null && \
ffmpeg -i input.mp4 -c:v libaom-av1 -b:v 2M -pass 2 -c:a libopus output.mkv

Note: Windows users should use NUL instead of /dev/null and ^ instead of \.

Average Bitrate (ABR)

libaom-av1 also offers a simple "Average Bitrate" or "Target Bitrate" mode. In this mode, it will simply try to reach the specified bit rate on average, e.g. 2 MBit/s.

ffmpeg -i input.mp4 -c:v libaom-av1 -b:v 2M output.mkv

Use this option only if file size and encoding time are more important factors than quality alone. Otherwise, use one of the other rate control methods described above.

Controlling Speed / Quality

-cpu-used sets how efficient the compression will be. Default is 1. Lower values mean slower encoding with better quality, and vice-versa. Valid values are from 0 to 8 inclusive.

-row-mt 1 enables row-based multi-threading which maximizes CPU usage. To enable fast decoding performance, also add tiles (i.e. -tiles 4x1 or -tiles 2x2 for 4 tiles). Enabling row-mt is only faster when the CPU has more threads than the number of encoded tiles.

-usage realtime activates the realtime mode, meant for live encoding use cases (livestreaming, videoconferencing, etc). -cpu-used values between 7-10 are only available in the realtime mode (though due to a bug in FFmpeg, presets higher than 8 cannot be used via FFmpeg).

Keyframe placement

By default, libaom's maximum keyframe interval is 9999 frames. This can lead to slow seeking, especially with content that has few or infrequent scene changes.

The -g option can be used to set the maximum keyframe interval. Anything up to 10 seconds is considered reasonable for most content, so for 30 frames per second content one would use -g 300, for 60 fps content -g 600, etc.

To set a fixed keyframe interval, set both -g and -keyint_min to the same value. Note that currently -keyint_min is ignored unless it's the same as -g, so the minimum keyframe interval can't be set on its own.

For intra-only output, use -g 0.

HDR and high bit depth

When encoding in HDR it's necessary to pass through color information; -colorspace, -color_trc and -color_primaries. For example, Youtube HDR uses

-colorspace bt2020nc -color_trc smpte2084 -color_primaries bt2020

AV1 includes 10-bit support in its Main profile. Thus content can be encoded in 10-bit without having to worry about incompatible hardware decoders.

To utilize 10-bit in the Main profile, use -pix_fmt yuv420p10le. For 10-bit with 4:4:4 chroma subsampling (requires the High profile), use -pix_fmt yuv444p10le. 12-bit is also supported, but requires the Professional profile. See ffmpeg -help encoder=libaom-av1 for the supported pixel formats.

Lossless encoding

Use -crf 0 for lossless encoding. Because of a bug present in FFmpeg versions prior to 4.4, the first frame will not be losslessly preserved (the issue was fixed on March 21, 2021). As a workaround on pre-4.4 versions one may use -aom-params lossless=1 for lossless output.

SVT-AV1

SVT-AV1 (libsvtav1) is an encoder originally developed by Intel in collaboration with Netflix. In 2020, SVT-AV1 was adopted by AOMedia as the basis for the future development of AV1 as well as future codec efforts. The encoder supports a wide range of speed-efficiency tradeoffs and scales fairly well across many CPU cores.

To enable support, FFmpeg needs to be built with --enable-libsvtav1. For options available in your specific build of FFmpeg, see ffmpeg -help encoder=libsvtav1. See also FFmpeg documentation, the upstream encoder user guide and list of all parameters.

Many options are passed to the encoder with -svtav1-params. This was introduced in SVT-AV1 0.9.1 and has been supported since FFmpeg 5.1.

CRF is the default rate control method, but VBR and CBR are also available.

CRF

Much like CRF in x264 and x265, this rate control method tries to ensure that every frame gets the number of bits it deserves to achieve a certain (perceptual) quality level.

For example:

ffmpeg -i input.mp4 -c:v libsvtav1 -crf 35 svtav1_test.mp4

Note that the -crf option is only supported in FFmpeg git builds since 2022-02-24. In versions prior to this, the CRF value is set with -qp.

The valid CRF value range is 0-63, with the default being 50. Lower values correspond to higher quality and greater file size. Lossless encoding is currently not supported.

Presets and tunes

The trade-off between encoding speed and compression efficiency is managed with the -preset option. Since SVT-AV1 0.9.0, supported presets range from 0 to 13, with higher numbers providing a higher encoding speed.

Note that preset 13 is only meant for debugging and running fast convex-hull encoding. In versions prior to 0.9.0, valid presets are 0 to 8.

As an example, this command encodes a video using preset 8 and a CRF of 35 while copying the audio:

ffmpeg -i input.mp4 -c:a copy -c:v libsvtav1 -preset 8 -crf 35 svtav1_test.mp4

Since SVT-AV1 0.9.1, the encoder also supports tuning for visual quality (sharpness). This is invoked with -svtav1-params tune=0. The default value is 1, which tunes the encoder for PSNR.

Also supported since 0.9.1 is tuning the encoder to produce bitstreams that are faster (less CPU intensive) to decode, similar to the fastdecode tune in x264 and x265. Since SVT-AV1 1.0.0, this feature is invoked with -svtav1-params fast-decode=1.

In 0.9.1, the option accepts an integer from 1 to 3, with higher numbers resulting in easier-to-decode video. In 0.9.1, decoder tuning is only supported for presets from 5 to 10, and the level of decoder tuning varies between presets.

Keyframe placement

By default, SVT-AV1's keyframe interval is 2-3 seconds, which is quite short for most use cases. Consider changing this up to 5 seconds (or higher) with the -g option (or keyint in svtav1-params); -g 120 for 24 fps content, -g 150 for 30 fps, etc.

Note that as of version 1.2.1, SVT-AV1 does not support inserting keyframes at scene changes. Instead, keyframes are placed at set intervals. In SVT-AV1 0.9.1 and prior, the functionality was present but considered to be in a suboptimal state and was disabled by default.

Film grain synthesis

SVT-AV1 supports film grain synthesis, an AV1 feature for preserving the look of grainy video while spending very little bitrate to do so. The grain is removed from the image with denoising, its look is approximated and synthesized, and then added on top of the video at decode-time as a filter.

The film grain synthesis feature is invoked with -svtav1-params film-grain=X, where X is an integer from 1 to 50. Higher numbers correspond to higher levels of denoising for the grain synthesis process and thus a higher amount of grain.

The grain denoising process can remove detail as well, especially at the high values that are required to preserve the look of very grainy films. This can be mitigated with the film-grain-denoise=0 option, passed via svtav1-params. While by default the denoised frames are passed on to be encoded as the final pictures (film-grain-denoise=1), turning this off will lead to the original frames to be used instead.

rav1e

librav1e is the Xiph encoder for AV1. Compile with --enable-librav1e. See FFmpeg doc and upstream CLI options.

Rav1e claims to be the fastest software AV1 encoder, but that really depends on the setting.

Additional Resources

Last modified 4 weeks ago Last modified on May 12, 2023, 3:06:36 PM
Note: See TracWiki for help on using the wiki.