r2179
Minor asm changes
r2178
Add row-reencoding support to VBV for improved accuracy
Extremely accurate, possibly 100% so (I can't get it to fail even with difficult VBVs).
Does not yet support rows split on slice boundaries (occurs often with slice-max-size/mbs).
Still inaccurate with sliced threads, but better than before.
r2177
Abstract bitstream backup/restore functions
Required for row re-encoding.
r2176
Add an small per-MB cost penalty for lowres
Helps avoid VBV predictors going nuts with very low-cost MBs.
One particular case this fixes is zero-cost MBs: adaptive quantization decreases the QP a lot, but (before this patch), no cost penalty gets factored in for this, because anything times zero is zero.
r2175
Remove explicit run calculation from coeff_level_run
Not necessary with the CAVLC lookup table for zero run codes.
r2174
Export PSNR/SSIM in x264 API
r2173
x86inc: support yasm -f win64
Not necessary for x264, as -m amd64 already does the right thing, but used by external users of x86inc.
r2172
Fix incorrect zero-extension assumptions in x86_64 asm
Some x264 asm assumed that the high 32 bits of registers containing "int" values would be zero.
This is almost always the case, and it seems to work with gcc, but it is *not* guaranteed by the ABI.
As a result, it breaks with some other compilers, like Clang, that take advantage of this in optimizations.
Accordingly, fix all x86 code by using intptr_t instead of int or using movsxd where neccessary.
Also add checkasm hack to detect when assembly functions incorrectly assumes that 32-bit integers are zero-extended to 64-bit.
r2171
Fix possible alignment crash when linking from MSVC
x264_cavlc_init needs to be stack-aligned now.
r2170
Fix rare overflow in 10-bit intra_satd_x3_16x16 asm
r2169
ICL: fix out of tree building and resource file usage on Windows
r2168
Add error handling for out-of-tree build
r2167
Fix RGB colorspace input
BGR/BGRA input was correct.
r2166
Fix interlaced + extremal slice-max-size
Broke if the first macroblock in the slice exceeded the set slice-max-size.
r2165
Fix regression in r2141
Broke register preservation in x264_cpu_cpuid and x264_cpu_xgetbv.
Did not cause any problems.
r2164
TBM, AVX2, FMA3, BMI1, and BMI2 CPU detection support
TBM and BMI1 are supported by Trinity/Piledriver.
The others (and BMI1) will probably appear in Intel's upcoming Haswell.
Also update x86inc with AVX2 stuff.
r2163
x86inc: add TAIL_CALL macro to abstract a common asm idiom
r2162
Minor asm optimizations/cleanup
r2161
Clean up and optimize weightp, plus enable SSSE3 weight on SB/BDZ
Also remove unused AVX cruft.
r2160
XOP frame_init_lowres
Covers both 8-bit and 16-bit, ~5-10% faster on Bulldozer.
r2159
XOP 8x8 zigzags
Field: 35(mmx) ->16(xop) cycles
Frame: 32(ssse3)->20(xop) cycles
r2158
AVX 32-bit hpel_filter_h
Faster on Sandy Bridge.
Also add details on unsuccessful optimizations in these functions.
r2157
x86inc: add high halfword register support
Might be useful in a few cases.
r2156
Change %ifdef directives to %if directives in *.asm files
This allows combining multiple conditionals in a single statement.
r2155
Use TV range algorithm for bit-depth conversions
Such sources are more common, so better to be correct for the common case.
This also produces less error for the case of full range than the previous algorithm produced for the case of TV range.
r2154
Bump dates to 2012
r2153
Add Windows resource file
Displays version info in Windows Explorer.
r2152
Fix win32 pthread_cond_signal
Isn't used by x264 currently, so didn't cause a problem.
Fix backported from libav.
r2151
ARM: align asm functions to 4 bytes.
Some linkers apparently fail to correctly align ARM functions when mixing with Thumb code.
r2150
Fix normalization of colorspace when input is packed YUV 4:2:2
r2149
Force keyint-min 1 with Blu-ray
Fixes an issue with referencing across I-frames that's prohibited in Blu-ray for some godforsaken reason.
r2148
Fix crash in --demuxer y4m with unsupported colorspace
r2147
Fix overread/possible crash with intra refresh + VBV
r2146
Fix trellis 2 + subme >= 8
Trellis didn't return a boolean value as it was supposed to.
Regression in r2143-5.
r2145
CABAC trellis opts part 4: x86_64 asm
Another 20% faster.
18k->12k codesize.
This patch series may have a large impact on encoding speed.
For example, 24% faster at --preset slower --crf 23 with 720p parkjoy.
Overall speed increase is proportional to the cost of trellis (which is proportional to bitrate, and much more with --trellis 2).
r2144
CABAC trellis opts part 3: make some arrays non-static
r2143
CABAC trellis opts part 2: C optimizations
Hoist the branch on coef value out of the loop over node contexts.
Special cases for each possible coef value (0,1,n).
Special case for dc-only blocks.
Template the main loop for two common subsets of nodes, to avoid a bunch of branches about which nodes are live.
Use the nonupdating version of cabac_size_decision in more cases, and omit those bins from the node struct.
CABAC offsets are now compile-time constants.
Change TRELLIS_SCORE_MAX from a specific constant to anything negative, which is cheaper to test.
Remove dct_weight2_zigzag[], since trellis has to lookup zigzag[] anyway.
60% faster on x86_64.
25k->18k codesize.
r2142
CABAC trellis opts part 1: minor change in output
Due to different tie-break order.
r2141
x86inc improvements for 64-bit
Add support for all x86-64 registers
Prefer caller-saved register over callee-saved on WIN64
Support up to 15 function arguments
r2140
High bit depth SSE2/AVX add8x8_idct8 and add16x16_idct8
From Google Code-In.
r2139
MMX/SSE2/AVX predict_8x16_p, high bit depth fdct8
From Google Code-In.
r2138
XOP 8-bit fDCT
Use integer MAC for one of the SUMSUB passes. About a dozen cycles faster for 16x16.
r2137
High bit depth intra_sad_x3_4x4
From Google Code-In.
r2136
Use a large LUT for CAVLC zero-run bit codes
Helps the most with trellis and RD, but also helps with bitstream writing.
Seems at worst neutral even in the extreme case of a CPU with small L2 cache (e.g. ARM Cortex A8).
r2135
High bit depth intra_sad_x3_8x8, intra_satd_x3_4x4/8x8c/16x16
Also add an ACCUM macro to handle accumulator-induced add-or-swap more concisely.
r2134
MMX 10-bit predict_8x8c_h and predict_8x16c_h
From Google Code-In.
r2133
Some MBAFF x86 assembly functions.
deblock_chroma_420_mbaff, plus 422/422_intra_mbaff implemented using existing functions.
From Google Code-In.
r2132
More ARM NEON assembly functions
predict_8x8_v, predict_4x4_dc_top, predict_8x8_ddl, predict_8x8_ddr, predict_8x8_vl, predict_8x8_vr, predict_8x8_hd, predict_8x8_hu.
From Google Code-In.
r2131
More 4:2:2 asm functions
High bit depth version of deblock_h_chroma_422.
Regular and high bit depth versions of deblock_h_chroma_intra_422.
High bit depth pixel_vsad.
SSE2 high bit depth and MMX 8-bit predict_8x8_vl.
Our first GCI patch this year!
r2130
SSE2 and SSSE3 versions of sub8x16_dct_dc
Also slightly faster sub8x8_dct_dc
r2129
Resize filter updates
Use AVPixFmtDescriptors to pick the most compatible x264 csp for any pixel format.
Fix deprecated use of av_set_int.
Now requires libavutil >= 51.19.0
r2128
Add out-of-tree build support
r2127
Limit SSIM to 100db
Avoids floating point error for infinite SSIM (lossless).
r2126
Fix wrong conditional inclusion of inttypes.h
inttypes.h is required by encoder/ratecontrol.c for SCNxxx macros, and HAVE_STDINT_H does not imply having inttypes.h.
stdint.h is a subset of inttypes.h, but this isn't enough for x264.
This change fixes building x264 with Android's toolchain.
r2125
Fix crash with sliced threads and input height <= 112
r2124
Fix loading custom 8x8 chroma quant matrices in 4:4:4
r2123
Fix PCM cost overflow
r2122
Fix overflow in 8-bit x86 vsad asm function
r2121
Fix crash in --fullhelp when compiled against recent ffmpeg
Don't assume all pixel formats have a description.