H.264 decoding on the graphics card can be done with vdpau or vaapi. The former is Nvidia specific and libavcodec can use it for H.264. The latter is vendor independent (it can use vdpau as backend on nvidia cards) but H.264 decoding with vaapi is not supported by ffmpeg yet.
In principle I prefer vendor independent solutions, but since I need H.264 support and ATI cards suck anyway on Linux, I tried VDPAU first.
The implementation in my libavcodec video frontend was straightforward after studying the MPlayer source. The VDPAU codecs are completely separated from the other codecs. They can simply be selected e.g. with
avcodec_find_decoder_by_name("h264_vdpau"). Then, one must supply callback functions for
draw_horiz_band. That's because the rendering targets are no longer frames in memory but rather handles of data-structures on the GPU. See here and here to see the details.
After the decoding, the image data is copied to memory by calling
VdpVideoSurfaceGetBitsYCbCr. This brings of course a severe slowdown. A much better way would be to keep the frames in graphics memory as long as possible. But this needs to be done in a much more generic way: Images can be VDPAU or VAAPI video surfaces, OpenGL textures or whatever. Implementing generic support for video frames, which are not in regular RAM, will be another project.