This is the best explanation of DF to date for me: (from Digital Trends)
Much like Apple’s Smart HDR, Deep Fusion relies on object and scene recognition, as well as a series of eight images captured before you click the shutter button.
Of the eight images,
four are taken with standard exposure, and
four with short exposure. A
ninth picture is then taken with a long exposure when the shutter button is triggered. The short exposure shots are meant to freeze time and bolster high-frequency details like grass blades or stubble on a person’s face. Therefore, the sharpest image of this series is chosen to move on to the next step.
Three of the standard-exposure shots which display the best color, tones, and other low-frequency data are then fused with the long-exposure frame to compose a single image. This image and the sharpest short-exposure frame are then sent through neural networks, which choose between these two images (my edit: 12 MP + 12 MP =
24 million pixels analysis) for the best pixel to ultimately represent this photo. This
pixel-by-pixel analysis enhances your images by ultimately
minimizing noise, sharpening details, and accurately coloring your photos, doing so on a very granular and intelligent level.
All of the post-shutter processing is done behind the scenes, so it
won’t impact your photo capture time. In other words, you can still snap back-to-back photos just as quickly as you ever could on the iPhone, and if they’re all using Deep Fusion, they’ll simply be queued up in the camera roll to be processed in order.