JPEG artifact removal with Convolutional Neural Network (AI)

Uncategorized

JPEG images, at high compression ratios, often have blocky artefacts which look particularly unpleasant. JPEG images look particularly bad around the sharp edges of objects.

There are already ways to reduce the artefacts, however, these methods often don’t use very sophisticated techniques, only using the information from adjacent pixels to smooth out the actual transition between the blocks. An example of this can be found in Kitbi and Dawood’s work https://ieeexplore.ieee.org/document/4743789/ which gave me the original inspiration for this.

An alternative way is to use a convolutional neural network (CNN) to intelligently estimate an optimal block given the original block and the eight surrounding blocks. And then tile over the image to estimate the optimal mapping for each block.

The network design was a 5 layer fully convolutional one, using only small filters. Several different architectures were used, which all gave largely similar results. The compromise between effectiveness and speed (i.e. 1/size) was found with a small network with only 57,987 parameters. Training the network was surprisingly fast, taking only a few hours without a GPU.

The network takes in full colour information and outputs the full colour too. The reason for this is to use all of the possible information. The colour channels are highly correlated with one another. It would be possible to train the network on monochrome images, but that would lose the relation which naturally exists.

So, does it work?

In my opinion, yes it does work. I think my method performs best when there are complicated edges, such as around glasses, or on hairs which are resolved as partial blocks. It works least well, in comparison to the method which photoshop currently uses, when there are large smooth areas.

Text was not present in the training data set. So the poor effectiveness of the network on text is not a significant point of comparison. The above images are quite small. A larger example is below, along with a zoomed in video comparing the original, JPEG, photoshop artefact removal, and my method.

I did calculate root mean squared error values one point comparing the network performance with the JPEG image to the original. In some cases the network was reliably out performing the JPEG – which is impressive, but not too surprising, as this was how it was trained. I don’t think that those sorts of values are too important in this case. The aim isn’t to restore the original image, but to instead reduce the visual annoyance of the artefacts.

If you really wanted to reduce the annoyance of the artefacts you should use JPEG2000 or else have a read of this paper https://arxiv.org/pdf/1611.01704.pdf

Linify: photographs to CNC etch-a-sketch paths

Projects

Linify: def. to turn something into a line.

I saw someone the other day using a variable density Hilbert curve to transform an image so as to be projectable using a laser and galvo mirror setup, this was on the facebook page Aesthetic Function Graphposting.

That kinda stayed in the back of my mind until I saw This Old Tony’s youtube video about making a (computer numeric controlled) CNC etch-a-sketch. The hardware looked great, but he was just using lame line sketches of outlines, the kind of limited images which you’d expect from an etch-a-sketch.

So, I put together a lazy easy way of turning a photo into a single line. Here is an example of my face. If you move quite far back from the image, it looks pretty good.

It uses a variable amplitude sinewave patches, which have a whole number of periods. Sine waves are used due to their simplicity and their periodicity. The same effect could be achieved with other periodic waves, in fact, square waves may be more efficient for CNC applications, as fewer commands would need to be issued to the orthogonal control axes.

The image is first downsampled to a resolution around 64 px on an edge. Then, for each pixel in the downsampled image, an approximate sinewave is found. Blocks of sine waves are added to a chain, which runs over the whole image. Currently, it raster scans, but it would be pretty easy to go R->L on even and L->R on odd. To improve the contrast, the downscaled image is contrast stretched to the 0-1 interval, and then each intensity value is raised to the (default) power of 1.5, this gives a little more contrast in the lighter regions and compensates somewhat for the low background level and the non-linear darkening with greater amplitude. This could be tuned better. 

There are several factors which alter the contrast. The cell size, frequency, number of rendered points, linewidth, and also the gamma-like factor applied to the image before the linify. Images were then optimised by adjusting the gamma-like factor. Optimal values were found between ~0.75 and ~1.65. Higher values were better for darker images. Successive parabolic interpolation (SPI) was used, four values were selected at random, and their error with respect to the original image was found. These values were then used to fit a parabola, and the minima of the parabola were used as a new point. This process was iterated with the four best values being used for the fit. This process can be seen in the figures below. In the first figure, four points (the blue stars) are found. The first parabola is fitted, and the red point is the predicted best parameter value. In the second figure, the blue star shows the true value of the metric we are trying to minimise, it is slightly different than predicted. A new estimated best parameter value (green) is found. And so on. To ensure that the parabola is only used near the minima of the function, the points farthest from the minima are discarded. Typically, only three points are used, my implementation uses four for robustness, which is helpful as the curve is non-analytic and non-smooth.

A few starting locations were checked to ensure that the minimum found was global. This kind of root finding, SPI, is very simple and commonly used. It converges faster than a basic line search (about 33% faster) but does not always converge to the local extremum. Parabola are used, as the region around any minima or maxima of any function can be approximated by a parabola, which can be observed by the Taylor expansion about an extremum.

Whilst we have a non-linear intensity response and some artefacts from the process, it is much easier to get this kind of process to well represent real images than the skittliser process, as we have a wide range of possible intensity levels.

Of course, one of the issues with using this method on something like an etch-a-sketch is the extremely long paths taken without any kind of reference positions. Modifications could be made internally to an etch-a-sketch to have homing buttons which would click at certain x or y positions, thus giving the control system a method of resetting itself at least after every line. A much more difficult, but potentially interesting closed-loop control system would be using information from a video feed pointed at the etch-a-sketch. Taking the difference of successive frames would likely be a good indication of the tip location.   

Finally, here is a landscape photograph of a body of water at sunrise. Just imagine it in dusky pink. 

Skittleiser

Projects

In this short project, which I finished in December of 2017, I took colour photos and represented them using images of skittles. To start off, I found photographs of skittles on the internet and segmented out the skittles manually. I was going to use my own images, but there are some flavours of skittles, which I couldn’t find at the time, and I wanted to get started right away. Since then, I have produced a library of my own skittle photos. 

There were I found eight skittle colours in total: blue, green, orange, pink, purple, white, yellow, and red. The purple skittles are very dark and look like black in contrast to the other colours. In a short python programme, I extracted small chips of an input photo and compared it to the images of the skittles. The programme then rotated the skittle image through 90, 180, and 270 degrees to see which was were best fit for the photo chip. These skittle images were then saved.

I quickly swapped to using my own skittle images, where possible. On a sheet of white paper and with a flash I took some photos of the contents of a small packet of skittles. For some reason, after each shot, the number of remaining skittles decreased. In another short python script, I used the open computer vision library to segment out the skittles and save them. I used a few steps to do this. I collapsed the images to greyscale and then blurred it and thresholded the image to produce a black and white image. This created white and black regions which corresponded to the location of the skittles, but also found regions in the background made up of noise. I then used two morphological operations, an erosion followed by a dilation. Erosion makes make black regions smaller by a set amount, and dilation makes those regions larger again. This is very useful for removing small regions in image processing tasks, like noise. Noise regions are often small enough to be removed completely when eroded, and the larger skittles are only reduced in size. I then found the location of the edges of the black regions and used this to cut out the skittles.

So, how delicious would portraits made out of skittles be? Well, I don’t know, as I never made any out of actual skittles. To get a decent, recognisable image you need about 10 kS (that’s kiloSkittles), or around 100 skittles on the short edge. Since skittles are about 1cm wide, it would be quite a large structure. On top of that, skittles also weigh about a gram, which would mean you’d need ~10kg of skittles, and at a penny a sweet you’d spend £100. That assumes that you can bulk buy the correct colours, which isn’t the case, most of the images I produced are mostly purple or yellow, with very little green. 

Bulk buying individual colours would make it feasible, but still not trivial. And then there’s the placing of >10 kS, although I can see that being quite therapeutic. I assumed that setting them in resin would be sufficient to preserve them, their high sugar content and a lack of oxygen should probably stop them from changing much over months, but I don’t know what might happen over a period of years.

If I ever revisit the idea I’ll add a hexagonal stacking mode, as well as the option to include dithering for greater colour accuracy. I have already added functionality for rectangular images of skittles from the side, to increase the resolution of the image in one direction. This worked, but the results weren’t as pleasing. I imagine, with the skittles on their sides, it would be that much harder to assemble. 

Dithering is an interesting concept. This medium renders images in an unusual way. If we forget about photons, the intensity of light on a given area is a continuous property, it could be twice as much as another area, or half as much, or any fraction. Digital cameras quantise this level when the analogue signal (a voltage) is converted to a digital level in a range. On dSLRs this is range is typically 12-14 bits or between 4096 and 16,384 levels. This is often then down-converted to 8 bits (256 levels) if the image is saved as a jpg, but the full range is sometimes kept with other file types. 

256 intensity levels for each colour channel is sufficient for viewing or printing images. The ability of the eye to differentiate between intensity levels is dependent on a large number of factors, including the absolute intensity of the source, as well as the colour. Colour photos allow the representation of a large number of colours and shades, typically 16.8 million. These colours can be represented in Hue, Saturation, Value or HSV space. HSV breaks down a pixel into its ‘value’, the brightness of the pixel, ‘hue’ the pure colour of the pixel, and ‘saturation’ which is how vivid the colour is, or how much of the pure colour is mixed with white. Now, our skittles well sample the different hues which are in photos, but they do not well sample any of the other dimensions of HSV space. Purple skittles are the only dark skittles, and the rest of them are all highly saturated bright colours, save for white, which is still bright. Because of this, many images cannot be well represented by skittles (I know, who’d have thought that?).

This is where dithering comes in. Dithering, in this case, trades of spatial resolution for better colour representation. A small image patch is considered, and the average colour in that patch is matched to several skittles. A light pink area, without dithering, would be represented as (say) four white skittles. But with dithering, one or two of the skittles could be swapped to red or pink. When viewed from a distance this would look much like a lighter pink. This would allow for areas of tone which are all similar to skittle colours to show some detail, and also allow for more accurate representation of the colour of image regions which are not well represented by skittle colours. 

Dithering does have some disadvantages. You lose spatial resolution, as boundaries between colours are blurred to intermediate values. It also doesn’t make ‘sense’ when viewed close up. A flat light pink region of an image may have a bright red skittle in the middle of it, and it might not be obvious from the original image why it is there.

I quite like them, anyway.

A landscape photograph overlooking a lake
A portrait of myself.
A board of the segmented skittles.