Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
A METHOD FOR DETECTION OF IMPERFECTIONS IN PRODUCTS
Document Type and Number:
WIPO Patent Application WO/2021/137745
Kind Code:
A1
Abstract:
A neural network model and method is disclosed. The method uses an autoencoder-type neural network that outputs probability density functions to improve reconstruction network, anomaly detection robustness, and in order to simplify training. The method comprises training the network on images (1) of imperfection-free products, and a probability density function (2) is generated to pixels in training mode, predicting the likelihood for specific pixel values based on the set of training images. In inference mode, the network generates a probability density function (5) to pixels in decoder representations of production product images, estimating the likelihood for pixel values in the decoder representation and rejecting as imperfection a pixel or region of pixels that displays an unpredicted probability.

Inventors:
FLORDAL OSKAR (SE)
BÄCKSTRÖM NILS (SE)
ÄRLEMALM FILIP (SE)
Application Number:
PCT/SE2020/051260
Publication Date:
July 08, 2021
Filing Date:
December 23, 2020
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
UNIBAP AB (SE)
International Classes:
G06T7/00; G06V10/70; G06V10/772; G06V30/194
Domestic Patent References:
WO2019109524A12019-06-13
Foreign References:
US20180025257A12018-01-25
US20030194124A12003-10-16
Other References:
LIU KUN ET AL: "Steel Surface Defect Detection Using GAN and One-Class Classifier", 2019 25TH INTERNATIONAL CONFERENCE ON AUTOMATION AND COMPUTING (ICAC), CHINESE AUTOMATION AND COMPUTING SOCIETY IN THE UK - CACSUK, 5 September 2019 (2019-09-05), pages 1 - 6, XP033649500, DOI: 10.23919/ICONAC.2019.8895110
AARON VAN DEN OORD ET AL.: "Conditional Image Generation with PixeICNN", 3C?H CONFERENCE ON NEURAL INFORMATION PROCESSING SYSTEMS (NIPS),, 2016
Attorney, Agent or Firm:
PATENTFIRMAN HENRIK FRANSSON AB (SE)
Download PDF:
Claims:
CLAIMS

1. A method for detection of imperfections in products using image analysis on digital representations of the products, wherein images of the products are captured through digital camera means and imported to data processing means running a computer-executable artificial neural network comprising an encoder-decoder network part, the method comprising: a) training the network on images of imperfection-free products, wherein training comprises predicting the probabilities for specific values of a property pertaining to a pixel or pixels in the image, based on a set of images of imperfection-free products, b) feeding production product images through the encoder-decoder part of the network, c) estimating the probabilities for sampled values of said property pertaining to said pixel or pixels in decoder representations of the production product images, d) setting a threshold in the probability estimates based on previous runs of production product images through the encoder-decoder part of the network, wherein the threshold separates probabilities of anomalous pixel values from probabilities of non-anomalous pixel values, and determining as imperfection a pixel or pixels for which the estimated probability falls on the anomaly side of the threshold.

2. The method of claim 1, further comprising:

- applying a normal distribution of actual pixel values sampled from training images,

- forming a Gaussian curve around actual pixel values, towards which the network is being trained,

- feeding training images through the encoder-decoder part of the network, and

- calculating mean squared error loss between the Gaussian curves and pixel values sampled in decoder representations of the training images.

3. The method of claim 1 or 2, comprising the step of setting a threshold between zero and one, i.e., between unlikely (0) and most likely (1), in the probability estimate, and rejecting as imperfection a pixel or pixels the estimated probability of which falls below the threshold.

4. The method of claim 3, comprising the step of isolating the product from the environment through masking by setting the value of pixels outside the product to zero (0) in the image.

5. The method of any of claims 1-4, wherein training the network comprises:

- preparing a set of gray-scale or single-color images without anomalies,

- setting up a neural network comprising an encoder-decoder network part that decodes an image to a probability density function by adding output filters to the decoder of the network,

- for each training iteration, generating a target probability density function by forming a Gaussian curve around the actual pixel value for each pixel in the image,

- training the network by running a forward pass of images through the encoder-decoder part of the network and calculating mean squared error loss between the reconstructed decoder representations and the target probability density function,

- update network and repeat the previous steps.

6. The method of claim 5, comprising:

- running the trained network on an image that may contain anomalies,

- estimating the probabilities for pixel values sampled from pixels in the reconstructed decoder representation of that image,

- setting a threshold value between zero (0) and one (1) on a probability scale to determine if an estimated probability indicates an anomaly: a probability below the threshold is marked as an anomalous pixel,

- making an erosion on anomalous pixels,

- applying a dilation function to eliminate the smallest clusters of anomalous pixels and outliers, - evaluating each cluster of connected anomalous pixels: if the number of pixels in any such cluster is larger than a preset threshold, that cluster is marked as an imperfection.

7. The method of claim 1, further comprising:

- applying a degenerate distribution to actual pixel values sampled from training images, and

- training the network to reconstruct that distribution by calculating KL- divergence loss against the reconstructed decoder representations.

8. The method of claim 1, further comprising:

- calculating a probability mass function of a region of actual pixel values around each target pixel, and

- training the network to reconstruct that distribution by calculating KL- divergence loss against the reconstructed decoder representations.

9. The method of claim 7 or 8, wherein training the network comprises:

- preparing a set of gray-scale or single-color images without anomalies,

- setting up a neural network comprising an encoder-decoder network part that decodes an image to a probability density function by adding output filters to the decoder of the network,

- for each training iteration, generating a target probability density function by applying a degenerate distribution to actual pixel values for each pixel in the image, or by calculating a probability mass function of a region of actual pixel values around each target pixel,

- training the network by running a forward pass of images through the encoder-decoder part of the network and calculating KL-divergence loss between the reconstructed decoder representations and the target probability density function,

- update network and repeat the previous steps.

10. The method of claim 8, comprising:

- running the trained network on an image that may contain anomalies, - estimating the probabilities for pixel values sampled from pixels in the reconstructed decoder representation of that image,

- setting a threshold value on a probability scale to determine if an estimated probability indicates an anomaly,

- making an erosion on anomalous pixels,

- applying a dilation function to eliminate the smallest clusters of anomalous pixels and outliers,

- evaluating each cluster of connected anomalous pixels: if the number of pixels in any such cluster is larger than a preset threshold, that cluster is marked as an imperfection.

11. The method of any previous claim, comprising the step of filtering pixel values sampled in decoder representations of production product images through a static filter which denotes images having clusters of connected anomalous pixels, and feeding these images or pixel clusters through a neural network trained to determine whether or not these clusters are imperfections.

12. The method of claim 11, comprising the step of determining the least number of connected anomalous pixels in a cluster of pixels that is required to be determined as an imperfection.

13. The method of any previous claim, comprising the step of feeding the reconstructed decoder representation to a post-operated neural network which is trained to determine the significance of an imperfection by comparison with a map of known imperfections.

14. The method of any previous claim, comprising the step of feeding the reconstructed decoder representation to a post-operated, object-detecting neural network which is trained to select regions of interest in the production product images.

15. The method of any previous claim, comprising the step of feeding anomalous images in the reconstructed decoder representation to a post-operated classifier neural network which is trained per product type and material to label and separate imperfections into different classes.

16. The method of claim 15, wherein training the classifier neural network includes feedback and labelling of previously unknown imperfections detected during running of the network on production product images.

17. The method of any previous claim, wherein the property pertaining to pixels in the image is pixel intensity.

18. The method of any previous claim, comprising repeating the method steps a), b), c), d) for each colour red, green and blue in a multicolour image.

19. The method of any previous claim, comprising repeating the steps a), b), c) and d) for each type of texture included in a set of textures previously recognized by a pre-operated neural network.

20. The method of claims 18 and 19, comprising the combination of method steps in claims 18 and 19.

21. A computer program product storable on a computer usable medium containing instructions for a data processing means to execute the method of any of claims 1- 20.

22. The computer program product of claim 21 provided at least in part over a data transfer network, such as Ethernet or Internet.

23. The computer program product of claim 21 or 22, installed to run on a computer operated in a physical inspection cell for a production line.

24. A computer readable medium, characterized in that it contains a computer program product according to any of claims 21-23.

Description:
A method for detection of imperfections in products

TECHNICAL FIELD OF THE INVENTION

The present invention relates to a method for detection of imperfections in products using image analysis on digital representations of the products, wherein images of the products are captured through digital camera means and imported to data processing means running a computer executable artificial neural network.

BACKGROUND AND PRIOR ART

Surface defect detection is the process of finding manufacturing defects that affect the quality of a produced item either visually or functionally. At a typical factory this is done at the end or sometimes in the middle of production to remove or rework defect goods before it reaches the next production steps or the end customer. When considering that defects can come in many shapes and forms, this is a domain which has traditionally involved large amounts of human labor. Recent developments with machine learning have improved the ability of computers to solve these problems. However, this has primarily been done by trained methods that learn to detect certain types of defects using manually collected training data, or in some cases by using more primitive versions of the algorithm below.

A typical running system comprises:

• cameras and carefully placed lighting which are used to collect data from surfaces that require inspection

• images are analyzed by the algorithm and are classified as either defect or not, usually with meta data such as type of defect and position

• based on the feedback from the algorithm, the items are either manually or automatically removed from the production line.

An alternative method is letting an operator collecting images of defect items, labeling the whole image or regions of the image as defective and training an object detector or a classifier neural network to detect the errors. The backside of this process is that

• it is time-consuming in that a lot of defects need to be found for training • it does not always produce the required results

• it does not necessarily generalize to new types of defects that were not seen before, including catastrophic errors such as a portion of the object missing.

A recent approach is to rely on an autoencoder trained either through a GAN (Generative Adversarial Network) or through other measures to reconstruct the input image. An autoencoder is a network that typically takes a representation of for example an image, then learns to encode this image into a compact representation (for example 128 numbers) and then learns to decode these numbers into either the same form or a similar form as the input but without noise etc. Autoencoders are forced to find a more efficient representation of the image. They are typically easy to train since in the basic case, where we simply try to make the input similar to the output, the data trained on does not need to be labeled.

The autoencoder is only trained on objects without defects, with the assumption that when the autoencoder is fed a defect image it will only be able to reconstruct a non-defect version of the input object. In other words, the autoencoder will find a way to describe an ideal object as precisely as possible, which does not leave any room in its internal representation to describe defects. By running the autoencoder on an image, comparing the reconstructed image and the input image, we reveal what parts were defect for example by comparing the distance between individual pixels or using another similarity metrics like SSIM or PSNR (Structural Similarity Index Metric or Peak Signal to Noise Ratio). While flexible and only requiring good samples, the downside of this method is that it is less robust to areas of the image which are inherently chaotic, such as cut areas of metal objects, and these methods can also be complex to train to a sufficient degree.

There is a wealth of other handcrafted methods that try to find defects using classic vision where particular features are compared between images. The art of detecting defects from images is very central to the art of machine vision. However, most of the methods tend to be very specific to a specific material and camera setup, and will not handle more complex and flexible defect setups.

In the prior art, generative neural networks of various shapes are used to generate and manipulate images of different classes. One such method is the conditional image generation with PixeICNN decoders where, pixel by pixel, a convolutional neural network model is trained to understand, that depending on the closest neighbors above and to the left what pixel intensities are likely to come next (see appended Fig. 1). By randomizing in the distribution of the likely pixels one at a time, new plausible images can be generated. The purpose of these methods is to generate images rather than detecting defects, so these techniques stop at describing how to generate an image iteratively from top to the bottom of the image.

The present invention is based on the observation that a conventional reconstructing neural network or classical autoencoder does not handle chaotic areas well and will always contain a bit of uncertainty. The problem is that an image of a manufactured item typically has a few different sources of what is essentially chaotic noise:

• camera introduced noise, either structural or random noise like photon shot noise will, depending on the camera, make each intensity on the actual object be represented within a random range after the camera

• texture on an object such as grinding patterns, micro facets, or natural random variations in the material.

Additional to this, when we have a statistical model, there is bound to be some uncertainty in the model for practical reasons. These sources are essentially impossible to describe in a compact form since they behave as noise.

Together with these essentially random factors there are factors concerning the object itself and how the image is taken such as:

• variations in how the object is placed in front of the camera, large variation if the object is dangling from a conveyor, smaller differences if it is held by a robot or in a fixture

• tolerances for size of the object where different objects will have small size variations

• with respect to noise there can be allowed variations in color due to the painting process of an object, etc.

A classical autoencoder will be better at handling the second set of differences by, e.g., learning a representation for the object pose or the acceptable variations as well as color shifts of the object. A classical autoencoder has however no way to encode random noise, and it has also difficulties in describing the pixel perfect position of edges and specular highlights in the material, that may depend on allowed micro variations since the model ultimately becomes very complex.

SUMMARY OF THE INVENTION

An object for the present invention is to provide a neural network model and method for automatic detection of imperfections in products which avoids the shortcomings of previous models or methods.

Another object of the present invention is to provide a neural network model and method for automatic detection of imperfections in products which can handle chaotic areas in the image with a high degree of certainty.

Still another object of the present invention is to provide a neural network model and method for automatic detection of imperfections in products which is designed for simplified unsupervised learning.

One or several of these objects are met in a neural network model and method for detection of imperfections in products using image analysis on digital representations of the products, wherein images of the products are captured through digital camera means and imported to data processing means running a computer executable artificial neural network of encoder-decoder type.

The method comprises

- training the network on images of imperfection-free products, wherein training comprises predicting the probabilities for specific values of a property pertaining to a pixel or pixels in the image, based on a set of images of imperfection-free products,

- feeding production product images through the encoder-decoder part of the network,

- estimating the probabilities for sampled values of said property pertaining to said pixel or pixels in decoder representations of the production product images,

- setting a threshold in the probability estimates based on previous runs of production product images through the encoder-decoder part of the network, wherein the threshold separates probabilities of anomalous pixel values from probabilities of non-anomalous pixel values, and determining as imperfection a pixel or pixels for which the estimated probability falls on the anomaly side of the threshold.

An advantage and technical effect provided by the method is that learning does not require a large variety of possible defects in products or in manipulated training images. For the same reason, the conventionally required manual involvement in the training process is significantly reduced.

It shall be emphasized that the presented network model is not limited to training and inference focusing on individual pixels and pixel intensities only. On the contrary, in the training process we can model several types of properties beside intensity, such as edginess, intensity variance, features such as pattern or texture, e.g., within a region of the image.

One embodiment of the method foresees that training the network comprises approximation of actual pixel intensities in training images into a normal distribution curve formed around a target pixel intensity towards which the network is being trained.

In this embodiment, the method comprises

- applying a normal distribution of actual pixel values sampled from training images,

- forming a Gaussian curve around actual pixel values, towards which the network is being trained,

- feeding training images through the encoder-decoder part of the network, and

- calculating mean squared error loss between the Gaussian curves and pixel values sampled in decoder representations of the training images.

Another embodiment of the method comprises the step of setting a threshold between zero (improbable) and one (highly probable) in the probability density function and rejecting as imperfection a pixel intensity that falls below the probability threshold.

In this embodiment, the method comprises the steps of

- setting a threshold between zero and one, i.e., between unlikely (0) and most likely (1), in the probability estimate, and

- rejecting as imperfection a pixel or pixels the estimated probability of which falls below the threshold. In one embodiment, the method comprises the step of isolating the product from the environment through masking by setting the value of pixels outside the product to zero (0) in the image.

In one embodiment, a training run of the network model and method can be implemented as follows

- preparing a set of gray-scale or single-color images without anomalies,

- setting up a neural network comprising an encoder-decoder network part that decodes an image to a probability density function by adding output filters to the decoder of the network,

- for each training iteration, generating a target probability density function by forming a Gaussian curve around the actual pixel value for each pixel in the image,

- training the network by running a forward pass of images through the encoder-decoder part of the network and calculating mean squared error loss between the reconstructed decoder representations and the target probability density function,

- update network and repeat the previous steps.

An inference run can be implemented as follows

- running the trained network on an image that may contain anomalies,

- estimating the probabilities for pixel values sampled from pixels in the reconstructed decoder representation of that image,

- applying a threshold value between zero (0) and one (1) on a probability scale to determine if an estimated probability indicates an anomaly: a probability below the threshold is marked as an anomalous pixel,

- making an erosion on anomalous pixels,

- applying a dilation function to eliminate the smallest clusters of anomalous pixels and outliers,

- evaluating each cluster of connected anomalous pixels: if the number of pixels in any such cluster is larger than a preset threshold, that cluster is marked as an imperfection

In this connection it can be emphasized that erosion and dilation are known concepts of morphological image processing which are familiar to persons skilled in the art, and thus they need no detailed explanation in this disclosure. Another embodiment of the method foresees that training the network comprises applying a degenerate distribution of actual pixel intensities in training images, or any other distribution function as appropriate, around a target pixel intensity towards which the network is being trained.

Still another embodiment of the method foresees that training the network comprises calculation of a probability mass function on a region of actual pixel intensities in training images around a target pixel intensity towards which the network is being trained.

In one embodiment, the method thus comprises

- applying a probability distribution to actual pixel values sampled from training images, such as a degenerate distribution or a probability mass function, and

- training the network to reconstruct that distribution by calculating KL-divergence loss against pixel values sampled in decoder representations of the training images.

One embodiment of the method comprises the step of filtering the decoder representation, determining if the actual pixel intensity falls inside or outside a pixel intensity range defined by the probability density algorithm.

In on embodiment, the method thus comprises

- filtering pixel values sampled in decoder representations of production product images through a static filter which denotes images having clusters of connected anomalous pixels, and

- feeding these images or pixel clusters through a neural network trained to determine whether or not these clusters are imperfections.

Yet another embodiment of the method comprises the step of determining the least number of clustered pixels required outside the predicted pixel intensity range, or below the probability threshold, to qualify as an imperfection.

The neural network model and method as briefly explained hereinabove can be realized and implemented in various ways and embodiments, as exemplified below: the reconstructed decoder representation can be fed to a post-operated neural network which is trained to determine the significance of an imperfection by comparison with a map of known imperfections; the reconstructed decoder representation can be fed to a post-operated, object detecting neural network which is trained to select regions of interest in the production product images; feeding anomalous images in the reconstructed decoder representation to a post- operated classifier neural network which is trained per product type and material to label and separate imperfections into different classes; the classifier neural network can comprise feedback and labelling of previously unknown imperfections detected during running of the network on production product images; advantageously, the examined property pertaining to pixels in the image is pixel intensity; the method steps a), b), c), d) can be repeated for each colour red, green and blue in a multicolour image; the steps a), b), c), d) can be repeated for each type of texture included in a set of textures previously recognized by a pre-operated neural network, as well as any appropriate combination of the above listed embodiments and implementations.

The present invention can be implemented in the form of a computer program product that can be stored on a computer-usable medium containing instructions for a data processing means to execute the inventive method.

The computer program product may be provided at least in part over a data transfer network, such as Ethernet or Internet.

The computer program product may be installed to run on a computer operated in a physical inspection cell for a production line.

The present invention can further be implemented in the form of a computer readable medium which contains the computer program product. Advantages and technical effects provided by these and other embodiments are further explained in the accompanying detailed description of preferred embodiments.

SHORT DESCRIPTION OF THE DRAWINGS The invention will be more closely described below with reference made to the accompanying drawings, of which

Fig. 1 (Prior art) is a reproduced illustration taken from the PixeICNN paper (Aaron van den Oord et al. Conditional Image Generation with PixeICNN. In 3Ct h conference on Neural Information Processing systems (NIPS), 2016), Fig. 2 is an image showing image regions of exceptional pixel intensities indicating an anomaly,

Fig. 3 is a graph illustrating probability density functions in the form of Gauss curves generated around actual pixel intensities,

Fig. 4 is a graph illustrating probability density functions generated on pixels close to an edge of a sample,

Fig. 5 is a diagram showing schematically the architecture of an embodiment of the neural network model of the present invention,

Fig. 6 is a flowchart illustrating a training process for the neural network model,

Fig. 7 is a flowchart illustrating masking, and Fig. 8 is a flowchart illustrating iterative training of a classifier operated in conjunction with the method and neural network model of the present invention.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

With reference to Fig. 1, briefly speaking, PixeICNN is a prior art method for generating, pixel by pixel, an image by predictions based on the properties of previous pixels above and to the left of the next generated pixel. Fig. 1 illustrates how the next pixel (black) is generated based on the actual intensities of previous pixels (see the graph) within a rectangular window that is moved in rows over the image from upper left to lower right.

Fig. 2 is an anomaly intensity image showing typical behavior of the network model/detection method of the present invention. The intensity of white indicates how unlikely a particular value is, strong white indicates that it is less likely. In this image the white dot in the middle is an anomaly. The noisy pattern indicates normal noise variations that are enhanced due to the outliers being slightly unlikely: even though there might be minor pixel shifts they are clearly within the acceptable region.

Fig. 3 is a graph illustrating the assessment basis in inference mode of the neural network model for detecting imperfections or anomalies in products according to the present invention. In the graph, the vertical scale indicates likelihood values from 0-1, and the horizontal scale indicates pixel intensity values from 0-255. The continuous-line curve (right) illustrates a probability density function comprising estimation of the probability for a specific pixel intensity based on input pixel intensities in imperfection-free training images. The algorithm generates a Gaussian curve approximating the distribution of intensity values in training images, the shape of the curve predicting the likelihood for a specific pixel intensity. Since in this case the training images show a narrow range of pixel intensity values between about 120-150 on the horizontal scale, the prediction for an actual pixel intensity of exactly 135 in analyzed images is quite high, about 0.8 on the likelihood scale. From the graph it can be concluded that a pixel intensity value of 140, e.g., shows high conformity with the predicted intensity range and lands in the region of about 0.6 on the likelihood scale. A pixel intensity value of 125, on the other hand, shows low conformity with the predicted intensity range and lands below 0.2 on the likelihood scale.

To separate anomalous samples from imperfection-free products in operation of the network model, the inference process can be designed to insert a threshold value on the likelihood scale as a discriminator, in Fig. 3 illustrated by a dash-dot line.

The value of the threshold is usually obtained by running the algorithm on images of imperfection-free products and tuning it so that no pixel or region of pixels in the imperfection-free samples are determined as imperfections. In other words, a threshold in the probability estimate can be set based on previous runs of images through the encoder-decoder part of the network, wherein the threshold separates probabilities of anomalous pixel values from probabilities of non-anomalous pixel values. A probability for a pixel value that falls on the anomaly side of the threshold will be rejected as anomalous, indicating an imperfection. Although in Fig. 3 the threshold is set to about 0.4 it will be realized, that the value of the threshold in practice can be tuned in at almost any value on the likelihood scale, depending on the specific implementation of the neural network model and method. In Fig. 3, a broken-line curve is generated by the algorithm in similar way based on a single pixel intensity in a production image, hence the high likelihood for that intensity. Obviously, there is no overlap and conformity between the two curves, and that single intensity which has produced the broken-line curve is clearly outside the predicted intensity range, indicating an anomaly in the production.

Fig. 4 is a graph similar to Fig. 3. In this case, the continuous-line curve describes the distribution of pixel intensities in the vicinity of a border or edge in training images. The probability density functions are here much wider (cf. Fig. 3) since there is greater spread and distribution of intensity values in this region, which can be due to how the light breaks in the surface of the product and where exactly the edge falls, e.g. The broken-line curve based on a single intensity value in the analyzed image however represents an intensity which is clearly inside of the predicted intensity range, and this image will pass as accepted without anomalies.

According to the invention, instead of calculating a specific value we are interested in calculating a range of possible values at a certain position. This way normally occurring variance can be easier handled without having to utilize a network with too much expressivity (thus also being able to reconstruct defects). Drawing inspiration from density functions as generated by non-GAN based generation networks like PixeICNN, a basic autoencoder algorithm can instead be modified and expanded as explained below with reference to Figs. 5 and 6.

Initially, a set of images 1 that preferably have no defects are gathered for training.

Images that do have defects can be sorted out by highlighting the most deviant images in the training set for human inspection. This can be done, for example, by running the algorithm with low thresholds for error. In a factory this is typically done by installing the cameras in normal production and capturing images without making any decisions. This can also be done off site in a lab, but the correct settings need to be replicated regarding light etc., or the light at the factory may otherwise be determined as anomalous.

In one embodiment, the network is trained by applying a normal distribution to actual pixel values sampled from input training images 1, and forming Gaussian curves around the actual pixel intensities, see reference number 2 in Fig. 5 (probability density function or ground truth cube 2). The network is then trained to ideally output that Gauss curve by calculating a loss based on MSE loss (Mean Squared Error) against the generated Gauss curve.

Training images 1 and production product images 100 are fed through the network comprising encoder 3 and decoder 4 network parts. The network is set up using convolutional layers (i.e. , small kernels that move across the image) as well as fully connected layers in the middle that compress the information into one layer which may contain, for example 128 entries or 256 entries. The network can be said to be a tailored version of an artificial neural network of encoder-decoder type.

The encoder 3 is forced to compress its understanding of the input image, and the decoder 4 expands that understanding into a mathematically described 3-dimensional shape 5 (cube) which contains one density function for each pixel (or element) in the image. This cube can contain one bin for each intensity value of the input image.

To be more specific, for an 8-bit mono-colored 256x256 pixel image, a 256x256x256 probability density function 5 is generated as the output from the decoder 4. Each value of this probability density function indicates the likelihood for a specific pixel intensity for that particular pixel. I.e., for a specific pixel this could describe that the likelihood of said pixel having the intensity of 0, 1, 2 or 3, e.g., (i.e., dark) is relatively high, whereas the likelihood of it being between 32-100 is very low and unlikely.

The probability density function output per pixel can be implemented in many ways.

In one embodiment it comprises a series of 1x1 convolutions that are connected to a set of filters that would, normally, come before the final step of the decoder 4. In other words, we have a set of, for example, 64 filters in the output from the decoder which is then analyzed through an additional set of filters 6 for each pixel when estimating the likelihoods of different pixel intensities in reconstructed decoder representations 7 of input images.

In Fig. 5, reference number 8 denotes the final step of evaluation wherein probabilities 5 estimated in inference are compared to probabilities 2 predicted in training.

By using a Gauss curve the generated density function doesn’t have to precisely match the output and is thus easier to train. I.e., we help the network to understand that nearby values are acceptable to the actual value at a certain position. Experimentation have shown this to be an efficient way to generate arbitrary density functions from the learned network.

Since only one value for each density function will prove correct for each pixel, an alternative way to train is to only train based on a few failing samples each image and all the correct samples. This way the network needs to learn the distribution itself without assuming it is gaussian. This has also proven to work nearly as well as training using the Gaussian curve.

Another efficient way of training the network is by applying a degenerate distribution on the actual pixel intensity, so that the probability equals to 1 on this pixel intensity and 0 for all other intensity values, for each pixel, and training the network to reconstruct that distribution by calculating a KL (Kullback-Leibler) divergence loss against the generated reconstructed decoder representation. Experimentation has shown this to be an efficient way to generate arbitrary density functions from the learned network.

Still another efficient training method is to calculate a probability mass function of a region of actual pixel intensities around each target pixel and training the network to reconstruct that by calculating a KL-divergence loss against the generated reconstructed decoder representation.

Given this output, which we can call the output probability density function, we can sample each value in the input image on the individual probability density functions for each pixel, and based on the probabilities for each value determine whether a region of pixels is unexpected (an anomaly) or not. I.e., if the input image has a value of 23 for a given pixel, we check in the output probability density function for that pixel how likely 23 is as a value. The probability density function can contain normalized or non-normalized values which gives a likelihood estimate that says whether this is likely or very unlikely. Typically, the probability density functions contain some noise so a threshold is preferably set at some low level.

In a case where the network is trained to reconstruct a probability mass function of intensities in a region around a pixel, KL-divergence can be utilized as a method to measure the similarity between two given density functions. By calculating a probability mass function in a region of actual pixel intensities around each pixel in the input image, such as in a 9x9 pixels region e.g., and comparing this density function to that of the output from the decoder, it is possible to find anomalous distributions and thus anomalous pixels. In contrast to the probability threshold, KL-divergence can be any value between 0 and positive infinity, where a high value indicates low similarity. A threshold based on KL- divergence is therefore the maximum value for a given pixel to be considered non- anomalous, and similarly any pixel in the output from the decoder with a distribution that results in a KL-divergence larger than the threshold is considered anomalous. In other words, if the input image has a pixel which probability mass function, given actual pixel intensities in a 9x9 pixels region, is compared to the density function for the corresponding pixel in the decoder representation by applying KL divergence, the KL- divergence score could be 8. If the threshold is previously set to 7, the pixel would in this case be classified as anomalous.

The output filtering can be done in a number of different ways. The most straightforward is to determine that all pixels with a probability below a threshold is deemed as an anomaly. A post filter can then be applied where sufficiently large sets of connected anomalous pixels are deemed as an actual anomaly (since individual outliers can still occur and not be insignificant). The required number of pixels can be calibrated based on reducing the number of false positives while still allowing for detecting small defects. In a properly trained network, the number of defects is expected to be small due to the flexibility of the density function.

The value for the threshold is usually obtained by running the algorithm on images of imperfection-free products and tuning it so that no pixels or regions of pixels in the imperfection-free samples are determined as imperfections. The value for number of connected anomalous pixels to consider a cluster of pixels as anomalous can be set explicitly from customer requirements if for example only clusters of anomalous pixels larger than a certain size is to be considered anomalous.

If, during production or a production site test, the obtained threshold causes undesired behavior, imperfect and imperfect-free data can be used to find the combination of probability threshold and number of connected anomalous pixels to consider a cluster of pixels as anomalous, which creates desired separation between imperfect and imperfect- free samples.

That an individual anomaly is insignificant can be either due to the defect being very small, or that the model is not expressive enough to cover very rare occurrences such as dust, that is still acceptable.

In a more advanced implementation, the output from the process of a neural network can be trained based on the anomaly map to determine whether the anomaly is significant or not. The advantage of such a network is that the probability density function will behave the same for different types of images of objects being tested. One way to do this is to check each region where the total anomaly within the region exceeds a low threshold, or by simply run the post filter on top of all regions of the image.

An alternative way is to train an object detector to suggest interesting regions in the image. The advantage to training a detector on the output is that the probability density function domain behaves similar for different types of images. In other words, a defect on a metal part can look very similar in the probability density function compared to a defect on a plastic part. This is not necessarily true when looking at the raw image from the camera. This way the effort of training a network can be reused across multiple different object and material types.

In typical implementations we still want to classify defects. This can now be done by running a classifier on top of the regions determined to be anomalous. In some cases, this stage can also be used to suppress features that, while anomalous, are acceptable in production. In other words, one way to do this is to train a post-operated classifier CNN (Convolutional Neural Network) to recognize false positives from the anomaly network, and pass all found anomalies through this network. This optional network needs to be trained per object type and material to meet whatever classifications are required for this process to get the correct labeling.

Based on the probability density function of an area, these anomalies can be clustered based on their likeness so that we can define separate classes of defects that the user of the system can give different names. This is a type of auto-labeling. The grouping can be done by a clustering algorithm such as K-Means. A neural network can be used to reduce the dimensionality of the faults, this network can be trained on a smaller set of defects and reused for new datasets. In other words, apply a network that takes for example a 32x32 pixel area around an error and train a network to separate different labeled defect types into different bins by describing them with a 128 entries vector. The same network can then be used on a completely new dataset. This is a technique which can be used when trying to separate for example different surfaces where not all different surfaces will have been seen during training. Instead, you teach the network to learn an efficient representation where different surfaces tend to get different 128 vectors as output, whereas similar/same surfaces get similar output vectors.

In a typical implementation this may have to be done on multiple color channels, since Red, Green and Blue have separate output density functions. This can be calculated by extending the density functions for each pixel while keeping a similar network structure, but it could also be done through three separate networks, one for each color.

While the probability density function is a powerful concept, the human eye/brain will sometimes pick up on defect not because of color variations but due to the structure or lack thereof in a single area. In other words, in the chaotic texture after a cutting tool on metal it is easy for a human to pick up a longer scratch even if the scratch has a color that is within the same color band as the chaotic texture. To cover this aspect, the same concept can be applied, but instead of using three different color channels a set of channels that describes the intensity of different variations of textures is used.

Accordingly, if we have one channel that describes a vertically striped texture. This effect can be either strong or weak in a region of the image. Many convolutional neural networks will describe textures a few layers into the network since they are central for determining what object an image depicts. Given this information the algorithm thus becomes: Take the output from a detection network (or a reconstructing autoencoder) a few layers into the network. This will typically generate a vector of lower resolution than the input, and will typically contain more channels (filters). In the example from the 256x256 image we could for example have a 64x64x64 vector that contains texture intensities from 64 textures from 64x64 number of 5x5 patches in the image using typical strides and convolution sizes which gives some overlap between the patches (stride is the length between each invocation of the convolution kernel). This sampling can be done at several layers in order to convey probabilities over more advanced textures.

For each texture intensity we normalize the intensities and can then treat it the same way as a color intensity for a 64x64 image according to the algorithm above. Similarly, to the way we combine three color channels in the image, we also calculate them all in the same network and expand to, for example, 64 different density functions for each region of the image.

Similar to the color density functions this network can also be used to find texture outliers. In other words, it would be unexpected to find a striped pattern in the middle of a dotted pattern, which probably indicates a scratch.

This can also be implemented by transforming the image with a set of classic filters. By applying a Laplacian filter, e.g., in order to find which areas of the image that contain many borders, or by checking transform values from a discrete Fourier transform, by measuring variance and mean of an area, or by calculating directional edges, etc. The output of the filter is then used as the input channel to the anomaly detection network.

One or several layers of texture probabilities can be combined with the color probabilities to make a decision on the anomalous nature of a region. Preferably this is combined into a neural network unless training is sufficiently powerful to accurately determine likely or unlikely regions.

Fig. 6 is a flowchart illustrating a process for ensuring the reliability of the network model. In step 9, images are processed through the network structure 3, 4, 6 which generates the probability density cube 5 built on probability density functions applied to pixels in the output decoder representations 7. In post analysis step 10, probabilities of the input images are matched with the Gaussian probability curves generated in step 11 for pixels in training images imported from step 12. In step 10, MSE loss is calculated with respect to the Gaussian probability curves. Step 13 is the evaluation step wherein the calculated MSE loss is compared with a setpoint or threshold value. If MSE loss is acceptable, the model is saved in step 14 as a reliable model for operative use. However, if MSE loss is unacceptable the process returns to step 9.

A masking process may be applied as illustrated in Fig. 7 in order to ensure that only the relevant object is seen in the image. Masking is effective for removing variability in the surrounding environment. A masking neural network can be trained to find out which pixels belong to the object, and set all other pixels in the image to 0. From step 15, an image is imported to a positioning algorithm run in step 16, and trained to isolate the relevant pixels or regions of pixels in the image. In step 17 the irrelevant pixels or regions of pixels are masked by the positioning algorithm. In step 18 the masked image is imported for training of the network.

Fig. 8 is a flow chart and overview showing a network model as represented by the present disclosure. Notably in Fig. 8, anomalous images in the decoder representation 19 are fed to a post-operated classifier neural network 20 which is trained per product type and material to label and separate imperfections or clusters of imperfections into different classes. A feedback loop 21 from decision step 22 to a labeling process 23 is implemented for iterative training of the classifier 25 on images containing defects which are previously unknown to the classifier.

The invention as disclosed prescribes the use of a probability density function in order to relax matching for reconstructing neural networks. The invention further suggests reconstruction on features for recreating texture. Training of the network includes features like doing MSE on a Gaussian curve applied as an estimator. Other advantageous embodiments comprise calculating KL-divergence loss against probability mass functions, or against degenerate distributions, applied to actual pixel values in training images.

Another inventive feature is the application of a classifier on top of the anomaly detector and the feedback loop made possible from that. The network model and method of the present invention can be run on an edge computer or server in a production line, whereas training can alternatively be done offline and tuned online as customer adds customer specific features to the setup of the network model and method. The present invention results in high precision to find imperfections in products, it can detect imperfections not seen before, and it is robust to false positives in multiple ways.

A computer program product or a computer program implementing the method or a part thereof comprises software or a computer program run on a general purpose or specially adapted computer, processor or microprocessor. The software includes computer program code elements or software code portions that make the computer perform the method. The program may be stored in whole or part, on, or in, one or more suitable computer readable media or data storage means such as a magnetic disk, CD-ROM or DVD disk, hard disk, magneto-optical memory storage means, in RAM or volatile memory, in ROM or flash memory, as firmware, on a data server, or a cloud server. Such a computer program product or a computer program can also be supplied via a network, such as Internet.

It is to be understood that the embodiments described above and illustrated in the drawings are to be regarded only as non-limiting examples of the present invention and may be modified within the scope of the appended claims.