Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHODS AND SYSTEMS FOR AUTOMATED ANALYSIS OF MEDICAL IMAGES
Document Type and Number:
WIPO Patent Application WO/2024/036374
Kind Code:
A1
Abstract:
A computer implemented method for detecting visual findings in anatomical images, comprising: providing anatomical images the subject; inputting the anatomical images into a convolutional neural network (CNN) component of a neural network to output a feature vector; computing an indication of visual findings being present in the anatomical images by a dense layer of the neural network that takes as input the feature vector and outputs an indication of whether each of the visual findings is present in the one or more anatomical images; communicating the visual findings to a user system configured to receive feedback data associated with the visual findings; and transmitting the feedback data to the neural network; wherein the neural network is trained on a training dataset including anatomical images, and labels associated with the anatomical images and each of the respective visual findings, the labels comprising labels obtained using the transmitted feedback data.

Inventors:
AUSTIN BEN (AU)
NORTHROP MARC (AU)
TRAN AENGUS (AU)
Application Number:
PCT/AU2023/050778
Publication Date:
February 22, 2024
Filing Date:
August 17, 2023
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
ANNALISE AI PTY LTD (AU)
International Classes:
G06T7/00; G06N3/0464; G06V10/778; G06V10/82; G16H30/20; G16H30/40; G16H50/20; G16H50/70
Domestic Patent References:
WO2018222755A12018-12-06
WO2019215605A12019-11-14
Foreign References:
US20200226746A12020-07-16
US20150086091A12015-03-26
US20190236782A12019-08-01
Attorney, Agent or Firm:
FB RICE PTY LTD (AU)
Download PDF:
Claims:
CLAIMS

1 . A computer implemented method for detecting a plurality of visual findings in one or more anatomical images of a subject, comprising: providing one or more anatomical images of the subject; inputting the one or more anatomical images into a convolutional neural network (CNN) component of a neural network to output a feature vector; computing an indication of a plurality of visual findings being present in at least one of the one or more anatomical images by a dense layer of the neural network that takes as input the feature vector and outputs an indication of whether each of the plurality of visual findings is present in at least one of the one or more anatomical images; communicating the plurality of visual findings to a user system configured to receive feedback data associated with the plurality of visual findings; and transmitting the feedback data to the neural network; wherein the neural network is trained on a training dataset including, for each of a plurality of subjects, one or more anatomical images, and a plurality of labels associated with the one or more anatomical images and each of the respective visual findings, the plurality of labels comprising labels obtained using the transmitted feedback data.

2. The method of claim 1 , wherein the visual findings are radiological findings in anatomical images comprising one or more chest x-ray (CXR) images or computed tomography (CT) images.

3. The method of claim 1 or claim 2, further comprising computing a segmentation mask indicating a localisation for at least one of the plurality of visual findings.

4. The method of any one of the preceding claims, further comprising displaying at least one of the one or more anatomical images of the subject and receiving a user selection of one or more areas of the anatomical images and/or a user-provided indication of a first visual finding. 5. The method of claim 4, further comprising recording the user selected one or more areas of the anatomical image(s) and/or the user provided indication of the first visual finding in a memory, associated with the one or more anatomical images.

6. The method of claim 4 or claim 5, further comprising using the user- selected one or more areas of the anatomical images and/or the user-provided indication of the first visual finding to train a deep learning model to detect the presence of at least the first visual finding in anatomical images and/or to train a deep learning model to detect areas showing at least the first visual finding in anatomical images.

7. The method of claim 6, wherein using the user-selected one or more areas of the anatomical image(s) and/or the user- provided indication of the first visual finding to train a deep learning model to detect the presence of at least the first visual finding in anatomical images comprises at least partially re-training the deep learning model that was used to produce the first value.

8. The method of claim 6, wherein using the user-selected one or more areas of the anatomical image(s) and/or the user- provided indication of the first visual finding to train a deep learning model to detect the areas showing at least the first visual finding in anatomical images may comprise at least partially retraining the deep learning model that was used to produce a segmentation map indicating the areas of the anatomical image(s) where the first visual finding has been detected.

9. The method of any one of the preceding claims, further comprising displaying a list of visual findings on a user interface, wherein the list of visual findings comprises a first sublist comprising one or more visual findings not present in the one or more anatomical images of a subject, the user interface configured to allow a user selecting a visual finding from the first sublist for a displayed one or more anatomical images, wherein the feedback data comprises the selected visual finding and associated one or more anatomical images.

10. The method of any one of the preceding claims, wherein transmitting the feedback data is carried out by a de-identification module configured to remove identification information from the feedback data thereby providing de-identified feedback data.

11 . The method of any one of the preceding claims, wherein the plurality of visual findings are provided by a server module to an integration layer module, wherein the integration layer module is configured to transmit the feedback data to the server module.

12. The method according to any one of the preceding claims, wherein the integration layer module comprises a database for storing the feedback data temporarily and the temporary period of time is user configurable.

13. The method according to any one of the preceding claims, wherein the plurality of labels associated with at least a subset of the one or more anatomical images and each of the respective visual findings in the training dataset are derived from the results of review of the one or more anatomical images combined with the feedback data, by at least one expert.

14. The method according to any one of the preceding claims, wherein the plurality of labels associated with subset of the one or more anatomical images in the training dataset represent a probability of each of the respective visual findings being present in the one or more anatomical images of the subject.

15. The method according to any one of the preceding claims, further comprising generating a report of visual findings and displaying the report of visual findings on a user interface, wherein the report of visual findings comprises text of the one or more visual findings present in the one or more anatomical images of a subject, the user interface configured to allow a user to edit the text, wherein the feedback data comprises a visual finding indicative of the edited text and associated one or more anatomical images.

16. A system for receiving, processing and transmitting feedback data for a plurality of visual findings in one or more anatomical images of a subject, wherein the plurality of visual findings are generated using a convolutional neural network (CNN) component of a neural network, the system comprising: at least one processor; and at least one computer readable storage medium, accessible by the processor, comprising instructions that, when executed by the processor, cause the processor to execute a method of any one of the preceding claims.

17. A non-transitory computer readable storage media comprising instructions that, when executed by at least one processor, cause the processor to execute a method of any one claims 1 to 15.

Description:
METHODS AND SYSTEMS FOR AUTOMATED ANALYSIS OF MEDICAL IMAGES

Cross-Reference To Related

The present application claims priority from Australian Provisional Patent Application No 2022902333 filed on 17 August 2022, the contents of which are incorporated herein by reference in their entirety.

Field of the Invention

The present invention generally relates to computer-implemented methods for analysing medical images, as well as computing systems, services, and devices implementing the methods. Embodiments of the invention improve on analysis of medical images by allowing for user feedback to be processed during automated analysis of medical images employing machine learning techniques, in particular deep learning networks, such as convolutional neural networks, trained using substratification training, to enable error correction and thereby increase accuracy of the machine learning techniques. Methods, systems, services, and devices embodying the invention find applications, amongst others, in the clinical assessment of chest conditions such as pneumothorax and other radiological findings pertaining to the chest or head. to the Invention

Generally, the manual interpretation of medical images performed by trained experts (such as e.g. radiologists) is a challenging task, due to the large number of possible findings that may be found. For example, the chest x-ray (CXR) is a very commonly performed radiological examination for screening and diagnosis of many cardiac and pulmonary diseases. CXRs are used for acute triage as well as longitudinal surveillance. In other words, a CXR is typically examined for any detectable abnormality in addition to the clinical indication for which it was ordered. This means that radiologists must be alert to identify many different conditions, with a concordant risk that some findings may be missed. CXRs are particularly difficult to interpret. Additionally, the increasing demand for specialists that are qualified to interpret medical images (i.e. medical imaging specialists or expert radiologists) far outweighs the availability of these specialists. Furthermore, the training of new specialists requires a significant amount of time. As a result, technical operators, such as radiographic technicians/ radiographers, are increasingly called upon to provide preliminary interpretations to decrease the waiting time and/or to provide a triage assessment. However, the accuracy and confidence in the work of such technicians is generally inferior to that of highly-trained and highly- experienced specialists.

Empirical training has been used to assess medical imagery, in which mathematical models are generated by learning a dataset. Deep learning is a particularly data- hungry subset of empirical training that is itself a subset of artificial intelligence (Al). Recently, the use of deep learning approaches to generate deep neural networks (DNNs) which are also known as deep learning models, that automate the assessment of CXR images has been suggested. PCT/AU2021/050580 entitled “SYSTEMS AND METHODS FOR AUTOMATED ANALYSIS OF MEDICAL IMAGES” by the same applicant, Annalise-AI Pty Ltd, the contents of which are incorporated in its entirety, describes improved technology in analysing anatomical images using deep learning models.

Prior methods have indicated the presence (and confidence of such prediction) of a radiological finding among a list of findings without any indication of priority of clinical significance of the prediction generated by an Al model. There is an ongoing need for improved methods to monitor and improve on Al model performance in a manner that accurately produces clinically useful outputs for clinical decision support and exhibiting performance that is non-inferior to conventional unassisted prior methods.

In various embodiments the present invention seeks to address, individually and/or in combination, one or more of the foregoing needs and limitations of the prior art.

Summary of the Invention According to a first aspect, there is provided a computer implemented method for detecting a plurality of visual findings in one or more anatomical images of a subject, comprising: providing one or more anatomical images of the subject; inputting the one or more anatomical images into a convolutional neural network (CNN) component of a neural network to output a feature vector; computing an indication of a plurality of visual findings being present in at least one of the one or more anatomical images by a dense layer of the neural network that takes as input the feature vector and outputs an indication of whether each of the plurality of visual findings is present in at least one of the one or more anatomical images; communicating the plurality of visual findings to a user system configured to receive feedback data associated with the plurality of visual findings; and transmitting the feedback data to the neural network; wherein the neural network is trained on a training dataset including, for each of a plurality of subjects, one or more anatomical images, and a plurality of labels associated with the one or more anatomical images and each of the respective visual findings, the plurality of labels comprising labels obtained using the transmitted feedback data.

In embodiments of the invention, the visual findings may be radiological findings in anatomical images comprising one or more chest x-ray (CXR) images or computed tomography (CT) images. For example, embodiments of the invention may employ a deep learning model trained to detect/classify pneumothoraces from a CXR image. For example, an Al model service (AIMS) 718 for Al processing and generating predictions, may identify and returns the predicted radiological findings generated by the deep learning models executed by a machine learning prediction service.

Deep learning models embodying the invention can be trained to detect/classify a very high number of visual findings and then re-trained to improve detection/classification based on user feedback. Such models may have been trained and re-trained using CXR images (pixel data) where, in one example, labels were provided for each of the findings (including corresponding to visual findings and feedback data), enabling the deep learning models to be trained to detect combinations of findings, while preventing the models from learning incorrect correlations and improving the model predictions following feedback data.

Advantageously, methods according to embodiments of the invention enable monitoring the performance of the Al model over a long period of time to allow for retraining of the Al model 708 as needed, in situations where the generated visual findings are not accurate and may be improved upon. In particular, the method allows for user (e.g. radiologist) feedback to be received, processed and transmitted to a server module for correcting errors with the Al model 708 when the prediction generated was incorrect and the user indicated so, so that the next version of the Al model trained with new data which includes the ground truthed feedback data, is more accurate.

Feedback data may indicate for example, a radiological finding being missed from the study, a radiological finding added by a user, or rejecting a predicted radiological finding (i.e. indicating the finding is incorrect), amongst others. In embodiments, the method may further comprise computing a segmentation mask indicating a localisation for at least one of the plurality of visual findings, wherein feedback data comprises an indication of incorrect localisation.

The method additionally enables detecting very bad predictions such as 10’s of lungs modules identified instead of labelling a visual finding as cystic fibrosis, or wrongly identifying visual findings which are in fact due to errors in the hardware imaging equipment such as a scratch x-ray hospital plate. This allows for a more accurate, reliable, and secure system.

An automated analysis of anatomical images using deep learning models is improved by enabling the user to review the results of such automated analysis and provide feedback/corrective information in relation to a radiological finding that may have been missed or incorrectly predicted by the automated analysis process, and using this information to train one or more improved deep learning model(s).

Accordingly, the method may further comprise displaying at least one of the one or more anatomical images of the subject and receiving a user selection of one or more areas of the anatomical image(s) and/or a user-provided indication of a first visual finding.

A user-provided indication of a first visual finding may be received by the user selecting a first visual finding from a displayed list of visual findings, or by the user typing or otherwise entering a first visual finding. Preferably, the method comprises receiving both a user selection of one or more areas of the anatomical image(s) and a user- provided indication of a first visual finding associated with the user-selected one or more areas.

Preferably, the method further comprises recording the user selected one or more areas of the anatomical image(s) and/or the user provided indication of the first visual finding in a memory, associated with the one or more anatomical image(s).

The method may further comprise using the user-selected one or more areas of the anatomical image(s) and/or the user-provided indication of the first visual finding to train a deep learning model to detect the presence of at least the first visual finding in anatomical images and/or to train a deep learning model to detect areas showing at least the first visual finding in anatomical images. The deep learning model trained to detect areas showing at least the first visual finding in anatomical images may be different from the deep learning model that trained to detect the presence of at least the first visual finding in anatomical images.

Using the user-selected one or more areas of the anatomical image(s) and/or the user- provided indication of the first visual finding to train a deep learning model to detect the presence of at least the first visual finding in anatomical images may comprise at least partially re-training the deep learning model that was used to produce the first value.

Using the user-selected one or more areas of the anatomical image(s) and/or the user- provided indication of the first visual finding to train a deep learning model to detect the areas showing at least the first visual finding in anatomical images may comprise at least partially retraining the deep learning model that was used to produce a segmentation map indicating the areas of the anatomical image(s) where the first visual finding has been detected.

The method may further comprise displaying a list of visual findings on a user interface, wherein the list of visual findings comprises a first sublist comprising one or more visual findings not present in the one or more anatomical images of a subject, the user interface configured to allow a user selecting a visual finding from the first sublist for a displayed one or more anatomical images, wherein the user feedback comprises the selected visual finding and associated one or more anatomical images. This allows the user to add a missed finding to the feedback data. In alternative embodiments, the user may manually input the missing visual finding.

In a dependent aspect, transmitting feedback data is carried out by a de-identification module configured to remove identification information from the feedback data thereby providing de-identified feedback data.

In a dependent aspect the plurality of visual findings are provided by a server module to an integration layer module, wherein the integration layer module is configured to transmit the feedback data to the server module. This enables providing feedback/corrective information to the Al model before the predictive findings are transmitted to a user, so that the model is already re-trained to improve accuracy of the findings transmitted to the user. In some embodiments, the integration layer module does not play a role in the feedback feature described in at least one embodiment of the present invention.

In a dependent aspect the integration layer module comprises a database for storing the feedback data temporarily and the temporary period of time is user configurable. This improves on data security aspects of the system.

The plurality of labels associated with at least a subset of the one or more CXR or CT images and each of the respective visual findings in the training dataset may be derived from the results of review of the one or more anatomical images by at least one expert together with feedback data. The plurality of labels for the subset of the images in the training dataset are advantageously derived from the results of review of the one or more images input as feedback data by at least two experts, preferably at least three experts.

The plurality of labels for the subset of the images in the training dataset may be obtained by combining the results of review and feedback data input in respect of the one or more anatomical images by a plurality of experts.

In embodiments, the plurality of labels associated with the one or more CXR images in the training dataset represent a probability of each of the respective visual findings being present in the at least one of the one or more CXR images of a subject.

Labelling using a plurality of labels organised as a hierarchical ontology tree may be obtained through expert review and feedback as explained above. For example, a plurality of labels associated with at least a subset of the one or more chest x-ray images and each of the respective visual findings in the training dataset may be derived from the results of review and feedback data input in respect of the one or more anatomical images by at least one expert using a labelling tool that allows the expert to select labels presented in a hierarchical object (such as e.g. a hierarchical menu).

In embodiments, the indication of whether each of the plurality of visual findings is present in at least one of the one or more CXR images represents a probability of the respective visual finding being present in at least one of the one or more CXR images.

In embodiments, the plurality of labels associated with at least a further subset of the one or more CXR images and each of the respective visual findings in the training dataset are derived from an indication of the plurality of visual findings being present in at least one of the one or more CXR images obtained using a previously trained neural network and user feedback. In embodiments, the neural network is trained by evaluating the performance of a plurality of neural networks (the plurality of neural networks being trained from a labelled dataset generated via consensus of radiologists) in detecting the plurality of visual findings and in detecting the localisation of any of the plurality of visual findings that are predicted to be present.

In a further aspect, there is provided a system for receiving, processing and transmitting feedback data for a plurality of visual findings in one or more anatomical images of a subject, wherein the plurality of visual findings are generated using a convolutional neural network (CNN) component of a neural network, the system comprising: at least one processor; and at least one computer readable storage medium, accessible by the processor, comprising instructions that, when executed by the processor, cause the processor to execute a method as described above.

In a further aspect there is provided a non-transitory computer readable storage media comprising instructions that, when executed by at least one processor, cause the processor to execute a method as described above.

Preferred features of each one of the independent claims are provided in the dependent claims.

Brief Description of the Fiaures

Aspects of the present invention will now be described, by way of example only, with reference to the accompanying figures, in which:

Figure 1 is a block diagram of an exemplary architecture of a medical image analysis system embodying the invention;

Figure 2 is another block diagram of an exemplary architecture of a medical image analysis system embodying the invention;

Figure 3A is a signal flow diagram illustrating an exemplary method for processing of imaging study results within the embodiments of Figure 1 or Figure 2;

Figure 3B is another signal flow diagram illustrating an exemplary method for processing of imaging study results within the embodiments of Figure 1 or Figure 2; Figures 4A to 4G show exemplary interactive user interface screens of a viewer component embodying the invention;

Figures 5A and 5B show further exemplary interactive user interface screens of a viewer component embodying the invention;

Figure 6 is an exemplary report generated on an interactive user interface embodying the invention.

Figure 7 illustrates a computer implemented method for detecting a plurality of visual findings in one or more anatomical images of a subject.

Detailed Description

Exemplary systems and methods for analysing radiological findings generated by an Al model on given radiology images (e.g. CXR) are described. Referring to Figures 1 and 2, a system 10 comprises modular system components in communication with each other, including a server system 70 configured to send predicted radiological findings, and receive feedback data/corrective information associated with the radiological findings, via an integration layer module 702. The integration layer module 702 includes at least one local database (not shown) and a processor 800. The predicted radiological findings and feedback data received by the integration layer 702 and stored in the local database before being queued to the processor 800.

Advantageously, the system 10 enables processing of feedback data/corrective information to improve the Al model 708 run by the Al model service (AIMS) 718. This enables the system to communicate more accurate predicted radiological findings, in a more robust and reliable manner.

The modular components make it highly configurable by users and radiologists in contrast to prior art systems which are rigid and inflexible and cannot be optimised for changes in disease prevalence and care settings. Another benefit of a modular systems architecture comprising asynchronous microservices is that it enables better re-usability, workload handling, and easier debugging processes (the separate modules are easier to test, implement or design). In this way, the system 10 provides an interface specification that allows external applications (patient worklists) to communicate with the system 10 and receive the predicted radiological findings in a more efficient and safe manner, including providing the functionality to receive and process user feedback/corrective information that enables the retraining of Al models 708.

The system 100 further comprises a radiology image analysis server (RIAS) 110. An exemplary RIAS 110 is based on a microservices architecture, and comprises a number of modular software components developed and configured in accordance with principles of the present invention. The RIAS 110 receives anatomical image data that is transmitted from a source of anatomical image data, for example, where the anatomical image data captured and initially stored such as a radiological clinic or its data centre. The transmission may occur in bulk batches of anatomical image data and prior to a user having to provide their decision/clinical report on a study. The transmission may be processed, controlled and managed by an integration layer (comprising integrator services of an integration adapter) installed at the radiological clinic or its data centre, or residing at cloud infrastructure.

In the clinical use scenario, the RIAS 110 provides analysis services in relation to anatomical images captured by and/or accessible by user devices, such as radiology termin als/workstations, or other computing devices (e.g. personal computers, tablet computers, and/or other portable devices - not shown). The anatomical image data is analysed by one or more software components of the RIAS 110, including through the execution of machine learning models. The RIAS 110 then makes the results of the analysis available and accessible to one or more user devices.

The processor 800 may check if the predicted radiological findings are “white-listed” for the RIAS 110. A white-list may be assigned for example using a Digital Imaging and Communications in Medicine (DICOM) tag for the user institution (RIAS) name. In this example, the processor received and transmits DICOM data, sitting between a DICOM receiver 8001 and a DICOM transmitter 8002. It is possible to select system functionality by enabling or disabling the feedback functionality; this increases system flexibility and configurability. Advantageously, a user (i.e. radiologist) using the RIAS 110 can flag incorrect studies or provide user feedback/corrective information. In alternative embodiments, a ruleset/model running in the integration layer 702 can also provide feedback before the radiological predictions 702 are transmitted to the RIAS 110. Feedback data can indicate one or more of the following non-exhaustive list:

• clinic information (location, or exact clinic)

• time

• machine used

• doctor/technology

• radiological findings

• feedback from the user.

It will be appreciated that a “radiology image” in this context may be any anatomical image including a chest X-ray image (CXR) or a CT image or the brain and/or head. The integration layer 702 may receive worklist priority data from the RIAS 110 representing a user’s worklist for a radiologist (i.e. a user), along with associated data which includes a patient identification (ID) and customer account number for example. The predicted radiological findings and feedback data are transmitted via the integration layer 702 comprising integrator services, the integration layer 702 connecting to an injection layer, the clinical ranking data being processable by the RIAS 110, advantageously communicating to the RIAS 110 the predicted Al radiology findings, and receiving user feedback data in a timely manner. This enables a more reliable and safer method of processing predicted radiological findings than known methods, which is particularly important when the radiological findings relate to diseases or injuries which are time critical and need to be identified and confirmed in a timely manner to prevent or minimise permanent injury or accelerated deteriorating medical condition(s) of a patient.

The integration layer 702 communicates the data to predicted radiological findings to the RIAS 110. With reference to Figure 2, the RIAS 110 forwards this data to an interactive viewer component 701 , which communicates the predicted Al radiology findings to the user and receives user feedback associated with the Al radiology findings, to communicate user feedback to the server 70. Accordingly, the system 10 comprises modular components which enable multiple integration and injection pathways to facilitate interoperability and deployment in various existing computing environments such as Radiology Information Systems Picture Archiving and Communication System (RIS-PACS) systems from various vendors and at different integration points such as via APIs or superimposing a virtual user interface element on the display device of the radiology terminals/workstations. A PACS server 111 is shown in Figure 1 . The virtual user interface element may be the interactive viewer component 701 , which has the functionality to allow a user to provide feedback data in respect of the predicted radiological findings.

The system further comprises a de-identification module 900 comprising a deidentification processor 90 for data de-identification. Data from all sources including feedback data is de-identified and DICOM tags are removed. Protected health information is removed or irreversibly anonymised from reports and images through an automated de-identification process. Image data is preserved at the original resolution and bit-depth. Patient IDs and Study IDs are anonymised to de-identify them while retaining the temporal and logical association between studies and patients.

Feedback storage and API endpoints

A distributed message queueing service (DMQS) 710 stores user feedback metadata. User feedback data associated with the results ID is stored in a separate database, in this example a cloud imaging processing service (CIPS) 7060. The primary functions of the CIPS 7060 are to: handle feedback data, handle image storage; handle image conversion; handle image manipulation; store image references and metadata to studies and predicted radiological findings; handle image type conversions (e.g. JPEG2000 to JPEG) and store the different image types, store segmentation image results from the Al model(s) 708; manipulate segmentation PNGs by adding a transparent layer over black pixels; handle and store feedback data, and provide open API endpoints for the viewer component 701 to request segmentation maps and radiological images (in a compatible image format expected by the viewer component 701). In this example, there are two API endpoints: one for user feedback, one for study images and associated metadata. Providing multiple endpoints supports granularity of configuration, thereby enhancing flexibility and configurability to save and/or update user feedback for example. Preferably user feedback identified by a user feedback ID may be provided even in situations where a study has an error associated with it.

Example code snippets of API settings are provided as follows:

// Client

GET Zv1/settings/types/:type?version=:schemaVersion

Res: { id: string; settings: OrganizationVI Settings | CxrV1 Settings | CxrAiVI Settings

}

POST /v1/settings/types/:type 11 Not available for now

11 Admin

POST /v1/admin/settings

Req: { type: SettingsType; schemaVersion: SettingsSchemaVersion; organizationld: string; realm: string; data: OrganizationVI Settings | CxrV1 Settings | CxrAiVI Settings

}

Res: { id: string; type: SettingsType; schemaVersion: SettingsSchemaVersion; organizationld: string; realm: string; data: OrganizationVI Settings | CxrV1 Settings | CxrAiVI Settings createdAt: string; updatedAt: string;

}

GET

/v1/admin/settings/types/:type?schemaVersion=:schemaVersi on&organizationld=:org anizationld&realm=:realm

Res: { id: string; type: SettingsType; schemaVersion: SettingsSchemaVersion; organizationld: string; realm: string; data: OrganizationVI Settings | CxrV1 Settings | CxrAiVI Settings createdAt: string; updatedAt: string;

}

Snippets of database configuration examples are provided below:

Column | Type | Nullable | Default | id | text | not null | | organizationld | text | not null | | realm | text | not null | | data | JSONB | not null | | type | text | not null | ORGANIZATION | schemaVersion 1 text 1 not null 1 0 1 createdAt | timestamp with time zone | not null | now() updatedAt | timestamp with time zone | not null | now() cxr_fssdback

Column | Type | Collation | Nullable | Default | Storage

| Stats target | Description id | uuid | | not null | uuid_generate_v4() | plain | cxrPredictionld | uuid I I I 1 plain | | organizationSettingsId | uuid I I I 1 plain | studyAccessId | uuid I I I 1 plain | organizationld | text | | not null | | extended | realm | text | | not null | | extended | username | text | | not null | | extended | data | jsonb | | not null | | extended | createdAt | timestamp with time zone | | not null | now() | plain updatedAt | timestamp with time zone | | not null | now() | plain

Indexes:

"cxr_feedback_pkey" PRIMARY KEY, btree (id)

Foreign-key constraints:

"cxr_feedback_cxrPredictionld_fkey" FOREIGN KEY ("cxrPredictionld")

REFERENCES cxr_predictions(id) "cxr_feedback_studyAccessld_fkey" FOREIGN KEY ("studyAccessId") REFERENCES studies_access(id)

In an example, feedback data is stored against the columns listed below.

• findingsRejected

• findingsMissed

• findingsAdded

• hasAffectedReport

• hasAffectedPatientManagement

• hasAffectedCtRecommendation

That is, feedback data may be used to distinguish between cases where: predicted radiological findings are rejected by a user, predicted radiological findings have been missed by the Al model 708, predicted radiological findings are added by the user, feedback has affected the report/patient management or recommendations.

Feedback data transmission/Microservices

With reference to Figure 3A, a microservice is responsible for acquiring data from the server 70 (via the integration layer 702 shown in Figure 1) to send the CXR images to the Al model 708 for generating predicted radiological findings and then sending back the prioritised predicted findings to the server 70. The microservice is also responsible for storing study-related information, CXR images, predicted radiological findings and user feedback data including metadata.

Optionally, a gateway service (not shown) may provide monitoring and security control, to function as the entry point for all interactions with a microservice for communicating with an AIMS 718 within the server system 70.

At step 7800 the server 70 sends a “study predict” request comprising an entire study, and which may include associated metadata, i.e. scan, series and CXR images. Additionally, at step 9000, the server sends user feedback metadata. The request, user feedback and other associated data are received by a distributed message queueing service (DMQS) 710.

At step 7830, the request is stored in the CIPS database 7060 as described above.

The DMQS 710 accepts incoming HTTP requests and listens on queues for message from the server 70 (optionally via a gateway) and a model handling service (MHS) 716. The DMQS 710 is configured to pass, at step 7840, CXR images to the MHS 716 for the model prediction pipeline. The DMQS 710 may store studies, CXR images, and deep learning predictions, into a database managed by a database management service (not shown). The DMSQ 710 also manages each study’s model findings state and stores the prioritised predicted radiological findings predicted by the Al models 708, stores errors when they occur in a database, accepts HTTP requests to send study data including model predictions for radiological findings, accepts HTTP requests to send the status of study findings, and forwards CXR images and related metadata to the MHS 716 for processing of the predicted radiological findings. The DMSQ 710 also stores and sends user feedback metadata to the CIPS database 7060 as described above.

The MHS 716 is configured to accept DICOM compatible CXR images and metadata from the DMQS 710. The MHS 716 also performs validation, and pre-processing to transform study data into JSON format, which may then be further transformed into a suitable format for efficient communication within the microservice. Then the MHS 716 sends, at step 7860 the study data to the Al model service (AIMS) 718 for Al processing, which identifies and returns the predicted radiological findings generated by the deep learning models executed by a machine learning prediction service. The study data includes organisation thresholds; thresholds not transmitted to AIMS may be rescheduled (see also Table 1).

The MHS 716 then accepts the predicted radiological findings generated by the deep learning models which are returned via the AIMS 718. The MHS 716 segments (at step 7920), validates, and transforms the prioritized predicted radiological findings (e.g. including predictor, threshold and segments data) representing including CXR data together with clinical ranking data predicted by the Al model 708 into JSON format and returns these, at step, 7940, to the DMQS 710. For example, each JSON file returned corresponds to a single patient study.

The DMQS 710 sends, at step 7960, the CXR data together with the visual findings predicted by the re-trained Al model 708 to the integration layer 702 (e.g. via a dispatch service module, not shown).

Preferably, the system may track:

1 . When a user views results and does not provide feedback

2. When a user views results and does provide feedback

3. When feedback is provided:

1 . Capture feedback (scoped as part of main enhance feedback feature)

2. List of findings as per results payload

3. Predictor value of each findings

4. Organisation thresholds at the time of viewing the results

5. Default thresholds of model (defined by model version that was used for the results)

As such, example feedback data to be transmitted to the server 70 may comprise:

• Prediction UID

• User ID

• error flag

• added finding(s)

• rejected finding(s)

• inaccurate segmentation (per finding)

• “good find” (per finding)

• Q1 ... Q4 answers

• Submitted

In an example, organisation data may comprise:

1 . expiryPeriod

2. isAnalyticsEnabled //

3. feedback mode: OFF/MODEL/TRIAL 4. image export: on/off

5. "trial feedback questions”:

1. q1

1. enable/disable

2. question

2. ...

3. q4 ...

Figure 3B shows exemplary steps of feedback data transmission between a server 70 and a client 7000, in a looped manner. Exemplary code snippets of setting are provided below: enum SettingsType {

ORGANIZATION = "ORGANIZATION",

CXR = "CXR",

CXR_AI = "CXR_AI",

} const restToSettingsType(input: string): SettingsType { switch (input) { case 'cxr-ai': return SettingsType.CXR_AI

}

} const SettingsTypeToRest(input: SettingsType): string { switch (input) { case SettingsType. CXR_AI: return 'cxr-ai'

}

} enum FeedbackMode {

OFF = "OFF",

MODEL = "MODEL",

TRIAL = "TRIAL",

} enum FeedbackTrialQuestionType {

CHECKBOX = "CHECKBOX",

FREE_FORM = "FREE_FORM",

} interface BaseSettings { type: SettingsType; version: string; // should this be semver orjust a plain version } enum Target {

GLOBAL = "global",

CXR = "cxr",

CTB = "ctb",

} interface Organizationsettings extends BaseSettings { type: SettingsType. ORGANIZATION; version: "1"; expiryPeriod: number; isAnalyticsEnabled: boolean; feedback: {

[Target.GLOBAL]: { imageExport: boolean; mode: FeedbackMode; trial: { questions: { id: string; // generated by whoever set the questions type: FeedbackTrialQuestionType; question: string; isEnabled: boolean;

}[];

};

};

[Target. CXR]: {

//... overrides

};

};

} enum LanguageType { en = "en",

} interface CxrSettings extends BaseSettings { type: SettingsType. CXR; version: "1"; assign: { priorities: { assignPriorityld: number; rank: number; priority: string;

}[]; defaultLanguage: LanguageType;

}; findings: { groups: { groupld: number; groupName: string; displayOrder: number; assignOrder: number }[]; labels: { label: string; groupld: number; displayOrder: number; assignPriorityld: number; features: { assign: boolean; assist: boolean; assure: boolean;

}

}[],

};

} interface CxrAiSettings extends BaseSettings { type: SettingsType.CXR_AI; version: "1"; modelVersion: string; labels: { label: string; predictionThreshold: number;

}[]

}

Example code snippets of feedback data processing (schemas) are provide below: export type FeedbackAnswer = FreeFormFeedbackTrialAnswer | CheckboxFeedbackT rialAnswer; export interface BaseFeedbackT rialAnswer { questionld: string;

} export interface CheckboxFeedbackT rialAnswer extends BaseFeedbackT rialAnswer

{ type: FeedbackT rialQuestionType. CHECKBOX; answer: boolean;

} export interface FreeFormFeedbackTrialAnswer extends BaseFeedbackT rialAnswer

{ type: FeedbackT rialQuestionType. FREE_FORM; answer: string;

} export interface PostFeedbackReq { predictionld: string; username: string; organizationSettingsId: string; studyAccessId: string; answers: FeedbackAnswerfl; findingsAdded: string}]; findingsRejected: string}]; findingsInaccurateSegment: string}]; findingsValuable: string}]; isSubmitted: boolean; flaggedWith: string}];

} interface PostFeedbackRes extends PostFeedbackReq { id: string; type PutFeedbackReq = PostFeedbackReq; type PutFeedbackRes = PostFeedbackRes;

The following implementations are envisaged:

1 . Presenting the Client 7000 with the Viewed ID

1 . viewedld may be a table that links together

1 . When a study data was viewed

2. Who it was viewed by

3. What settings was used to filter and format the prediction

1 . cxrAiSettings

2. cxrSettings

4. Which prediction was viewed

2. 11 GetStudyRes

3. {

4. findings: { vision: {

5. //...etc

6. }},

7. error: {

8. //...

9. },

10. viewld: "xxx-yyy"

11. }

2. Returning the settings IDs that was used to format and filter the prediction

1 . Modify the current prediction payload to add a new key 1 . // GetStudyRes

2.

3. {

4. findings: { vision: {

5. //...etc

6. }},

7. error: {

8. //...

9. },

10. settings: {

11. cxrSettingsId: "xxx-yyy",

12. cxrAiSettingsId: "aaa-bbb"

13. }

14. }

15.

Types of feedback data

Users may reject a radiological finding predicted by the Al model 708. Users may add a radiological finding that was not predicted by the Al model 708. An auto-complete feature will enable a user to partially type a substring of a radiological finding in text and a suggestion to auto-complete the radiological finding in text is derived from the text information of the ontology tree for the particular imaging modality/body part. The addition of a new radiological finding will increment a counter of the number of radiological findings in a modality user interface component. Users may manually indicate an incorrect localisation/segmentation predicted by the Al model 708. This indication may be a flag icon that may be pressed via a mouse click. Users may also flag if the image or slice is an eligible series. Users may also indicate if a radiological finding is important, either in terms of clinical significance or otherwise.

The feedback mode described may be disabled by the user.

Processing feedback data Once the feedback data has been collected from the user input, it is stored in a database. User feedback data is able to be extracted from the backend for a customer with the following data: date and time feedback was submitted, accession number, study instance UID, Al prediction UID, Al prediction status, user name, model feedback responses, and whether the submit button was clicked by the user.

The feedback data may then be transmitted from the database to Cl PS 7060 for further processing. After processing by CIPS 7060, the processed feedback data is communicated to AIMS 718 in order to retrain a new Al Model 708 including new weights and biases derived from the feedback data.

Accordingly, user feedback stored in the system 10 can be extracted out and shared back to a customer (i.e. the radiology clinic) for example for product evaluation and monitoring and customisation of outputs. Alternatively, user feedback can be extracted by the system 10 along with the images. This extraction process may involve use of a de-identification tool to de-identify the data (reports and images have any personal information removed or irreversibly anonymised), and storing the data in databases or file storage systems (such as an Amazon S3 bucket). This then creates a common pool of deidentified data aggregated from all of our customers that can be queried using software to produce analytics data about the system 10 and identify areas for potential performance improvements of the Al model 708. Through a process of data extraction, root cause investigation (performed by people) and analysis, such insights can then be used to improve performance of the Al model 708. This can range from adjustment of thresholds, tuning of pre-processing I postprocessing steps or at the most extreme, trigger targeted labelling operations to generate new labels that are incorporated into new specific training sets for training improved versions of the Al Model 708.

Preparation of re-training data and model re-training using feedback data

In embodiments of the invention, a set of possible visual findings may be determined according to an ontology tree. These may be organised into a hierarchical structure or nested structure. The use of a hierarchical structure for the set of visual findings may lead to an improved accuracy of prediction as various levels of granularity of findings can be simultaneously captured, with increasing confidence when going up the hierarchical structure.

For the purpose of training models in embodiments directed to analysis of CXR images (pixel data) for classifying x-ray imagery as frontal, lateral or other x-ray imagery, a dataset of x-ray images may be used. A sub-dataset consisting solely of anatomical CXR images is preferably used for radiological findings. Each set of anatomical images may include two or more images of a body portion of the respective subject depicting a respective different orientation of at least the body portion of subject. Each of the CXR images is an anatomical x-ray image of a human being’s chest associated with a set of labels manually annotated by expert radiologists for example using a chest x-ray software labelling tool. Preferably, each label indicates whether a particular radiological finding was identified by one or more expert reviewers. A label derived from a plurality of expert reviews and feedback data may be obtained via algorithms that quantify the performance and/or uncertainty of independent reviews combined, e.g. using a vote aggregation algorithm, for example, the Dawid-Skene algorithm. These labels can then be used to train and re-train a deep neural network for findings within CXR images.

The estimated radiological finding probability generated via the Dawid-Skene algorithm is an estimated probability of the presence of each finding rather than a binary label. This is a better reflection of the likelihood of each finding and can be used as additional training signal for the deep learning model. As such the deep learning model is trained to minimise the difference between the predicted score compared to the David-Skene algorithm output directly.

After a groundtruth is ascertained for a feedback study, feedback data is incorporated into the training data and the Al model 708 is re-trained, to improve accuracy of radiological predictions and prevent future system errors.

As set out herein, there is a convolutional neural network (CNN) that receives at its input the one or more anatomical images. A CNN is a neural network that comprises convolution operators which apply a spatial filter over pixels of the input image to calculate a single output pixel, shift the spatial filter to calculate another output pixel and so on until all output pixels of an output image are calculated. That output image is also referred to as a feature map as it maps the feature against the input image.

There may be multiple different filters that are applied to the input image and each result in their own respective feature map. The step from one input image to multiple feature maps is referred to as one layerof the CNN. The CNN can have multiple further layers where further convolution operators are applied to the feature maps generated by a previous layer. Other types of layers that may aid the prediction process and that may be included between the convolutional layers are pooling layers and fully connected layers (also referred to as “dense” layers). Since the network comprises multiple layers with at least one hidden layer and potentially including the same type of layer multiple times at different depths (i.e. number of layers from the input), such a neural network is referred to as a “deep” neural network, which is in contrast to a neural network with only a single layer, such as a perceptron. In some examples, the deep neural network has more than 19 layers, more than 150 layers or more than 1 ,000 layers.

The output values of the last layer of the CNN (e.g. the last convolutional or pooling layer) are together referred to as a feature vector.

In some examples, afterthe last layerof the CNN there is a dense (e.g., fully connected layer) which takes the feature vector and combines values of the feature vector according to the weights of the dense layer. The dense layer has multiple outputs and each of the outputs is a differently weighted combination of the input values of the dense layer. Each output of the dense layer is associated with a visual finding.

Therefore, the dense layer computes an indication of a plurality of visual findings being present in at least one of the one or more anatomical images. In other words, the dense layer of the neural network takes as input the feature vector and outputs an indication of whether each of the plurality of visual findings is present in at least one of the one or more anatomical images. The neural network may be trained using backpropagation, which involves the calculates of the gradient of a loss function with respect to the weights of the network, for a single input-output example. A gradient descent method can then be used to minimise the loss and thereby find the optimal weights.

While specific examples are provided herein, it is to be noted that a wide range of different CNNs can be used. For example, LeNet, GoogleNet, ResNet, Inseption, VGG and AlexNet architectures can be used.

Further examples are provided below with the Tables.

Image Viewer with User Feedback Functionality

As illustrated in Figure 4A, a window or dialog box 1800 is provided from which the user is able to select a “feedback” button 1802 located, in this example, at the bottom of the visual findings list 1804. By clicking on this button, the user selects a feedback mode meaning that the feedback functionality of viewer component 701 is enabled and ready to allow a user to input feedback on the Al model 708 performance. This data may then be transmitted to the server 70 to help improve Al model 708 performance. The user feedback is preferably saved automatically when the user selects a new study or exits feedback mode.

By clicking on the feedback button 1802, the user can indicate that there are Al model errors present in the study, and optionally provide more specific feedback. In this example, the following categories of specific feedback may be provided:

1 . Indicate if the finding is not as expected, for example it has been incorrectly identified as a chest X-ray or incorrectly identified as a brain CT. The user can indicate this by clicking the “incorrect study flag” button 1806, representing a study error flag when showing the Al model 708 prediction; this button is located in this example at the top of the dialog box in the Window Title bar. Once clicked, the study is flagged as incorrect, and the button may appear highlighted by changing colour or brightness. To allow for cases where the user accidentally clicked the button 1806, clicking the button 1806 again will de-activate the flag and the study is no longer flagged as incorrect. For chest X-ray images, the user input may indicate that “This is not a CXR” (Figure 4F). In this case, the system may display an error message. Example error messages include for example: incorrect “Not a CXR” error or incorrect “No frontal image” error. Alternatively, with reference to Figure 4G, the user input may indicate that “This is a CXR”.

For CT images, the user input may indicate that there is “No eligible series available for processing”. For example, this indication may specifically mean “this is a contrast CTB”, “this is a bone window CTB”, “this is not a CT of the brain” or “this is not a CXR”. In other words, the image or pixel data displayed does not reflect the user’s expectation of what they thought they would be see (see also “incorrect study flag” button 1906 in Figures 5A, 5B). In this case, the system may display an error message. Example error messages include for example “No axial images available” or incorrect “Not a non-contrast CTB” error.

Possible exemplary scenarios of user feedback for incorrect CT findings include:

1 . “This is a contrast CTB”

2. “This is a bone window CTB”

3. “This is not a CT of the brain at all”

2. Indicate incorrect visual findings (i.e. incorrect specific predictions made by the Al model 708). To indicate a visual finding as incorrect, the user can click on the “incorrect finding” button 1808 located in this example to the rightside of the field displaying the finding name (“Simple effusion”, or “In position CVC” as shown in Figure 4B). Once clicked, the visual finding name is displayed an incorrect, for example, using a strikethrough line and the button may display an “undo” sign (see Figure 4C). To allow for cases where the user accidentally clicked the button 1808, clicking the same button 1808 representing an “undo” button (Figure 4C) will re-instate the visual finding as correct.

3. Indicate incorrect localisation or segmentation (i.e. incorrect localisation/segmentation predictions made by the Al model 708). For visual findings that have associated localisation or segmentation, the user may flag where this is incorrect by clicking the incorrect segmentation/localisation button 1822. In the present example, once the localisation is flagged as incorrect, the button 1822 may change colour or brightness; the button 1822 may be clicked again to return to its original display state, whilst the indication of incorrect localisation/segmentation is removed.

4. Add missing visual findings (i.e. visual findings which have not been generated by the Al model 708). The user may select a “missing finding” button 1810 which results in the viewer displaying a field 1810 comprising a list of visual findings 1812 (see Figure 4D). The user may search the list 1812 and select the missing visual finding to be added to the study. Where the missing visual finding the user is searching for is not present in the list 1812, the user may select an “Add New” study finding button 1814 and manually input the visual finding name. In the present example, predicted search functionality requires three letters to be entered by the user before any predicted text is displayed (“Acute rib fracture” as shown in Figure 4E). Once the missing finding has been added to the list, it is displayed in a field under a “user added” sublist 1818 of the list 1812. A manually added visual finding can be removed by clicking a “cancel” button 1820 displayed in this example to the right of the added visual finding name.

To exit feedback mode, the user may select the feedback button 1802 again. The feedback button 1802 may change colour or brightness for example, when switching between feedback mode off (where no feedback is provided) to feedback model on (e.g. where a study is flagged as containing Al model errors).

Figures 5A and 5B display views of another example of a window or dialog box 1900 this time for displaying a CT brain study. As described above, a user may input feedback data such as adding or rejecting a visual finding, or indicating incorrect localisation/segmentation.

In preferred embodiments the feedback mode is enabled by default. It will be appreciated that users (organisations) may configure fields displaying questions or checkboxes. In this example, a particular important visual finding (e.g. vasogenic oedema) may be flagged when a user selects an “important finding” button 1902 (star shaped in this example) available for each visual finding listed.

Figure 6 is an exemplary report 600 generated on an interactive user interface. In some embodiments, in addition to generating a list of the visual findings present, such as the visual findings list 1804 shown in Figure 4A, an additional output may be generated in the form of report. In this generated report, the visual findings present is described in sentences, rather than in a discrete list, such as the visual findings list 1804 shown in Figure 4A. In some examples, the report may be generated using one or more machine learning models, such as natural language programming (NLP) models.

While an example of such report is shown in Figure 6, the report 600 may be customisable to communicate the plurality of visual findings in a desired manner. The generated report 600 may hyperlink 610 each of the visual findings that are present in at least one of the one or more anatomical image, such that, the associated anatomical image is displayed with a segmentation mask indicating a localisation of the visual finding upon a user interacting with the hyperlink 610.

Generating such a report 600 enables the detected visual findings to be communicated to the user in a more efficient manner. For example, the user may be a radiologist and a report is generated to model a radiographical report format that is readily understandable by the radiologist.

Moreover, a user may edit the report 600 generated on an interactive user interface to provide feedback data. More specifically, the differences between the initially generated report and the edited report may be used as feedback data to train the neural network. For example, the report 600 shown in Figure 6 may be generated on an interactive user interface, for which the user is a radiologist. In this example, the bold text 620 is not be editable by the user, whereas the text 630 is editable, which enables the user to provide a correction and create feedback data. The plurality of visual findings communicated on the generated report are associated with at least one of the one or more anatomical images, which may also be provided on the interactive user interface for the user to view. In this example, the user may determine that pneumothorax is present in the associated anatomical image, despite the report stating that pneumothorax is not present. As such, the user may edit the text 630 and write “Pneumothorax is present” or “Pneumothorax is detected”, for example. The user may then submit the edited report, and the user system generates the feedback data as the user system detects a difference between the initially generated report and the edited report submitted by the user. For example, the user system may compare the text 600 from both reports, determine that the visual finding is “Pneumothorax” and determine that the visual finding is present in the associated anatomical image. The user system may use one or more algorithms and/or one or more machine learning models, such as NLP models, to compare the two reports and generate the feedback data.

Method

Figure 7 illustrates a computer implemented method 700 for detecting a plurality of visual findings in one or more anatomical images of a subject. The method comprises providing 701 one or more anatomical images of the subject and inputting 702 the one or more anatomical images into a convolutional neural network (CNN) component of a neural network to output a feature vector as disclosed herein. Method 700 further comprises computing 703 an indication of a plurality of visual findings being present in at least one of the one or more anatomical images by a dense layer of the neural network that takes as input the feature vector and outputs an indication of whether each of the plurality of visual findings is present in at least one of the one or more anatomical images. Method 700 also comprises communicating 704 the plurality of visual findings to a user system configured to receive feedback data associated with the plurality of visual findings; and transmitting the feedback data to the neural network. As described above, the neural network is trained on a training dataset including, for each of a plurality of subjects, one or more anatomical images, and a plurality of labels associated with the one or more anatomical images and each of the respective visual findings, the plurality of labels comprising labels obtained using the transmitted feedback data. Method 700 may be implemented in a computer system described below.

Systems With regard to the preceding overview of the system 10, and other processing systems and devices described in this specification, terms such as ‘processor’, ‘computer’, and so forth, unless otherwise required by the context, should be understood as referring to a range of possible implementations of devices, apparatus and systems comprising a combination of hardware and software. This includes single-processor and multiprocessor devices and apparatus, including portable devices, desktop computers, and various types of server systems, including cooperating hardware and software platforms that may be co-located or distributed. Physical processors may include general purpose CPUs, digital signal processors, GPUs, and/or other hardware devices suitable for efficient execution of required programs and algorithms.

Computing systems may include conventional personal computer architectures, or other general-purpose hardware platforms. Software may include open-source and/or commercially available operating system software in combination with various application and service programs. Alternatively, computing or processing platforms may comprise custom hardware and/or software architectures. As previously noted, computing and processing systems may comprise cloud computing platforms, enabling physical hardware resources, including processing and storage, to be allocated dynamically in response to service demands.

Terms such as ‘processing unit’, ‘component’, and ‘module’ are used in this specification to refer to any suitable combination of hardware and software configured to perform a particular defined task. Such a processing unit, components, or modules may comprise executable code executing at a single location on a single processing device, or may comprise cooperating executable code modules executing in multiple locations and/or on multiple processing devices. Where exemplary embodiments are described herein with reference to one such architecture (e.g. cooperating service components of the cloud computing architecture described above) it will be appreciated that, where appropriate, equivalent functionality may be implemented in other embodiments using alternative architectures.

The program code embodied in any of the applications/modules described herein is capable of being individually or collectively distributed as a program product in a variety of different forms. In particular, the program code may be distributed using a computer readable storage medium having computer readable program instructions thereon for causing a processor to carry out aspects of the embodiments of the invention.

Computer readable storage media may include volatile and non-volatile, and removable and non-removable, tangible media implemented in any method or technology for storage of information, such as computer-readable instructions, data structures, program modules, or other data. Computer readable storage media may further include random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM), electrically erasable programmable readonly memory (EEPROM), flash memory or other solid state memory technology, portable compact disc read-only memory (CD-ROM), or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to store the desired information and which can be read by a computer. Computer readable program instructions may be downloaded via transitory signals to a computer, another type of programmable data processing apparatus, or another device from a computer readable storage medium or to an external computer or external storage device via a network.

Computer readable program instructions stored in a computer readable medium may be used to direct a computer, other types of programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions that implement the functions, acts, and/or operations specified in the flowcharts, sequence diagrams, and/or block diagrams. The computer program instructions may be provided to one or more processors of a general purpose computer, a special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the one or more processors, cause a series of computations to be performed to implement the functions, acts, and/or operations specified in the flowcharts, sequence diagrams, and/or block diagrams.

Interpretation Software components embodying features of the invention may be developed using any suitable programming language, development environment, or combinations of languages and development environments, as will be familiar to persons skilled in the art of software engineering. For example, suitable software may be developed using the Typescript programming language, the Rust programming language, the Go programming language, the Python programming language, the SQL query language, and/or other languages suitable for implementation of applications, including webbased applications, comprising statistical modelling, machine learning, data analysis, data storage and retrieval, and other algorithms. Implementation of embodiments of the invention may be facilitated by the used of available libraries and frameworks, such as TensorFlow or PyTorch for the development, training and deployment of machine learning models using the Python programming language.

It will be appreciated by skilled persons that embodiments of the invention involve the preparation of training data, as well as the implementation of software structures and code that are not well-understood, routine, or conventional in the art of anatomical image analysis, and that while pre-existing languages, frameworks, platforms, development environments, and code libraries may assist implementation, they require specific configuration and extensive augmentation (i.e. additional code development) in order to realize various benefits and advantages of the invention and implement the specific structures, processing, computations, and algorithms described herein with reference to the drawings.

The described examples of languages, environments, and code libraries are not intended to be limiting, and it will be appreciated that any convenient languages, libraries, and development systems may be employed, in accordance with system requirements. The descriptions, block diagrams, flowcharts, tables, and so forth, presented in this specification are provided, by way of example, to enable those skilled in the arts of software engineering, statistical modelling, machine learning, and data analysis to understand and appreciate the features, nature, and scope of the invention, and to put one or more embodiments of the invention into effect by implementation of suitable software code using any suitable languages, frameworks, libraries and development systems in accordance with this disclosure without exercise of additional inventive ingenuity. Any discussion of documents, acts, materials, devices, articles or the like which has been included in the present specification is not to be taken as an admission that any or all of these matters form part of the prior art base or were common general knowledge in the field relevant to the present disclosure as it existed before the priority date of each of the appended claims.

Those skilled in the art will be able to make modifications and alternatives in view of the disclosure which are contemplated as falling within the scope of the appended claims. Each feature disclosed or illustrated in the present specification may be incorporated in the invention, whether alone or in any appropriate combination with any other feature disclosed or illustrated herein.

It will be appreciated that the order of performance of the steps in any of the embodiments in the present description is not essential, unless required by context or otherwise specified. Therefore, most steps may be performed in any order. In addition, any of the embodiments may include more or fewer steps than those disclosed.

Additionally, it will be appreciated that the term “comprising” and its grammatical variants must be interpreted inclusively, unless the context requires otherwise. That is, “comprising” should be interpreted as meaning “including but not limited to”.

Tables

Table 1 - Model feedback example

Table 2 - Model example 1 Layer (type) Output Shape Param # Connected to input (InputLayer) [(None, None, 1024, 1024, 0

1)] tf.compat.v1 .shape (5,) 0 input[O][O] (TFOpLambda) tf. operators .getitem 0 0 tf.compat.v1 .shape[0][0] (SlicingOpLambda) tf. operators ,getitem_1 0 0 tf.compat.v1 .shape[0][0] (SlicingOpLambda) tf.math. multiply (TFOpLambda) 0 0 tf. operators .getitem[0][0] tf. operators ,getitem_1 [0][0] tf. operators ,getitem_2 0 0 tf.compat.v1.shape[0][0] (SlicingOpLambda) tf. operators ,getitem_3 0 0 tf.compat.v1 .shape[0][0] (SlicingOpLambda) tf. operators ,getitem_4 0 0 tf.compat.v1 .shape[0][0] (SlicingOpLambda) tf.reshape (TFOpLambda) (None, None, None, None) 0 input[O][O] tf.math. multiply[0][0] tf. operators ,getitem_2[0][0] tf. operators .getitem_3[0][0] tf. operators .getitem_4[0][0] model_1 (Functional) [(None, 512, 512, 16), (None, 6378604 tf.reshape[O][O]

32, 3 tf.reshape_2 (TFOpLambda) (None, None, 32, 32, 1280) 0 model_1 [0][1] tf. operators .getitem[0][0] tf. operators ,getitem_1 [0][0] top_activations (Lambda) (None, None, 32, 32, 1280) 0 tf.reshape_2[0][0] tf.math. reduce_max (None, 1280) 0 top_activations[0][0]

(TFOpLambda) tf.math. reduce_mean (None, 1280) 0 top_activations[0][0]

(TFOpLambda) tf.concat (TFOpLambda) (None, 2560) 0 tf. math .red u ce_max[0] [0] tf.math. reduce_mean[0][0] dropout (Dropout) (None, 2560) 0 tf.concat[0][0] tf.reshape_1 (TFOpLambda) (None, None, 512, 512, 16) 0 model_1 [0][0] tf. operators .getitem[0][0] tf. operators ,getitem_1 [0][0] logits (Dense) (None, 263) 673543 dropout[0][0] seg (Lambda) (None, None, 512, 512, 16) 0 tf.reshape_1 [0][0] cis (Activation) (None, 263) 0 logits[0][0]

Total params: 7,052,147

Trainable params: 7,009,107

Non-trainable params: 43,040 Table 3 - Model example 2 (model_1 above)

Layer (type) Output Shape Param # Connected to input_1 (InputLayer) [(None, 1024, 1024, 1)] 0 stem_conv (Conv2D) (None, 512, 512, 32) 288 input_1 [0][0] stem_bn (BatchNormalization) (None, 512, 512, 32) 128 stem_conv[0][0] stem_activation (Activation) (None, 512, 512, 32) 0 stem_bn[0][0] blockl a_dwconv (None, 512, 512, 32) 288 stem_activation[0][0] (DepthwiseConv2D) blockl a_bn (None, 512, 512, 32) 128 blockl a_dwconv[0][0] (BatchNormalization) blockl a_activation (Activation) (None, 512, 512, 32) 0 blockl a_bn[0][0] blockl a_se_squeeze (None, 32) 0 blockl a_activation[0][0]

(GlobalAveragePooling2D) blockl a_se_reshape (None, 1 , 1 , 32) 0 blockl a_se_squeeze[0][0] (Reshape) blockl a_se_reduce (Conv2D) (None, 1 , 1 , 8) 264 blockl a_se_reshape[0][0] blockl a_se_expand (Conv2D) (None, 1 , 1 , 32) 288 blockl a_se_reduce[0][0] blockl a_se_excite (Multiply) (None, 512, 512, 32) 0 blockl a_activation[0][0] blockl a_se_expand[0][0] blockl a_project_conv (None, 512, 512, 16) 512 blockl a_se_excite[0][0]

(Conv2D) blockl a_project_bn (None, 512, 512, 16) 64 blockl a_project_conv[0][0]

(BatchNormalization) block2a_expand_conv (None, 512, 512, 96) 1536 blockl a_project_bn[0][0]

(Conv2D) block2a_expand_bn (None, 512, 512, 96) 384 block2a_expand_conv[0][0]

(BatchNormalization) block2a_expand_activation (None, 512, 512, 96) 0 block2a_expand_bn[0][0] (Activation) block2a_dwconv (None, 256, 256, 96) 864 block2a_expand_activation[0][0]

(DepthwiseConv2D) block2a_bn (None, 256, 256, 96) 384 block2a_dwconv[0][0] (BatchNormalization) block2a_activation (Activation) (None, 256, 256, 96) 0 block2a_bn[0][0] block2a_se_squeeze (None, 96) 0 block2a_activation[0][0]

(GlobalAveragePooling2D) block2a_se_reshape (None, 1 , 1 , 96) 0 block2a_se_squeeze[0][0] (Reshape) block2a_se_reduce (Conv2D) (None, 1 , 1 , 4) 388 block2a_se_reshape[0][0] block2a_se_expand (Conv2D) (None, 1 , 1 , 96) 480 block2a_se_reduce[0][0] block2a_se_excite (Multiply) (None, 256, 256, 96) 0 block2a_activation[0][0] block2a_se_expand[0][0] block2a_project_conv (None, 256, 256, 24) 2304 block2a_se_excite[0][0]

(Conv2D) block2a_project_bn (None, 256, 256, 24) 96 block2a_project_conv[0][0]

(BatchNormalization) block2b_expand_conv (None, 256, 256, 144) 3456 block2a_project_bn[0][0]

(Conv2D) block2b_expand_bn (None, 256, 256, 144) 576 block2b_expand_conv[0][0]

(BatchNormalization) block2b_expand_activation (None, 256, 256, 144) 0 block2b_expand_bn[0][0]

(Activation) block2b_dwconv (None, 256, 256, 144) 1296 block2b_expand_activation[0][0]

(DepthwiseConv2D) block2b_bn (None, 256, 256, 144) 576 block2b_dwconv[0][0]

(BatchNormalization) block2b_activation (Activation) (None, 256, 256, 144) 0 block2b_bn[0][0] block2b_se_squeeze (None, 144) 0 block2b_activation[0][0]

(GlobalAveragePooling2D) block2b_se_reshape (None, 1 , 1 , 144) 0 block2b_se_squeeze[0][0]

(Reshape) block2b_se_reduce (Conv2D) (None, 1 , 1 , 6) 870 block2b_se_reshape[0][0] block2b_se_expand (Conv2D) (None, 1 , 1 , 144) 1008 block2b_se_reduce[0][0] block2b_se_excite (Multiply) (None, 256, 256, 144) 0 block2b_activation[0][0] block2b_se_expand[0][0] block2b_project_conv (None, 256, 256, 24) 3456 block2b_se_excite[0][0]

(Conv2D) block2b_project_bn (None, 256, 256, 24) 96 block2b_project_conv[0][0]

(BatchNormalization) block2b_drop (FixedDropout) (None, 256, 256, 24) 0 block2b_project_bn[0][0] block2b_add (Add) (None, 256, 256, 24) 0 block2b_drop[0][0] block2a_project_bn[0][0] block3a_expand_conv (None, 256, 256, 144) 3456 block2b_add[0][0]

(Conv2D) block3a_expand_bn (None, 256, 256, 144) 576 block3a_expand_conv[0][0]

(BatchNormalization) block3a_expand_activation (None, 256, 256, 144) 0 block3a_expand_bn[0][0]

(Activation) block3a_dwconv (None, 128, 128, 144) 3600 block3a_expand_activation[0][0]

(DepthwiseConv2D) block3a_bn (None, 128, 128, 144) 576 block3a_dwconv[0][0] (BatchNormalization) block3a_activation (Activation) (None, 128, 128, 144) 0 block3a_bn[0][0] block3a_se_squeeze (None, 144) 0 block3a_activation[0][0]

(GlobalAveragePooling2D) block3a_se_reshape (None, 1 , 1 , 144) 0 block3a_se_squeeze[0][0] (Reshape) block3a_se_reduce (Conv2D) (None, 1 , 1 , 6) 870 block3a_se_reshape[0][0] block3a_se_expand (Conv2D) (None, 1 , 1 , 144) 1008 block3a_se_reduce[0][0] block3a_se_excite (Multiply) (None, 128, 128, 144) 0 block3a_activation[0][0] block3a_se_expand[0][0] block3a_project_conv (None, 128, 128, 40) 5760 block3a_se_excite[0][0]

(Conv2D) block3a_project_bn (None, 128, 128, 40) 160 block3a_project_conv[0][0]

(BatchNormalization) block3b_expand_conv (None, 128, 128, 240) 9600 block3a_project_bn[0][0]

(Conv2D) block3b_expand_bn (None, 128, 128, 240) 960 block3b_expand_conv[0][0]

(BatchNormalization) block3b_expand_activation (None, 128, 128, 240) 0 block3b_expand_bn[0][0]

(Activation) block3b_dwconv (None, 128, 128, 240) 6000 block3b_expand_activation[0][0]

(DepthwiseConv2D) block3b_bn (None, 128, 128, 240) 960 block3b_dwconv[0][0] (BatchNormalization) block3b_activation (Activation) (None, 128, 128, 240) 0 block3b_bn[0][0] block3b_se_squeeze (None, 240) 0 block3b_activation[0][0]

(GlobalAveragePooling2D) block3b_se_reshape (None, 1 , 1 , 240) 0 block3b_se_squeeze[0][0] (Reshape) block3b_se_reduce (Conv2D) (None, 1 , 1 , 10) 2410 block3b_se_reshape[0][0] block3b_se_expand (Conv2D) (None, 1 , 1 , 240) 2640 block3b_se_reduce[0][0] block3b_se_excite (Multiply) (None, 128, 128, 240) 0 block3b_activation[0][0] block3b_se_expand[0][0] block3b_project_conv (None, 128, 128, 40) 9600 block3b_se_excite[0][0]

(Conv2D) block3b_project_bn (None, 128, 128, 40) 160 block3b_project_conv[0][0]

(BatchNormalization) block3b_drop (FixedDropout) (None, 128, 128, 40) 0 block3b_project_bn[0][0] block3b_add (Add) (None, 128, 128, 40) 0 block3b_drop[0][0] block3a_project_bn[0][0] block4a_expand_conv (None, 128, 128, 240) 9600 block3b_add[0][0]

(Conv2D) block4a_expand_bn (None, 128, 128, 240) 960 block4a_expand_conv[0][0]

(BatchNormalization) block4a_expand_activation (None, 128, 128, 240) 0 block4a_expand_bn[0][0]

(Activation) block4a_dwconv (None, 64, 64, 240) 2160 block4a_expand_activation[0][0]

(DepthwiseConv2D) block4a_bn (None, 64, 64, 240) 960 block4a_dwconv[0][0] (BatchNormalization) block4a_activation (Activation) (None, 64, 64, 240) 0 block4a_bn[0][0] block4a_se_squeeze (None, 240) 0 block4a_activation[0][0]

(GlobalAveragePooling2D) block4a_se_reshape (None, 1 , 1 , 240) 0 block4a_se_squeeze[0][0]

(Reshape) block4a_se_reduce (Conv2D) (None, 1 , 1 , 10) 2410 block4a_se_reshape[0][0] block4a_se_expand (Conv2D) (None, 1 , 1 , 240) 2640 block4a_se_reduce[0][0] block4a_se_excite (Multiply) (None, 64, 64, 240) 0 block4a_activation[0][0] block4a_se_expand[0][0] block4a_project_conv (None, 64, 64, 80) 19200 block4a_se_excite[0][0]

(Conv2D) block4a_project_bn (None, 64, 64, 80) 320 block4a_project_conv[0][0]

(BatchNormalization) block4b_expand_conv (None, 64, 64, 480) 38400 block4a_project_bn[0][0]

(Conv2D) block4b_expand_bn (None, 64, 64, 480) 1920 block4b_expand_conv[0][0]

(BatchNormalization) block4b_expand_activation (None, 64, 64, 480) 0 block4b_expand_bn[0][0]

(Activation) block4b_dwconv (None, 64, 64, 480) 4320 block4b_expand_activation[0][0]

(DepthwiseConv2D) block4b_bn (None, 64, 64, 480) 1920 block4b_dwconv[0][0]

(BatchNormalization) block4b_activation (Activation) (None, 64, 64, 480) 0 block4b_bn[0][0] block4b_se_squeeze (None, 480) 0 block4b_activation[0][0]

(GlobalAveragePooling2D) block4b_se_reshape (None, 1 , 1 , 480) 0 block4b_se_squeeze[0][0]

(Reshape) block4b_se_reduce (Conv2D) (None, 1 , 1 , 20) 9620 block4b_se_reshape[0][0] block4b_se_expand (Conv2D) (None, 1 , 1 , 480) 10080 block4b_se_reduce[0][0] block4b_se_excite (Multiply) (None, 64, 64, 480) 0 block4b_activation[0][0] block4b_se_expand[0][0] block4b_project_conv (None, 64, 64, 80) 38400 block4b_se_excite[0][0]

(Conv2D) block4b_project_bn (None, 64, 64, 80) 320 block4b_project_conv[0][0]

(BatchNormalization) block4b_drop (FixedDropout) (None, 64, 64, 80) 0 block4b_project_bn[0][0] block4b_add (Add) (None, 64, 64, 80) 0 block4b_drop[0][0] block4a_project_bn[0][0] block4c_expand_conv (None, 64, 64, 480) 38400 block4b_add[0][0]

(Conv2D) block4c_expand_bn (None, 64, 64, 480) 1920 block4c_expand_conv[0][0]

(BatchNormalization) block4c_expand_activation (None, 64, 64, 480) 0 block4c_expand_bn[0][0]

(Activation) block4c_dwconv (None, 64, 64, 480) 4320 block4c_expand_activation[0][0]

(DepthwiseConv2D) block4c_bn (None, 64, 64, 480) 1920 block4c_dwconv[0][0]

(BatchNormalization) block4c_activation (Activation) (None, 64, 64, 480) 0 block4c_bn[0][0] block4c_se_squeeze (None, 480) 0 block4c_activation[0][0]

(GlobalAveragePooling2D) block4c_se_reshape (None, 1 , 1 , 480) 0 block4c_se_squeeze[0][0]

(Reshape) block4c_se_reduce (Conv2D) (None, 1 , 1 , 20) 9620 block4c_se_reshape[0][0] block4c_se_expand (Conv2D) (None, 1 , 1 , 480) 10080 block4c_se_reduce[0][0] block4c_se_excite (Multiply) (None, 64, 64, 480) 0 block4c_activation[0][0] block4c_se_expand[0][0] block4c_project_conv (None, 64, 64, 80) 38400 block4c_se_excite[0][0]

(Conv2D) block4c_project_bn (None, 64, 64, 80) 320 block4c_project_conv[0][0]

(BatchNormalization) block4c_drop (FixedDropout) (None, 64, 64, 80) 0 block4c_project_bn[0][0] block4c_add (Add) (None, 64, 64, 80) 0 block4c_drop[0][0] block4b_add[0][0] block5a_expand_conv (None, 64, 64, 480) 38400 block4c_add[0][0] (Conv2D) block5a_expand_bn (None, 64, 64, 480) 1920 block5a_expand_conv[0][0] (BatchNormalization) block5a_expand_activation (None, 64, 64, 480) 0 block5a_expand_bn[0][0]

(Activation) block5a_dwconv (None, 64, 64, 480) 12000 block5a_expand_activation[0][0] (DepthwiseConv2D) block5a_bn (None, 64, 64, 480) 1920 block5a_dwconv[0][0] (BatchNormalization) block5a_activation (Activation) (None, 64, 64, 480) 0 block5a_bn[0][0] block5a_se_squeeze (None, 480) 0 block5a_activation[0][0] (GlobalAveragePooling2D) block5a_se_reshape (None, 1 , 1 , 480) 0 block5a_se_squeeze[0][0] (Reshape) block5a_se_reduce (Conv2D) (None, 1 , 1 , 20) 9620 block5a_se_reshape[0][0] block5a_se_expand (Conv2D) (None, 1 , 1 , 480) 10080 block5a_se_reduce[0][0] block5a_se_excite (Multiply) (None, 64, 64, 480) 0 block5a_activation[0][0] block5a_se_expand[0][0] block5a_project_conv (None, 64, 64, 112) 53760 block5a_se_excite[0][0] (Conv2D) block5a_project_bn (None, 64, 64, 112) 448 block5a_project_conv[0][0] (BatchNormalization) block5b_expand_conv (None, 64, 64, 672) 75264 block5a_project_bn[0][0] (Conv2D) block5b_expand_bn (None, 64, 64, 672) 2688 block5b_expand_conv[0][0] (BatchNormalization) block5b_expand_activation (None, 64, 64, 672) 0 block5b_expand_bn[0][0]

(Activation) block5b_dwconv (None, 64, 64, 672) 16800 block5b_expand_activation[0][0] (DepthwiseConv2D) block5b_bn (None, 64, 64, 672) 2688 block5b_dwconv[0][0] (BatchNormalization) block5b_activation (Activation) (None, 64, 64, 672) 0 block5b_bn[0][0] block5b_se_squeeze (None, 672) 0 block5b_activation[0][0] (GlobalAveragePooling2D) block5b_se_reshape (None, 1 , 1 , 672) 0 block5b_se_squeeze[0][0] (Reshape) block5b_se_reduce (Conv2D) (None, 1 , 1 , 28) 18844 block5b_se_reshape[0][0] block5b_se_expand (Conv2D) (None, 1 , 1 , 672) 19488 block5b_se_reduce[0][0] block5b_se_excite (Multiply) (None, 64, 64, 672) 0 block5b_activation[0][0] block5b_se_expand[0][0] block5b_project_conv (None, 64, 64, 112) 75264 block5b_se_excite[0][0] (Conv2D) block5b_project_bn (None, 64, 64, 112) 448 block5b_project_conv[0][0] (BatchNormalization) block5b_drop (FixedDropout) (None, 64, 64, 112) 0 block5b_project_bn[0][0] block5b_add (Add) (None, 64, 64, 112) 0 block5b_drop[0][0] block5a_project_bn[0][0] block5c_expand_conv (None, 64, 64, 672) 75264 block5b_add[0][0] (Conv2D) block5c_expand_bn (None, 64, 64, 672) 2688 block5c_expand_conv[0][0] (BatchNormalization) blocks c_expand_a ct ivation (None, 64, 64, 672) 0 block5c_expand_bn[0][0]

(Activation) block5c_dwconv (None, 64, 64, 672) 16800 block5c_expand_activation[0][0] (DepthwiseConv2D) block5c_bn (None, 64, 64, 672) 2688 block5c_dwconv[0][0] (BatchNormalization) block5c_activation (Activation) (None, 64, 64, 672) 0 block5c_bn[0][0] block5c_se_squeeze (None, 672) 0 block5c_activation[0][0] (GlobalAveragePooling2D) block5c_se_reshape (None, 1 , 1 , 672) 0 block5c_se_squeeze[0][0] (Reshape) block5c_se_reduce (Conv2D) (None, 1 , 1 , 28) 18844 block5c_se_reshape[0][0] block5c_se_expand (Conv2D) (None, 1 , 1 , 672) 19488 block5c_se_reduce[0][0] block5c_se_excite (Multiply) (None, 64, 64, 672) 0 block5c_activation[0][0] block5c_se_expand[0][0] block5c_project_conv (None, 64, 64, 112) 75264 block5c_se_excite[0][0] (Conv2D) block5c_project_bn (None, 64, 64, 112) 448 block5c_project_conv[0][0] (BatchNormalization) block5c_drop (FixedDropout) (None, 64, 64, 112) 0 block5c_project_bn[0][0] block5c_add (Add) (None, 64, 64, 112) 0 block5c_drop[0][0] block5b_add[0][0] block6a_expand_conv (None, 64, 64, 672) 75264 block5c_add[0][0] (Conv2D) block6a_expand_bn (None, 64, 64, 672) 2688 block6a_expand_conv[0][0] (BatchNormalization) block6a_expand_activation (None, 64, 64, 672) 0 block6a_expand_bn[0][0]

(Activation) block6a_dwconv (None, 32, 32, 672) 16800 block6a_expand_activation[0][0] (DepthwiseConv2D) block6a_bn (None, 32, 32, 672) 2688 block6a_dwconv[0][0] (BatchNormalization) block6a_activation (Activation) (None, 32, 32, 672) 0 block6a_bn[0][0] block6a_se_squeeze (None, 672) 0 block6a_activation[0][0]

(GlobalAveragePooling2D) block6a_se_reshape (None, 1 , 1 , 672) 0 block6a_se_squeeze[0][0] (Reshape) block6a_se_reduce (Conv2D) (None, 1 , 1 , 28) 18844 block6a_se_reshape[0][0] block6a_se_expand (Conv2D) (None, 1 , 1 , 672) 19488 block6a_se_reduce[0][0] block6a_se_excite (Multiply) (None, 32, 32, 672) 0 block6a_activation[0][0] block6a_se_expand[0][0] block6a_project_conv (None, 32, 32, 192) 129024 block6a_se_excite[0][0] (Conv2D) block6a_project_bn (None, 32, 32, 192) 768 block6a_project_conv[0][0] (BatchNormalization) block6b_expand_conv (None, 32, 32, 1152) 221184 block6a_project_bn[0][0]

(Conv2D) block6b_expand_bn (None, 32, 32, 1152) 4608 block6b_expand_conv[0][0]

(BatchNormalization) block6b_expand_activation (None, 32, 32, 1152) 0 block6b_expand_bn[0][0] (Activation) block6b_dwconv (None, 32, 32, 1152) 28800 block6b_expand_activation[0][0] (DepthwiseConv2D) block6b_bn (None, 32, 32, 1152) 4608 block6b_dwconv[0][0]

(BatchNormalization) block6b_activation (Activation) (None, 32, 32, 1152) 0 block6b_bn[0][0] block6b_se_squeeze (None, 1152) 0 block6b_activation[0][0] (GlobalAveragePooling2D) block6b_se_reshape (None, 1 , 1 , 1152) 0 block6b_se_squeeze[0][0] (Reshape) block6b_se_reduce (Conv2D) (None, 1 , 1 , 48) 55344 block6b_se_reshape[0][0] block6b_se_expand (Conv2D) (None, 1 , 1 , 1152) 56448 block6b_se_reduce[0][0] block6b_se_excite (Multiply) (None, 32, 32, 1152) 0 block6b_activation[0][0] block6b_se_expand[0][0] block6b_project_conv (None, 32, 32, 192) 221184 block6b_se_excite[0][0]

(Conv2D) block6b_project_bn (None, 32, 32, 192) 768 block6b_project_conv[0][0]

(BatchNormalization) block6b_drop (FixedDropout) (None, 32, 32, 192) 0 block6b_project_bn[0][0] block6b_add (Add) (None, 32, 32, 192) 0 block6b_drop[0][0] block6a_project_bn[0][0] block6c_expand_conv (None, 32, 32, 1152) 221184 block6b_add[0][0]

(Conv2D) block6c_expand_bn (None, 32, 32, 1152) 4608 block6c_expand_conv[0][0]

(BatchNormalization) block6c_expand_activation (None, 32, 32, 1152) 0 block6c_expand_bn[0][0]

(Activation) block6c_dwconv (None, 32, 32, 1152) 28800 block6c_expand_activation[0][0]

(DepthwiseConv2D) block6c_bn (None, 32, 32, 1152) 4608 block6c_dwconv[0][0]

(BatchNormalization) block6c_activation (Activation) (None, 32, 32, 1152) 0 block6c_bn[0][0] block6c_se_squeeze (None, 1152) 0 block6c_activation[0][0]

(GlobalAveragePooling2D) block6c_se_reshape (None, 1 , 1 , 1152) 0 block6c_se_squeeze[0][0]

(Reshape) block6c_se_reduce (Conv2D) (None, 1 , 1 , 48) 55344 block6c_se_reshape[0][0] block6c_se_expand (Conv2D) (None, 1 , 1 , 1152) 56448 block6c_se_reduce[0][0] block6c_se_excite (Multiply) (None, 32, 32, 1152) 0 block6c_activation[0][0] block6c_se_expand[0][0] block6c_project_conv (None, 32, 32, 192) 221184 block6c_se_excite[0][0]

(Conv2D) block6c_project_bn (None, 32, 32, 192) 768 block6c_project_conv[0][0]

(BatchNormalization) block6c_drop (FixedDropout) (None, 32, 32, 192) 0 block6c_project_bn[0][0] block6c_add (Add) (None, 32, 32, 192) 0 block6c_drop[0][0] block6b_add[0][0] block6d_expand_conv (None, 32, 32, 1152) 221184 block6c_add[0][0]

(Conv2D) block6d_expand_bn (None, 32, 32, 1152) 4608 block6d_expand_conv[0][0]

(BatchNormalization) block6d_expand_activation (None, 32, 32, 1152) 0 block6d_expand_bn[0][0]

(Activation) block6d_dwconv (None, 32, 32, 1152) 28800 block6d_expand_activation[0][0]

(DepthwiseConv2D) block6d_bn (None, 32, 32, 1152) 4608 block6d_dwconv[0][0]

(BatchNormalization) block6d_activation (Activation) (None, 32, 32, 1152) 0 block6d_bn[0][0] block6d_se_squeeze (None, 1152) 0 block6d_activation[0][0]

(GlobalAveragePooling2D) block6d_se_reshape (None, 1 , 1 , 1152) 0 block6d_se_squeeze[0][0]

(Reshape) block6d_se_reduce (Conv2D) (None, 1 , 1 , 48) 55344 block6d_se_reshape[0][0] block6d_se_expand (Conv2D) (None, 1 , 1 , 1152) 56448 block6d_se_reduce[0][0] block6d_se_excite (Multiply) (None, 32, 32, 1152) 0 block6d_activation[0][0] block6d_se_expand[0][0] block6d_project_conv (None, 32, 32, 192) 221184 block6d_se_excite[0][0]

(Conv2D) block6d_project_bn (None, 32, 32, 192) 768 block6d_project_conv[0][0]

(BatchNormalization) block6d_drop (FixedDropout) (None, 32, 32, 192) 0 block6d_project_bn[0][0] block6d_add (Add) (None, 32, 32, 192) 0 block6d_drop[0][0] block6c_add[0][0] block? a_expand_conv (None, 32, 32, 1152) 221184 block6d_add[0][0]

(Conv2D) block? a_expand_bn (None, 32, 32, 1152) 4608 block7a_expand_conv[0][0] (BatchNormalization) block? a_expand_activation (None, 32, 32, 1152) 0 block? a_expand_bn[0][0] (Activation) block7a_dwconv (None, 32, 32, 1152) 10368 block7a_expand_activation[0][0] (DepthwiseConv2D) block7a_bn (None, 32, 32, 1152) 4608 block7a_dwconv[0][0] (BatchNormalization) block7a_activation (Activation) (None, 32, 32, 1152) 0 block7a_bn[0][0] block? a_se_squeeze (None, 1152) 0 block? a_activation[0][0] (GlobalAveragePooling2D) block? a_se_reshape (None, 1 , 1 , 1152) 0 block7a_se_squeeze[0][0] (Reshape) block7a_se_reduce (Conv2D) (None, 1 , 1 , 48) 55344 block? a_se_reshape[0][0] block7a_se_expand (Conv2D) (None, 1 , 1 , 1152) 56448 block? a_s e_re d u ce [0] [0 ] block7a_se_excite (Multiply) (None, 32, 32, 1152) 0 block? a_se_expand[0][0] block? a_project_conv (None, 32, 32, 320) 368640 block? a_se_excite[0][0] (Conv2D) block? a_project_bn (None, 32, 32, 320) 1280 block7a_project_conv[0][0] (BatchNormalization) top_conv (Conv2D) (None, 32, 32, 1280) 409600 block? a_project_bn[0][0] top_bn (BatchNormalization) (None, 32, 32, 1280) 5120 top_conv[0][0] top_activation (Activation) (None, 32, 32, 1280) 0 top_bn[0][0] decoder_stageOa_transpose (None, 64, 64, 64) 1310720 top_activation[0][0] (Con v2DT ranspose) decoder_stageOa_bn (None, 64, 64, 64) 256 decoder_stageOa_transpose[0][ (BatchNormalization) 0] decoder_stageOa_relu (None, 64, 64, 64) 0 decoder_stage0a_bn[0][0] (Activation) decoder_stageO_concat (None, 64, 64, 736) 0 decoder_stage0a_relu[0][0] (Concatenate) block6a_expand_activation[0][0] decoder_stageOb_conv (None, 64, 64, 64) 423936 decoder_stage0_concat[0][0] (Conv2D) decoder_stageOb_bn (None, 64, 64, 64) 256 decoder_stage0b_conv[0][0] (BatchNormalization) decoder_stageOb_relu (None, 64, 64, 64) 0 decoder_stage0b_bn[0][0] (Activation) decoder_stage1 a_transpose (None, 128, 128, 64) 65536 decoder_stage0b_relu[0][0] (Con v2DT ranspose) decoder_stage1 a_bn (None, 128, 128, 64) 256 decoder_stage1 a_transpose[0][ (BatchNormalization) 0] decoder_stage1 a_relu (None, 128, 128, 64) 0 decoder_stage1 a_bn[0][0] (Activation) decoder_stage1_concat (None, 128, 128, 304) 0 decoder_stage1 a_relu [0][0] (Concatenate) block4a_expand_activation[0][0] decoder_stage1 b_conv (None, 128, 128, 64) 175104 decoder_stage1_concat[0][0] (Conv2D) decoder_stage1 b_bn (None, 128, 128, 64) 256 decoder_stage1 b_conv[0][0] (BatchNormalization) decoder_stage1 b_relu (None, 128, 128, 64) 0 decoder_stage1 b_bn[0] [0] (Activation) decoder_stage2a_transpose (None, 256, 256, 64) 65536 decoder_stage1 b_relu [0][0] (Con v2DT ranspose) decoder_stage2a_bn (None, 256, 256, 64) 256 decoder_stage2a_transpose[0][ (BatchNormalization) 0] decoder_stage2a_relu (None, 256, 256, 64) 0 decoder_stage2a_bn[0][0] (Activation) decoder_stage2_concat (None, 256, 256, 208) 0 decoder_stage2a_relu[0][0] (Concatenate) block3a_expand_activation[0][0] decoder_stage2b_conv (None, 256, 256, 64) 119808 decoder_stage2_concat[0][0] (Conv2D) decoder_stage2b_bn (None, 256, 256, 64) 256 decoder_stage2b_conv[0][0] (BatchNormalization) decoder_stage2b_relu (None, 256, 256, 64) 0 decoder_stage2b_bn[0][0] (Activation) decoder_stage3a_transpose (None, 512, 512, 64) 65536 decoder_stage2b_relu[0][0] (Con v2DT ranspose) decoder_stage3a_bn (None, 512, 512, 64) 256 decoder_stage3a_transpose[0][ (BatchNormalization) 0] decoder_stage3a_relu (None, 512, 512, 64) 0 decoder_stage3a_bn[0][0] (Activation) decoder_stage3_concat (None, 512, 512, 160) 0 decoder_stage3a_relu[0][0] (Concatenate) block2a_expand_activation[0][0] decoder_stage3b_conv (None, 512, 512, 64) 92160 decoder_stage3_concat[0][0] (Conv2D) decoder_stage3b_bn (None, 512, 512, 64) 256 decoder_stage3b_conv[0][0] (BatchNormalization) decoder_stage3b_relu (None, 512, 512, 64) 0 decoder_stage3b_bn[0][0] (Activation) final_conv (Conv2D) (None, 512, 512, 16) 9232 decoder_stage3b_relu[0][0] activation (Activation) (None, 512, 512, 16) 0 finai_conv[0][0]

Total params: 6,378,604 Trainable params: 6,335,564 Non-trainable params: 43,040