Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD AND SYSTEM FOR DETERMINING A TARGET RECIPE OF A COMPOUND
Document Type and Number:
WIPO Patent Application WO/2023/072993
Kind Code:
A1
Abstract:
A method and system for determining a target recipe of a compound with desired attributes are provided in the invention, characterized in that the method comprises: a) performing simulation synthesis for each of a plurality of recipes, and calculating on the simulation synthesis one or more descriptors of each recipe, wherein the descriptors are used to characterize the simulated product of a corresponding recipe; b) performing synthesis for each of a plurality of recipes, and measuring one or more properties of each recipe, wherein the properties are used to describe the attributes of a corresponding recipe; c) training one or more machine learning models by using the values of the descriptors and the properties obtained by step a) and step b) as training samples, wherein the machine learning models correlates the descriptors of the recipes with the properties of the recipes; and d) generating a plurality of candidate recipes, performing simulation synthesis for each of the plurality of candidate recipes, and, using the trained machine learning model obtained by step c), to predict one or more properties of each candidate recipe, thus determining one or more target recipes from the plurality of candidate recipes.

Inventors:
GAO HAN (CN)
LIU HAO (CN)
GUO RUIJING (CN)
ZHANG MENGYANG (CN)
THOMPSON-COLÓN JIM A (US)
Application Number:
PCT/EP2022/079888
Publication Date:
May 04, 2023
Filing Date:
October 26, 2022
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
COVESTRO DEUTSCHLAND AG (DE)
International Classes:
G16C20/10; G06N3/00; G16C20/70
Other References:
STRIETH-KALTHOFF F. ET AL: "Machine learning the ropes: principles, applications and directions in synthetic chemistry", CHEMICAL SOCIETY REVIEWS, vol. 49, no. 17, 16 July 2020 (2020-07-16), UK, pages 6154 - 6168, XP055838253, ISSN: 0306-0012, Retrieved from the Internet DOI: 10.1039/C9CS00786E
COLEY C. W. ET AL: "Machine Learning in Computer-Aided Synthesis Planning", ACCOUNTS OF CHEMICAL RESEARCH, vol. 51, no. 5, 1 May 2018 (2018-05-01), US, pages 1281 - 1289, XP055660945, ISSN: 0001-4842, DOI: 10.1021/acs.accounts.8b00087
MALIK S. A. ET AL: "Predicting the Outcomes of Material Syntheses with Deep Learning", CHEMISTRY OF MATERIALS, vol. 33, no. 2, 5 January 2021 (2021-01-05), US, pages 616 - 624, XP093027008, ISSN: 0897-4756, DOI: 10.1021/acs.chemmater.0c03885
PANTELEEV J. ET AL: "Recent applications of machine learning in medicinal chemistry", BIOORGANIC & MEDICINAL CHEMISTRY LETTERS, ELSEVIER, AMSTERDAM NL, vol. 28, no. 17, 28 June 2018 (2018-06-28), pages 2807 - 2815, XP085447055, ISSN: 0960-894X, DOI: 10.1016/J.BMCL.2018.06.046
JABLONKA K. M. ET AL: "Big-Data Science in Porous Materials: Materials Genomics and Machine Learning", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 8 June 2020 (2020-06-08), XP081681923
HESSLER G. ET AL: "Artificial Intelligence in Drug Design", MOLECULES, vol. 23, no. 10, 2 October 2018 (2018-10-02), pages 2520, XP093026005, DOI: 10.3390/molecules23102520
REYES K. G. ET AL: "The machine learning revolution in materials?", MRS BULLETIN, vol. 44, no. 7, 1 July 2019 (2019-07-01), US, pages 530 - 537, XP093027022, ISSN: 0883-7694, Retrieved from the Internet DOI: 10.1557/mrs.2019.153
Attorney, Agent or Firm:
LEVPAT (DE)
Download PDF:
Claims:
Claims:

1. A method for determining a target recipe of a compound with desired attributes, characterized in that the method comprises: a) performing simulation synthesis for each of a plurality of recipes, and calculating on the simulation synthesis one or more descriptors of each recipe, wherein the descriptors are used to characterize the simulated product of a corresponding recipe; b) performing synthesis for each of a plurality of recipes, and measuring one or more properties of each recipe, wherein the properties are used to describe the attributes of a corresponding recipe; c) training one or more machine learning models by using the values of the descriptors and the properties obtained by step a) and step b) as training samples, wherein the machine learning models correlates the descriptors of the recipes with the properties of the recipes; and d) generating a plurality of candidate recipes, performing simulation synthesis for each of the plurality of candidate recipes, and, using the trained machine learning model obtained by step c), to predict one or more properties of each candidate recipe, thus determining one or more target recipes from the plurality of candidate recipes.

2. The method of claim 1, wherein: the simulated synthesis process is Monte Carlo simulation.

3. The method of claim 1 or 2, wherein: in step d), randomly generating a plurality of candidate recipes.

4. The method of any one of claims 1 to 3, wherein: in step d), generating a plurality of candidate recipes using a statistical experimental design, such as full factorial, partial factorial, and latin-hypercube.

5. The method of any one of claims 1 to 4, wherein: in step c), training the plurality of machine learning models by using the values of the descriptors as the independent variable of the machine learning model and the value of the properties as the dependent variable, respectively. he method of any one of claims 1 to 5, wherein: in step d), obtaining the predicted value of the properties using each of the values of the multiple sets of candidate descriptors as the input of the plurality of machine learning models. he method of claim 1 to 6, wherein: in step d), selecting the target recipe from the plurality of candidate recipes according to the predicted value and expected value of the properties. he method of any one of claims 1 to 7, wherein: in step c), training at least one of the plurality of machine learning models based on any of the SVM, random forest, elastic net, logistic regression algorithm, neural networks, generalize additive models and any other machine learning model that can correlate the descriptors of the recipes with the properties of the recipes. he method of claims 8, wherein: in step c), the machine learning models include the boosted versions of the machine learning models and ensembles of the machine learning models combined into one aggregated model and another. The method of any one of claims 1 to 9, wherein: after step a) and step b) and before step c), selecting a plurality of descriptors according to each of the attributes of the compound that need to be predicted. A method for determining a set of attributes of a candidate compound, characterized in that the method comprises: a) performing simulation synthesis for each of a plurality of recipes, and calculating on the simulation synthesis one or more descriptors of each recipe, wherein the descriptors are used to characterize the simulated product of a corresponding recipe; b) performing synthesis for each of a plurality of recipes, and measuring one or more properties of each recipe, wherein the properties are used to describe the set of attributes of a corresponding recipe; c) training one or more machine learning models by using the values of the descriptors and the properties obtained by step a) and step b) as training samples, wherein the machine learning models correlates the descriptors of the recipes with the properties of the recipes; and d) determining the set of attributes of the candidate compound according to the plurality of machine learning models. The method of claim 11, wherein: the simulated synthesis process is Monte Carlo simulation. The method of claim 11 or 12, wherein: in step c), training the one or more machine learning models using the values of the set of descriptors as the independent variable of the machine learning model and the value of the one or more properties as the dependent variable, respectively. The method of any one of claims 11 to 13, wherein: in step d), generating a plurality of candidate recipes using a statistical experimental design, such as full factorial, partial factorial, and latin-hypercube. The method of any one of claims 11 to 14, wherein: in step d), obtaining the predicted value of each recipe in the set of recipes using each of the values of the multiple sets of candidate descriptors as the input of the plurality of machine learning models. The method of claim 15, wherein: in step d), selecting the target attribute from the plurality of candidate attributes according to the predicted value and expected value of each recipe of the set of recipes. The method of any one of claims 11 to 16, wherein: in step c), training at least one of the plurality of machine learning models based on any of the SVM, random forest, elastic net and logistic regression algorithm, neural networks, generalize additive models and any other machine learning model that can correlate the descriptors of the recipes with the properties of the recipes. The method of claim 17, wherein: in step c), the machine learning models include the boosted versions of the machine learning models and ensembles of the machine learning models combined into one aggregated model and another. The method of any one of claims 11 to 19, wherein: after step a) and step b) and before step c), selecting a plurality of descriptors according to each of the attributes of the compound that need to be predicted. A system for determining a target recipe of a compound with desired attributes, characterized in that the system comprises: a simulation unit configured to perform simulation synthesis for each of a plurality of recipes, and calculate on the simulation synthesis one or more descriptors of each recipe, wherein the descriptors are used to characterize the simulated product of a corresponding recipe; a synthesis unit configured to perform synthesis for each of a plurality of recipes, and measure one or more properties of each recipe, wherein the properties are used to describe the attributes of a corresponding recipe; a training unit configured to train one or more machine learning models by using the values of the descriptors and the properties obtained by the simulation unit and synthesis unit as training samples, wherein the machine learning models correlates the descriptors of the recipes with the properties of the recipes; and a determining unit configured to generate a plurality of candidate recipes, perform simulation synthesis for each of the plurality of candidate recipes, and calculate on the simulation synthesis one or more descriptors of each candidate recipe, determine one or more target recipes from the plurality of candidate recipes according to the trained machine learning model. The system of claim 20, wherein: the simulated synthesis process is Monte Carlo simulation. The system of claim 20 or 21, wherein: in the determining unit, randomly generating a plurality of candidate recipes. The method of any one of claims 20 to 22, wherein: in the determining unit, generating a plurality of candidate recipes using a statistical experimental design, such as full factorial, partial factorial, and latin-hypercube. The system of any one of claims 20 to 23, wherein: the training unit is configured to train the plurality of machine learning models by using the values of the descriptors as the independent variable of the machine learning model and the value of the properties as the dependent variable, respectively. The system of any one of claims 20 to 24, wherein: the determining unit is configured to obtain the predicted value of the properties using each of the values of the multiple sets of candidate descriptors as the input of the plurality of machine learning models. The system of claim 25, wherein: the determining unit is configured to select the target recipe from the plurality of candidate recipes according to the predicted value and expected value of the properties. The system of any one of claims 20 to 26, wherein: the determining unit is configured to train at least one of the plurality of machine learning models based on any of the SVM, random forest, elastic net and logistic regression algorithm, neural networks, generalize additive models and any other machine learning model that can correlate the descriptors of the recipes with the properties of the recipes. The system of claim 27, wherein: in determining unit, the machine learning models include the boosted versions of the machine learning models and ensembles of the machine learning models combined into one aggregated model and another. The method of any one of claims 20 to 28, wherein: in the training unit, selecting a plurality of descriptors according to each of attributes of the compound that need to be predicted. A system for determining a target recipe of a compound, characterized in that the system comprises: a computer processor, a computer-readable storage medium, program instructions stored on a computer-readable storage medium, which, when executed by a processor, execute the method according to any one of claims 1 to 19.

31. A computer-readable storage medium having instructions thereon that, when executed, cause a computing device to execute the method according to any one of claims 1 to 19.

Description:
METHOD AND SYSTEM FOR DETERMINING A TARGET RECIPE OF A COMPOUND

Technical field

The present invention relates to a field of machine learning technology, and more specifically, to a method and system for determining a target recipe of a compound.

Background

In the process of compound synthesis experiments, the performance of a recipe or formulation should be verified in the laboratory after synthesis. The traditional verification method often requires design and synthesis of a large number of samples when synthesizing target compounds that need to meet multiple attribute requirements. It is time-consuming and costly work and the number of compounds synthesized and verified is limited. It is difficult to cover all possible compounds as much as possible, resulting in limitations in the verification results.

In addition, the traditional verification method relies on the individual empirical knowledge of the researcher. Since the researcher should often verify the performance of synthesized compound by their empirical knowledge, it is difficult to avoid the interference caused by human factors.

There is a lack of appropriate methods to make full use of the experimental data of synthesized compounds to predict the properties of the synthesized compounds through the method of simulation and machine learning.

Summary

According to one aspect of the present invention, the object of the present invention is to provide a method and system for determining a target recipe of a compound with desired attributes. The method and system could effectively screen the synthesis recipe space for the synthesis of the target compound through an appropriate machine learning model.

Some embodiments of the present inventive concept provide a method for determining a target recipe of a compound with desired attributes, characterized in that the method comprises: step a) performing simulation synthesis for each of a plurality of recipes, and calculating on the simulation synthesis one or more descriptors of each recipe, wherein the descriptors are used to characterize the simulated product of a corresponding recipe; step b) performing synthesis for each of a plurality of recipes, and measuring one or more properties of each recipe, wherein the properties are used to describe the attributes of a corresponding recipe; step c) training one or more machine learning models by using the values of the descriptors and the properties obtained by step a) and step b) as training samples, wherein the machine learning models correlates the descriptors of the recipes with the properties of the recipes; and step d) generating a plurality of candidate recipes, performing simulation synthesis for each of the plurality of candidate recipes, and, using the trained machine learning model obtained by step c), to predict one or more properties of each candidate recipe, thus determining one or more target recipes from the plurality of candidate recipes.

In one or more embodiments, the simulated synthesis process is Monte Carlo simulation.

In one or more embodiments, in step d), a plurality of candidate recipes are randomly generated.

In one or more embodiments, in step d), a plurality of candidate recipes are generated by using a statistical experimental design, such as full factorial, partial factorial, and latin-hypercube.

In one or more embodiments, in step c), the plurality of machine learning models are trained by using the values of the descriptors as the independent variable of the machine learning model and the value of the properties as the dependent variable, respectively.

In one or more embodiments, in step d), the predicted value of the properties are obtained by using each of the values of the multiple sets of candidate descriptors as the input of the plurality of machine learning models.

In one or more embodiments, in step d), the target recipe from the plurality of candidate recipes are selected according to the predicted value and expected value of the properties.

In one or more embodiments, in step c), at least one of the plurality of machine learning models are trained based on any of the SVM, random forest, elastic net, logistic regression algorithm, neural networks, generalize additive models and any other machine learning model that can correlate the descriptors of the recipes with the properties of the recipes.

In one or more embodiments, in step c), the machine learning models include the boosted versions of the machine learning models and ensembles of the machine learning models combined into one aggregated model and another.

In one or more embodiments, after step a) and step b) and before step c), a plurality of descriptors are selected according to each of the attributes of the compound that need to be predicted. Some embodiments of the present inventive concept provide a method for determining a set of attributes of a candidate compound, characterized in that the method comprises: step a) performing simulation synthesis for each of a plurality of recipes, and calculating on the simulation synthesis one or more descriptors of each recipe, wherein the descriptors are used to characterize the simulated product of a corresponding recipe; step b) performing synthesis for each of a plurality of recipes, and measuring one or more properties of each recipe, wherein the properties are used to describe the set of attributes of a corresponding recipe; step c) training one or more machine learning models by using the values of the descriptors and the properties obtained by step a) and step b) as training samples, wherein the machine learning models correlates the descriptors of the recipes with the properties of the recipes; and step d) determining the set of attributes of the candidate compound according to the plurality of machine learning models.

In one or more embodiments, the simulated synthesis process is Monte Carlo simulation.

In one or more embodiments, in step c), the one or more machine learning models are trained by using the values of the set of descriptors as the independent variable of the machine learning model and the value of the one or more properties as the dependent variable, respectively.

In one or more embodiments, in step d), a plurality of candidate recipes are generated by using a statistical experimental design, such as full factorial, partial factorial, and latin-hypercube.

In one or more embodiments, in step d), the predicted value of each recipe in the set of recipes are obtained by using each of the values of the multiple sets of candidate descriptors as the input of the plurality of machine learning models.

In one or more embodiments, in step d), the target attribute from the plurality of candidate attributes are selected according to the predicted value and expected value of each recipe of the set of recipes.

In one or more embodiments, in step c), at least one of the plurality of machine learning models are trained based on any of the SVM, random forest, elastic net and logistic regression algorithm, neural networks, generalize additive models and any other machine learning model that can correlate the descriptors of the recipes with the properties of the recipes.

In one or more embodiments, in step c), the machine learning models include the boosted versions of the machine learning models and ensembles of the machine learning models combined into one aggregated model and another. In one or more embodiments, after step a) and step b) and before step c), a plurality of descriptors are selected according to each of the attributes of the compound that need to be predicted.

Some embodiments of the present inventive concept provide a system for determining a target recipe of a compound with desired attributes, characterized in that the system comprises: a simulation unit configured to perform simulation synthesis for each of a plurality of recipes, and calculate on the simulation synthesis one or more descriptors of each recipe, wherein the descriptors are used to characterize the simulated product of a corresponding recipe; a synthesis unit configured to perform synthesis for each of a plurality of recipes, and measure one or more properties of each recipe, wherein the properties are used to describe the attributes of a corresponding recipe; a training unit configured to train one or more machine learning models by using the values of the descriptors and the properties obtained by the simulation unit and synthesis unit as training samples, wherein the machine learning models correlates the descriptors of the recipes with the properties of the recipes; and a determining unit configured to generate a plurality of candidate recipes, perform simulation synthesis for each of the plurality of candidate recipes, and calculate on the simulation synthesis one or more descriptors of each candidate recipe, determine one or more target recipes from the plurality of candidate recipes according to the trained machine learning model.

In one or more embodiments, the simulated synthesis process is Monte Carlo simulation.

In one or more embodiments, in the determining unit, a plurality of candidate recipes are randomly generated.

In one or more embodiments, in the determining unit, a plurality of candidate recipes are generated using a statistical experimental design, such as full factorial, partial factorial, and latin-hypercube.

In one or more embodiments, the training unit is configured to train the plurality of machine learning models by using the values of the descriptors as the independent variable of the machine learning model and the value of the properties as the dependent variable, respectively.

In one or more embodiments, the determining unit is configured to obtain the predicted value of the properties using each of the values of the multiple sets of candidate descriptors as the input of the plurality of machine learning models. In one or more embodiments, the determining unit is configured to select the target recipe from the plurality of candidate recipes according to the predicted value and expected value of the properties.

In one or more embodiments, the determining unit is configured to train at least one of the plurality of machine learning models based on any of the SVM, random forest, elastic net and logistic regression algorithm, neural networks, generalize additive models and any other machine learning model that can correlate the descriptors of the recipes with the properties of the recipes.

In one or more embodiments, in determining unit, the machine learning models include the boosted versions of the machine learning models and ensembles of the machine learning models combined into one aggregated model and another.

In one or more embodiments, in the training unit, a plurality of descriptors are selected according to each of attributes of the compound that need to be predicted.

Some embodiments of the present inventive concept provide a system for determining a target recipe of a compound, characterized in that the system comprises: a computer processor, a computer-readable storage medium, program instructions stored on a computer-readable storage medium, which, when executed by a processor, execute the method according to any one of method above.

Some embodiments of the present inventive concept provide a computer-readable storage medium having instructions thereon that, when executed, cause a computing device to execute the method according to any one of method above.

The present invention, by the existing experimental data synthesis of compounds as training samples to train a machine learning model, the machine learning model prediction for the respective properties of the target compound synthesized and screened for the optimal parameters for the last qualifying Laboratory verification. This method generates a large number of random samples by means of random simulation, which can quickly cover the recipe design space as much as possible and can greatly simplify the development of complex compounds with multiple attribute requirements.

Brief description of the drawings The foregoing and other objects, features, and advantages would be apparent from the following more particular description of preferred embodiments as illustrated in the accompanying drawings in which:

Fig. 1 schematically illustrates a flowchart of a method for determining a target recipe of a compound according to the present disclosure.

Fig. 2 schematically illustrates a flowchart of another method for determining a target recipe of a compound according to the present disclosure.

Fig. 3 schematically illustrates a flowchart of a method for determining a set of attributes of a candidate compound according to the present disclosure.

Fig. 4 is a block diagram illustrating a system for determining a target recipe of a compound according to the present disclosure.

Fig. 5 is a block diagram illustrating a computer system for implementing the methods from the invention according to the present disclosure.

Detailed description

In the following description, numerous specific details are set forth such as examples of specific components, circuits, and processes to provide a thorough understanding of the disclosure. The term "coupled" as used herein means coupled directly to or coupled through one or more intervening components or circuits. In addition, in the following description and for purposes of explanation, specific nomenclature is set forth to provide a thorough understanding of the example embodiments. However, it will be apparent to one skilled in the art that these specific details may not be required to practice the example embodiments. In other instances, well-known circuits and devices are shown in block diagram form to avoid obscuring the disclosure. Any of the signals provided over various buses described herein may be time-multiplexed with other signals and provided over one or more common buses. Additionally, the interconnection between circuit elements or software blocks may be shown as buses or as single signal lines. Each of the buses may alternatively be a single signal line, and each of the single signal lines may alternatively be buses, and a single line or bus might represent any one or more of a myriad of physical or logical mechanisms for communication between components.

Unless specifically stated otherwise as apparent from the following discussions, it is appreciated that throughout the present application, discussions utilizing the terms such as "accessing," "receiving," "sending," "using," "selecting," "determining," "normalizing," "multiplying," "averaging," "aggregating", "monitoring," "comparing," "applying," "updating," "measuring," "deriving" or the like, refer to the actions and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices.

The techniques described herein may be implemented in hardware, software, firmware, or any combination thereof, unless specifically described as being implemented in a specific manner. Any features described as modules or components may also be implemented together in an integrated logic device or separately as discrete but interoperable logic devices. If implemented in software, the techniques may be realized at least in part by a non-transitory computer-readable storage medium comprising instructions that, when executed, performs one or more of the methods described below. The non-transitory computer-readable storage medium may form part of a computer program product, which may include packaging materials.

The non-transitory computer-readable storage medium may include random access memory (RAM) such as synchronous dynamic random access memory (SDRAM), read only memory (ROM), non-volatile random access memory (NVRAM), electrically erasable programmable readonly memory (EEPROM), FLASH memory, other known storage media, and the like. The techniques additionally, or alternatively, may be realized at least in part by a computer- readable communication medium that carries or communicates code in the form of instructions or data structures and that may be accessed, read, and/or executed by a computer or other processor.

The various illustrative logical blocks, modules, circuits and instructions described in connection with the implementations disclosed herein may be executed by one or more processors, such as one or more digital signal processors (DSPs), general purpose microprocessors, application specific integrated circuits (ASICs), application specific instruction set processors (ASIPs), field programmable gate arrays (FPGAs), or other equivalent integrated or discrete logic circuitry. The term "processor," as used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein. In addition, in some aspects, the functionality described herein may be provided within dedicated software modules or hardware modules configured as described herein. In addition, the techniques could be fully implemented in one or more circuits or logic elements. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices (such as a combination of a DSP and a microprocessor), a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other suitable configuration.

According to an embodiment of the present invention, a method for determining a target recipe of a compound with desired attributes is provided.

Fig. 1 schematically illustrates a flowchart of a method for determining a target recipe of a compound according to the present disclosure. As shown in FIG. 1, the method 100 includes steps S 101, S102, S103 and S104.

Step S 101 : performing simulation synthesis for each of a plurality of recipes, and calculating on the simulation synthesis one or more descriptors of each recipe, wherein the descriptors are used to characterize the simulated product of a corresponding recipe.

A recipe is a general term for the method of generating compounds. A recipe generally includes multiple ingredients, the amounts of the ingredients, the reaction processes of the compound and the process parameters. For a particular compound, there may be thousands of recipes depending on the attributes of a compound. Therefore, in step S 101, thousands of possible recipes could be generated based on process parameters and chemical reaction conditions. The process parameter could be predefined by the user. The predefined process parameters for a recipe could include the number of ingredients, content range for ingredients, proportion for ingredients, change step for ingredients and other parameters that could describe the recipe. For example, for a recipe with 10 possible ingredients, the number of ingredients could be set as 5 and the proportion for ingredients could be set as 1: 1.5:3: 1.5:2. The content range for ingredients could be changed from 1 mg to 10 mg and the change step for each ingredient could be set as 0.5mg. Based on predefined parameters and chemical reaction conditions (constrains), the computer could generate a plurality of recipes at random. The chemical reaction conditions is conditions required in chemical reaction for the recipe, which includes surface area of the reactants, catalyst, pressure, heat, light and other conditions in chemical reaction.

Based on the generated recipes, the simulation synthesis for each of the plurality of recipes could be performed by the Monte Carlo method. The simulation is used to simulate the reaction which takes generated random recipes as the reactant. The number of simulation performed could be determined by the computing power of computer for the experiment. The Monte Carlo method is a random sampling method used to research problems and obtain an approximate value of the probability of solving the problem. The Monte Carlo method is usually repeated multiple times and all test results of the experiment are calculated and analyzed to provide an approximate answer of the experiment. In some embodiments, a sample space for random sampling could be constructed according to multiple initial conditions such as raw material parameters and reaction processes for the compound to be generated. Monte Carlo simulation generates a large number of uniformly distributed random points (random values) in the sample space. After the sample space generated, the method could simulate the recipes by the computer to construct thousands of forms of a compound with different attributes. In actual operation, when the number of Monte Carlo simulation samples is larger, the changes of the attributes of corresponding compound are smaller and the accuracy increases. That is, when a sufficient number of samples are simulated, the simulated compound can reach a very high level accuracy with the desired attributes.

After obtaining the large number of simulated compounds, it requires a quantification of plurality of simulated compound to effectively distinguish different attributes of simulated compounds. In step S 101, the descriptor is introduced as an intermediary between the synthesis parameters of the related compound and the respective attributes of the final synthesized compound. In some embodiments, the descriptor has a clear definition based on the physical structure and chemical attributes of the compound, and the descriptor of the recipes could be roughly divided into the following general types:

1) the composition descriptor, the composition descriptor calculated by a fragment of a compound atom or an entire compound number, number of functional groups, a given atomic species of number of atoms and atomic percent, an isocyanate group or the number and hydroxy groups compound the molecular weight averages and so on;

2) the geometric descriptor, the geometric descriptor calculated by a fragment of a compound or compounds of the entire bond lengths, bond angles, dihedral angles, the distribution of mass, gravity index and so on;

3) the topological descriptor, the topological descriptor calculated by the connectivity matrix in the compound fragment or the entire compound and so on; and

4) the quantum chemical descriptor, the quantum chemical descriptor calculated by the quantum mechanical wave function in the compound fragment or the entire compound and so on. In one embodiment of the invention, in the Monte Carlo simulation process, not only a large number of simulated compounds can be obtained, but also the synthesized parameters of the simulated compounds can be obtained. The synthesized parameters include raw material parameters.

Step S 102: performing synthesis for each of a plurality of recipes, and measuring one or more properties of each recipe, wherein the properties are used to describe the attributes of a corresponding recipe.

In step SI 02, a plurality of recipes generated in step S101 could be performed in Laboratory. The actual synthesis in Laboratory could be performed for each of a plurality of recipes. For example, synthesis includes generating compounds according to the ingredients, amounts, and reaction processes of recipe. After synthesis performed, one or more properties of each recipe could be measured. The attributes of compound in the synthesis could be evaluated by a new parameter, property. The properties could be used to describe the attributes of a corresponding recipe.

In the following steps, the properties of the recipes in the actual synthesis would be correlated to the descriptors of the recipes in the simulation.

Step S 103 : training one or more machine learning models by using the values of the descriptors and the properties obtained by step S101 and step 102 as training samples, wherein the machine learning models correlates the descriptors of the recipes with the properties of the recipes.

In step SI 03, according to the set of descriptors selected in step S101 and the set of properties measured in step SI 02, the descriptors of the recipes could be correlated with the properties of the recipes by the machine learning models. In particular, training is based on the following conditions: the plurality of machine learning models by using the values of the descriptors as the independent variable of the machine learning model; and using the value of the properties as the dependent variable of the machine learning model. That is, the value of the set of descriptors simulated and the properties of the synthesized compound are used as a set of training samples. The result of training is a function between the descriptors and the properties, which could be represent as following: properties=f (descriptors).

In some embodiments, the machine learning model may be trained based on any one of the SVM, random forest, elastic net, logistic regression algorithm, neural networks, generalize additive models and any other machine learning model that can correlate the descriptors of the recipes with the properties of the recipes. For example, a support vector machine (SVM) algorithm could be used in the method to correlate the descriptors with the properties. The SVM algorithm is an algorithm often used in machine learning. Many classification and regression problems involve the SVM algorithm. In actual operation, multiple sets of training samples constitute the input variable space, and the SVM selects a hyperplane in the space. The hyperplane is a line dividing the input variable space, which can best divide the input variable space into different classes. SVM trains the model through machine learning algorithms to find the hyperplane that can best complete the class division. The hyperplane is as far as possible from the training samples on both sides, which can make the misjudgment relatively small and improve the prediction accuracy. Finally, a machine learning model trained based on the SVM algorithm can predict the properties of the target compound based on the value of a set of input descriptors.

In another example, the machine learning model may be trained based on the random forest algorithm. Random forest algorithm is used for regression analysis and classification statistics. In actual operation, multiple sets of training samples are randomly selected multiple times with replacement to generate multiple new training samples. Then decision tree models are constructed separately based on the multiple new training samples generated and multiple decision tree models are merged together to combine into the entire random forest model. When a new descriptor needs to be used to predict the attributes of the target compound, each decision tree model makes a prediction. Then the predicted values could be averaged to obtain the actual output and better predicted values of the target compound attributes for subsequent screening.

It is contemplated that one skilled in the art could training based on the boosted versions of the machine learning models and ensembles of the machine learning models combined into one aggregated model and another. The most suitable training algorithm can be selected by comparing the predicted value of the machine learning model trained by different algorithms with the true value.

Step S104: generating a plurality of candidate recipes, performing simulation synthesis for each of the plurality of candidate recipes, and, using the trained machine learning model obtained by step S103, to predict one or more properties of each candidate recipe, thus determining one or more target recipes from the plurality of candidate recipes.

In step S 104, a plurality of candidate recipes for a compound could be generated at random by predefined process parameters and chemical reaction conditions as discussed above. The process parameters are generated randomly within the predefined limit and the predefined limit could be set by the user. Further, in addition to the process parameter, the recipe could also include ingredient type, ingredient ratio and other possible factors which define the recipe and determine its behavior and are varied in an experiment. A plurality of candidate recipes include multiple ingredient type, multiple ingredient ratio and multiple process parameters generated randomly within the predefined limit. Thousands of possible recipes would be generated for the subsequent simulations. The recipes could be randomly generated under restricted conditions, which are affected by multiple conditions such as raw material parameters, process parameters, reaction processes and so on. The simulation synthesis could be performed for each of the plurality of candidate recipes. The simulation result includes the reaction products of recipes and the descriptor is used to characterize the simulated product of each of the plurality of candidate recipes. As discussed in the step S103, a machine learning model (that is, function between the descriptors and the properties) is trained with machine learning algorithm. Therefore, using the machine learning model trained in step S103, each of the multiple sets of candidate descriptor values is used as the input of the machine learning model, and a set of properties of the recipes could be predicted respectively.

In step SI 04, a plurality of candidate recipes could be generated by using a statistical experimental design, such as full factorial, partial factorial, and latin-hypercube. According to the function in step SI 03 (properties=f (descriptors)), the predicted value of the properties could be obtained by using each of the values of the multiple sets of candidate descriptors as the input of the plurality of machine learning models. The output of the machine learning models in step SI 04 are the predicted properties used to describe the attributes of the used recipes,

In process of determining one or more target recipes from the plurality of candidate recipes, an estimate value may be calculated from the predicted properties for each of candidate recipes. Meanwhile, a corresponding expected value may be predefined for each attribute of the target compound by the user based on the historical experimental data, and the expected value may be a certain numerical range. The candidate recipes with the evaluation value in the certain numerical range could be selected for further screening. In actual screening of the recipes, the evaluation values could be ranked from top to bottom or by other sort ways. The predicted value of each attribute of the target compound can be compared with the expected value, and a certain number of the recipes that meet the expected value condition could be initially selected from the multiple sets of the recipes. At the same time, if the number of the recipes that meet the conditions is insufficient after preliminary screening, the expected value range could be widened to obtain the specified number of the recipes. In the subsequent screening, multiple sets of the recipes could be screened based on the distances between the multiple sets of the recipes and the synthesis recipes of the actual compound in a multi-dimensional space. For example, the multidimensional scaling (MDS) method could be used in simulated data statistics to screen the appropriate recipe from the plurality of candidate recipes.

For example, if there are multiple scaling factors used to estimate a recipe and the factor are related, the MDS method could be used to reduce the multiple factors (multiple dimensions) to two factors (two dimensions). In MDS method, the projection method could be used to screen points in a multi-dimensional space, wherein one point in the space could be equivalent to the one parameter in recipe (for example, ingredient, ingredient type, process parameters). One type of parameter corresponds to one dimension in space. If five parameters (five dimensions) are used to estimate recipe, the MDS method could reduce five dimensional parameters to two dimensional parameters and the reduced two dimensional parameters could be drawn on 2D plot. The points on 2D plot could represent the five dimensional parameters. The appropriate recipe could be determined more intuitively and effectively with the help of the 2D plot. In particular, two attributes are selected from a set of attributes of the target compound to construct a two-dimensional space. The points of multiple sets of the recipes in the multi-dimensional space are projected onto the two-dimensional space, which is reducing multiple dimensions to 2 dimensions. Based on the reduced 2 dimensional space, the generated points could be compared with the historic points in terms of ingredients, the amounts of the ingredients, the reaction processes step and another other process parameter determined by the user in a 2D plot.. The points that meet the requirements of the recipes are screened out, and then the other two attributes are selected to repeat the above steps, until the synthesis recipes that meet the requirements are finally selected. The MDS method could be implemented by the commonly used statistical tools in the art, like R programming language, SPSS programming language and so on.

Step SI 04 uses a machine learning model to select one or more sets of recipes from the multiple sets of recipes as the synthesis parameters for synthesizing the target compound, and the selected synthesis recipes of the target compound will eventually be determined by laboratory verification.

According to another embodiment of the present invention, another method for determining a target recipe of a compound with desired attributes is provided.

Fig. 2 schematically illustrates a flowchart of another method for determining a target recipe of a compound according to the present disclosure. As shown in FIG. 2, the method 200 includes steps S201, S202, S203, S204 and S205. The steps S201, S202, S204 and S205 are similar to the step S101, SI 02, SI 03 and SI 04 of method 100. In addition, the method 200 also includes a new step S203.

Step S203: selecting a plurality of descriptors according to each of the attributes of the compound that need to be predicted.

In step S203, a plurality of descriptors is selected according to each of the attributes of the compound that need to be predicted. As mentioned above, the descriptors include composition descriptor, the geometric descriptor, the topological descriptor and the quantum chemical descriptor and so on. Since a plurality of descriptors of compound described may be described in a similar attribute. The descriptors may have a high degree of correlation with the attributes. For example, some descriptors exhibit a ratio changes in the Monte Carlo simulations. In addition, some descriptors may have little effect on the target attributes of the simulated compound. For example, when some descriptors are changed in a large proportion in the Monte Carlo simulation, the attributes of the compound hardly change. Therefore, it is necessary to select a set of descriptors that are relevant and highly related to the attributes of the compound from multiple descriptors describing the attributes of the simulated compound, and the descriptors in the set of descriptors have different values relative to the simulated compound. The selected descriptors can cover and describe the required attributes of the simulated compound. Since that, after step S201, step S202 and before step 204, a plurality of descriptors related to the attributes of the compound that need to be predicted is selected.

According to another embodiment of the present invention, a method for determining a set of attributes of a candidate compound is provided.

Fig. 3 schematically illustrates a flowchart of a method for determining a set of attributes of a candidate compound according to the present disclosure. As shown in FIG. 3, the method 300 includes steps S301, S302, S303 and S304. The steps S301, S302 and S303 are similar to the step S 101, SI 02 and SI 03 of method 100. In addition, the method 300 also includes a new step S304.

Step S304: determining the set of attributes of the candidate compound according to the plurality of machine learning models.

As discussed above, the machine learning models between the descriptors and the properties could be generated by the step S301, S302 and S303. In step S304, using the machine learning model trained in previous step, each of the multiple sets of candidate descriptor values is used as the input of the machine learning model, and a set of properties of the recipes could be predicted respectively. The set of properties of the recipes could be screened to determine parameters of the target compound. Based on the screened recipes, the attributes of candidate compound (including viscosity, hardness, density and so on) could be determined.

According to an embodiment of the present invention, a system for determining a target recipe of a compound with desired attributes is provided.

Fig. 4 is a block diagram illustrating a system for determining a target recipe of a compound according to the present disclosure. As shown in FIG. 4, the system 400 includes units 401, 402, 403 and 404.

The unit 401 is a simulation unit, which is configured to perform simulation synthesis for each of a plurality of recipes, and calculate on the simulation synthesis one or more descriptors of each recipe, wherein the descriptors are used to characterize the simulated product of a corresponding recipe.

A recipe is a general term for the method of generating compounds. A recipe generally includes multiple ingredients, the amounts of the ingredients, the reaction processes of the compound and the process parameters. For a particular compound, there may be thousands of recipes depending on the attributes of a compound. Therefore, in step S 101, thousands of possible recipes could be generated based on process parameters and chemical reaction conditions. The process parameter could be predefined by the user. The predefined process parameters for a recipe could include the number of ingredients, content range for ingredients, proportion for ingredients, change step for ingredients and other parameters that could describe the recipe. For example, for a recipe with 10 possible ingredients, the number of ingredients could be set as 5 and the proportion for ingredients could be set as 1: 1.5:3: 1.5:2. The content range for ingredients could be changed from 1 mg to 10 mg and the change step for each ingredient could be set as 0.5mg. Based on predefined parameters and chemical reaction conditions (constrains), the computer could generate a plurality of recipes at random. The chemical reaction conditions is conditions required in chemical reaction for the recipe, which includes surface area of the reactants, catalyst, pressure, heat, light and other conditions in chemical reaction.

Based on the generated recipes, the simulation synthesis for each of the plurality of recipes could be performed by the Monte Carlo method. The simulation is used to simulate the reaction which takes generated random recipes as the reactant. The number of simulation performed could be determined by the computing power of computer for the experiment.

The Monte Carlo method is a random sampling method used to research problems and obtain an approximate value of the probability of solving the problem. The Monte Carlo method is usually repeated multiple times and all test results of the experiment are calculated and analyzed to provide an approximate answer of experiment. In some embodiments, a sample space for random sampling could be constructed according to multiple initial conditions such as raw material parameters and reaction processes for the compound to be generated. Monte Carlo simulation generates a large number of uniformly distributed random points (random values) in the sample space. After the sample space generated, the method could simulate the recipes to construct thousands of forms of a compound with different attributes. In actual operation, when the number of Monte Carlo simulation samples is larger, the changes of the attributes of corresponding compound are smaller and the accuracy increases. That is, when a sufficient number of samples are simulated, the simulated compound can reach a very high level accuracy with the desired attributes.

After obtaining the large number of simulated compounds, it requires a quantification of plurality of simulated compound to effectively distinguish different attributes of simulated compounds. In simulation unit 401, the descriptor is introduced as an intermediary between the synthesis parameters of the related compound and the respective attributes of the final synthesized compound. In some embodiments, the descriptor has a clear definition based on the physical structure and chemical attributes of the compound, and the descriptor of the recipes could be roughly divided into the following general types:

1) the composition descriptor, the composition descriptor calculated by a fragment of a compound atom or an entire compound number, number of functional groups, a given atomic species of number of atoms and atomic percent, an isocyanate group or the number and hydroxy groups compound the molecular weight averages and so on;

2) the geometric descriptor, the geometric descriptor calculated by a fragment of a compound or compounds of the entire bond lengths, bond angles, dihedral angles, the distribution of mass, gravity index and so on;

3) the topological descriptor, the topological descriptor calculated by the connectivity matrix in the compound fragment or the entire compound and so on; and

4) the quantum chemical descriptor, the quantum chemical descriptor calculated by the quantum mechanical wave function in the compound fragment or the entire compound and so on.

In one embodiment of the invention, in the Monte Carlo simulation process, not only a large number of simulated compounds can be obtained, but also the synthesized parameters of the simulated compounds can be obtained. The synthesized parameters include raw material parameters.

The unit 402 is a synthesis unit, which is configured to perform synthesis for each of a plurality of recipes, and measure one or more properties of each recipe, wherein the properties are used to describe the attributes of a corresponding recipe.

In the synthesis unit 402, a plurality of recipes generated in the simulation unit 401 could be performed in Laboratory. The actual synthesis in Laboratory could be performed for each of a plurality of recipes. For example, synthesis includes generating compounds according to the ingredients, amounts, and reaction processes of recipe. After synthesis performed, one or more properties of each recipe could be measured. The attributes of compound in the synthesis could be evaluated by a new parameter, property. The properties could be used to describe the attributes of a corresponding recipe.

In the following units, the properties of the recipes in the actual synthesis would be correlated to the descriptors of the recipes in the simulation.

The unit 403 is a training unit, which is configured to train one or more machine learning models by using the values of the descriptors and the properties obtained by the simulation unit 401 and the synthesis unit 402 as training samples, wherein the machine learning models correlates the descriptors of the recipes with the properties of the recipes.

In the training unit 403, according to the set of descriptors selected in the simulation unit 401 and the set of properties measured in the synthesis unit 402, the descriptors of the recipes could be correlated with the properties of the recipes by the machine learning models. In particular, training is based on the following conditions: the plurality of machine learning models by using the values of the descriptors as the independent variable of the machine learning model; and the value of the properties as the dependent variable of the machine learning model. That is, the value of the set of descriptors simulated and the properties of the synthesized compound are used as a set of training samples. The result of training is a function between the descriptors and the properties, which could be represent as following: properties=f (descriptors).

In some embodiments, the machine learning model may be trained based on any of the SVM, random forest, elastic net, logistic regression algorithm, neural networks, generalize additive models and any other machine learning model that can correlate the descriptors of the recipes with the properties of the recipes. For example, a support vector machine (SVM) algorithm could be used in the method to correlate the descriptors with the properties. The SVM algorithm is an algorithm often used in machine learning. Many classification and regression problems involve the SVM algorithm. In actual operation, multiple sets of training samples constitute the input variable space, and the SVM selects a hyperplane in the space. The hyperplane is a line dividing the input variable space, which can best divide the input variable space into different classes. SVM trains the model through machine learning algorithms to find the hyperplane that can best complete the class division. The hyperplane is as far as possible from the training samples on both sides, which can make the misjudgment relatively small and improve the prediction accuracy. Finally, a machine learning model trained based on the SVM algorithm can predict the properties of the target compound based on the value of a set of input descriptors.

In another example, the machine learning model may be trained based on the random forest algorithm. Random forest algorithm is used for regression analysis and classification statistics. In actual operation, multiple sets of training samples are randomly selected multiple times with replacement to generate multiple new training samples. Then decision tree models are constructed separately based on the multiple new training samples generated and multiple decision tree models are merged together to combine into the entire random forest model. When a new descriptor needs to be used to predict the attributes of the target compound, each decision tree model makes a prediction. Then the predicted values could be averaged to obtain the actual output and better predicted values of the target compound attributes for subsequent screening.

It is contemplated that one skilled in the art could training based on the boosted versions of the machine learning models and ensembles of the machine learning models combined into one aggregated model and another. The most suitable training algorithm can be selected by comparing the predicted value of the machine learning model trained by different algorithms with the true value.

The unit 404 is a determining unit, which is configured to generate a plurality of candidate recipes, perform simulation synthesis for each of the plurality of candidate recipes, and calculate on the simulation synthesis one or more descriptors of each candidate recipe, determine one or more target recipes from the plurality of candidate recipes according to the trained machine learning model.

In the determining unit 404, a plurality of candidate recipes for a compound could be generated at random by predefined process parameters and chemical reaction conditions as discussed above. The process parameters are generated randomly within the predefined limit and the predefined limit could be set by the user. Further, in addition to the process parameter, the recipe could also include ingredient type, ingredient ratio and other possible factors which define the recipe and determine its behavior and are varied in an experiment. A plurality of candidate recipes include multiple ingredient type, multiple ingredient ratio and multiple process parameters generated randomly within the predefined limit. Thousands of possible recipes would be generated for the subsequent simulations. The recipes could be randomly generated under restricted conditions, which are affected by multiple conditions such as raw material parameters, process parameters, reaction processes and so on. The simulation synthesis could be performed for each of the plurality of candidate recipes. The simulation result includes the reaction products of recipes and the descriptor is used to characterize the simulated product of each of the plurality of candidate recipes. As discussed in the training unit 403, a machine learning model (that is, function between the descriptors and the properties) is generated by machine learning algorithm. Therefore, using the machine learning model trained with the training unit 403, each of the multiple sets of candidate descriptor values is used as the input of the machine learning model, and a set of properties of the recipes could be predicted respectively.

In the determining unit 404, a plurality of candidate recipes could be generated by using a statistical experimental design, such as full factorial, partial factorial, and latin-hypercube. According to the function in unit 403 (properties=f (descriptors)), the predicted value of the properties could be obtained by using each of the values of the multiple sets of candidate descriptors as the input of the plurality of machine learning models. The output of the machine learning models in unit 404 are the predicted properties used to describe the attributes of the used recipes,

In process of determining one or more target recipes from the plurality of candidate recipes, an estimate value may be calculated from the predicted properties for each of candidate recipes. Meanwhile, a corresponding expected value may be predefined for each attribute of the target compound by the user based on the historical experimental data, and the expected value may be a certain numerical range. The candidate recipes with the evaluation value in the certain numerical range could be selected for further screening. In actual screening of the recipes, the evaluation values would be ranked from top to bottom or by other sort ways. The predicted value of each attribute of the target compound can be compared with the expected value, and a certain number of the recipes that meet the expected value condition could be initially selected from the multiple sets of the recipes. At the same time, if the number of the recipes that meet the conditions is insufficient after preliminary screening, the expected value range could be widened to obtain the specified number of the recipes. In the subsequent screening, multiple sets of the recipes could be screened based on the distances between the multiple sets of the recipes and the synthesis recipes of the actual compound in a multi-dimensional space. For example, the multidimensional scaling (MDS) method could be used in simulated data statistics to screen the appropriate recipe from the plurality of candidate recipes.

For example, if there are multiple scaling factors used to estimate a recipe and the factor are related, the MDS method could be used to reduce the multiple factors (multiple dimensions) to two factors (two dimensions). In MDS method, the projection method could be used to screen points in a multi-dimensional space, wherein one point in the space could be equivalent to the one parameter in recipe (for example, ingredient, ingredient type, process parameters). One type of parameter corresponds to one dimension in space. If five parameters (five dimensions) are used to estimate recipe, the MDS method could reduce five dimensional parameters to two dimensional parameters and the reduced two dimensional parameters could be drawn on 2D plot. The points on 2D plot could represent the five dimensional parameters. The appropriate recipe could be determined more intuitively and effectively with the help of the 2D plot. In particular, two attributes are selected from a set of attributes of the target compound to construct a two-dimensional space. The points of multiple sets of the recipes in the multi-dimensional space are projected onto the two-dimensional space, which is reducing multiple dimensions to 2 dimensions. Based on the reduced 2 dimensional space, the generated points could be compared with the historic points in terms of ingredients, the amounts of the ingredients, the reaction processes step and another other process parameter determined by the user in a 2D plot.. The points that meet the requirements of the recipes are screened out, and then the other two attributes are selected to repeat the above steps, until the synthesis recipes that meet the requirements are finally selected. The MDS method could be implemented by the commonly used statistical tools in the art, like R programming language, SPSS programming language and so on.

The determining unit 404 uses a machine learning model to select one or more sets of recipes from the multiple sets of recipes as the synthesis parameters for synthesizing the target compound, and the selected synthesis recipes of the target compound will eventually be determined by laboratory verification.

In one embodiment, in the determining unit 404, a plurality of descriptors are selected according to each of the attributes of the compound that need to be predicted. As mentioned above, the descriptors include composition descriptor, the geometric descriptor, the topological descriptor and the quantum chemical descriptor and so on. Since a plurality of descriptors of compound described may be described in a similar attribute. The descriptors may have a high degree of correlation with the attributes. For example, some descriptors exhibit a ratio changes in the Monte Carlo simulations. In addition, some descriptors may have little effect on the target attributes of the simulated compound. For example, when some descriptors are changed in a large proportion in the Monte Carlo simulation, the attributes of the compound hardly change. Therefore, it is necessary to select a set of descriptors that are relevant and highly related to the attributes of the compound from multiple descriptors describing the attributes of the simulated compound, and the descriptors in the set of descriptors have different values relative to the simulated compound. The selected descriptors can cover and describe the required attributes of the simulated compound. Since that, after step S201, step S202 and before step 204, a plurality of descriptors related to the attributes of the compound that need to be predicted is selected.

According to another embodiment of the present invention, a system 500 for determining a target recipe of a compound with desired attributes is provided.

Fig. 5 is a block diagram illustrating a computer system for implementing the methods from the invention according to the present disclosure. As shown in FIG. 5, the system 500 may include one or more computer processors 502, each processor having one or more processor cores. The computer processor 502 may include any type of single-core or multi-core processor. Each processor may include a central processing unit (CPU) and one or more levels of cache. The computer processor 502 may be implemented as an integrated circuit.

The system 500 may comprise a computer-readable storage medium 504, the computer- readable storage medium 504 may be any type of temporary and/or permanent storage including but not limited to, volatile and non-volatile memory, optical, Magnetic and/or solid state memory. Volatile memory may include but is not limited to static and/or dynamic random access memory. Non-volatile memory may include but is not limited to erasable programmable read-only memory, phase change memory, resistance memory, and the like. In some embodiments, the computer-readable storage medium 504 includes a magnetic hard disk, a solid-state hard disk drive, a semiconductor storage device, a read-only memory (ROM), a flash memory or any other computer-readable storage medium capable of storing program instructions or digital information. The computer-readable storage medium 504 may store implemented as steps of the method of synthesis parameters determined on the compound, which may be a collection of programming instructions. The data information and program information in the computer-readable storage medium 504 could be distributed as a computer program product in the computer-readable storage medium. The various components of the system 500 may be coupled to each other components via a system bus 506. The above description is only for the preferred embodiments of the present invention, and is not intended to limit the scope of the present invention. One skilled in the art will be able to design various modifications without departing from the spirit and the appended claims the present invention.