Actelion Property Explorer
This is the details about individual properties' relevance and applied prediction algorithms in the OSIRIS Property Explore.
Toxicity Risk Assessment
The toxicity risk predictor locates fragments within a molecule which indicate a potential toxicity risk. Toxicity risk alerts are an indication that the drawn structure may be harmful concerning the risk category specified. For assessing the toxicity prediction's reliability we ran a set of toxic compounds and a set of presumably non-toxic compounds through the prediction. The diagram below shows the results obtained by predicting all available structures of four subsets of the RTECS database.
E.g. all structures known to be mutagenic were run through the mutagenicity assessment. 81 % of these structures where found to bear a high or medium risk of being mutagenic. As a controlset served a collection of traded drugs of which the mutagenicity risk assessment revealed only 12 % of potentially harmful compounds.
The logP value of a compound, which is the logarithm of its partition coefficient between n-octanol and water, is a well established measure of the compound's hydrophilicity. Low hydrophilicities and therefore high logP values may cause poor absorption or permeation.
It has been shown for compounds to have a reasonable propability of being well absorbt their logP value must not be greater than 5.0. The distribution of calculated logP values of more than 3000 drugs on the market underlines this fact (see diagram).
Actelion's in-house logP calculation method is implemented as increment system adding contributions of every atom based on its atom type. Alltogether the cLogP predicting engine distinguishes 368 atom types which are composed of various properties of the atom itself (atomic no and ring membership) as its direct neighbours (bond type, aromaticity state and encoded atomic no). More than 5000 compounds with experimentally determined logP values were used as training set to optimize the 369 contribution values associated with the atom types.
The correlation plot (see diagram) shows calculated versus experimentally determined logP values of an independent test set of more than 5000 compounds being different from the training set.
Aqueous Solubility Prediction
The aqueous solubility of a compound significantly affects its absorption and distribution characteristics. Typically, a low solubility goes along with a bad absorption and therefore the general aim is to avoid poorly soluble compounds. Our estimated logS value is a unit stripped logarithm (base 10) of a compound's solubility measured in mol/liter.
The diagram below shows that more than 80% of the drugs on the market have an (estimated) logS value greater than -4.
Similar to our in-house logP calculation we assess the solubility via an increment system by adding atom contributions depending on their atom types. The atom types employed here differ slightly from the ones used for the cLogP estimation in that respect that the ring membership is not considered. Still there are 271 distinguishable atom types describing the atom and its near surrounding. More than 2000 compounds with experimentally determined solubility values (25 degrees, pH=7.5) were used as training set to optimize the contribution values associated with the atom types.
Optimizing compounds for high activity on a biological target almost often goes along with increased molecular weights. However, compounds with higher weights are less likely to be absorbed and therefore to ever reach the place of action. Thus, trying to keep molecular weights as low as possible should be the desire of every drug forger.
The diagram below shows that more than 80 % of all traded drugs have a molecular weight below 450.
Fragment-based Drug-likeness Prediction
There are many approaches around that assess a compound's drug likeness partially based on topological descriptors, fingerprints of MDL structure keys or other properties as cLogP and molecular weights. Our approach is based on a list of about 5300 distinct substructure fragments with associated drug-likeness scores. The druglikeness is calculated with the following equation summing up score values of those fragments that are present in the molecule under investigation:
The fragment list was created by shreddering 3300 traded drugs as well as 15000 commercially available chemicals (Fluka) yielding a complete list of all available fragments. As a restriction the shredder considered only rotatable bonds. In addition the substitution modes of all fragment atoms were retained, i.e. fragment atoms that hadn't been further substituted in the original compounds were marked as such and atoms being part of a bond that was cut were marked as carrying further substituents. This way fragment substitution patterns are included in the fragments. The occurrence frequency of every one of the fragments was determined within the collection of traded drugs and within the supposedly non-drug-like collection of Fluka compounds. All fragments with an overall frequency above a certain threshold were inverse clustered in order to remove highly redundant fragments. For the remaining fragments the drug-likeness score was determined as the logarithm of the quotient of frequencies in traded drugs versus Fluka chemicals.
A positive druglikeness value states that a molecule contains predominantly fragments which are frequently present in commercial drugs. What it doesn't necessarily mean, though, is that these fragments are well balanced concerning other properties. For instance, a molecule may be composed of drug-like, but lipophilic fragments only. This molecule will have a high druglikeness score although it wouldn't really qualify for being a drug because of its high lipophilicity.
The drug score combines druglikeness, cLogP, logS, molecular weight and toxicity risks in one handy value than may be used to judge the compound's overall potential to qualify for a drug. This value is calculated by multiplying contributions of the individual properties with the first equation:
ds is the drug score. si are the contributions calculated directly from of cLogP, logS, molweight and druglikeness (pi) via the second equation which describes a spline curve. Parameters a and b are (1, -5), (1, 5), (0.012, -6) and (1, 0) for cLogP, logS, molweight and druglikeness, respectively. ti are the contributions taken from the 4 toxicity risk types. The ti values are 1.0, 0.8 and 0.6 for no risk, medium risk and high risk, respectively.