java -jar mib.jar input_option [processing_options]
where the file_name is a file containing set of SMILESes, or SDfile. SDfile will be automatically recognized by an extension ".sdf", ".sd" or ".mol".
All other files are assumed to contain SMILES. In this case SMILES must be a first item in a line.
The line may contain also other items (molecule name, other data), tab separated.
Compressed files with extension .gz, gzip, or zip may be processed without necessity to unpack them.
Data from the SDfile may be retrieved by using the
-keep parameter. For example:
java -jar mib.jar -f mdpi.smi -keep "MolName,Amount"
retrieves also parameters named MolName and Amount from the SDfile.
If you want to retrieve all parameters use the parameter
Processing of a single SMILES is possible by using a
-smi input parameter (in this case SMILES string must be in quotes).
java -jar mib.jar -smi 'molecule_smiles'
When no output options are provided, a file with canonised SMILES is generated and sent to the standard output.
When a SDfile should be generated, the parameters
-out sdf is required.
-nostereo stereo information in molecules will not be considerer
-normalizeCharges atomic charges will be normalized when possible
-singlePart only the main (largest) part of a multipart molecule will be processed
-standardize is a shortcut for all three previous options together
-normalizeIsotopes removes all isotope labels from atoms
-isostandardize is a shortcut for -standardize plus -normaliseIsotopes
The mib package performs strict valence checking and discards molecules violating organic valence rules (when skipping molecules with errors, respective error messages are issued). They keyword:
-kmwve (mnemonic for "keep molecules with valence errors") allows processing also of molecules with non-standard valencies
The following properties are available on the output (in this order); items are tab separated:
logP octanol-water partition coefficient
PSA polar surface area
number of nonhydrogen atoms
number of hydrogen-bond acceptors (O and N atoms)
number of hydrogen-bond donors (OH and NH groups)
number of Rule of 5 violations
number of rotatable bonds
When using the keyword
-header in the SMILES output mode, the first line of output is a header with property names.
The mib property calculation engine is used in numerous instances by our industry customers and powers also our free online property calculation tool.
-formulathe molecule formula is part of the output. This keyword may be used together with the
mib allows fragmentation of molecules into various types of fragments. Below examples of various fragmentation options are given. The parent molecule used for the fragmentation is the structure shown above.
-r1 - substituents (Rgroups); all "breakable" nonring single bonds are broken to generate substituent
-r2 - spacers (groups with 2 attachment points)
-ringSystems - ring systems is a collection of fused or spiro rings
-simpleRings - simple rings which this molecule contains. Simple ring does not need to be a valid molecule (in example below, in the sulfur ring only 2 atoms are aromatic) therefore results are provided as fragment SMILES (note that aromatic bonds in fragments are displayed as dashed lines on images below)
C1Sc:nN1 c:1:n:n:c:n:1 c:1:c:c:c:c:c:1 o:1:c:c:c:c:1
-scaffold - is ring part of a molecule (rings systems and their connections) without aliphatic substituents
-hose - generates so called HOSE fragments (atoms with environment). A HOSE fragment consists of a central atom (first atom in HOSE SMILES) and several levels of surrounding atoms.
HOSE fragments may be used as structural descriptors by QSAR studies or fragment-based property prediction applications.
By default, the size of the r1 and r2 fragments is limited to 15 atoms. This may be changed by a parameter
-maxsize n (this parameter does not affect other types of fragments).
Generated fragments are written in the output line after input SMILES (in a canonized form) and any other parameters from the input. All items are tab separated.
-list allows to perform fragmentation statistics for large collections of molecules. On the output a list of fragments is provided, together with the number of molecules containing these fragments.
The following command provides list of substituents up to 8 atoms, which are most common in GPCR ligands.
java -jar mib.jar -f gpcr.smi -standardize -r1 -max 8 -list > gpcr.r1
The first lines of the output file are
[R]C 539 [R]c1ccccc1 213 [R]O 204 [R]OC 193 [R]N 191 [R]Cl 172 [R]F 133 [R]CC 83 [R]CCC 71 [R]C(O)=O 67 . . .
When using parameter
-count also the number of fragments of particular type will be provided, in the form fragment1 count1 fragment2 count2 (tab separated)
For example command
java -jar mib.jar -smi 'c1ncccc1' -hose -maxSize 1
c1ccncc1 [n] [cH]
while when using also the parameters
-count the output includes also the number of respective fragments in the molecule
c1ccncc1 [n] 1 [cH] 5
When using the parameter
-count together with the parameter
-list, the number on output provides the total number of fragments of this type in the molecule set (and not just the number of molecules with this fragments as by
-maxsize sets the maximum size of generated fragments. This is ignored when generating rings and scaffolds, and makes sense only by -r1 and -r2 fragments and HOSE fragments (in this case -maxsize is the number of surrounding levels, 1 - just central atom, 2 - single level of neighbors, 3 - 2 levels of neighbors, etc)
Fragmentation is applied only to the main part of multipart molecules (as the keyword
-singlepart would be used). [This may be modified in later release of the toolkit].
-tautomer. In the output line canonized SMILES of the original molecule, followed by eventual data contained in the input line are provided, followed by a number giving the number of generated tautomers and SMILES codes of these tautomers. All data in line are tab separated.
java mib -smi 'n1c(O)cccc1' -tautomers
provides the following output
Oc1ccccn1 2 O=c1cccc[nH]1 Oc1ccccn1
To get tautomers of a single molecule one tautomers SMILES per line (this output may be displayed for example by the Molinspiration molecule viewer) use the
java -jar mib.jar -smi 'Nc2nc1nc[nH]c1c(=S)[nH]2' -tautomers -list > tautomers.smi
saves the following output to the tautomers.smi file (the source SMILES in this example is thioguanine)
Nc2nc(=S)c1nc[nH]c1[nH]2 1 Sc1nc(=N)[nH]c2nc[nH]c12 1 Sc1nc(=N)[nH]c2[nH]cnc12 1 N=c2[nH]c(=S)c1nc[nH]c1[nH]2 1 Sc1[nH]c(=N)nc2nc[nH]c12 1 Nc2nc(S)c1nc[nH]c1n2 1 Nc2nc1ncnc1c(S)[nH]2 1 Sc1[nH]c(=N)nc2[nH]cnc12 1 Nc2nc(S)c1ncnc1[nH]2 1 Nc2nc(=S)c1[nH]cnc1[nH]2 1 Nc2nc(S)c1[nH]cnc1n2 1 Sc1[nH]c(=N)[nH]c2ncnc12 1 Nc2nc1[nH]cnc1c(=S)[nH]2 1 N=c2[nH]c(=S)c1[nH]cnc1[nH]2 1 Nc2nc1nc[nH]c1c(=S)[nH]2 1
The 15 tautomers generated are shown below
Some molecules may have very large number of tautomers (several hundreds), to keep computational time reasonable, the default number of generated tautomers is limited to 50. This limit may be increased by parameter
-maxtautomers n .
Tautomers are listed on the output in alphabetic order.
EZ stereochemistry on tautomeric bonds is not preserved during tautomer enumeration.
Contact us please at info[at]molinspiration.com to arrange an evaluation license of mib.