The miscreen engine is written in Java, and therefore may be used on
any platform where the Java (version 1.4 or higher) is installed. Java is
currently supported practically on all platforms (Windows, Mac, LINUX, Unix). The
latest version of Java may be downloaded for free from www.java.sun.com. (You
may find out which version of Java is installed on your machine by command java -version
). No other software is required to run miscreen.
Functions of the miscreen engine are available from Windows command prompt or UNIX command line.
1. Generation of fragments
To generate a bioactivity model a training set of active molecules which are know to be active on the given target needs to be available. These molecules may be hits from in-house screening, structures from commercial bioactivity databases, or molecules collected from literature, patents or public sources such a PubChem or ChEMBL. To get the best results, the training set should be, of course, as large and diverse as possible. The method provides, however, reliable results also with quite limited training sets, in extreme case one can use even a single active molecule as a starting point.
As a reference also a set of inactive molecules is required. In most cases this is simply set of inactives from the HTS campaign. When only active molecules are available and no information about inactive molecules can be obtained (for example when using information about active molecules from literature, or data from a competitor patent) one can use as a reference a "background" set of average drug-like molecules (see the next step). In this case only fragments for active molecules need to be generated.
Fragments are generated by the following commands:
java -jar miscreen.jar -fragment active_molecules > active.frag
java -jar miscreen.jar -fragment inactive_molecules > inactive.frag
active_molecules
and inactive_molecules
are files with list of active
and inactive molecules used for the training (of course, you can use whatever filenames you want) encoded as a SMILES (one molecule per line, tab separated from the rest of data (these additional data are ignored)) or MDL SDfile.
Generated fragments are stored into files active.frag and inactive.frag (or whatever file names you choose).
Progress of processing will be shown on the screen. Fragmentation of a database with 10,000 molecules will require about 2 minutes.
2. Development of bioactivity model
The model is developed by analysing fragment files generated for active and inactive molecules in the previous step and comparing distribution of fragments in these sets.
Use the command:
java -jar miscreen.jar -createmodel -af active.frag -if inactive.frag > project.model
When no information about inactive molecules is available, one can use instead
a set of fragments from a large representative
collection of "average drug-like molecules" which may be obtained
from Molinspiration. Sometimes, especially in cases when training set of
inactive molecules is limited, better results are obtained by using these average drug-like fragments as a reference.
Ready to use models to identify GPCR ligands, ion channel modulators, kinase inhibitors, nuclear receptor ligands, protease inhibitors and inhibitors of other enzymes may be obtained from Molinspiration.
3. Actual virtual screening
Once a bioactivity model is generated, the actual virtual screening may be performed by a command:
java -jar miscreen.jar -model model_file -screen file_to_screen [-minscore x] > results
where
model_file
is a file with the model generated in step 2
file_to_screen
is a file with molecules to be screened, with SMILES as a first
item, tab separated from the rest of data (molecule identifier) or SDfile.
When using the option -minscore
(for example -minscore 0.3
) only molecules with
activity score greater than specified value will be sent to the output.
Results will be saved into the file results
, either as SMILES (if the file_to_screen was in SMILES format) in a form SMILES, activity_score
additional data, tab separated.
If the screened file was submitted in SDfile format, calculated bioactivity scores are added to the data section of individual molecules.
-kmwnv
this is a mnemonic for 'keep molecules with nonstandard valences'. Molinspiration software is quite "picky" about correct valences, therefore "exotic" molecules with nonstandard valences are rejected. To process also such molecules, you may use use the -kmwnv
option. But do not use this option blindly, be aware, what you are doing.
-multiscreen
is it possible to calculate activity scores for all these targets in one run. On the output six numbers are provided - GPCR score, IC score, KI score, NR score, PR score and EZ score (in this order).
-multiscreen
command
java -jar miscreen.jar -multiscreensmi 'SMILES'
or
java -jar miscreen.jar -multiscreen smilesFile
all model files (gpcr.model, ki.model ...) must be in the working directory.
In some cases when building a model for very large data sets (hundreds of thousands of molecules) an OutOfMemoryError is issued. In this case start Java with more memory by using the -mx option in the command line, for example
java -jar -mx1000m miscreen.jar parameters
(details depend on your computer system, consult your local Java expert).
Do not edit data files generated by miscreen by hand, the program relies on specific format of the data.
You may wish to test an interactive calculation of activity scores for several important drug classes available here (choose option [Predict Bioactivity]).
Do not hesitate to contact Molinspiration if you have additional questions or comments or in case you wish to test the miscreen package.
We wish you a lot of active hits identified by miscreen !