miscreen - Molinspiration Fragment-based Virtual Screening Engine v2016.02

Molinspiration miscreen engine allows fast prediction of biological activity - virtual screening of large collections of molecules and selection of molecules with the highest probability to show biological activity. The screening is based on identification of fragments or substructure features typical for the active molecules. No information about the 3D structure of receptor is necessary, the set of active molecules (encoded as SMILES or SDfile) is sufficient for training, therefore the procedure may be applied also in the early project stage when detailed information about the binding mode is not yet available. The Molinspiration virtual screening is fast (100,000 molecules may be screened in about 30 minutes) and therefore allows processing of very large molecular libraries. Validation tests performed by our company, as well as results of our customers on various target classes (including kinase inhibitors, various GPCR targets, different enzymes, pesticides and others) show 10 to 20-fold increase in hit rate in comparison with random selection of molecules for screening. Another advantage of Molinspiration fragment-based screening procedure is its ability to identify also novel active scaffolds which are not present in the training set (so called "scaffold hopping"). See here for more details about the Molinspiration virtual screening protocol.

The miscreen engine is written in Java, and therefore may be used on any platform where the Java (version 1.4 or higher) is installed. Java is currently supported practically on all platforms (Windows, Mac, LINUX, Unix). The latest version of Java may be downloaded for free from www.java.sun.com. (You may find out which version of Java is installed on your machine by command java -version). No other software is required to run miscreen.

Functions of the miscreen engine are available from Windows command prompt or UNIX command line.

Using the miscreen

To run a virtual screening, three steps need to be performed:
  1. generation of fragments from a training set containing active and inactive molecules
  2. development of a bioactivity model (calculation of fragment activity contributions) from fragment files generated in step 1
  3. actual virtual screening by using the model generated in step 2

1. Generation of fragments

To generate a bioactivity model a training set of active molecules which are know to be active on the given target needs to be available. These molecules may be hits from in-house screening, structures from commercial bioactivity databases, or molecules collected from literature, patents or public sources such a PubChem or ChEMBL. To get the best results, the training set should be, of course, as large and diverse as possible. The method provides, however, reliable results also with quite limited training sets, in extreme case one can use even a single active molecule as a starting point.

As a reference also a set of inactive molecules is required. In most cases this is simply set of inactives from the HTS campaign. When only active molecules are available and no information about inactive molecules can be obtained (for example when using information about active molecules from literature, or data from a competitor patent) one can use as a reference a "background" set of average drug-like molecules (see the next step). In this case only fragments for active molecules need to be generated.

Fragments are generated by the following commands:

java -jar miscreen.jar -fragment active_molecules > active.frag

java -jar miscreen.jar -fragment inactive_molecules > inactive.frag

active_molecules and inactive_molecules are files with list of active and inactive molecules used for the training (of course, you can use whatever filenames you want) encoded as a SMILES (one molecule per line, tab separated from the rest of data (these additional data are ignored)) or MDL SDfile.

Generated fragments are stored into files active.frag and inactive.frag (or whatever file names you choose).

Progress of processing will be shown on the screen. Fragmentation of a database with 10,000 molecules will require about 2 minutes.

2. Development of bioactivity model

The model is developed by analysing fragment files generated for active and inactive molecules in the previous step and comparing distribution of fragments in these sets.
Use the command:

java -jar miscreen.jar -createmodel -af active.frag -if inactive.frag > project.model

When no information about inactive molecules is available, one can use instead a set of fragments from a large representative collection of "average drug-like molecules" which may be obtained from Molinspiration. Sometimes, especially in cases when training set of inactive molecules is limited, better results are obtained by using these average drug-like fragments as a reference.
Ready to use models to identify GPCR ligands, ion channel modulators, kinase inhibitors, nuclear receptor ligands, protease inhibitors and inhibitors of other enzymes may be obtained from Molinspiration.

3. Actual virtual screening

Once a bioactivity model is generated, the actual virtual screening may be performed by a command:

java -jar miscreen.jar -model model_file -screen file_to_screen [-minscore x] > results


model_file is a file with the model generated in step 2

file_to_screen is a file with molecules to be screened, with SMILES as a first item, tab separated from the rest of data (molecule identifier) or SDfile.

When using the option -minscore (for example -minscore 0.3) only molecules with activity score greater than specified value will be sent to the output.

Results will be saved into the file results, either as SMILES (if the file_to_screen was in SMILES format) in a form SMILES, activity_score additional data, tab separated. If the screened file was submitted in SDfile format, calculated bioactivity scores are added to the data section of individual molecules.

Model generation options

An option introduced in miscreen 2007.04 is:
-kmwnv this is a mnemonic for 'keep molecules with nonstandard valences'. Molinspiration software is quite "picky" about correct valences, therefore "exotic" molecules with nonstandard valences are rejected. To process also such molecules, you may use use the -kmwnv option. But do not use this option blindly, be aware, what you are doing.

Additional hints

In Molinspiration we developed a set of standard models for calculation of activity scores against the most important drug targets - GPCR ligands, ion channel modulators, kinase inhibitors, nuclear receptor ligands, protease inhibitors and inhibitors of other enzymes (the same models as used in our interactive bioactivity prediction service). These models are based on Molinspiration in-house library of several thousand active ligands. Molinspiration customers may obtain these model score files at no cost. With the option -multiscreen is it possible to calculate activity scores for all these targets in one run. On the output six numbers are provided - GPCR score, IC score, KI score, NR score, PR score and EZ score (in this order).
When using the -multiscreen command

java -jar miscreen.jar -multiscreensmi 'SMILES'
java -jar miscreen.jar -multiscreen smilesFile

all model files (gpcr.model, ki.model ...) must be in the working directory.

In some cases when building a model for very large data sets (hundreds of thousands of molecules) an OutOfMemoryError is issued. In this case start Java with more memory by using the -mx option in the command line, for example

java -jar -mx1000m miscreen.jar parameters

(details depend on your computer system, consult your local Java expert).

Do not edit data files generated by miscreen by hand, the program relies on specific format of the data.

You can also download a simple Python script screen.py which automatizes the screening process. You have to input only a set of active and inactive molecules (or possibly a set of reference inactive fragments) and the whole screening is run automatically.

You may wish to test an interactive calculation of activity scores for several important drug classes available here (choose option [Predict Bioactivity]).

Do not hesitate to contact Molinspiration if you have additional questions or comments or in case you wish to test the miscreen package.

We wish you a lot of active hits identified by miscreen !