====== Chemistry Informatics ====== ===== Software to draw chemical structures ===== * [[https://sourceforge.net/apps/mediawiki/cdk/index.php?title=JChemPaint|JChemPaint]] * [[http://www.chemaxon.com/products/marvin/marvinview/|MarvinView]] ([[http://www.chemaxon.com/marvin/sketch/index.php|live demo]]) * [[http://www.acdlabs.com/resources/freeware/chemsketch/|ACD ChemSketch]] Plugins: * [[https://sourceforge.net/apps/mediawiki/oochemistry/index.php?title=OpenOffice.org_Chemistry|OOChemistry]] -- JChemPaint plugin for OpenOffice, which allows to embed chemical molecules into ODF. * [[http://chem4word.codeplex.com/|Chem4Word]] -- Chemistry Add-in for Microsoft Word, which allows to tag chemical entities and change the representation (2D, common name, formular, ...). ===== Cheminformatics toolkits ===== * [[http://www.eyesopen.com/oechem-tk|OEChem]]: * Facile management of molecules, atoms, bonds, and conformers * Conformational and frame-of-reference coordinate transformations * Maximum common substructure and exact substructure searching * Extremely fast 2D similarity using LINGOS * Perception of aromaticity with multiple models * Chemical reaction parsing and processing * Tetrahedral and E/Z stereochemistry recognition * Ring perception and Kekulization * Molecular normalization and canonicalization * Multiconformer molecule handling * Support for residues and bases * Ability to store and recall generic primitives or user-defined objects on molecules, atoms, bonds, or conformers * Multiple file format handling: robust reading and specification-compliant writing of: [[wp>SMILES]], SLN, [[wp>Structure data file#SDF|SDF]], [[wp>MDL Molfile|MOL]], MOL2, [[wp>PDB]], [[wp>FASTA]], [[wp>MOPAC]], MacroModel, XYZ, CCP4, XPLOR, and OEBinary. * Available in C++, Python & Java * [[http://www.ra.cs.uni-tuebingen.de/software/joelib/|JOELib]]: * Graph based structure to modify molecular structures * Classes for getting the aromatic flags for atoms and bonds * Classes for getting the hybridisation of atoms * Descriptor calculation classes (I-State, E-State, Burden, ...) * **SM**iles **AR**bitrary **T**arget **S**pecification ([[wp>Smiles arbitrary target specification|SMARTS]]) substructure search * Base classes for reading and writing molecular file formats * Support for [[wp>SMILES]], Chemical Markup Language ([[wp>Chemical Markup Language|CML]]), [[http://www.xemistry.com/products.htm|CACTVS]]'s clear text format (CTX), POVRay export (including aromatic rings) * Atom and bond properties classes (including import and export filter) * Processes / External processes and process decision filters * Regression module using Neural Networks ([[http://www.ra.cs.uni-tuebingen.de/software/JavaNNS/welcome.html|JavaNNS]]) * Regression module using Support Vector Machines * JOELib-Matlab connection, e.g. for feature extraction * External processing modules for 3D structure generation with [[http://www2.chemie.uni-erlangen.de/software/corina/|Corina]] and descriptor calculation with [[http://www2.ccc.uni-erlangen.de/software/petra/intro.phtml|Petra]] (especially atom and bond property descriptors). * Database module checking for duplicate molecules * Available in Java * [[http://cdk.sourceforge.net/|CDK]]: * 2D rendering (see also [[http://sourceforge.net/apps/mediawiki/cdk/index.php?title=Renderer_Tutorial|Renderer Tutorial]], [[http://chem-bla-ics.blogspot.com/search?q=%22cdk-jchempaint%22|Chemblaics Blog]] by Egon Willighagen and [[http://sourceforge.net/mailarchive/message.php?msg_id=24861290|this]] plus [[http://sourceforge.net/mailarchive/message.php?msg_id=24397529|this]] maillist posts for more examples) * JChemPaint 2D diagram editor * Structure Diagram Layout * 3D Rendering * integration with [[http://www.jmol.org/|Jmol]] * Input/Output * [[wp>Chemical Markup Language|CML]], [[wp>SMILES]] parsing/generation, [[wp>MDL Molfile]] support (limited), [[wp>InChI]] (via JNI bridge), readers for XYZ, ShelX, HIN, GhemicalMM, Mol2 * interface to [[http://openbabel.sf.net/|OpenBabel]] (via command line) * rule based [[wp>IUPAC]] name parser * Virtual Screening * molecular, atomic and bond descriptors * LogP, TPSA, Rule-of-Five, many more * Gasteiger-Marsili charges (sigma *and* pi) * interface to R and Weka for modelling * path-based Fingerprinter * Modelling * 3D model builder * atom typing * MM2, [[wp>MMFF94]], CDK-internal * [[wp>MMFF94]] force field * Kabsch alignment * Chemical Graphs * isomorphism detection * maximal common substructure search * substructure searching ([[wp>Smiles arbitrary target specification|SMARTS]] like) * ring searches (Smallest Set of Smallest Rings (SSSR), all rings) * Properties * [[wp>Nuclear magnetic resonance|NMR]] prediction * Structure Generation * deterministic generator * stochastic generators (genetic algorithm- and simulated annealing- based) * [[http://biojava.org/|BioJava]] interface * Protein Structures * [[wp>Protein Data Bank (file format)|PDB]] reading * active site detection * sequence to connectivity table * [[http://cdk-taverna-2.ts-concepts.de/wiki/index.php?title=Main_Page|CDK-Taverna Project]] * [[http://chem-bla-ics.blogspot.com/2010/12/blog-post.html|Commercial and proprietary cheminformatics tools]] * A task-oriented comparison of multiple cheminformatics toolkits, [[http://www.dalkescientific.com/writings/diary/archive/2004/01/03/available_toolkits.html|Chemical Informatics Toolkits]] by [[dalke@dalkescientific.com|Andrew Dalke]] * [[http://www.dalkescientific.com/writings/Python-EuroQSAR2008.pdf|Python for Computational Chemistry]] * [[http://www.dalkescientific.com/writings/EuroKNIME.pdf|Dataflow vs. Scripting Languages]] * [[http://www.dalkescientific.com/writings/I590-OEChem.pdf|Python and Chemical Informatics]] * [[https://ambit.uni-plovdiv.bg:8443/ambit2/depict?search=InChI=1S/CH3/h1H3|Visualization of methyl by various libraries]] ===== Bio Events ===== * [[http://www.iscb.org/iscb-conferences|International Conference on Intelligent Systems for Molecular Biology (ISMB)]] * [[http://www.bio-itworldexpo.com/|BioIT World Conference & Expo]] * [[http://convention.bio.org/events.aspx|Bio International Convention]] * [[http://www.biotnet.org/courses-events|Courses & Events on Bioinformatics Training Network]] * [[http://bio.org/events/|Conferences & Events on bio.org]] * [[http://www.chemistry-conferences.com/calendar.htm|Chemistry Conferences WorldWide Calendar]] ===== Learning material ===== * [[http://cshl.edu/public/educat.html|Cold Spring Harbor Laboratory]] * [[http://citeseer.ist.psu.edu/|Scientific Literature Digital Library]] * [[http://ees.elsevier.com/jocs/|Journal of Computational Science]] * [[http://www.ncsb.nl/e-courses|Basic Introduction to Systems Biology – Online Course]] * [[http://www3.open.ac.uk/study/undergraduate/science/chemistry/index.htm|The Open University – Chemistry]] ===== Blogs and forums ===== * [[http://cactus.nci.nih.gov/blog/|CADD Group Chemoinformatics Tools and User Services blog]] * [[http://opensource.cheme.info/|Open Source Chemical Engineering Software Forum]] * [[http://biostar.stackexchange.com/|A question and answer site for bioinformatics]] * [[http://blueobelisk.shapado.com/questions/tags/cheminformatics|Blue Obelisk Exchange]] -- the place to ask about the use and development of Open Data, Open Source, and Open Standards * [[http://habrahabr.ru/hub/biotech/|Биотехнологии на Хабрахабре]] * [[http://habrahabr.ru/hub/bioinformatics/|Биоинформатика на Хабрахабре]] * [[http://www.jcheminf.com/|Journal of Cheminformatics]] ===== Search engines and databases ===== * [[http://www.ncbi.nlm.nih.gov/pmc/|PubMed]] and [[http://www.gopubmed.org/web/gopubmed/|GoPubMed]] * [[http://scholar.google.com/|Google Scholar]], [[http://www.google.com/patents|Goole Patents]](([[http://googlepublicpolicy.blogspot.com/2010/06/free-download-10-terabytes-of-patents.html|UPSTO and Google made 10Tb of patent information available]])) * [[http://biobar.mozdev.org/index.html|biobar]] Firefox addon allows a biologist to browse and retrieve data from [[http://biobar.mozdev.org/Databases.html|many databases]] * [[http://reflect.ws/|Reflect]] -- Highlighting Proteins, and Small Molecule Names, similar to this: * [[http://enhancer.nanopublication.org/linker/|Concept Web Knowledge Enhancer]] -- highlight concepts for search ([[http://wikiprofessional.org/wikifier/index.php|another link]]) * [[http://bioportal.bioontology.org/|NCBO Bioportal]] -- ontologies used in biomedical communities * [[http://www.biocatalogue.org/|The BioCatalogue]] -- a curated catalogue of Life Science Web Services * [[http://www.google.com/Top/Science/Chemistry/Chemical_Databases/|Google DB Directory]] * [[http://www.slideshare.net/AntonyWilliams/online-public-compound-databases|Online Public Compound Databases]] ==== Chemical compound search ==== [[wp>Category:Chemical databases|Chemical databases]]: * [[http://pubchem.ncbi.nlm.nih.gov/pug_soap/pug_soap_help.html|PubChem]] * [[http://www.pharmaceuticalonline.com/article.mvc/IBM-Contributes-Data-To-The-National-0001|IBM Contributes Data To The National Institutes Of Health To Speed Drug Discovery And Cancer Research Innovation]] (see also [[youtube>0-C1ZEBK4ig|IBM BAO's strategic IP insight platform (SIIP)]]) * [[http://www.chemspider.com/Search.asmx|ChemSpider]] * [[http://chembank.broadinstitute.org/webServices.htm|ChemBank]] * [[http://chembank.broadinstitute.org/chemistry/search/input/similarity.htm|online]] * [[http://chembl.blogspot.co.uk/2013/12/surechembl-chemical-structure.html|SureChEMBL]] -- Chemical Structure Information in Patents * [[http://chem.sis.nlm.nih.gov/chemidplus/|ChemIDplus]] * [[http://drugbank.ca/search/chemquery?type=structure|DrugBank]] * [[http://www.emolecules.com/|eMolecules]] * [[wp>Beilstein database]] [[https://www.reaxys.com/reaxys/WebHelp/Reaxys_Help.htm#All_Files/Query_Page.htm|query]] * [[http://cactus.nci.nih.gov/chemical/structure/documentation|Chemical Identifier Resolver]] * {{chemical_identifier_resolver_-_indexing_and_analysis_of_available_chemistry_space.ppt|Chemical Identifier Resolver: Indexing and Analysis of Available Chemistry Space (Markus Sitzmann, Wolf-Dietrich Ihlenfeldt, Marc C. Nicklaus) [2005]}} * [[http://chem-bla-ics.blogspot.com/2012/04/lordags-goodies-1-chemical-identifier.html|Chemical Identifier Resolver plugin for Bioclipse]] * [[http://logic.pdmi.ras.ru/csclub/node/1080|Подструкурный поиск химических соединений в базах данных (Михаил Рыбалкин)]] ===== Text extraction and analysis ===== * [[http://osra.sourceforge.net|OSRA]] -- a utility designed to convert graphical representations of chemical structures * [[http://ggasoftware.com/opensource/imago|Imago OCR]] -- a toolkit for 2D chemical structure image recognition * [[habrahabr>172651|Построение системы оптического распознавания структурной информации на примере Imago OCR]] * [[https://digitalresearchtools.pbworks.com/w/page/17801708/Text%20Analysis%20Tools|Text Analysis Tools]] * [[http://incubator.apache.org/uima/|UIMA]] * [[http://minorthird.sourceforge.net/|MinorThird]] * [[http://www.nextmovesoftware.com/products/CaffeineFix.html|CaffeineFix]] * {{improved_chemical_text_mining_of_patents_using_infinite_dictionaries_translation_and_automatic_spelling_correction.pdf|Improved chemical text mining of patents using infinite dictionaries, translation and automatic spelling correction (Roger A Sayle, Plamen Petrov, Jon Winter and Sorel Muresan)}} ([[http://www.jcheminf.com/content/3/S1/O16|online]]) * {{preserving_nuance_in_chemical_nomenclature_translation.pdf|Preserving Nuance in Chemical Nomenclature Translation (Roger Sayle)}} * [[http://nextmovesoftware.com/blog/2015/04/27/chemistry-enabling-chinese-japanese-and-korean-patents/|Chemistry enabling Chinese, Japanese and Korean patents]] * [[http://wwmm.ch.cam.ac.uk/wikis/wwmm/index.php/Oscar3|Oscar3]] * {{semantic_analysis_of_chemical_patents.pdf|Semantic Analysis of Chemical Patents (David Jessop, Peter Murray-Rust, Lezan Hawizy) [2010]}} ([[http://egonw.github.com/acsrdf2010/pdfs/davidJessopAcs.pdf|online]]) * [[dropbox>ybs8s1v2omm0ead/high_throughput_identification_of_chemistry_in_life_science_texts.pdf|High-Throughput Identification of Chemistry in Life Science Texts (Peter Corbett and Peter Murray-Rust) [2006]]] ([[http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.106.2599|online]]) * [[http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.145.7875|Annotation of chemical named entities (Peter Corbett, Colin Batchelor, Simone Teufel) [2007]]] * [[http://chem-bla-ics.blogspot.com/2011/09/almost-year-ago-i-started-position-with.html|Text mining in Bioclipse with Oscar4]] * [[http://www.jcheminf.com/content/pdf/1758-2946-7-S1-S1.pdf|CHEMDNER: The drugs and chemical names extraction challenge (Martin Krallinger, Florian Leitner, Obdulia Rabal, Miguel Vazquez, Julen Oyarzabal, Alfonso Valen) [2015]]] * [[http://www.biocreative.org/tasks/biocreative-iv/chemdner-task-2-training-set/|CHEMDNER (Task 2) Training Set]] * [[http://www.bionlp-st.org/|The BioNLP Shared Task series]] represents a community-wide trend in text-mining for biology toward fine-grained information extraction (IE). * {{identification_of_chemical_entities_in_patent_documents.pdf|Identification of Chemical Entities in Patent Documents (Tiago Grego, Piotr Pęzik, Francisco M. Couto, and Dietrich Rebholz-Schuhmann) [2009]}} ([[http://www.springerlink.com/content/u62434887j60653w/|online]]) * {{towards_in-house_searching_of_markush_structures_from_patents.pdf|Towards in-house searching of Markush structures from patents (John Barnard, Matthew Wright) [2009]}} ([[http://www.sciencedirect.com/science/article/pii/S0172219008001385|online]]) * [[http://www.crowdanalytix.com/contests/data-mining-accelerating-drug-discovery-by-text-mining-of-patents/|Data mining: Accelerating Drug Discovery by Text Mining of Patents]] * [[https://www.surechembl.org/search/|Example of chemical extraction]] from [[http://www.surechem.org/|SureChem]] * [[http://www.chemicalize.org/?url=http%3A%2F%2Fv3.espacenet.com%2FpublicationDetails%2Fdescription%3Bjsessionid%3D127C4978594925DCF1D135A0DE56CB43.espacenet_levelx_prod_4%3FCC%3DEP%26NR%3D0930075A1%26KC%3DA1%26FT%3DD%26date%3D19990721%26DB%3D%26locale%3D#cID311225|Chemicalize project in action]] by [[http://www.chemaxon.com/|ChemAxon]] * [[http://www.chemicalize.org/structure/#!mol=Penicillin+V&source=fp|Example for phenopenicillin]] ===== Other ===== * [[livejournal>simulacrumtv/128261|Путь в создании любого нового лекарства и роль поисковиков химических соединений]] * [[http://www.jstatsoft.org/v18/i05|Chemical Informatics Functionality in R (Rajarshi Guha) [2007]]] -- describes the rcdk package that provides the R user with access to the CDK * [[http://pubs.acs.org/doi/abs/10.1021/ci8002123|Polymer Markup Language (PML). Chemical Markup, XML and the World-Wide Web. [2008]]] by Nico Adams, Jerry Winter, Peter Murray-Rust and Henry S. Rzepa (DOI: 10.1021/ci8002123) * [[SourceForgeMailThread>4CE25DB0.7050405%40innoq.com&forum_name=cdk-jchempaint|Drawing Polymers in JChemPaint]] * [[SourceForgeMailThread>4B82FEB8.9030008%40gmx.de&forum_name=cdk-jchempaint|Edit reaction with JChemPaintPanel]] * [[http://onlinelibrary.wiley.com/doi/10.1002/wcms.36/full|Representation of chemical structures]] (Wendy A. Warr) [2011] * [[http://cisrg.shef.ac.uk/shef2010/talks/52.pdf|Canonical Line Notations: InChI vs SMILES]] (Krisztina Boda) [2010] * [[http://wwmm.ch.cam.ac.uk/inchifaq/#What%20Can%20InChI%20Currently%20Not%20Represent?|Unofficial InChI FAQ: What Can InChI Currently Not Represent?]] * [[http://www.iupac.org/inchi/release102final.html|What is Std InChI?]] * [[http://cactus.nci.nih.gov/blog/?p=571|Partial Standard InChIKey Lookup]] * [[https://www.chemaxon.com/library/scientific-presentations/markush-search/representation-of-markush-structures/|Representation of Markush structures]] (Szabolcs Csepregi) [2010] * [[http://cactus.nci.nih.gov/blog/?tag=name-to-structure-conversion|Chemical name resolving]] * [[livejournal>progenes/111963|Что такое ген?]] * [[http://www.nytimes.com/2013/06/14/business/after-dna-patent-ruling-availability-of-genetic-tests-could-broaden.html|Which Genes Can Be Patented?]] (isolated DNA could not be patented, but that synthetic DNA created in the laboratory — complementary DNA, or cDNA — could be protected under the patent laws) * [[http://nebc.nerc.ac.uk/tools/bio-linux|Bio-Linux]] for bioinformatics workstations * [[http://www.bioclipse.net|Bioclipse]] -- a Rich Client for the Life Sciences * Features: * The management, analysis, and visualization of chemical structures and related information * The management and analysis of biological sequences (DNA, RNA, and proteins). * Pharmacological research and drug discovery. * Data analysis engine based on the statistical language R. * [[http://www.bioclipse.net/dl/eclipseCon2010poster.pdf|Poster for EclipseCon]] * [[http://chem-bla-ics.blogspot.com/2010/01/semantic-web-features-in-bioclipse-22.html|SemanticWeb features in Bioclipse 2.2]] * [[http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1001126|Ten Simple Rules for Providing a Scientific Web Resource]] {{tag>chemistry bioinformatics eclipse}}