Awesome-GNN-based-drug-discovery

Graph Machine Learning Resources for Drug Discovery

Explore a curated collection of databases and resources tailored for graph machine learning applications in drug discovery. These resources offer comprehensive datasets, tools, and benchmarks to propel your research in understanding complex biological systems and advancing therapeutic innovations.

Open Graph Benchmark (OGB)

Link: OGB
Description: A collection of benchmark datasets, data loaders, and evaluators for graph machine learning. It includes specific datasets for drug discovery tasks like molecular property prediction and protein-protein interaction networks. The OGB datasets are particularly designed to standardize and facilitate graph research, providing diverse, large-scale, and challenging datasets that are useful for developing and benchmarking GNN models in drug discovery.

Hetionet

Link: Hetionet
Description: A hetnet (heterogeneous network) of relationships between different types of entities, including diseases, genes, compounds, and pathways. It’s a valuable resource for studying drug repurposing and drug-disease associations using GNNs.

Graph4Med

Link: Graph4Med, Github
Description: t Graph4Med, a web application that relies on a Neo4J graph database obtained by transforming a traditional relational database. Graph4Med provides a straightforward visualization and analysis of a selected patient cohort.

CROssBAR: Comprehensive Resource of Biomedical Relations with Deep Learning Applications and Knowledge Graph Representations

Link: CROssBAR
Description: CROssBAR is a comprehensive system that integrates large-scale biomedical data from various resources and store it in a new NoSQL database, enrich this data with deep learning based prediction of relations between numerous biomedical entities, rigorously analyse the enriched data to obtain biologically meaningful modules and display them to the user via easy to interpret, interactive and heterogenous knowledge graphs within an open access, user-friendly and online web-service.

ZINC Database

Link: ZINC Database
Description: A free database of commercially available compounds for virtual screening. It contains over 35 million compounds and is widely used for drug discovery projects, particularly in tasks like molecular generation and optimization.

PubChem

Link: PubChem
Description: A public database of chemical molecules and their activities against biological assays. PubChem is used extensively for structure-activity relationship studies and molecular property prediction.

ChEMBL

Link: ChEMBL
Description: A manually curated database of bioactive molecules with drug-like properties. It provides information on compound bioactivity, targets, and pharmacological data, useful for GNN models that predict drug-target interactions and drug efficacy.

PDBbind

Link: PDBbind
Description: A comprehensive collection of the binding affinities of proteins and ligands. It’s a crucial resource for developing models that predict protein-ligand interactions.

DrugBank

Link: DrugBank
Description: A unique bioinformatics and cheminformatics resource combining detailed drug data with comprehensive drug target information. It’s used for drug-target interaction studies.

MoleculeNet

Link: MoleculeNet
Description: A benchmark dataset for molecular machine learning. It covers a range of molecular properties, including quantum mechanics, physical chemistry, biophysics, and physiology.

BindingDB

Link: BindingDB
Description: A public, web-accessible database of measured binding affinities, focusing on the interactions of proteins considered to be drug targets with ligands.

Tox21

Link: Tox21
Description: A dataset from the Toxicology in the 21st Century initiative, containing data on the toxicity of compounds, which is essential for ADMET studies.

Human Metabolome Database (HMDB)

Link: HMDB
Description: Provides comprehensive information on human metabolites and their roles in metabolism. Useful for understanding drug metabolism and pharmacokinetics in GNN models.

Genomics of Drug Sensitivity in Cancer (GDSC)

Link: GDSC
Description: Offers a rich dataset of drug responses in cancer cell lines, including genetic, molecular, and pharmacological data. Ideal for GNNs focusing on cancer therapeutics and personalized medicine.

Therapeutic Target Database (TTD)

Link: TTD
Description: Provides information about known and explored therapeutic protein and nucleic acid targets, the targeted disease, pathway information, and the corresponding drugs directed at each of these targets.

SureChEMBL

Link: SureChEMBL
Description: Contains chemical information extracted from the patent literature, useful for novel chemical entity discovery and GNN models focusing on intellectual property in drug design.

Protein Data Bank (PDB)

Link: PDB
Description: A global repository of information about the 3D structures of large biological molecules, including proteins and nucleic acids. Essential for protein structure prediction and ligand docking studies.

Connectivity Map (CMap)

Link: CMap
Description: Provides gene expression profiles in response to various chemical and genetic perturbations, aiding in understanding drug mechanisms of action and potential off-target effects.

ChemSpider

Link: ChemSpider
Description: A free chemical structure database providing fast access to over 100 million structures, properties, and associated information. Useful for sourcing chemical compounds and property data.

STITCH

Link: STITCH
Description: Integrates information about interactions between chemicals and proteins, including metabolic and signaling pathways, chemicals, and drugs. It’s valuable for studying drug-protein interaction networks.

LINCS L1000

Link: LINCS L1000
Description: Provides gene expression signatures in response to a variety of drug and genetic perturbations, facilitating the understanding of compound mechanisms of action.

This site is open source. Improve this page.