Graph Machine Learning Resources for Drug Discovery
Explore a curated collection of databases and resources tailored for graph machine learning applications in drug discovery. These resources offer comprehensive datasets, tools, and benchmarks to propel your research in understanding complex biological systems and advancing therapeutic innovations.
Open Graph Benchmark (OGB)
- Link: OGB
- Description: A collection of benchmark datasets, data loaders, and evaluators for graph machine learning. It includes specific datasets for drug discovery tasks like molecular property prediction and protein-protein interaction networks. The OGB datasets are particularly designed to standardize and facilitate graph research, providing diverse, large-scale, and challenging datasets that are useful for developing and benchmarking GNN models in drug discovery.
Hetionet
- Link: Hetionet
- Description: A hetnet (heterogeneous network) of relationships between different types of entities, including diseases, genes, compounds, and pathways. It’s a valuable resource for studying drug repurposing and drug-disease associations using GNNs.
Graph4Med
- Link: Graph4Med, Github
- Description: t Graph4Med, a web application that relies on a Neo4J graph database obtained by transforming a traditional relational database. Graph4Med provides a straightforward visualization and analysis of a selected patient cohort.
CROssBAR: Comprehensive Resource of Biomedical Relations with Deep Learning Applications and Knowledge Graph Representations
-
Link: CROssBAR
-
Description:
CROssBAR is a comprehensive system that integrates large-scale biomedical data from various resources and store it in a new NoSQL database, enrich this data with deep learning based prediction of relations between numerous biomedical entities, rigorously analyse the enriched data to obtain biologically meaningful modules and display them to the user via easy to interpret, interactive and heterogenous knowledge graphs within an open access, user-friendly and online web-service.
ZINC Database
- Link: ZINC Database
- Description: A free database of commercially available compounds for virtual screening. It contains over 35 million compounds and is widely used for drug discovery projects, particularly in tasks like molecular generation and optimization.
PubChem
- Link: PubChem
- Description: A public database of chemical molecules and their activities against biological assays. PubChem is used extensively for structure-activity relationship studies and molecular property prediction.
ChEMBL
- Link: ChEMBL
- Description: A manually curated database of bioactive molecules with drug-like properties. It provides information on compound bioactivity, targets, and pharmacological data, useful for GNN models that predict drug-target interactions and drug efficacy.
PDBbind
- Link: PDBbind
- Description: A comprehensive collection of the binding affinities of proteins and ligands. It’s a crucial resource for developing models that predict protein-ligand interactions.
DrugBank
- Link: DrugBank
- Description: A unique bioinformatics and cheminformatics resource combining detailed drug data with comprehensive drug target information. It’s used for drug-target interaction studies.
MoleculeNet
- Link: MoleculeNet
- Description: A benchmark dataset for molecular machine learning. It covers a range of molecular properties, including quantum mechanics, physical chemistry, biophysics, and physiology.
BindingDB
- Link: BindingDB
- Description: A public, web-accessible database of measured binding affinities, focusing on the interactions of proteins considered to be drug targets with ligands.
Tox21
- Link: Tox21
- Description: A dataset from the Toxicology in the 21st Century initiative, containing data on the toxicity of compounds, which is essential for ADMET studies.
Human Metabolome Database (HMDB)
- Link: HMDB
- Description: Provides comprehensive information on human metabolites and their roles in metabolism. Useful for understanding drug metabolism and pharmacokinetics in GNN models.
Genomics of Drug Sensitivity in Cancer (GDSC)
- Link: GDSC
- Description: Offers a rich dataset of drug responses in cancer cell lines, including genetic, molecular, and pharmacological data. Ideal for GNNs focusing on cancer therapeutics and personalized medicine.
Therapeutic Target Database (TTD)
- Link: TTD
- Description: Provides information about known and explored therapeutic protein and nucleic acid targets, the targeted disease, pathway information, and the corresponding drugs directed at each of these targets.
SureChEMBL
- Link: SureChEMBL
- Description: Contains chemical information extracted from the patent literature, useful for novel chemical entity discovery and GNN models focusing on intellectual property in drug design.
Protein Data Bank (PDB)
- Link: PDB
- Description: A global repository of information about the 3D structures of large biological molecules, including proteins and nucleic acids. Essential for protein structure prediction and ligand docking studies.
Connectivity Map (CMap)
- Link: CMap
- Description: Provides gene expression profiles in response to various chemical and genetic perturbations, aiding in understanding drug mechanisms of action and potential off-target effects.
ChemSpider
- Link: ChemSpider
- Description: A free chemical structure database providing fast access to over 100 million structures, properties, and associated information. Useful for sourcing chemical compounds and property data.
STITCH
- Link: STITCH
- Description: Integrates information about interactions between chemicals and proteins, including metabolic and signaling pathways, chemicals, and drugs. It’s valuable for studying drug-protein interaction networks.
LINCS L1000
- Link: LINCS L1000
- Description: Provides gene expression signatures in response to a variety of drug and genetic perturbations, facilitating the understanding of compound mechanisms of action.