Edge2vec: Representation learning using edge semantics for biomedical knowledge discovery

Zheng Gao, Gang Fu, Chunping Ouyang, Satoshi Tsutsui, Xiaozhong Liu, Jeremy Yang, Christopher Gessner, Brian Foote, David Wild, Ying Ding, Qi Yu

Research output: Contribution to journalArticle

Abstract

Background: Representation learning provides new and powerful graph analytical approaches and tools for the highly valued data science challenge of mining knowledge graphs. Since previous graph analytical methods have mostly focused on homogeneous graphs, an important current challenge is extending this methodology for richly heterogeneous graphs and knowledge domains. The biomedical sciences are such a domain, reflecting the complexity of biology, with entities such as genes, proteins, drugs, diseases, and phenotypes, and relationships such as gene co-expression, biochemical regulation, and biomolecular inhibition or activation. Therefore, the semantics of edges and nodes are critical for representation learning and knowledge discovery in real world biomedical problems. Results: In this paper, we propose the edge2vec model, which represents graphs considering edge semantics. An edge-type transition matrix is trained by an Expectation-Maximization approach, and a stochastic gradient descent model is employed to learn node embedding on a heterogeneous graph via the trained transition matrix. edge2vec is validated on three biomedical domain tasks: biomedical entity classification, compound-gene bioactivity prediction, and biomedical information retrieval. Results show that by considering edge-types into node embedding learning in heterogeneous graphs, edge2vec significantly outperforms state-of-the-art models on all three tasks. Conclusions: We propose this method for its added value relative to existing graph analytical methodology, and in the real world context of biomedical knowledge discovery applicability.

Original languageEnglish (US)
Article number306
JournalBMC Bioinformatics
Volume20
Issue number1
DOIs
StatePublished - Jun 10 2019

Fingerprint

Knowledge Discovery
Semantics
Data mining
Genes
Learning
Graph in graph theory
Bioactivity
Information retrieval
Information Storage and Retrieval
Chemical activation
Transition Matrix
Gene
Proteins
Vertex of a graph
Phenotype
Gene Expression
Pharmaceutical Preparations
Stochastic Gradient
Expectation Maximization
Methodology

Keywords

  • Applied machine learning
  • Biomedical knowledge discovery
  • Data science
  • Edge semantics
  • Graph embedding
  • Heterogeneous network
  • Knowledge graph
  • Linked data
  • Network science
  • Node embedding
  • Representation learning
  • Semantic web
  • Systems biology

ASJC Scopus subject areas

  • Structural Biology
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Applied Mathematics

Cite this

Edge2vec : Representation learning using edge semantics for biomedical knowledge discovery. / Gao, Zheng; Fu, Gang; Ouyang, Chunping; Tsutsui, Satoshi; Liu, Xiaozhong; Yang, Jeremy; Gessner, Christopher; Foote, Brian; Wild, David; Ding, Ying; Yu, Qi.

In: BMC Bioinformatics, Vol. 20, No. 1, 306, 10.06.2019.

Research output: Contribution to journalArticle

Gao, Z, Fu, G, Ouyang, C, Tsutsui, S, Liu, X, Yang, J, Gessner, C, Foote, B, Wild, D, Ding, Y & Yu, Q 2019, 'Edge2vec: Representation learning using edge semantics for biomedical knowledge discovery', BMC Bioinformatics, vol. 20, no. 1, 306. https://doi.org/10.1186/s12859-019-2914-2
Gao, Zheng ; Fu, Gang ; Ouyang, Chunping ; Tsutsui, Satoshi ; Liu, Xiaozhong ; Yang, Jeremy ; Gessner, Christopher ; Foote, Brian ; Wild, David ; Ding, Ying ; Yu, Qi. / Edge2vec : Representation learning using edge semantics for biomedical knowledge discovery. In: BMC Bioinformatics. 2019 ; Vol. 20, No. 1.
@article{dc80306c8bca45cca1fbf2ff3b6714af,
title = "Edge2vec: Representation learning using edge semantics for biomedical knowledge discovery",
abstract = "Background: Representation learning provides new and powerful graph analytical approaches and tools for the highly valued data science challenge of mining knowledge graphs. Since previous graph analytical methods have mostly focused on homogeneous graphs, an important current challenge is extending this methodology for richly heterogeneous graphs and knowledge domains. The biomedical sciences are such a domain, reflecting the complexity of biology, with entities such as genes, proteins, drugs, diseases, and phenotypes, and relationships such as gene co-expression, biochemical regulation, and biomolecular inhibition or activation. Therefore, the semantics of edges and nodes are critical for representation learning and knowledge discovery in real world biomedical problems. Results: In this paper, we propose the edge2vec model, which represents graphs considering edge semantics. An edge-type transition matrix is trained by an Expectation-Maximization approach, and a stochastic gradient descent model is employed to learn node embedding on a heterogeneous graph via the trained transition matrix. edge2vec is validated on three biomedical domain tasks: biomedical entity classification, compound-gene bioactivity prediction, and biomedical information retrieval. Results show that by considering edge-types into node embedding learning in heterogeneous graphs, edge2vec significantly outperforms state-of-the-art models on all three tasks. Conclusions: We propose this method for its added value relative to existing graph analytical methodology, and in the real world context of biomedical knowledge discovery applicability.",
keywords = "Applied machine learning, Biomedical knowledge discovery, Data science, Edge semantics, Graph embedding, Heterogeneous network, Knowledge graph, Linked data, Network science, Node embedding, Representation learning, Semantic web, Systems biology",
author = "Zheng Gao and Gang Fu and Chunping Ouyang and Satoshi Tsutsui and Xiaozhong Liu and Jeremy Yang and Christopher Gessner and Brian Foote and David Wild and Ying Ding and Qi Yu",
year = "2019",
month = "6",
day = "10",
doi = "10.1186/s12859-019-2914-2",
language = "English (US)",
volume = "20",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central",
number = "1",

}

TY - JOUR

T1 - Edge2vec

T2 - Representation learning using edge semantics for biomedical knowledge discovery

AU - Gao, Zheng

AU - Fu, Gang

AU - Ouyang, Chunping

AU - Tsutsui, Satoshi

AU - Liu, Xiaozhong

AU - Yang, Jeremy

AU - Gessner, Christopher

AU - Foote, Brian

AU - Wild, David

AU - Ding, Ying

AU - Yu, Qi

PY - 2019/6/10

Y1 - 2019/6/10

N2 - Background: Representation learning provides new and powerful graph analytical approaches and tools for the highly valued data science challenge of mining knowledge graphs. Since previous graph analytical methods have mostly focused on homogeneous graphs, an important current challenge is extending this methodology for richly heterogeneous graphs and knowledge domains. The biomedical sciences are such a domain, reflecting the complexity of biology, with entities such as genes, proteins, drugs, diseases, and phenotypes, and relationships such as gene co-expression, biochemical regulation, and biomolecular inhibition or activation. Therefore, the semantics of edges and nodes are critical for representation learning and knowledge discovery in real world biomedical problems. Results: In this paper, we propose the edge2vec model, which represents graphs considering edge semantics. An edge-type transition matrix is trained by an Expectation-Maximization approach, and a stochastic gradient descent model is employed to learn node embedding on a heterogeneous graph via the trained transition matrix. edge2vec is validated on three biomedical domain tasks: biomedical entity classification, compound-gene bioactivity prediction, and biomedical information retrieval. Results show that by considering edge-types into node embedding learning in heterogeneous graphs, edge2vec significantly outperforms state-of-the-art models on all three tasks. Conclusions: We propose this method for its added value relative to existing graph analytical methodology, and in the real world context of biomedical knowledge discovery applicability.

AB - Background: Representation learning provides new and powerful graph analytical approaches and tools for the highly valued data science challenge of mining knowledge graphs. Since previous graph analytical methods have mostly focused on homogeneous graphs, an important current challenge is extending this methodology for richly heterogeneous graphs and knowledge domains. The biomedical sciences are such a domain, reflecting the complexity of biology, with entities such as genes, proteins, drugs, diseases, and phenotypes, and relationships such as gene co-expression, biochemical regulation, and biomolecular inhibition or activation. Therefore, the semantics of edges and nodes are critical for representation learning and knowledge discovery in real world biomedical problems. Results: In this paper, we propose the edge2vec model, which represents graphs considering edge semantics. An edge-type transition matrix is trained by an Expectation-Maximization approach, and a stochastic gradient descent model is employed to learn node embedding on a heterogeneous graph via the trained transition matrix. edge2vec is validated on three biomedical domain tasks: biomedical entity classification, compound-gene bioactivity prediction, and biomedical information retrieval. Results show that by considering edge-types into node embedding learning in heterogeneous graphs, edge2vec significantly outperforms state-of-the-art models on all three tasks. Conclusions: We propose this method for its added value relative to existing graph analytical methodology, and in the real world context of biomedical knowledge discovery applicability.

KW - Applied machine learning

KW - Biomedical knowledge discovery

KW - Data science

KW - Edge semantics

KW - Graph embedding

KW - Heterogeneous network

KW - Knowledge graph

KW - Linked data

KW - Network science

KW - Node embedding

KW - Representation learning

KW - Semantic web

KW - Systems biology

UR - http://www.scopus.com/inward/record.url?scp=85068187689&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85068187689&partnerID=8YFLogxK

U2 - 10.1186/s12859-019-2914-2

DO - 10.1186/s12859-019-2914-2

M3 - Article

C2 - 31238875

AN - SCOPUS:85068187689

VL - 20

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

IS - 1

M1 - 306

ER -