[P] 700x faster Node2Vec embeddings by CSR graph representation

This post has been de-listed

It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.

Post Body

Blog post here

Code here

I recently rewrote node2vec, which took a severely long time to generate random walks on a graph, by representing the graph as a CSR sparse matrix, and operating directly on the sparse matrix's data arrays.

The result is a speedup from 30 hours to 3 minutes for a small sized graph (nodes and edges in the hundreds of thousands).

This raises bigger questions about graph representation for graph analytics -- representing graphs as sparse matrices prevents node insertion, but makes operations much more efficient (though admitedly harder to write). More importantly, we can hold fairly huge graphs in RAM because the data usage is so lean.

If we're analyzing graphs, we don't care so much about adding nodes, so I think the future of graph analytics is in CSR representation.

Author

Account Strength

100%

Account Age

12 years

Verified Email

Yes

Verified Flair

Total Karma

150,189

Link Karma

18,628

Comment Karma

129,574

Profile updated: 2 days ago

Posts updated: 9 months ago

VodkaHaze

ML Engineer

Subreddit

r/MachineLearning

Post Details

We try to extract some basic information from the post title. This is not always successful or accurate, please use your best judgement and compare these values to the post title and body for confirmation.

Posted: 5 years ago
Reddit URL: View post on reddit.com
External URL: reddit.com/r/MachineLear...