本文共 1280 字,大约阅读时间需要 4 分钟。
graphx实现k-core比较简单,参考淘宝技术部,已经给出了一个代码片段,基本上改改就可以定制自己的需求了。
import org.apache.spark._import org.apache.spark.graphx._import org.apache.spark.storage.StorageLevelimport org.apache.spark.graphx.lib._// load the graphval friendsGraph = GraphLoader.edgeListFile(sc, "data/friends.txt.", false, 512, StorageLevel.MEMORY_ONLY, StorageLevel.DISK_ONLY)var degreeGraph = friendsGraph.outerJoinVertices(friendsGraph.degrees) { (vid, vd, degree) => degree.getOrElse(0)}.cache()val kNum = 200var lastVerticeNum: Long = degreeGraph.numVerticesvar thisVerticeNum: Long = -1var isConverged = falseval maxIter = 10var i = 1while (!isConverged && i <= maxIter) { val subGraph = degreeGraph.subgraph( vpred = (vid, degree) => degree >= kNum ).cache() degreeGraph = subGraph.outerJoinVertices(subGraph.degrees) { (vid, vd, degree) => degree.getOrElse(0) }.cache() thisVerticeNum = degreeGraph.numVertices if (lastVerticeNum == thisVerticeNum) { isConverged = true println("vertice num is " + thisVerticeNum + ", iteration is " + i) } else { println("lastVerticeNum is " + lastVerticeNum + ", thisVerticeNum is " + thisVerticeNum + ", iteration is " + i + ", not converge") lastVerticeNum = thisVerticeNum } i += 1} // do something to degreeGraph
拼的主要是子图的计算速度。
全文完 :)
转载地址:http://aymja.baihongyu.com/