How to calculate Nearest Neighbour using Networkx and data from CSV file?

Question

I have a CSV file (node.csv) with the data as follows -

    1           2           3           4           5           6           7           8           9           10
1   0           0.257905291 0.775104118 0.239086843 0.002313744 0.416936603 0.194817214 0.163350301 0.252043807 0.251272559
2   0.346100279 0           0.438892758 0.598885794 0.002263231 0.406685237 0.523850975 0.257660167 0.206302228 0.161385794
3   0.753358102 0.222349243 0           0.407830809 0.001714776 0.507573592 0.169905687 0.139611318 0.187910832 0.326950557
4   0.185342928 0.571302688 0.51784403  0           0.003231018 0.295197533 0.216184462 0.153032751 0.216331326 0.317961522
5   0           0           0           0           0           0           0           0           0           0
6   0.478164621 0.418192795 0.646810223 0.410746629 0.002414973 0           0.609176897 0.203461461 0.157576977 0.636747837
7   0.24894327  0.522914349 0.33948832  0.316240267 0.002335929 0.639377086 0           0.410011123 0.540266963 0.587764182
8   0.234017887 0.320967208 0.285193773 0.258198079 0.003146737 0.224412057 0.411725737 0           0.487081815 0.469526333
9   0.302955306 0.080506624 0.261610132 0.22856311  0.001746979 0.014994905 0.63386228  0.486096957 0           0.664434415
10  0.232675407 0.121596312 0.457715027 0.310618067 0.001872929 0.57556548  0.473562887 0.32185564  0.482351246 0

I want to use Networkx Python library to calculate the nearest neighbours in a given network (maximum, minimum numbers included), for example - The program is written in such a way that for a number of iterations, it should be able to produce outputs showing "Node1 neighbours are 2,3" , "Node2 neighbours are 1,3" and so on using an algorithm or inbuilt function from Networkx.

The positions of the nodes are (pos.txt) -

Firstly, is it possible to create a network/graph using floating values less than 1? (The values indicate the connectivity rate from node to node, it also indicates the probability of a successful connection and the probability of a message being passed between nodes)
Can anyone help me with this regard?

Thanks in advance for the help :)

Regarding your first question, yes it is possible, am I assuming that the numbers in node.csv are reprensenting the weight of the edge between each node? And do you compute the closest neighbors based on that weight, or based on their position? (based on node.csv I'd say that 5 and 4 are the closest neighbors of 1) — Adonis
@Adonis Thank you for your response. Actually they are not weights of an edge, they are just probabilities of a connection between each node. Using this assumption and their positions, I want to calculate the nearest neighbour (in the experiment, it is assumed that all nodes are connected to each other, but depending on the probability, a message is passed/connection is made). I am not sure if my clarification has helped you in some way (also, how did you guess that 5 and 4 are the closest neighbours of 1 based on the CSV file?) — user136819
Because those are the lowest number in the row of the node with id 1. So basically you compute the "nearest distance" with the euclidean distance between node multiplied by the probability. Do I get it correctly? — Adonis

Adonis Adonis · Accepted Answer · 2017-08-07T13:46:02

Regarding your first question, and assuming we use the numbers from node.csv as the weight for the edges, a simple program allow to compute this graph using networkx:

import matplotlib.pyplot as plt
import networkx as nx
import csv

g = nx.Graph()

i_dict = {}
with open("g.csv","r") as input:
    csv_dict = csv.DictReader(input, skipinitialspace=True, delimiter=",")
    ini = 1
    for row in csv_dict:
        for i in row:
            #print(row[i])
            if type(row[i]) is str:
                g.add_edge(ini, int(i), weight=(float(row[i])))
        ini += 1

pos=nx.spring_layout(g, scale=100.)
nx.draw_networkx_nodes(g, pos)
nx.draw_networkx_edges(g,pos)
nx.draw_networkx_labels(g,pos)
plt.axis('off')
plt.show()

This yields:

Regarding finding the nearest neighbor of let's say node1, still based on the value from node.csv:

min_weight_neighbors = sorted(g[1].items(), key=lambda e: e[1]["weight"] if e[1]["weight"] != 0  else 1000000000)[:2] #remove edges with weight 0 from the computation

Which in turns yields the 2 nodes with the lowest weight:

[(5, {'weight': 0.002313744}), (4, {'weight': 0.185342928})]

Or if you want 2 nodes with the biggest weight:

sorted(g[1].items(), key=lambda e: e[1]["weight"], reverse=True)[:2] #two nodes with the biggest weight

which yields:

[(3, {'weight': 0.753358102}), (4, {'weight': 0.5342928})]

NB: I modified a little bit node.csv:

1,2,3,4,5,6,7,8,9,10
0,0.257905291,0.775104118,0.239086843,0.002313744,0.416936603,0.194817214,0.163350301,0.252043807,0.251272559
0.346100279,0,0.438892758,0.598885794,0.002263231,0.406685237,0.523850975,0.257660167,0.206302228,0.161385794
0.753358102,0.222349243,0,0.407830809,0.001714776,0.507573592,0.169905687,0.139611318,0.187910832,0.326950557
0.5342928,0.571302688,0.51784403,0,0.003231018,0.295197533,0.216184462,0.153032751,0.216331326,0.317961522
0,0,0,0,0,0,0,0,0,0
0.478164621,0.418192795,0.646810223,0.410746629,0.002414973,0,0.609176897,0.203461461,0.157576977,0.636747837
0.24894327,0.522914349,0.33948832,0.316240267,0.002335929,0.639377086,0,0.410011123,0.540266963,0.587764182
0.234017887,0.320967208,0.285193773,0.258198079,0.003146737,0.224412057,0.411725737,0,0.487081815,0.469526333
0.302955306,0.080506624,0.261610132,0.22856311,0.001746979,0.014994905,0.63386228,0.486096957,0,0.664434415
0.232675407,0.121596312,0.457715027,0.310618067,0.001872929,0.57556548,0.473562887,0.32185564,0.482351246,0

How to calculate Nearest Neighbour using Networkx and data from CSV file?

1 Answers