GPU network structure

From GPU

GPU is currently using TGnutella as it's network protocol. Because TGnutella is buggy, about 5% of all the messages get lost, or don't get to their destination. That is why there needs to be a new way of connecting all the nodes. To do this the way jobs are distributed and answered will have to be changed. This will be done by adding channels.

Contents

Channels

Channels are a way of specifying which data a client wants to receive, and what it does not want to receive. Some examples of channels could be:

  • The chat
  • The whiteboard
  • The search engine

The advantage of these channels is that every plugin can have it's own channel, or join others. Each channel will have it's own network of nodes that have that channel enabled. This has the following advantages:

  • uses less bandwidth ( an idle connection doesn't use much/any bandwidth)
  • uses less cpu use ( messages don't have to be filtered )
  • is faster (there are only nodes that need the information, so there will be less nodes that have to forward the message).
  • more resistant to ddos attacks. If one channel goes down, not all the channels are down. You would have to ddos every node in the dht to block everything.

DHT

Each channel has its own Distributed Hash Table (DHT) to keep track of the servers and nodes. For more information on DHT's, go to: http://en.wikipedia.org/wiki/Distributed_hash_table. For gpu, the kademlia dht protocol would be most suited.

Client & Server

Once started up, the client will contact the DHT and retrieve all the servers and clients of the specified channel. After doing that it will establish a connection to 2-5 servers, and determine which server is the best according to a formula that will consider these values:

  • The server with the least connected nodes should be most favorable to connect to
  • If a server has a high ping, it should be less favorable to connect to
  • Eventual other things like the amount of active nodes on a server (so active nodes group together, resulting in less average travel time for the messages. The idle connections to other servers are sometimes dropped, and new ones are sometimes made if something changes on the network.

Before a client can decide to become a server, it has to for fill a few requirements:

  • The amount of servers on the channel should be less than: ceil(sqrt(nodes))
  • The right port should be forwarded
  • The node should have a real IP address
  • Their ping time should be reasonably low (as seen from the other servers)
  • Other servers chose it to be a new server (unless there is no server yet, then the first node will become the server

If a client complies to all that, it can become a server

Once promoted to server by the other servers, the node starts to connect to all the other servers and accepts incoming connections. Because new nodes connect randomly to 2-5 servers and existing nodes randomly change their active server, the new server will soon get some active connections.

This is how a channel network graph would look like with 16 nodes (4 servers) and one active and one idle connection:

Image:channel_network.png

As you can see, the Time To Live for each message can be 3, because then every node has been reached. Also the problem of duplicate messages does not exist in this setup, as it does in current protocol which filters it with an unique id.

Broadcasting and Private messages

To send messages to the outer world, a client/server has two options.

Broadcasting

You can send a message to the whole network. Servers do not check where to send the message to, they just forward it to every node they are connected to.

Private messages

You can also send a message with a destination. The server you send it to checks if he is connected to the destination. If not, it looks up what server is, and forwards the message to that server. This way the message reaches the node as private as possible. Answers from for example a stack execution, are always send back privately, not broadcasted. Plugins should be able to do both on a channel they registered to. To be able to do that, they need to know what nodes participate in that channel.