Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP server fixes #15

Closed
wants to merge 18 commits into from
Closed

Conversation

FotiosBistas
Copy link
Contributor

@FotiosBistas FotiosBistas commented Jan 13, 2023

Motivation

An automated way to insert already published cids to the hoarder. The only way currently is to create a json file with the required metadata.

Tasks

  1. Testing of the http server way of retrieving cids
  2. Is the tool able to concurrently ping and add the new cids retrieved to the database?

@cortze cortze changed the base branch from dev to master January 16, 2023 09:25
Copy link
Owner

@cortze cortze left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit work @FotiosBistas ! The PR looks nice overall, just left some comments/questions where I would love to know your insights.

I'm also wondering if the Readme file is updated to the latest changes in the addition of the API to the hoarder.


} else if r.Method == http.MethodGet {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reasoning behind having a Get method in the API? I thought we just needed to notify with a POST of the CID that we want to track (and the related info)

Am I missing anything?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @cortze, the functionality of the HTTP server is that a post requests are received by the peer that is publishing, and encapsulated json arrays are then added to the queue.

type HttpCidSource struct {
	port            int
	hostname        string
	lock            sync.Mutex
	server          *http.Server
	providerRecords []ProviderRecords
	isStarted       bool
}

func (httpCidSource *HttpCidSource) Dequeue() ProviderRecords {
	httpCidSource.lock.Lock()
	defer httpCidSource.lock.Unlock()
	if len(httpCidSource.providerRecords) == 0 {
		log.Debug("Queue is empty")
		return ProviderRecords{}
	}
	elem := httpCidSource.providerRecords[0]
	httpCidSource.providerRecords = httpCidSource.providerRecords[1:]
	log.Debugf("Removed element from queue, length is now: %d", len(httpCidSource.providerRecords))
	return elem
}

func (httpCidSource *HttpCidSource) Enqueue(providerRecords ProviderRecords) {
	httpCidSource.lock.Lock()
	defer httpCidSource.lock.Unlock()
	httpCidSource.providerRecords = append(httpCidSource.providerRecords, providerRecords)
	log.Debugf("Added new element to queue, length is now: %d", len(httpCidSource.providerRecords))
}

//A container for the encapsulated struct.
//
//File containts a json array of provider records.
//[{ProviderRecord1},{ProviderRecord2},{ProviderRecord3}]
type ProviderRecords struct {
	EncapsulatedJSONProviderRecords []EncapsulatedJSONProviderRecord `json:"ProviderRecords"`
}

//This struct will be used to create,read and store the encapsulated data necessary for reading the
//provider records.
type EncapsulatedJSONProviderRecord struct {
	ID              string   `json:"PeerID"`
	CID             string   `json:"ContentID"`
	Creator         string   `json:"Creator"`
	PublicationTime string   `json:"PublicationTime"`
	ProvideTime     string   `json:"ProvideTime"`
	UserAgent       string   `json:"UserAgent"`
	Addresses       []string `json:"PeerMultiaddresses"`
}

This means that we wait for all the providers to be added into the array.
The get method get's called right now of a time interval of 5 seconds and we receive the first element of the queue using a dequeue method. I found it easier to be able to determine whether the queue is just empty, inferring that the publisher is just publishing slower than the queue empties itself, or the publisher will not publish any more elements.
If the queue is just empty the server responds back with NoContent, else if the publisher finished publishing it must send a nil encapsulated json:

http.Error(w, "No record available currently", http.StatusNoContent)

inside the interval function 
trackableCids, err := src.GetNewHttpCid(tracker.CidSource)
if trackableCids == nil {
	log.Debug("Received empty provider records")
	trackableCidArrayC <- nil
	close(trackableCidArrayC)
	//gracefully shutdown server
	go tracker.httpSource.Shutdown(tracker.ctx)
	break
}

if err != nil {
	log.Errorf("error while getting new cid: %s", err)
	// check if ticker for next iteration was raised
	<-minTimeT.C
	continue
}

Copy link
Owner

@cortze cortze Jan 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohhhhh I see, I thought that the API was organized in a different way.

My understanding was that we would initialize the API server in the Cid-Hoarder, and that the publisher would be able to POST the CidInfo and the related one to the PRHolders in the same POST query.
But being this POST query after the publication has been accomplished

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried doing it like this but I was impossible to tell whether the publishing process was finished. It also would add a lot of extra logic by checking for each PR which CID it corresponds to and some type of waiting mechanism to add all the PRs.

Copy link
Owner

@cortze cortze Jan 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but are you taking into account having a single json sent over the POST method including the CidInfo and PRHolders?
(something like this)

{
     "CidInfo":{
            "cid": 
             ....
             "pr_holders":{
                    "peer_id": 
                    ....
             }
     }
}

this would simplify having to match PRHolders with CIDs

Copy link
Contributor Author

@FotiosBistas FotiosBistas Jan 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is exactly how I'm sending it:

  "ProviderRecords":[
  {
    "PeerID":"12D3KooWG1MhbRPTGZiiLf1iECH19YMFjurkGjGsmKstfEMvNF42",
    "ContentID":"QmNyxnsJ15nWD85R3xjTKCm37BSjPqyzMHcwBrGqhaWhy8",
    "Creator":"QmPod7yQRSQX8jDtQCBi65ZSGfbkmoAX5FH4g8wzzJmtnY",
    "PublicationTime":"2023-01-16T14:04:42+02:00",
    "ProvideTime":"5.4405671s",
    "UserAgent":"hydra-booster/0.7.4",
    "PeerMultiaddresses": ["/ip4/18.188.54.97/udp/30014/quic","/ip4/18.188.54.97/tcp/30014","/ip4/172.31.10.39/udp/30014/quic","/ip4/127.0.0.1/udp/30014/quic","/ip4/172.31.10.39/tcp/30014","/ip4/127.0.0.1/tcp/30014"]
  },
  {
    "PeerID":"12D3KooWA4m41sRq68mdhk5cSTBptwxSXSsrjsyRKf2WJbNE7h9x",
    "ContentID":"QmNyxnsJ15nWD85R3xjTKCm37BSjPqyzMHcwBrGqhaWhy8",
    "Creator":"QmPod7yQRSQX8jDtQCBi65ZSGfbkmoAX5FH4g8wzzJmtnY",
    "PublicationTime":"2023-01-16T14:04:42+02:00",
    "ProvideTime":"5.4405671s",
    "UserAgent":"go-ipfs/0.7.0/",
    "PeerMultiaddresses":["/ip4/127.0.0.1/tcp/4001","/ip6/2605:7380:1000:1310:c08b:37ff:fe37:5ec3/tcp/4001","/ip4/127.0.0.1/udp/4001/quic","/ip6/::1/tcp/4001","/ip4/209.50.56.24/udp/4001/quic","/ip6/2605:7380:1000:1310:c08b:37ff:fe37:5ec3/udp/4001/quic","/ip6/::1/udp/4001/quic","/ip4/209.50.56.24/tcp/4001"]
  },
  {
    "PeerID":"12D3KooWSFppQG3QwTXCVKxtfQEjwoY6sLVgbLCegntd87Dc8L8Q",
    "ContentID":"QmNyxnsJ15nWD85R3xjTKCm37BSjPqyzMHcwBrGqhaWhy8",
    "Creator":"QmPod7yQRSQX8jDtQCBi65ZSGfbkmoAX5FH4g8wzzJmtnY",
    "PublicationTime":"2023-01-16T14:04:42+02:00",
    "ProvideTime":"5.4405671s",
    "UserAgent":"go-ipfs/0.7.0/",
    "PeerMultiaddresses":["/ip4/127.0.0.1/udp/4001/quic","/ip4/119.94.38.201/tcp/4001","/ip6/2001:4451:1114:8600:10f7:c70b:bec3:b939/udp/4001/quic","/ip6/::1/tcp/4001","/ip6/2001:4451:1114:8600:10f7:c70b:bec3:b939/tcp/4001","/ip4/127.0.0.1/tcp/4001","/ip6/2001:4451:1114:8600:a4a2:76bb:267:6cee/tcp/4001","/ip6/2001:4451:1114:8600:a4a2:76bb:267:6cee/udp/4001/quic","/ip6/::1/udp/4001/quic","/ip4/119.94.38.201/udp/4001/quic"]
  },
]

Notice that the ContentID,Creator,ProvideTime,PublicationTime stay the same. After that is sent a new batch of provider records for a specific CID will be sent over the server.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reasoning behind having a Get method in the API? I thought we just needed to notify with a POST of the CID that we want to track (and the related info)

Am I missing anything?

@cortze I removed the get method. Essentially the internal queue of the server would fill up and ,at some point later after a lot of CIDs have been inserted, would never empty itself. I replaced it, just like you mentioned at the above comment, with just a post method that sends the received CIDs through a channel and then inserts them into the database. It does manage to insert them all, unlike the previous method which would get stuck, but it still is slow. E.G. the database has inserted 1500/1900 CIDs (numbers are random here) immediately and then the rest 400 CIDs all will get inserted after a couple of hours. So I think some approaches would be:

  1. Make the process synchronous, wait for all of the CIDs to get published and then insert them to the hoarder (can be done very easily, but is essentially the same as the json file approach)
  2. Implement some type of batching system, like inserting 1000 cids at a time into the database (I think it would be better to be done from the database side)
  3. Leave it as it is and we get some of the benefits of the asynchronous pings

Would love to hear your thoughts!

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @FotiosBistas , thanks for taking into consideration the upper suggestion. Tbh it makes everything cleaner and easier to follow.
About your different approaches:

  1. If you wait for all the CIDs to be published, you cannot follow the timings in each ping accurately. We will synchronize all the pings, having spikes of heavy ping sessions. Having the publication of it spaced over time serve the purpose as load balancing, so I'll say that it is fine to keep it this way
  2. Adding batching over queries makes complete sense. I'll work on it in the following month.
  3. It must stay like this until the batching is implemented. I need to sneak some time to take action on this.

pkg/hoarder/discoverer.go Show resolved Hide resolved
@FotiosBistas
Copy link
Contributor Author

I'm also wondering if the Readme file is updated to the latest changes in the addition of the API to the hoarder.

No I thought it would have been best to not add here. Shall I add it here?

Break was not working inside the select statement. Added returned instead.
This exposed addProviderWg.add(1) was not being called.
Http server when content is finished providing now returns nil,nil
Queue would fill up sometimes up to 1500 cids, due to the mintimer iteration scheduling.
Implemented emptying the whole queue.
Added channel which listens to post requests and then sends them over to the discoverer, avoiding the HTTP server queue getting stuck.
@cortze
Copy link
Owner

cortze commented May 26, 2023

As Optimistic Provide is already part of the libp2p/go-libp2p-kad-dht since #783, there is no longer need of having an exposed API. I intend to add OP as part of the publisher, thus, I proceed to close the PR as #18 removes the API.

Feel free to reopen it though if at one point it is needed :)

@cortze cortze closed this May 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants