HTTP server fixes #15

FotiosBistas · 2023-01-13T18:41:45Z

Motivation

An automated way to insert already published cids to the hoarder. The only way currently is to create a json file with the required metadata.

Tasks

Testing of the http server way of retrieving cids
Is the tool able to concurrently ping and add the new cids retrieved to the database?

cortze

Nit work @FotiosBistas ! The PR looks nice overall, just left some comments/questions where I would love to know your insights.

I'm also wondering if the Readme file is updated to the latest changes in the addition of the API to the hoarder.

cortze · 2023-01-16T09:43:43Z

pkg/cid-source/http_server.go


+ } else if r.Method == http.MethodGet {


What is the reasoning behind having a Get method in the API? I thought we just needed to notify with a POST of the CID that we want to track (and the related info)

Am I missing anything?

Hey @cortze, the functionality of the HTTP server is that a post requests are received by the peer that is publishing, and encapsulated json arrays are then added to the queue.

type HttpCidSource struct { port int hostname string lock sync.Mutex server *http.Server providerRecords []ProviderRecords isStarted bool } func (httpCidSource *HttpCidSource) Dequeue() ProviderRecords { httpCidSource.lock.Lock() defer httpCidSource.lock.Unlock() if len(httpCidSource.providerRecords) == 0 { log.Debug("Queue is empty") return ProviderRecords{} } elem := httpCidSource.providerRecords[0] httpCidSource.providerRecords = httpCidSource.providerRecords[1:] log.Debugf("Removed element from queue, length is now: %d", len(httpCidSource.providerRecords)) return elem } func (httpCidSource *HttpCidSource) Enqueue(providerRecords ProviderRecords) { httpCidSource.lock.Lock() defer httpCidSource.lock.Unlock() httpCidSource.providerRecords = append(httpCidSource.providerRecords, providerRecords) log.Debugf("Added new element to queue, length is now: %d", len(httpCidSource.providerRecords)) } //A container for the encapsulated struct. // //File containts a json array of provider records. //[{ProviderRecord1},{ProviderRecord2},{ProviderRecord3}] type ProviderRecords struct { EncapsulatedJSONProviderRecords []EncapsulatedJSONProviderRecord `json:"ProviderRecords"` } //This struct will be used to create,read and store the encapsulated data necessary for reading the //provider records. type EncapsulatedJSONProviderRecord struct { ID string `json:"PeerID"` CID string `json:"ContentID"` Creator string `json:"Creator"` PublicationTime string `json:"PublicationTime"` ProvideTime string `json:"ProvideTime"` UserAgent string `json:"UserAgent"` Addresses []string `json:"PeerMultiaddresses"` }

This means that we wait for all the providers to be added into the array.
The get method get's called right now of a time interval of 5 seconds and we receive the first element of the queue using a dequeue method. I found it easier to be able to determine whether the queue is just empty, inferring that the publisher is just publishing slower than the queue empties itself, or the publisher will not publish any more elements.
If the queue is just empty the server responds back with NoContent, else if the publisher finished publishing it must send a nil encapsulated json:

http.Error(w, "No record available currently", http.StatusNoContent) inside the interval function trackableCids, err := src.GetNewHttpCid(tracker.CidSource) if trackableCids == nil { log.Debug("Received empty provider records") trackableCidArrayC <- nil close(trackableCidArrayC) //gracefully shutdown server go tracker.httpSource.Shutdown(tracker.ctx) break } if err != nil { log.Errorf("error while getting new cid: %s", err) // check if ticker for next iteration was raised <-minTimeT.C continue }

ohhhhh I see, I thought that the API was organized in a different way.

My understanding was that we would initialize the API server in the Cid-Hoarder, and that the publisher would be able to POST the CidInfo and the related one to the PRHolders in the same POST query.
But being this POST query after the publication has been accomplished

Tried doing it like this but I was impossible to tell whether the publishing process was finished. It also would add a lot of extra logic by checking for each PR which CID it corresponds to and some type of waiting mechanism to add all the PRs.

but are you taking into account having a single json sent over the POST method including the CidInfo and PRHolders?
(something like this)

{ "CidInfo":{ "cid": .... "pr_holders":{ "peer_id": .... } } }

this would simplify having to match PRHolders with CIDs

That is exactly how I'm sending it:

"ProviderRecords":[ { "PeerID":"12D3KooWG1MhbRPTGZiiLf1iECH19YMFjurkGjGsmKstfEMvNF42", "ContentID":"QmNyxnsJ15nWD85R3xjTKCm37BSjPqyzMHcwBrGqhaWhy8", "Creator":"QmPod7yQRSQX8jDtQCBi65ZSGfbkmoAX5FH4g8wzzJmtnY", "PublicationTime":"2023-01-16T14:04:42+02:00", "ProvideTime":"5.4405671s", "UserAgent":"hydra-booster/0.7.4", "PeerMultiaddresses": ["/ip4/18.188.54.97/udp/30014/quic","/ip4/18.188.54.97/tcp/30014","/ip4/172.31.10.39/udp/30014/quic","/ip4/127.0.0.1/udp/30014/quic","/ip4/172.31.10.39/tcp/30014","/ip4/127.0.0.1/tcp/30014"] }, { "PeerID":"12D3KooWA4m41sRq68mdhk5cSTBptwxSXSsrjsyRKf2WJbNE7h9x", "ContentID":"QmNyxnsJ15nWD85R3xjTKCm37BSjPqyzMHcwBrGqhaWhy8", "Creator":"QmPod7yQRSQX8jDtQCBi65ZSGfbkmoAX5FH4g8wzzJmtnY", "PublicationTime":"2023-01-16T14:04:42+02:00", "ProvideTime":"5.4405671s", "UserAgent":"go-ipfs/0.7.0/", "PeerMultiaddresses":["/ip4/127.0.0.1/tcp/4001","/ip6/2605:7380:1000:1310:c08b:37ff:fe37:5ec3/tcp/4001","/ip4/127.0.0.1/udp/4001/quic","/ip6/::1/tcp/4001","/ip4/209.50.56.24/udp/4001/quic","/ip6/2605:7380:1000:1310:c08b:37ff:fe37:5ec3/udp/4001/quic","/ip6/::1/udp/4001/quic","/ip4/209.50.56.24/tcp/4001"] }, { "PeerID":"12D3KooWSFppQG3QwTXCVKxtfQEjwoY6sLVgbLCegntd87Dc8L8Q", "ContentID":"QmNyxnsJ15nWD85R3xjTKCm37BSjPqyzMHcwBrGqhaWhy8", "Creator":"QmPod7yQRSQX8jDtQCBi65ZSGfbkmoAX5FH4g8wzzJmtnY", "PublicationTime":"2023-01-16T14:04:42+02:00", "ProvideTime":"5.4405671s", "UserAgent":"go-ipfs/0.7.0/", "PeerMultiaddresses":["/ip4/127.0.0.1/udp/4001/quic","/ip4/119.94.38.201/tcp/4001","/ip6/2001:4451:1114:8600:10f7:c70b:bec3:b939/udp/4001/quic","/ip6/::1/tcp/4001","/ip6/2001:4451:1114:8600:10f7:c70b:bec3:b939/tcp/4001","/ip4/127.0.0.1/tcp/4001","/ip6/2001:4451:1114:8600:a4a2:76bb:267:6cee/tcp/4001","/ip6/2001:4451:1114:8600:a4a2:76bb:267:6cee/udp/4001/quic","/ip6/::1/udp/4001/quic","/ip4/119.94.38.201/udp/4001/quic"] }, ]

Notice that the ContentID,Creator,ProvideTime,PublicationTime stay the same. After that is sent a new batch of provider records for a specific CID will be sent over the server.

What is the reasoning behind having a Get method in the API? I thought we just needed to notify with a POST of the CID that we want to track (and the related info)

Am I missing anything?

@cortze I removed the get method. Essentially the internal queue of the server would fill up and ,at some point later after a lot of CIDs have been inserted, would never empty itself. I replaced it, just like you mentioned at the above comment, with just a post method that sends the received CIDs through a channel and then inserts them into the database. It does manage to insert them all, unlike the previous method which would get stuck, but it still is slow. E.G. the database has inserted 1500/1900 CIDs (numbers are random here) immediately and then the rest 400 CIDs all will get inserted after a couple of hours. So I think some approaches would be:

Make the process synchronous, wait for all of the CIDs to get published and then insert them to the hoarder (can be done very easily, but is essentially the same as the json file approach)

Implement some type of batching system, like inserting 1000 cids at a time into the database (I think it would be better to be done from the database side)

Leave it as it is and we get some of the benefits of the asynchronous pings

Would love to hear your thoughts!

Hey @FotiosBistas , thanks for taking into consideration the upper suggestion. Tbh it makes everything cleaner and easier to follow.
About your different approaches:

If you wait for all the CIDs to be published, you cannot follow the timings in each ping accurately. We will synchronize all the pings, having spikes of heavy ping sessions. Having the publication of it spaced over time serve the purpose as load balancing, so I'll say that it is fine to keep it this way

Adding batching over queries makes complete sense. I'll work on it in the following month.

It must stay like this until the batching is implemented. I need to sneak some time to take action on this.

pkg/hoarder/discoverer.go

FotiosBistas · 2023-01-16T11:38:56Z

I'm also wondering if the Readme file is updated to the latest changes in the addition of the API to the hoarder.

No I thought it would have been best to not add here. Shall I add it here?

Break was not working inside the select statement. Added returned instead. This exposed addProviderWg.add(1) was not being called. Http server when content is finished providing now returns nil,nil

Queue would fill up sometimes up to 1500 cids, due to the mintimer iteration scheduling. Implemented emptying the whole queue.

Added channel which listens to post requests and then sends them over to the discoverer, avoiding the HTTP server queue getting stuck.

cortze · 2023-05-26T10:12:10Z

As Optimistic Provide is already part of the libp2p/go-libp2p-kad-dht since #783, there is no longer need of having an exposed API. I intend to add OP as part of the publisher, thus, I proceed to close the PR as #18 removes the API.

Feel free to reopen it though if at one point it is needed :)

FotiosBistas added 8 commits January 12, 2023 19:05

track latest changes

869f7ec

added concurrency control in http server

21c741c

decrease queue timer http server

3934195

removed path from get and handler

e079972

counter to debug cid received from get

acec5ae

changed position of fetch res

68b0d90

added log messages and removed provider store operations

8054036

fixed log messages for better debugging

aa552a7

cortze changed the base branch from dev to master January 16, 2023 09:25

cortze requested changes Jan 16, 2023

View reviewed changes

cortze reviewed Jan 16, 2023

View reviewed changes

pkg/hoarder/discoverer.go Show resolved Hide resolved

FotiosBistas added 10 commits January 16, 2023 14:38

bug fix: for loop not exiting

c5454c7

Break was not working inside the select statement. Added returned instead. This exposed addProviderWg.add(1) was not being called. Http server when content is finished providing now returns nil,nil

better testing

8fba1ed

avoiding same error

f2a2b1a

added missing break

4bde352

Bug fix: queue would fill up

2bf2c8a

Queue would fill up sometimes up to 1500 cids, due to the mintimer iteration scheduling. Implemented emptying the whole queue.

bug fix: shutting down of http server before adding last cids

622c21e

Left only post method

1d3ebe7

Added channel which listens to post requests and then sends them over to the discoverer, avoiding the HTTP server queue getting stuck.

shutdown server

5dcaa48

simplified code with getNewHttpCid

9a50c6e

added comments

7c5cb87

cortze closed this May 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HTTP server fixes #15

HTTP server fixes #15

FotiosBistas commented Jan 13, 2023 •

edited

Loading

cortze left a comment

cortze Jan 16, 2023

FotiosBistas Jan 16, 2023

cortze Jan 16, 2023 •

edited

Loading

FotiosBistas Jan 16, 2023

cortze Jan 16, 2023 •

edited

Loading

FotiosBistas Jan 17, 2023 •

edited

Loading

FotiosBistas Feb 23, 2023

cortze Feb 23, 2023

FotiosBistas commented Jan 16, 2023

cortze commented May 26, 2023

HTTP server fixes #15

HTTP server fixes #15

Conversation

FotiosBistas commented Jan 13, 2023 • edited Loading

Motivation

Tasks

cortze left a comment

Choose a reason for hiding this comment

cortze Jan 16, 2023

Choose a reason for hiding this comment

FotiosBistas Jan 16, 2023

Choose a reason for hiding this comment

cortze Jan 16, 2023 • edited Loading

Choose a reason for hiding this comment

FotiosBistas Jan 16, 2023

Choose a reason for hiding this comment

cortze Jan 16, 2023 • edited Loading

Choose a reason for hiding this comment

FotiosBistas Jan 17, 2023 • edited Loading

Choose a reason for hiding this comment

FotiosBistas Feb 23, 2023

Choose a reason for hiding this comment

cortze Feb 23, 2023

Choose a reason for hiding this comment

FotiosBistas commented Jan 16, 2023

cortze commented May 26, 2023

FotiosBistas commented Jan 13, 2023 •

edited

Loading

cortze Jan 16, 2023 •

edited

Loading

cortze Jan 16, 2023 •

edited

Loading

FotiosBistas Jan 17, 2023 •

edited

Loading