Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use List of HashSet to cache hashes instead of OrderedDictionary in FIFOSet #1382

Closed
eryeer opened this issue Dec 24, 2019 · 2 comments · Fixed by #1389
Closed

Use List of HashSet to cache hashes instead of OrderedDictionary in FIFOSet #1382

eryeer opened this issue Dec 24, 2019 · 2 comments · Fixed by #1389
Labels
discussion Initial issue state - proposed but not yet accepted

Comments

@eryeer
Copy link
Contributor

eryeer commented Dec 24, 2019

Summary or problem description

When there are many elements cached in the FIFOSet, its Add and ExceptWith efficiency will become worse. It is very obvious during performance testing, in the TaskManager's RestartTask,

private void OnRestartTasks(InvPayload payload)
{
knownHashes.ExceptWith(payload.Hashes);

the ExceptWith method is used to exclude 16,000 of the 62,800 knownHashes. Each delete performance is 0.015s and totally spends 240s. The process will get stuck. That means under high overload, if the nodes are out of sync due to lacking of a large number of transactions, obtaining transactions will become extremely slow.
The reason is mainly the computational efficiency issue of the OrderedDictionary. The unit test below will show the performance for large number of elements.

        [TestMethod]
        public void TestFIFOSet()
        {
            Stopwatch stopwatch = new Stopwatch();
            var bucket = new FIFOSet<int>(150_000);
            stopwatch.Start();
            for (int i = 1; i <= 550_000; i++)
            {
                bucket.Add(i);
            }
            stopwatch.Stop();
            Console.WriteLine($"Add timespan: {stopwatch.Elapsed.TotalSeconds}s");
            stopwatch.Reset();
            var items = new int[10000];
            var value = 550_000;
            for (int i = 0; i <= 9999; i++)
            {
                items[i] = value;
                value -= 50;
            }
            stopwatch.Start();
            bucket.ExceptWith(items);
            stopwatch.Stop();
            Console.WriteLine($"except with timespan: {stopwatch.Elapsed.TotalSeconds}s");
            stopwatch.Reset();

            stopwatch.Start();
            var ret = bucket.Contains(140_000);
            stopwatch.Stop();
            Console.WriteLine($"contains with timespan: {stopwatch.Elapsed.TotalSeconds}s result: {ret}");
            stopwatch.Reset();

            stopwatch.Start();
            ret = bucket.Contains(545_001);
            stopwatch.Stop();
            Console.WriteLine($"contains with timespan: {stopwatch.Elapsed.TotalSeconds}s result: {ret}");
            stopwatch.Reset();
        }

Console prints

Add timespan: 21.3015352s
except with timespan: 10.8901768s
contains with timespan: 0.0001552s result: False
contains with timespan: 5E-07s result: True

Do you have any solution you want to propose?

Currently, we only use FIFOSet to cache UInt256 hashes. The elimination order is based on the sequence of the hashes. In this case the elimination order is not critical. Therefore, we can use the List of HashSet instead, create a List containing multiple HashSets, and each element is stored in the HashSet in chronological order. When the previous HashSet is full, it starts to load the next HashSet. If all HashSets are full, delete the first HashSet in the List and add a new empty HashSet at the end of the List to hold the newly added elements.
The design of this class is as follows

    public class HashSetCache<T> : IEnumerable<T> where T : IEquatable<T>
    {
        private readonly int hashSetCapacity; //maxmium of elements in each hashset
        private readonly List<HashSet<T>> sets = new List<HashSet<T>>();

        public int Size
        {
            get
            {
                int size = 0;
                foreach (var set in sets)
                {
                    size += set.Count;
                }
                return size;
            }
        }

        public HashSetCache(int hashSetCapacity, int hashSetCount = 10)
        {
            if (hashSetCapacity <= 0) throw new ArgumentOutOfRangeException(nameof(hashSetCapacity));
            if (hashSetCount <= 0 || hashSetCount > 20) throw new ArgumentOutOfRangeException($"{nameof(hashSetCount)} should between 1 and 20");

            this.hashSetCapacity = hashSetCapacity;
            for (int i = 0; i < hashSetCount; i++)
            {
                sets.Add(new HashSet<T>());
            }
        }

        public bool Add(T item)
        {
            if (Contains(item)) return false;
            foreach (var set in sets)
            {
                if (set.Count < hashSetCapacity)
                {
                    return set.Add(item);
                }
            }
            sets.RemoveAt(0);
            var newSet = new HashSet<T>
            {
                item
            };
            sets.Add(newSet);
            return true;
        }

        public bool Contains(T item)
        {
            foreach (var set in sets)
            {
                if (set.Contains(item)) return true;
            }
            return false;
        }

        public void ExceptWith(IEnumerable<T> items)
        {
            foreach (var item in items)
            {
                foreach (var set in sets)
                {
                    if (set.Remove(item)) break;
                }
            }
        }

        public IEnumerator<T> GetEnumerator()
        {
            foreach (var set in sets)
            {
                foreach (var item in set)
                {
                    yield return item;
                }
            }
        }

        IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
    }

The unit test shows this design is much faster than OrderedDictionary under the same input condition.

        [TestMethod]
        public void TestHashSetCache()
        {
            Stopwatch stopwatch = new Stopwatch();
            var bucket = new HashSetCache<int>(15_000);
            stopwatch.Start();
            for (int i = 1; i <= 550_000; i++)
            {
                bucket.Add(i);
            }
            stopwatch.Stop();
            Console.WriteLine($"Add timespan: {stopwatch.Elapsed.TotalSeconds}s");
            stopwatch.Reset();
            var items = new int[10000];
            var value = 550_000;
            for (int i = 0; i <= 9999; i++)
            {
                items[i] = value;
                value -= 50;
            }

            stopwatch.Start();
            bucket.ExceptWith(items);
            stopwatch.Stop();
            Console.WriteLine($"except with timespan: {stopwatch.Elapsed.TotalSeconds}s");
            stopwatch.Reset();
            stopwatch.Start();
            var ret = bucket.Contains(140_000);
            stopwatch.Stop();
            Console.WriteLine($"contains with timespan: {stopwatch.Elapsed.TotalSeconds}s result: {ret}");
            stopwatch.Reset();

            stopwatch.Start();
            ret = bucket.Contains(545_001);
            stopwatch.Stop();
            Console.WriteLine($"contains with timespan: {stopwatch.Elapsed.TotalSeconds}s result: {ret}");
            stopwatch.Reset();
        }

Here is the Console print

Add timespan: 0.2357612s
except with timespan: 0.0035483s
contains with timespan: 2.1E-06s result: False
contains with timespan: 4.6E-06s result: True

Neo Version

  • Neo 3

Where in the software does this update applies to?

  • Consensus
  • Ledger
@eryeer eryeer added the discussion Initial issue state - proposed but not yet accepted label Dec 24, 2019
@shargon
Copy link
Member

shargon commented Dec 24, 2019

Did you tested other alternatives to OrderedDictionary ?

@eryeer
Copy link
Contributor Author

eryeer commented Dec 25, 2019

No, do you have any other alternative ideas?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Initial issue state - proposed but not yet accepted
Projects
None yet
2 participants