Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Nov:2023:1515-1550.
doi: 10.1109/focs57990.2023.00093. Epub 2023 Dec 22.

Pseudorandom Hashing for Space-bounded Computation with Applications in Streaming

Affiliations

Pseudorandom Hashing for Space-bounded Computation with Applications in Streaming

Praneeth Kacham et al. Proc Annu Symp Found Comput Sci. 2023 Nov.

Abstract

We revisit Nisan's classical pseudorandom generator (PRG) for space-bounded computation (STOC 1990) and its applications in streaming algorithms. We describe a new generator, HashPRG, that can be thought of as a symmetric version of Nisan's generator over larger alphabets. Our generator allows a trade-off between seed length and the time needed to compute a given block of the generator's output. HashPRG can be used to obtain derandomizations with much better update time and without sacrificing space for a large number of data stream algorithms, for example: Andoni's F p estimation algorithm for constant p > 2 (ICASSP, 2017) assumes a random oracle, but achieves optimal space and constant update time. Using HashPRG's time-space trade-off we eliminate the random oracle assumption while preserving the other properties. Previously no time-optimal derandomization was known. Using similar techniques, we give an algorithm for a relaxed version of p sampling in a turnstile stream. Both of our algorithms use O ˜ d 1 - 2 / p bits of space and have O 1 update time.For 0 < p < 2 , the 1 ± ε approximate F p estimation algorithm of Kane et al., (STOC, 2011) uses an optimal O ε - 2 log d bits of space but has an update time of O log 2 1 / ε log log 1 / ε . Using HashPRG, we show that if 1 / d ε 1 / d c for an arbitrarily small constant c > 0 , then we can obtain a 1 ± ε approximate F p estimation algorithm that uses the optimal O ε - 2 log d bits of space and has an update time of O log d in the Word RAM model, which is more than a quadratic improvement in the update time. We obtain similar improvements for entropy estimation.CountSketch, with the fine-grained error analysis of Minton and Price (SODA, 2014). For derandomization, they suggested a direct application of Nisan's generator, yielding a logarithmic multiplicative space overhead. With HashPRG we obtain an efficient derandomization yielding the same asymptotic space as when assuming a random oracle. Our ability to obtain a time-efficient derandomization makes crucial use of HashPRG's symmetry. We also give the first derandomization of a recent private version of CountSketch. For a d -dimensional vector x being updated in a turnstile stream, we show that x can be estimated up to an additive error of ε x 2 using O ε - 2 log 1 / ε log d bits of space. Additionally, the update time of this algorithm is O log 1 / ε in the Word RAM model. We show that the space complexity of this algorithm is optimal up to constant factors. However, for vectors x with x = Θ x 2 , we show that the lower bound can be broken by giving an algorithm that uses O ε - 2 log d bits of space which approximates x up to an additive error of ε x 2 . We use our aforementioned derandomization of the CountSketch data structure to obtain this algorithm, and using the time-space trade off of HashPRG, we show that the update time of this algorithm is also O log 1 / ε in the Word RAM model.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
Overview of CountSketch guarantees with different kinds of random hash functions. For simplicity we focus on the case of r=O(logd) repetitions and d-dimensional input vectors that contain O(logd)-bit integers such that the CountSketch itself (without hash functions) uses space D=O(tlogd) words. With pairwise independence we can only tightly bound the probability of exceeding error Δ=tailt(x)2/t, while the other hash functions allow us to bound the probability of smaller errors. Time bounds are for implementation on a Word RAM with word size w=O(logd). Parameters with a particularly bad impact on space or time are highlighted in red color.

Similar articles

References

    1. Ahn Kook Jin, Guha Sudipto, and McGregor Andrew. Graph sketches: Sparsification, Spanners, and Subgraphs. In Proceedings of the 31st ACM SIGMOD-SIGACT-SIGAI symposium on Principles of Database Systems, pages 5–14, 2012.
    1. Andoni Alexandr, Krauthgamer Robert, and Onak Krzysztof. Streaming algorithms via precision sampling. In 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science, pages 363–372. IEEE, 2011.
    1. Andoni Alexandr. High frequency moments via max-stability. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 6364–6368. IEEE, 2017.
    1. Andoni Alexandr, Nguyen Huy L,Polyanskiy Yury, and Wu Yihong. Tight lower bound for linear sketches of moments. In International Colloquium on Automata, Languages, and Programming, pages 25–32. Springer, 2013.
    1. Braverman Vladimir, Chestnut Stephen R., Ivkin Nikita, and Woodruff David P.. Beating CountSketch for heavy hitters in insertion streams. In STOC’16-Proceedings of the 48th Annual ACM SIGACT Symposium on Theory of Computing, pages 740–753. ACM, New York, 2016.

LinkOut - more resources