ANALYSIS AND FORMALIZATION OF HAYSTACK ARCHITECTURE IN FACEBOOK

Thị Thúy Lê , Quốc Việt Bùi

Main Article Content

Abstract

Haystack is a storage system architecture optimized for Facebook’s photo application. Current haystack has four main advantages compared to previous versions, including high throughput and low latency, fault tolerance, cost effectiveness, and simplicity. Given the widespread use of the Haystack architecture in Facebook, its validity and other key attributes abstracted from this architecture need to be analyzed and verified with a more precise approach. This paperfocuses on the internal design of serving and uploading a photo of Haystack architecture and apply Communicating Sequential Processes (CSP) to formalize them in detail. By feeding the models into the model checker Process Analysis Toolkit (PAT), we have verified crucial properties, including basic property and supplementary properties. Basic property contains Deadlock Freedom. Supplementary properties include synchronous and asynchronous concurrent access, and synchronous concurrent access with the same client, synchronous concurrent and synchronous concurrent upload with the same client. Finally, according to the verification results, we believe that from the CSP’s perspective, the properties of Haystack architecture are valid, which means that it meets the requirements of the documents of Facebook.

Article Details

References

Brookes, S. D., Hoare, C. A. R., & Roscoe, A. W. (1984). A theory of communicating sequential processes. J. ACM, 31(3), 560-599.
Doug Beaver, Sanjeev Kumar, Harry C. Li, Jason Sobel, & Peter Vajgel (2010). Finding a needle in haystack: Facebook’s photo storage. In 9th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2010, October 4-6, 2010, Vancouver, BC, Canada, Proceedings, 47-60.
Hoare, C. A. R. (1978). Communicating sequential processes. Communications of the ACM, 21(8), 666-677.
Jan A. Bergstra, & Jan Willem Klop (1985). Algebra of communicating processes with abstraction. Theor. Comput. Sci., 37, 77-121.
Lowe, G., & Roscoe A. W. (1997). Using CSP to detect errors in the TMN protocol. IEEE Trans. Software Eng., 23(10), 659-669.
National University of Singapore. PAT (2008). Process analysis toolkit url=https://pat.comp.nus.edu.sg/.
Ngo, Q. V. (2011). Multi-coronas zernike moments on curvelet-like transform and application to pattern recognition. Ho Chi Minh City University of Education of Journal of Science, 30(64).
Robin Milner (1980) A Calculus of Communicating Systems, volume 92 of Lecture Notes in Computer Science. Springer.
Roscoe, A. W. (2010). Understanding Concurrent Systems. Texts in Computer Science. Springer.