Abstract: Deduplication has been employed mainly in distributed storage systems to improve space efficiency. Traditional deduplication research ignores the design requirements of shared-nothing distributed storage systems, such as no central metadata bottleneck, scalability, and storage rebalancing. Further, deduplication introduces transactional changes threatening the system’s data reliability, recovery, and consistency issues in the event of system failures. In this talk, I will present my work on building a robust, fault-tolerant, and scalable cluster-wide inline deduplication design that can eliminate duplicate copies across the cluster, maintaining consistency and effective garbage collection mechanism without violating the design properties of shared-nothing storage systems. We decouple the deduplication metadata from the read I/O path and replace it with an RMO object to further speedup the read performance. Finally, we show experimentally that our approach achieves high storage space efficiency without jeopardizing performance when compared against state-of-the-art content-addressable deduplication.
Please use this link to attend the virtual seminar:
Meeting ID: 978322106 / Participant passcode: 6132