Loading...
Files
Date
2025-05-15
Publisher
Philipps-Universität Marburg
item.page.supervisor-of-thesis
Abstract
The current era of exponentially increasing data volumes has led to unprecedented demand for data storage capacity. Estimates show that by the year 2025, around 175 zettabytes of data will be created globally. Despite continuous improvements in current data storage technologies, they are already outpaced, failing to cope with the ever-growing demand. Thus, alternative storage technologies are explored and studied to confront this predicament. Deoxyribonucleic acid (DNA) has emerged as one of the promising alternatives among these endeavors due to its remarkable properties. DNA consists of four basic building units called nucleotides and is a naturally occurring biomaterial found in all known living organisms. It provides a theoretical storage density of 455 exabytes/g, around six orders of magnitude higher than current storage devices. DNA is environmentally friendly and does not require energy once data is preserved. Crucially, DNA can endure up to several millennia, providing dense and long-term storage at a low cost.
However, using DNA for storage today entails several unsolved challenges. Writing and reading DNA is still more expensive compared to current storage technologies. Furthermore, the methods for reading and writing DNA require specific considerations. For example, not every sequence of nucleotides is valid, as some arrangements are error-prone in the writing and reading processes. Crucially, reading specific regions on DNA (random access) is poorly supported. The currently available address space on DNA is very small, at around a few hundred addresses. Notably, efficient and scalable random access on DNA is essential, given its large storage capacity. Current solutions for accessing specific data on DNA introduce several biochemical complications, significantly hindering subsequent access. In addition, they require additional storage on a traditional storage device to store specific meta-data. This extra storage is needed to support random access on DNA. These drawbacks make implementing DNA storage today unfavorable despite DNA's remarkable advantages.
This thesis studies the abovementioned drawbacks and aims to provide practical and efficient solutions for using DNA as storage. In particular, this thesis focuses on providing novel methods for randomly accessing data on DNA. The proposed techniques in this thesis enlarge the address space from a few hundred to several billion addresses, enabling high-scale random access on DNA. Several optimization steps are included, returning DNA codes fulfilling the necessary biochemical constraints. Moreover, this thesis introduces the first method to efficiently support content-based queries on DNA. Data objects on DNA can be accessed based on their content, a considerable improvement over previous methods. The proposed techniques are further extended to support more generic queries, such as range queries, providing rich query support on DNA. Supporting a wide range of queries enhances the pressing need for random access capabilities on DNA and reduces the associated costs significantly. Targeted information can be extracted fine-grained from DNA, reducing the overhead in the costly DNA reading process. Crucially, supporting such complex queries is achieved without relying on additional storage on a traditional storage device.
Furthermore, this thesis introduces DNAContainer, a uniform interface to DNA storage providing simple put and get operations, as known from traditional storage devices. DNAContainer hides the complexity tailored to DNA storage, making DNA storage more accessible, even for non-experts. Internally, DNAContainer implements the introduced enhancements and techniques and thus offers an efficient and scalable approach to random access. It further provides a linear virtual address space, simplifying data management on DNA. Hence, it facilitates the seamless integration of DNA storage into the storage hierarchy of a database management system. Moreover, it implements data structures on DNA, such as array and list, further enhancing data management.
In summary, this thesis provides novel solutions to implement and enhance random access capabilities on DNA. The results demonstrate the tremendous scalability of the introduced methods, allowing addressing and accessing up to several billion objects on DNA. Further enhancements in supported queries on DNA improve random access capabilities on DNA and reduce the associated cost. Finally, DNAContainer provides a standard interface with a virtual address space for using DNA, lowering the hurdle for using DNA as storage.
Review
Metadata
Contributors
Supervisor:
Dates
Created: 2024Issued: 2025-05-15Updated: 2025-05-15
Faculty
Fachbereich Mathematik und Informatik
Language
eng
Data types
DoctoralThesis
DFG-subjects
DNA-SpeicherSpeichersystemrandom accessDNA
DDC-Numbers
004
show more
El-Shaikh, Alex (M.Sc.) (0000-0001-6276-4020): Implementing random access on DNA data storage systems. : Philipps-Universität Marburg 2025-05-15. DOI: https://doi.org/10.17192/z2024.0250.