In theory, an A4 sheet of paper should be able to hold about 4.18 MB of binary data if completely filled with a B&W pattern of 1/600 inch dots. If the CMYK colors are permitted, that bumps us up to 5 symbols per dot instead of two, giving us 8.4 MB per sheet. A stack of 500 sheets would then be able to hold 4.2 GB of data, with no error correction. Why this exercise? I wanted to see if paper could actually work as an archival medium for digital files.
@aaronscientiae You'd have to make a few assumptions. My example assumes that high-resolution digital scanners are available, together with knowledge of symbolic encoding of binary numbers, and of course, the ability to decode the data itself after it has been recovered. I'm not so much assuming an apocalyptic scenario, and I'm assuming that people will continue to be familiar with the science of past centuries, much like we are.
@aaronscientiae I'm not using any sort of theory that wasn't already discovered and documented on paper in the 20th century. Basic information theory and symbol encoding that's been around since the days of morse code. File formats are a trickier business, but I reckon ample sources on things like HTML and UTF-8 (and definitely ASCII) will be available, since that stuff is so widely used, and plenty of code tables will exist in printed for for that.
@aaronscientiae Copies of the JFIF format specification will probably also be easy to find, so heavily compressed images are probably fine. Again, if we aren't thinking post-apocalyptic here, the main job is to keep the medium physically intact. Historians and archivists will be able to build machines for decoding old physical formats so long as they're documented.
@thor some way of recovering the page order from a pile of random retrieved pages, know which are missing. Cool thought experiement to distract me from Xmas dip!
@thor A large stack of paper equating to a DVD is not bad! Though you'd probably lose 20-30% to error correction, depending on how robust you want it to be. Does this include front and back?
CYMK could get you many more than 5 symbols if you wanted it to, and I feel that 1/600 inch dots might not be easy/cheap to print (or scan) reliably, but it's still a cool idea.
@icefox I'm assuming that 1/600 inch is the smallest practical dot size. Thus, all dots must be uniform. CMYK is 4 colors. White is the 5th symbol. Mixing colors by overlapping them is possible, but you risk making it harder to decode, especially if the paper or toner degrades. Keeping the color symbols few and far apart on the symbol cube reduces the chance of errors. I'm assuming one side of a sheet. With both sides, you could double that figure.
@ninjawedding @thor nope but that's awesome.
@icefox @ninjawedding His scanner looks kind of blurry, but yeah, that's similar. There are some sophisticated algorithms out there that can not only detect errors, but correct them by extrapolating data that fits the parity bits or checksums. There are also tricks you can use, like spreading your data in a quasi-random pattern to spread the impact of a damaged area across many single bits instead of a large chunk of data.
@ninjawedding @icefox I can imagine that if you use such spreading and pair it with error correction, you could tear off a corner of the paper and still successfully recover all the data on it, because the discontinuity is distributed as random missing bits across the data, and that's much easier to deal with than a large missing chunk.
@ninjawedding @icefox Reed-Solomon coding
@icefox To get variable dot sizes at 600 raster dots per inch, the printer must have more than 600 pixels per inch of resolution. I'm thinking that measuring dot sizes is tricky, especially if you are clustering and partially overlapping them, like a typical color laser printer will do with images. You need to bypass the dot raster system and make a uniform grid of non-overlapping single-color squares.
@thor I wasn't thinking about variable dot sizes, just colors. I was just wondering whether 600 dpi might be too small to reliably read/write with cheap equipment.
@thor ...do you know what the data density of microfilm is like? Assuming just text, I suppose.
@thor
Interesting. I've encoded files in Base64 and printed them before (works well for really small files if you use either an unambiguous font & manual typing or a scanner with good OCR), but this would definitely allow for much greater data density.
@ND3JR I was thinking of another system that uses a dot matrix, namely QR codes. If you filled an entire sheet with a very dense QR-like code, and perhaps permitted extra permutations with colored CMYK dots, yeah, I imagine that would work, especially once you drop the requirement that the code be readable with a camera, and use a high resolution scanner read the medium with instead.
You'll have to check archival quality on both the paper and the ink, or fading over time could affect data integrity.
@thor I'm trying to think of a book format that would enable maximum readability in the future and data protection.