I just skimmed it, but it’s starting with a totally nonsensical basis for calculation. For example,
“In fact, the entropy of English is only ∼ 1 bit per character.”
Um, so each character is just 0 or 1 meaning there are only two characters in the English language? You can’t reduce it like that.
I mean just the headline is nonsensical. 10 bits per second? I mean a second is a really long time. So even if their hypothesis that a single character is a bit we can only consider 10 unique characters in a second? I can read a whole sentence with more than ten words, much less characters, in a second while also retaining what music I was listening to, what color the page was, how hot it was in the room, how itchy my clothes were, and how thirsty I was during that second if I pay attention to all of those things.
This is all nonsense.
I think the fundamental issue is that you’re assuming that information theory refers to entropy as uncompressed data but it’s actually referring to the amount of data assuming ideal/perfect compression.
Um, so each character is just 0 or 1 meaning there are only two characters in the English language? You can’t reduce it like that.
There are only 26 letters in the English alphabet, so fitting in a meaningful character space can be done in less than 5 bits (2^5 = 32). Morse code, for example, encodes letters in less than 4 bits per letter (the most common letters use fewer bits, and the longest use 4 bits). A typical sentence will reduce down to an average of 2-3 bits per letter, plus the pause between letters.
And because the distribution of letters in any given English text is nonuniform, there’s less meaning per letter than it takes to strictly encode things by individual letter. You can assign values to whole words and get really efficient that way, especially using variable encoding for the more common ideas or combinations.
If you scour the world of English text, the 15-character string of “Abraham Lincoln” will be far more common than even the 3-letter string of “xqj,” so lots of those multiple character expressions only convey a much smaller number of bits of entropy. So it might be that it takes someone longer to memorize a random 10 character string that is truly random, including case sensitivity and symbols and numbers, than it would to memorize a 100-character sentence that actually carries meaning.
Finally, once you actually get to reading and understanding, you’re not meticulously remembering literally every character. Your brain is preprocessing some stuff and discarding details without actually consciously incorporating them into the reading. Sometimes we glide past typos. Or we make assumptions (whether correct or not). Sometimes when tasked with counting basketball passes we totally miss that there was a gorilla in the video. The actual conscious thinking discards quite a bit of the information as it is received.
You can tell when you’re reading something that is within your own existing knowledge, and how much faster it is to read than something that is entirely new, on a totally novel subject that you have no background in. Your sense of recall is going to be less accurate with that stuff, or you’re going to significantly slow down how you read it.
I can read a whole sentence with more than ten words, much less characters, in a second while also retaining what music I was listening to, what color the page was, how hot it was in the room, how itchy my clothes were, and how thirsty I was during that second if I pay attention to all of those things.
If you’re preparing to be tested on the recall of each and every one of those things, you’re going to find yourself reading a lot slower. You can read the entire reading passage but be totally unprepared for questions like “how many times did the word ‘the’ appear in the passage?” And that’s because the way you actually read and understand is going to involve discarding many, many bits of information that don’t make it past the filter your brain puts up for that task.
For some people, memorizing the sentence “Linus Torvalds wrote the first version of the Linux kernel in 1991 while he was a student at the University of Helsinki” is trivial and can be done in a second or two. For many others, who might not have the background to know what the sentence means, they might struggle with being able to parrot back that idea without studying it for at least 10-15 seconds. And the results might be flipped for different people on another sentence, like “Brooks Nader repurposes engagement ring from ex, buys 9-carat ‘divorce ring’ amid Gleb Savchenko romance.”
The fact is, most of what we read is already familiar in some way. That means we’re actually processing less information than we’re actually taking in, and discarding a huge chunk of what we perceive towards what we actually think. And when we encounter things that didn’t necessarily expect, we slow down or we misremember things.
So I can see how the 10-bit number comes into play. It cited various studies showing the image/object recognition tends to operate in the high 30’s in bits per second, and many memorization or video game playing tasks involve processing in the 5-10 bit range. Our brains are just highly optimized for image processing and language processing, so I’d expect those tasks to be higher performance than other domains.