Originally posted by Steve Conner
View Post
Ad Widget
Collapse
Announcement
Collapse
No announcement yet.
Pickups- physics or cooking?
Collapse
X
-
Well, my point is, that if you claim to have a compressor that works on random data, you can't just pick the sequences that compress well and reject the rest. Maybe you think you can make some kind of scheme that picks out the compressible bits and sends the rest as is. (This is more or less how FLAC works.) But you can show from the statistics of random noise that any such scheme wouldn't make back its encoding overhead, it would actually make the average lump of random data bigger. Proof:
To achieve compression, a shorter sequence must be used to stand for a longer one, but if this sequence itself occurs in the data, it must be "escaped", by adding an escape character, which makes it longer. Since this second sequence is shorter, it must be more common, by the statistics of random noise. (If the system were the alphabet, making a sequence 1 character shorter makes it appear 26 times more often.) All appearances of the escape character must also be escaped. So the overhead in escaping wipes out the gain in compressing the long sequence.
Any sequence capable of being stored in a computer must be the output of some algorithm. Maybe compression could be achieved by calculating, say, which LFSR it is the output of. However I think this is the same problem as cracking RSA.Last edited by Steve Conner; 05-11-2011, 07:06 PM."Enzo, I see that you replied parasitic oscillations. Is that a hypothesis? Or is that your amazing metal band I should check out?"
Comment
-
Originally posted by Steve Conner View PostWell, my point is, that if you claim to have a compressor that works on random data, you can't just pick the sequences that compress well and reject the rest. Maybe you think you can make some kind of scheme that picks out the compressible bits and sends the rest as is. But you can show from the statistics of random noise that any such scheme wouldn't make back its encoding overhead, it would actually make the average lump of random data bigger. Proof:
To achieve compression, a shorter sequence must be used to stand for a longer one, but if this sequence itself occurs in the data, it must be "escaped", by adding an escape character, which makes it longer. Since this second sequence is shorter, it must be more common, by the statistics of random noise. All appearances of the escape character must also be escaped. So the overhead in escaping wipes out the gain in compressing the long sequence.
Any sequence capable of being stored in a computer must be the output of some algorithm. Maybe compression could be achieved by calculating, say, which LFSR it is the output of. However I think this is the same problem as cracking RSA.
Compression takes advantage of patterns in the data. A particular sample of a random process such as white noise is unlikely to be sufficiently patterned to compress significantly.
Comment
-
Originally posted by Steve Conner View PostTo achieve compression, a shorter sequence must be used to stand for a longer one
In all cases what you said is true, if the map, whatever the map is, takes more space than the data, the compression is inefficient and may even increase size. When compressing random data using a certain algo, you may get no compression or lots of compression, it depends on the distribution of your random bits and on the algorithm you're employing - there are tons of algorithms, one for each purpose.... There are folks in their labs now testing new compression algorithms for different purposes as we speak.
Originally posted by Steve Conner View PostAny sequence capable of being stored in a computer must be the output of some algorithm.
Originally posted by Steve Conner View PostMaybe compression could be achieved by calculating, say, which LFSR it is the output of. However I think this is the same problem as cracking RSA.
If the key were generated by your computer without true random data, the state of your computer could be duplicated by algorithm, and the key generated elsewhere again, thus not needing to crack RSA, but just use the key. That is the role of random data in any assymetrical key system (and every procedure of generating keys in every encryption scheme that I know).
Random data is in fact unbreakable encryption. If you use random data as your key and nobody has ever used that same data for a key, just by shifting characters in another message using that key makes it an unbreakable message that can only be read by finding your key. That type of encryption is called a "one time pad" and is unbreakable by any algorithm, you need to torture the secret holder to death to get it. It was used on the "red phone" between the Kremlin and the White House and the key tapes would be transported by the military, so to break that encryption you'd need one hell of an army. Random data can be a powerful tool indeed.
Comment
-
Originally posted by jmaf View PostCracking RSA is a mathematical problem that involves the difficulty of factoring large numbers.
The actual difficulty is better sung to "Supercalifragilisticexpialidocious"
How to Give a Math Lecture at a Party.
> Superpolynomial subexponential runtimes.
> Even though in practice it would take you several lifetimes,
> If you ran it long enough you'd always find those two primes.
> Superpolynomial subexponential runtimes
>
> E to the root-log root-log-log [4x]
...and that's just the session key, i.e., the introduction.
The actual data stream uses a different encryption method."Det var helt Texas" is written Nowegian meaning "that's totally Texas." When spoken, it means "that's crazy."
Comment
-
You still don't get it. There are lots of different compression algorithms, but they all work only in so far as the data is patterned, as Mike put it. Each one is tuned to take advantage of the kind of pattern in its data. Trivial example: ASCII text only uses about 70 of the 256 available codes.
Lossy compression also exploits the limitations of the human senses, so the algorithms are interesting in so far as they have to mimic our own perception mechanisms. Maybe the guts of the MP3 codec have something to tell us about music theory.
But random data can't be compressed, this is as fundamental to information theory as the laws of thermodynamics are to regular engineering. The lack of patterns means that any scheme you can dream up will fail in general. Patterns can appear by chance, but the odds of any pattern are so slim that they outweigh the coding overhead.
The meaning of "random" is a deeper puzzle still. Shannon's information theory was concerned with distinguishing signal from noise, which is a very similar idea. The encrypted data on the Kremlin hotline would have been indistinguishable from random noise by a wiretapper, but if you have the matching key tape, it's not random at all."Enzo, I see that you replied parasitic oscillations. Is that a hypothesis? Or is that your amazing metal band I should check out?"
Comment
-
Originally posted by Steve Conner View PostYou still don't get it.
But random data can't be compressed, this is as fundamental to information theory as the laws of thermodynamics are to regular engineering. The lack of patterns means that any scheme you can dream up will fail in general. Patterns can appear by chance, but the odds of any pattern are so slim that they outweigh the coding overhead.
The Turing quote was taken out of context and refers to generating putative random sequences by arithmetic methods. His contention was simple: purely deterministic algorithms do not generate random sequences; they make long deterministic sequences. Example: the linear congruence rand() function in Microsoft's VC++ Express release.
In practice, most random sequences are less random than they seem. That is, perfect randomness does not exist except as a mathematical abstraction and should not be assumed as absolute save for the purposes of discussion.
Data resistant to RLE, Huffman, or LZW compression will often squeeze down a little with bzip2 or a fractal algorithm. It depends on the timescale and how much computational smarts you want/need to throw at the problem.
Does it matter? Not at all in a pickup makers forum."Det var helt Texas" is written Nowegian meaning "that's totally Texas." When spoken, it means "that's crazy."
Comment
-
OK, I agree with that, and the doubtful sanity of starting this thread in the first place. It was useful for me though, I bought a copy of the Harold McGee book and am loving it.
Would you accept this milder version of my original claim:
Perfectly random data would be incompressible if it existed, except it doesn't.
And the corollaries:
The closer data is to being perfectly random, the less it can be compressed by any conceivable algorithm.
The better a compression algorithm is, the harder its output is to distinguish from random data. (Unless you have the decompressor.)
In information theory terms, you could say that lossless compression must preserve the total entropy of the data, so if you want to reduce the number of bits, the entropy per bit must increase. However, perfectly random data already has the maximum possible.
I am open to suggestions to move this whole mess to the Parking Lot."Enzo, I see that you replied parasitic oscillations. Is that a hypothesis? Or is that your amazing metal band I should check out?"
Comment
-
Originally posted by Mike Sulzer View PostI am reminded of Freud. Starting from scratch and working with a tiny subset of highly neurotic people in a particular European subculture, he set out to explain the workings of the human mind.
OK, so I exaggerate a little bit.
The ancients knew this in the form that the lengths of strings yielding musical tones were the ratios of small integers. What they didn't know was why.Last edited by Joe Gwinn; 05-12-2011, 01:05 AM.
Comment
-
Originally posted by fieldwrangler View PostWhat was Helmholz' answer to this?
The long form: Gets pretty complex, especially why some chords sound bad. But Helmholtz's theory of this seems to be standing up even in this day of computer music.
I'd suggest reading (or at least skimming) the book, but Helmholtz was a German Professor, and I'll grant the book can be a bit of a slog.
I found his experimental methods quite interesting and inventive. Remember, this was decades before the invention of the vacuum tube - there was no such thing as electronics in Helmholtz's day.
Comment
-
A message written from an alphabet of infinite symbols is what I think you want... If that is the case, it should be simple to prove that a message of infinite size made from an alphabet of infinite symbols is probably incompressible. I don't know of a source of more in that theory to point you to though. It's not the case of white noise, images, music or text, all of which have finite alphabets.
Every message must be made of a finite number of symbols = the alphabet. Using the English alphabet, the main Twitter feed is a source of perfectly random data.
If every person on Twitter would play musical notes with the same intensity, rather than write English, it'd probably sound like white noise but I just made that up and have no proof to offer
Comment
-
Originally posted by jmaf View PostIt's not the case of white noise, images, music or text, all of which have finite alphabets.
There are some subtleties relating to types of patterns and lossy or lossless compression, but enough of this.
Comment
-
Originally posted by Steve Conner View PostOn a related note: The more complicated jazz is, the more the hipsters like it. So why is white or pink noise not praised as the ultimate form of jazz? Being totally random, it has the maximum possible algorithmic complexity.
By contrast, music, natural languages, genome sequences, and the like, all follow 1/f statistics, also known as pink noise.
Jazz may be a paler shade of pink, but it's still pink.
Comment
-
Originally posted by Mike Sulzer View PostWhite noise, the analog signal, has infinite bandwidth, infinite power (variance), and infinite peak power with vanishing probability.
Originally posted by Mike Sulzer View Postbut enough of this.
Comment
Comment