Many fairly random and chaotic looking things are often nothing surprising at all.

https://x.com/keenanisalive/status/1866251675440460234

Many fairly random and chaotic looking things are often nothing surprising at all.

Take any sequence and count unique values, the probability distribution ( as counts) tells you how many unique tokens are needed to store it losslessly. A log function fits for estimation, but so does the tail of a Poisson or normal distribution.

I just keep all the data now and only looks at statistics when I do not have the lossless original data. Much of STEMCFQ (science technology engineering mathematics computing finance quantiative_things) on the Internet uses analytic expressions from a past where it was hard to memorize whole tables of data.
 
To me it is simply a “compression algorithm” because of the finite and incomplete memory of humans (and other species) . The more unique things there are to manage, the larger the memory required. If there are many all alike, but arriving more or less randomly, there is a cost to manage ones “almost all alike” and if they are normally distributed on another dimension, a mean and standard deviation and count is often useful.
 
The arguments over entropy come from its lack of precision and care. It sounds really magical, so it gets trotted out in many places on the Internet. As though anyone actually uses it.
 
The only place it has a specific meaning is in thermodynamics where any ratio of energy to temperature (Joules/Kelvin) is an “entropy” Where I see it used it is simply to avoid doing the detailed statistics. When working with temperatures as proxies for energy contained or used or absorbed. Other than a unit, which does have meaning if people do a good job of tracking from the raw data to stuff put on the Internet. Generally I would avoid using it at all.

As for surprise, that has to do with how much you have to memorize to not be surprised. The same training data (for machine learning) can be compressed significantly and if you can model relations between unique entities in the data, fairly random looking and chaotic looking things take on meaning, are controllable, and nothing surprising at all.

Temperature is the problem because it does not directly relate to power or energy except where people are very careful and precise. That care and precision is rare on the Internet. If anyone saying entropy want to be precise, they need to show how they measure and define energy and power, spatial and temporal data in their systems. None of this is really hard, it is just tedious and takes great care.

( “entropy”) on google has 120 Million entries. That is NOT a concise and consistent global resource term. It is not hard to fix, just voluminous and tedious and no one gives a hoot. And you want to talk it out – in a “limited to text mostly” chat not capable of sharing mathematical models and real data?  Better to work on “poverty” “knowledge for all” or “solar system exploration and development”.
 
 
Richard Collins, The Internet Foundation
Richard K Collins

About: Richard K Collins

The Internet Foundation Internet policies, global issues, global open lossless data, global open collaboration


Leave a Reply

Your email address will not be published. Required fields are marked *