Do not flatter groups who are trying to sell unverifiable systems. Do not promote or trust them

Richard K Collins All Global Open Devices, All knowledge, All Languages, Infrastructure Water Power Internet Food Weather September 15, 2024

Thomas Wolf @Thom_Wolf still quite crazy how Anthropic leap-frogged everyone when they released Sonnet 3.5 a few months ago

a model completely ahead of its time. we still have zero idea what special tricks they used to train it

source: ARC-AGI evaluation of the new openai o1 https://arcprize.org/blog/openai-o1-results-arc-prize https://pic.x.com/y0xpnp6dv0
Replying to @Thom_Wolf

Do not flatter groups who are trying to sell unverifiable systems. Do not promote or trust them

Thomas, A group that makes something “better” is absolutely useless if the method cannot be understood, improved, verified. For global human society, hidden methods are not just dangerous and unreliable, they are also usually unfair. If they simply learned how to game your benchmarks that is mostly useless, since your benchmarks do not include any of the world’s real systems and needs.

For the most part, none of these AIs can be trusted with human lives. None of these AIs can trace their raw training data. None of these AIs have training data that is open and accessible. The stuff that is being used is not representative of what the human species knows, values and uses.

Since I have spent 26 years every day looking at what groups use the Internet for, what humans face and what groups actually do. Perhaps you ought to listen. I started working on artificial intelligence about 1966 and have been working at the core issues for many decades. What it is used for and how it is used at global scale matters. How it is implemented on the Internet matters.

Make sure that all the AI groups are encoding their raw data in lossless format. Making statistical projections from data is useful, but verification requires lossless and globally accessible information. To be fair to 8.2 Billion humans (5.4 Billion with some access to the Internet and 2.8 Billion not) then all human languages have to be covered. If you cannot explain things and share with everyone, you all are likely going the wrong directions.

I say “global open tokens” to mean codes for all concepts and common things in a dataset of the equivalent in all human languages. Encode the entire Internet. Use the encoding so Google does not have a monopoly on search. Use the encoding so every site is pre-encoded to go into AI raw datasets, and so every reply or action of the AIs can be traced and verified back to the specific raw data, evidence, people, situation and background that created the raw data in the first place.

Most of the “data” on the Internet is secondary or derived. Much is badly copied. Much has been recast into “yet another human attempt to say things” and much is “yet other group trying to sell something”. The data going into the AIs now all of them, is highly biases, very very narrow in focus, and generally not traceable. I had to look at all formats, all groups, all countries, all occupations, all human situations, all domain specific languages, all ways of looking at things just to see where the Internet is failing. Huggingface promoting “garbage in, garbage out, untraceable chatty stuff” will not help countries, industries, global issues, global systemic problems, global opportunities.

There are many instances where the arbitrary tokens for units and dimensions, place names, dates, numbers (particularly scientific notation), datasets, equations, computer algorithms, translations, generated things from “AIs”– are creating responses that are always wrong.

A nuclear power plant is not a toy, its design cannot be trusted to the toy AIs coming out now. A rocket or plane, a factory or refinery, a global just in time warehousing system, a global education systems, a global set of country management systems, a global set of city management systems, a global set of environmental datasets models and plans.

Your group knows what to do, in many cases, but the Huggingface community as a whole is not working consciously, deliberately and seriously enough. People are dying in the world and they should not be. People are living miserable lives, or wasting entire lives because they cannot get to reliable information. 8.2 Billion people is a huge responsibility. And if HuggingFace is just playing games it is abrogating its future responsibilities.

If you have a group like Anthropic and they refuse to work openly, verifiably, auditable, reliable, trustworthy because it cannot be independently tested. Then simply ignore them, and figure it out yourself. Definitely do not keep saying “Anthropic is wonderful, a leader, really a leap forward” if they cannot be verified or trusted with real world problems. What they are doing (or not doing) is dangerous for the world. Do not flatter them, do not follow them, no matter how attractive or heavily promoted.

There are many countries with wars now. There are hundreds of millions just barely getting by. The whole notion of “true AIs” is supposed to be ( Intelligent algorithms running on computers and global systems will save human brain labor, and improve the world’s memory of how things work, what is happening, where to find things, who is doing what, what to avoid, where to put effort, where to invest your time, money and attention).

Do it consciously and deliberately. Do not just “see what happens if we do this or that”.

Most of the information in the world, much of it created based on “for the good of all” is locked behind copyright, patents, trade secrets, or simply hidden. Those mechanisms restrict the input that AIs can use as they are often legally bound not to include copyrighted materials. Governments and nonprofits pay for research, and it is suppose to be shared with all humans. But it gets locked by groups and used for the benefit of a few.

As a practical matter, that means the AI datasets are not supposed to contain anything in books, papers, blogs, videos, shared knowledge that is on the Internet, but is not supposed to be used for commercial purposes, not supposed to be sold as a product of someone else. — but critical for smooth and safe operation of global systems.

You likely have people in the HuggingFace and AI community who know some of that. But they are not working together globally to craft AIs that understand it all, and that can be verified, checked, corrected, and adapted as the world changes.

The AI community can work openly to encourage groups to share. HuggingFace and the global AI groups working toward “true intelligent systems” can work to assure they have open data from all Internet users that can be used for the good of all. That means also ostracizing those who try to hoard their methods or push unverifiable systems on the human species. There is too much at stake, too many lives and futures to let “oh they are just learning, let them do what they want”.

You can set higher standards for behavior. You can encourage all the AI groups to work with the sources of the information to track it faithfully, make it verifiable and useful to all. Help the creators of knowledge to do it right, do it efficiently. You are stewards of the world’s knowledge if all AIs can hold the world’s knowledge and help all humans.

You are composing your Huggingface and related websites on large high resolution monitors and many of the future workers and and AI industries are going to be in countries and places that are considered poor now. They have the greatest incentives to change the fastest and most efficiently. Make your knowledge and community accessible to all humans, in all human languages, and all backgrounds, in compact and accessible form (everyone has a good way to use it that is not onerous).

Stop thinking “the next 12 days” or “the next 12 months” but think “the next 12 decades” and “the next 12 millenia”.

Richard Collins, The Internet Foundation

Do not flatter groups who are trying to sell unverifiable systems. Do not promote or trust them

About: Richard K Collins

Leave a Reply Cancel reply