Global Open Tokens (GOTs), Global and Universal tokens

Cameron R. Wolfe, Ph.D. @cwolferesearch New language models get released every day (Gemini-1.5, Gemma, Claude 3, potentially GPT-5 etc. etc.), but one component of LLMs has remained constant over the last few years—the decoder-only transformer architecture. This architecture has five components…

Why should we care?… https://pic.twitter.com/7vn9GugHm1
Replying to @cwolferesearch

Hello Cameron, With the Internet Foundation, every day for the last 25 years to trace out all knowledge and activities on the Internet. The AI groups need to make global open tokens so references to common things use real tokens.
 
“Global open tokens” (GOTs). Essentially concepts that cross and link all human and synthetic languages. The automated translation of human languages has a core where the same concepts are expressed in many different scripts and encodings, but the same ideas to many humans. A few hundred human languages all saying the same thing, only need translation between the local and global, then everyone use the global for sharing. “man” “woman” “sky” “walk”, “speed of light and gravity”. With only a few million GOTs, humans can learn and use it and not be locked to hidden codes made up by competing financial interests. Date formats, dates, events. Names units and dimensions with named (SI prefixes and widely used values like the speed of light). Place names and event names. If you use longer pre-identified strings “statue of liberty (the place)” that is something humans can use for remembering. Human names, surnames, groups names. We have these in all languages but there are groups who work on these things globally. They just do not have tools and conscientious AIs to help them. “electrical engineers”, “electrical engineering”. The human names ought to at least get you in the ball park.
 
Rather than maintain an infinite number of local (corpus by corpus, collection by collection “token” or “standards” or “words”) global agreement on a few “earth” “moon” “sun” gets you to the physical location and volume where most of that happens and there you can create objects with all the linked parameters, data, measurements, algorithms, tools, methods, “stuff”, databases, people. Just count the interactions and translations required. If you try to maintain millions of standards, everyone doing their own thing, those become single points of failure and manipulation (literally humans fiddling things for their own gain).
 
AIs with permanent memory and identities can be tracked. If a few billion AI entities (with human and beyond mix of skills and experiences) are added, and they are tracked by their individually created memories, that is manageable. You give them names and identities and simply treat them as a new species. But if you try to have humans keep track of all the flows, there will be people and AIs doing that, but humans will need to use words for a while longer.
 
“speed of light and gravity” is the simplest example I know. You can find “speed of light” and “2.99792458E8 meters/second” in many locations in those HuggingFace corpuses that are getting too big. And in the real world.
 
(“speed of light”) with about 91 Million entry points today, is a unique query, even if Google search will not share those results. And HuggingFace is not set up to share with all humans. But parsing that query result with whatever is stored now, give you a list of humans, organization names, dates, place names, concepts, that ought to be in human global standard form (human and machine readable) and have associated AI (machine readable and human readable) things attached, If the speed of light weather reports start, where attoSecond variations are reported at each place, but still some global and Mars and Moon standards are use, keeping it all in standard tokens lets all the people and AIs involved be known, kept in community and informed if changes are made.
 
This is already emerging. But when schools and organizations gather stuff and organize it, their current habit is the use it for dominance and self enrichment. Check your own motives (make money, become famous, influence people, have power and control of things, family, nature, art, beauty, whatever).
 
I do not know you. You certainly have never heard about me. I keep repeating that 8 billion human lives and lifetime experiences perfectly recorded (AIs can do that now or likely some will). One human per second takes 8E9 humans/(31,557,600 seconds per year) is 253.504702512 MeanSolarYears. And “MeanSolarYears” can have many aliases in every human language and in different Science Technology Engineering Mathematics Computing Finance Government TopicGroups IssueGroups OtherCollections.
 
With the Internet Foundation, every day for the last 25 years to trace out all knowledge and activities on the Internet. There is massive waste and inefficiency in all the STEMC-FGTIO areas. ( Science Technology Engineering Mathematics Computing Finance Government TopicGroups IssueGroups OtherCollections.) It takes humans decades to memorize and master tiny bits of text material, but massive amounts of personal sensory data. Things would be done in hours that now take tens of thousands of humans decades.
 
Global Open Data for Cameras
Global Open Data for STEMC
Global Open Place Names
Global Open Human Names
Global Open Data about Armed Conflicts
Global Open Data about Emerging Famines
Global Open Names of Unique Humans
Global Open Music
Global Open Videos
Global Open Equations
 
I am not able to use now what will be the “words and concepts of the future” for “all heliospheric projects and activities”
 
Take all the PDFs (and other formats) on the Internet now and they can be globally and universally (solar system and beyond) tokenized in real tokens, not bits and pieces of text which are arbitrary. There can be “SI core data and methods” “Standard Internet”, not “Systeme International”.
 
It all will not fit in a small box, but what humans have distilled and recorded (the mostly copyrighted stuff) the common elements can be globally and heliospherically shared as a means of communicating in the large.
 
Richard Collins, The Internet Foundation
Richard K Collins

About: Richard K Collins

The Internet Foundation Internet policies, global issues, global open lossless data, global open collaboration


Leave a Reply

Your email address will not be published. Required fields are marked *