All LLMs and sites should use global open tokens for all human knowledge

jietang @jietang We have released GLM-4-520 and have the open-sourced version GLM-4-9B with superior performance beyond Llama-3-8B.
https://github.com/THUDM/GLM-4/blob/main/README_en.md https://pic.x.com/cmonog5nq5
Replying to @jietang


I hope you will spend more time improving LongBench and make it specific to all STEMC-FGOT communities on the Internet (Science Technology Engineering Mathematics Computing Finance Governence Organizations TopicGroups). And face, seriously, a fairness issue that all humans should share in global knowledge, not just the few that benefit financially, or the easy ones.
 
Some concepts and phenomena do not have “words” in all human languages, so animated gifs and deep navigation might be needed so all knowledge is accessible to all humans (and their AIs) in global open forms.

Create a global open resource that has all domain specific knowledge in a form for all human languages and all humans. When anyone adds new things they do not post it on their site, but register it first so it has a place in global open discussions and can be made accessible to all humans.

When Internet sites all tokenize all of their site from global open tokens, it is immediately accessible to all AIs, without going through search engines. And when sites use global open tokens they automatically can be translated to all human and domain specific languages. The tokens themselves are indexed, so all users of any token are indexed in many global groups. Efforts to make sense of (“mathematics”) on the Internet can tackle the whole of that domain on the Internet. But they cannot be allows to monopolized or personally benefit.

There are roughly 7000 human languages and many tens of thousands of domain specific languages. But “the sun” only has one global open token identifier and then links to the terms (and context specific) best translation in all.

“speed of light” would map to all places on the Internet that is used or discussed and it would map, in any place to the whole concept, not just be a bunch of unconnected text or symbols or images or content stored as many arbitrary fonts and character sets, or proprietary or ill supported formats.

 
Filed as (All LLMs and sites should use global open tokens for all human knowledge )
 
23 Jun 2024 is the 26th Anniversary of the Internet Foundation
 
Richard Collins, The Internet Foundation
Richard K Collins

About: Richard K Collins

The Internet Foundation Internet policies, global issues, global open lossless data, global open collaboration


Leave a Reply

Your email address will not be published. Required fields are marked *