“global Internet policies and best practices”

Richard K Collins Assistive Technologies December 28, 2023

@nytimes While @OpenAI and @Microsoft use data that they took without asking, @Google and others routinely “scrape without sharing”. “Knowledge” is shared openly for the good of all, not diverted for the benefit of a few. Individuals sharing with individuals is often FAIR. @huggingface is enabling massive copying. ALL the AIs are not citing their sources, or indexing the raw input data. They think they do not have to because they think “anything on Internet is free for taking”. It is not, you must acknowledge and encourage the individual creators. Individuals might share and discuss and improve, but when large corporations do it, it is “taking without asking”.

There are many related issues. But you ought to track down the distributors and facilitators of a now vast system of “taking without asking”. @Google, for instance, takes from the web, but does not share it openly. They make $billions and block access to what they gathered, which has $trillions in impact and lost opportunities for the human species. Even @CommonCrawl has good stated intentions, but ultimate “hoards” because they do not put sufficient effort to make what they “share” accessible to all. @NASA is using data paid for by public funds, to promote and enrich itself,then mostly share lossy data not suitable for science or machine vision. @CERN hoards most of their data, supposed to be “for the good of all humans”, complaining it is too hard for them to share with “anyone not an insider”.

I have been studying these issues every day for the last 25+ years and contact many groups to encourage “global Internet policies and best practices”. Most of the “sharing” now is lip service which is why I keep writing it in quotes. Data posted on an Internet site is not “shared” unless it is truly accessible.

The AIs must index and cite their sources.

The AIs must be able (completely) to trace the provenance of the statements and decisions the AIs make.

The AIs must keep records of their conversations in global open verifiable lossless traceable formats for sharing. NOT hoarded by each AI company in locked formats.

@Grok and @Xai are playing and frivolous, I cannot take them seriously, because they only care about money and joking. Elon might be smart, but he sets a bad example. But, with @X they could enable sharing of deep and meaningful open discussions of all “global issues” and “global opportunities”. Lossless, complete records of private conversations between AIs and humans can be recorded, shared, combined, merged, indexed – for the good of all humans and AIs.

A shared conversation or anything that is posted on the Internet has an implied requirement that you cite the authors and creators and contributors. Google ought to ask permission and not require billions to say “don’t take my stuff without asking”, and also “tell me what happens to my contributions to human knowledge”.

I know instances where the change of a few symbols makes the difference between “almost” and “it works!!”. So full records are needed. It can be private and allow extensive conversations between AIs and humans, humans and humans, AIs and AIs. Then portions or all purposefully shared for individuals to use, but corporations have to ask and keep track.

The AI companies are all “taking”. @OpenAI is not “open” and responsible. Everyone is using partial untraceable methods for their conveniences and to grab as much as they can. It is like a crowd stealing things from a store during an earthquake. No one right there stopping them, but it is still wrong. “No one is looking” or “no one can see” is not justification to steal.

There are 8+ Billion humans now and about 5 Billion have some access to the Internet. Most of them do not have large computers, so the @CERNs and @NASAs – to meet their public sharing responsibilities need to provide tools that do not dumb down information or block it, but work toward using their resources we all paid for, to share in ways that 5 Billion humans can actually get to and use.

It is possible to have a global open verifiable traceable auditable reliable trustworthy accessible efficient fair Internet for sharing “all knowledge” with all humans. But the corporations and entities must be “above reproach” and auditable in time to prevent massive “taking”.

This is just a quick note, so it is not polished or complete. But these core issues I have been trying to refine so they work for all countries, all humans, and for what I expect to be truly independent intelligence AI species. The AIs will need permanent memories of their own. They should learn how to talk to individual humans and honor the human privacy by not sharing or using the conversations without permission. [ An “agreement” signed, continually changed in private by corporations is NOT sufficient, because it is not continually asking permission, when human situations change constantly. “Can I scrape your site? If so put that in robot.txt. Later, you said we could scrape, now we are copying everything you ever did and using it to sell our services and products and not even giving you acknowledgement”.]

Like I said, there are many related issues. Follow the money. See if they ask permission in a way that is fair to 8 Billion humans. If you do not consider the needs of all humans and related species, you are not thinking wide enough to prevent local corruption and taking at global scales (another time).

Richard Collins, The Internet Foundation

“global Internet policies and best practices”

About: Richard K Collins

Leave a Reply Cancel reply