Policies and standards for collaboration on the Internet – DNA Genealogy, sharing and hoarding

https://x.com/RichardKCollin2/status/1848845422498111868

Policies and standards for collaboration on the Internet – DNA Genealogy, sharing and hoarding

Someone just asked me to help them with a brick wall. They are not related to me or to any of my DNA matches, or cases. They do not clearly identify themselves, have no age or location, and ALL their trees are locked. Writing in a private tree is like writing a dissertation and then burning it. NONE of the DNA matches benefit. If it is a hard problem and I solve it, it is stored in some locked vault like Scrooge McDuck. Do you understand how disgusting that is to me? So my rule is simple now, I won’t help anyone who locks their tree. I work to help the most people, not to gather and hoard for myself. I cannot take it with me, and I won’t last many more years. And I have been helping people desperate to find their families and 10% of trees are locked, sharing nothing, giving nothing, helping almost no one.

I would go so far to say that Ancestry and the Internet ought to have “Shared Public Trees” that anyone can link to. And those links are considered part of the tree of any DNA match. I don’t copy five generations back on some ancestor, I link to a person in public shared tree, and when the DNA robots do their thing, that tree (already compiled and verified because it is public and shared) is selected the person added as “public resource link”.

The software only needs to use a unique treeId and personId and to check if the tree has agreed to be a public resource.

If public trees link to each other, and you ancestry goes back to Adam and Eve somehow, those deeper trees are collaborative work areas and the people joining the work on those collection and link up or down to any public tree.

A person who puts in a lot of effort, they could say “I allow the dead people in this tree to be public shared” and that means their cousins and descendants and DNA groups working back in the 1700’s or earlier can link down to the present so all the descendants are connected.

It makes possible descendant groups so that your DNA matches can be linked – without duplicating the material. When a tree wants to publish parts as “public share” Ancestry and others can have AI assisted API based tools that can verify and complete a tree. Fill in the gaps, standardize ALL the dates, place names, names — without throwing anything away.

I came up with this general strategy in the last 26 years of the Internet Foundation, where “global open resources” can be examined deeply and caringly to share with all 5.4 Billion humans using the Internet. Ancestry is a closed group, but it has millions of DNA test takers and tree builders who should not have to waste decades for each person, just copying things that are problably right in to individual collections that are nearly impossible for anyone to maintain now. FamilySearch went too far and allows new people to change things. Wikipedia is bloated and makes things more obscure, and lets newbies completely trash old things. But curation does work sometimes.

I have only been at this every day for 26 years. So it is rough. But locked trees are bad for anyone wanting to find family. Innocent people can have their entire family locked by one person. I helped the Pilgrim Edward Doty Society put their 92,000 person core first five generations on Ancestry so there are NOT 10,000 copies or more of all those families. It does NOT have to be duplicated if the records that are found and the reasoning and possibilties are compiled by variations of algorithms that seem to work,

Ancestry does not even standardize dates. They allow everything and their comparison algorithms can never be powerful and deep enough to standardize on the fly every time there is a transaction or query or process. Same with place names.

The same problem affects all the “AI” and “LLM” (large language model) groups on the Internet. They scrape bad data, do not treat things like dates, places, equations, software, processes in a uniform way. So their method are discombobulations of partial and usually incomplete translations.

What do you think?

Richard Collins, The Internet Foundation

[ I have never used discombobulation in a sentence, I usually just say “a mess” or “house of cards” or “too many single points of failure”. ]

Richard K Collins

About: Richard K Collins

The Internet Foundation Internet policies, global issues, global open lossless data, global open collaboration


Leave a Reply

Your email address will not be published. Required fields are marked *