Coding knowledge to “Standard Internet” can save years each for billions of learners

I spent much of the day tracing out software used by the Fu Ori teams mentioned in https://iopscience.iop.org/article/10.3847/1538-4357/ad31a1. It is badly organized by my lights (My first full time job was “scientific programmer” for satellite orbit determination in 1970) and I spent much of the last 26 years every day looking at ways to simplify the mess of software and data on the whole internet. So I should be able to reproduce there work, if they have the data open on the Internet. I am not sanguine. I have been a “secret shopper” for data and services on the Internet for 26 years to walk through how people are treated. Not for the faint hearted.
 
They used 45 twelve Meter, 11 seven Meter, and 3 “Total Power Arrays”. I did see location data in the source code but I think it is not right or well documented. I am fully capable of doing anything in STEMC, but if the papers are incomplete and the Internet broken, even someone with lots of background and experience will find it impossible to reproduce. I will see. I can devote a few weeks to it. I spent most of my time today, not simply tracing this one small effort. But rather analyzing GitHub to estimate its efficiency and fragility. They have built a large hours of cards, and many people depend on it. But they give no guarantees, and have use methods where the left hand does not see or know what the other hands are doing. The number of independent efforts – NOT coordinated except at microscale – is such that the only likely structure that can emerge is a termite mound, not a sleek and functional tool for the world. Much of what goes on at GitHub aims to create and maintain hegemony and monopolies of various sorts.
 
This is my story, I will write it as best I can.
 
Reading the paper, it is fairly clear. Having spent tens of thousands of hours tracing data and collaborative efforts on the Internet, many with geophysical and astrophysical datasets, I can see the meaning of data tags like
 
(Fu Ori distance at Gaia DR3; Bailer-Jones et al. 2021)
(FU Ori project code 2017.1.00015.S, PI: J Williams)
( Table 1. Summary of ALMA Observations )
141 KHz resolution for (12C)O (2–1)
122.070 KHz resolution for (13C)O (2–1) and C(18O) (2–1)
From the different masses, 184, 166 and 167 Meters/Second resolutions for velocities of (12C)O, (13C)O, C(18O) isotopic species.
 
I will translate it quietly. I will bet that 1000 groups like this writing their velocities and ranges will write them differently enough that human readers will have to take time to do it right. If they shared in lossless Standard Internet form, there would be NO ambiguity or delay. That is in the program setups or it would not run. But the program formats are as variant and ambiguous as writing precise values, units and dimensions in “for print” ambiguous and sometimes lazy formats. “Oh, we know what it is, and so and so did it, we trust him or here”.
 
Crap. Pardon me. “automated self-calibration module (J. Tobin et al., in preparation).” I have to find J Tobin and hope they will answer if I ask a question. Usually if they at a university, the email will block anyone from “outside”.  Universities vary, but “screw anyone outside” is common on the Internet. I have worked 14 hours today and I am just not filled with patience at the moment.

I will have to copy this into a text format, and write all the variables in standard units and dimensions. If I had the leisure to spend as much time as this whole team did, I might use sloppy methods and write things in my head or post-it notes or emails. But since I am one person, I use a formal methods of “compiling” the source data in HTML or PDF and then carefully putting the sloppy “data” from the paper in global open form, so anyone using SI units (Standard Internet) will know exactly what numbers go into the computer calculations.

“Continuum dust” uses 232.470000000 GHz center uses data from +/- (1.875000000/2) GHz range. I do that routinely to remind myself there are groups who work to tiny fractions of a Hertz. I work at nanoHertz often.
 
18.75E9 Hertz/( 141E3 Hertz) = 132,978.723 bins
18.75E9 Hertz/( 122.070E3 Hertz) = 153,600.393 bins
 
Steve Klosko at Wolfe Research and Development taught me to keep all numbers in double precision. That was about 1978/1979 when we were working on NASA geopotential models, calibrating gravitational potential models from satellite orbit, C band and laser tracking and some altimeter data. We used anything we could find. When I made good initial estimates, I had to be in a very narrow range for his full model to converge. And re-starting his model could waste huge amounts of computer time if you neglected to keep exact values through out.
(ALMA Science Pipeline (version 40896 of Pipeline-CASA51-P2-B) from ( CASA 5.1.1 (McMullin et al. 2007; CASA Team et al. 2022) at (casa.nrao.edu)
 
Parsing this kind of stuff from free form and non-standard text which varies from html to pdf and in every copy and abstract and summary – wastes time for me and millions of others. In the world, I usually estimate 1 in a thousand for most every STEMC plus Finance Government Organization NonProfits. So 5.3 Million could do this, if the PDF were not used but “global open formats”.
 

And those poor Large Language Model groups trying to write baby “AIs” they have no subject matter expertise or experience at all. And almost no experience running websites for millions of people. It all the PDFs and now many other document formats with STEMC and domain specific data and structures – were to be reverse engineered, it will be garbage mostly, as it is now. Code to global open standards, and all the search engines can do exact and unambiguous calculations. Those “Large Language Model Wrapper Programs (LWPs as I call them) can use exact calculations with the proper software and often directly tied to the groups and global open collaborations that maintain parts of global knowledge. The LWPs can be forced to learn to how to run things like CASA for ALL things astrophysical and space data related – so a young 11 year old human in a poor school can go online and learn as much as they want about spectroscopy and plasma and know it is the best in the world (or Heliosphere) and not just best in that school or city or county or state or country.

Filed as (Coding knowledge to “Standard Internet” can save years each for billions of learners) and make LWPs accurate, exact and lossless.
 
Richard Collins, The Internet Foundation
Richard K Collins

About: Richard K Collins

The Internet Foundation Internet policies, global issues, global open lossless data, global open collaboration


Leave a Reply

Your email address will not be published. Required fields are marked *