Big Data does not have to mean bully and bragging

Big Data Biology Lab @BigDataBiology
New preprint: A catalogue of small proteins from the global microbiome!

As part of our ongoing efforts to understand small proteins in prokaryotes, we catalogued almost 1 billion sequences

https://doi.org/10.1101/2023.12.27.573469
Replying to @BigDataBiology


Thank you for this effort. When you put the same text in every row of a file, that be be factored into a simple header. TSV supports headers for provenance, citations, and other important information. But fixed format is not much harder to support (more compact), if you put a header. TSV can support multiple tables, sub-tables and summary statistical reports.
 
You do not have to go to the most primitive TSVs.
 
Most people using the Internet do not have supercomputers, nor access to them. So reducing a file by half or 70% might make a big difference. Javascript can handle fairly large text data structures. Many text files zipped or gz’d are as small as their binary counterparts and globally accessible while diffusion of binary format data on the Internet usually stops after one or two steps.
 
I tried your web interface. You need to learn how to use statistical profiles to introduce your viewers and users to the whole dataset, for most people on the Internet (about 5 billion) who have not memorized or seen your codes, identifiers and methods, but might well completely understand life forms, biochemistry, variations, kinetics and evolution. There are many smart people in the 8 Billion humans today. Please do not dump huge files and a few screens of stuff, when introducing them to the variety of life and chemistry.
 

Statistics, Statistics, Statistics – Small datasets first. Sizes for everything. Think before you dump log files suitable for supercomputers on groups worldwide who do not have the liberal resources you take for granted. Empathy and care matter.

Big Data does not have to mean, “We have big computers and you do not, we will dump it a form that takes us two minutes of effort, but all small users days or weeks.” Big should mean compassionate, not bully or bragging.

Richard K Collins

About: Richard K Collins

The Internet Foundation Internet policies, global issues, global open lossless data, global open collaboration


Leave a Reply

Your email address will not be published. Required fields are marked *