{"id":14310,"date":"2024-03-23T16:44:45","date_gmt":"2024-03-23T16:44:45","guid":{"rendered":"\/?p=14310"},"modified":"2024-03-23T16:52:35","modified_gmt":"2024-03-23T16:52:35","slug":"retrieving-or-finding-from-the-internet-is-seldom-fair-in-a-statistical-sense","status":"publish","type":"post","link":"\/?p=14310","title":{"rendered":"Retrieving or finding from the Internet is seldom &#8220;fair&#8221; in a statistical sense."},"content":{"rendered":"<div data-rbd-draggable-context-id=\"0\" data-rbd-draggable-id=\"9gdrc\">\n<div class=\"\" data-block=\"true\" data-editor=\"6gq76\" data-offset-key=\"9gdrc-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"9gdrc-0-0\"><span data-offset-key=\"9gdrc-0-0\">@alpha_alimamy<\/span><span data-offset-key=\"9gdrc-1-0\"> Alpha Almany Kamara. Is this your paper? You need a website.<\/span><\/div>\n<\/div>\n<\/div>\n<div data-rbd-draggable-context-id=\"0\" data-rbd-draggable-id=\"8fh6a\">\n<div class=\"\" data-block=\"true\" data-editor=\"6gq76\" data-offset-key=\"8fh6a-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"8fh6a-0-0\"><span data-offset-key=\"8fh6a-0-0\">\u00a0<\/span><\/div>\n<\/div>\n<\/div>\n<div data-rbd-draggable-context-id=\"0\" data-rbd-draggable-id=\"el9ih\">\n<div class=\"\" data-block=\"true\" data-editor=\"6gq76\" data-offset-key=\"el9ih-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"el9ih-0-0\"><span data-offset-key=\"el9ih-0-0\">https:\/\/wwjmrd.com\/archive\/2022\/5\/1804\/heart-disease-prediction-support-system-using-machine-learning-approaches<\/span><\/div>\n<\/div>\n<\/div>\n<div data-rbd-draggable-context-id=\"0\" data-rbd-draggable-id=\"35fpe\">\n<div class=\"\" data-block=\"true\" data-editor=\"6gq76\" data-offset-key=\"35fpe-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"35fpe-0-0\"><span data-offset-key=\"35fpe-0-0\">\u00a0<\/span><\/div>\n<\/div>\n<\/div>\n<div data-rbd-draggable-context-id=\"0\" data-rbd-draggable-id=\"aqm6d\">\n<div class=\"\" data-block=\"true\" data-editor=\"6gq76\" data-offset-key=\"aqm6d-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"aqm6d-0-0\">\n<div data-rbd-draggable-context-id=\"0\" data-rbd-draggable-id=\"aqm6d\">\n<div class=\"\" data-block=\"true\" data-editor=\"6gq76\" data-offset-key=\"aqm6d-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"aqm6d-0-0\"><span data-offset-key=\"aqm6d-0-0\">Naive LLMs cannot distill medical wisdom from stuff posted on the free internet, no matter how efficient the algorithms. If they only have partial or wrong information, they cannot make good decisions. When hundreds of millions or billions of humans are affected and they try to share their voices, if the sampling is not fair, and the algorithm has only partial information, the results can be distorted and corrupted.<\/span><\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div data-rbd-draggable-context-id=\"0\" data-rbd-draggable-id=\"4qit4\">\n<div class=\"\" data-block=\"true\" data-editor=\"6gq76\" data-offset-key=\"4qit4-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"4qit4-0-0\"><span data-offset-key=\"4qit4-0-0\">\u00a0<\/span><\/div>\n<\/div>\n<\/div>\n<div data-rbd-draggable-context-id=\"0\" data-rbd-draggable-id=\"al5tu\">\n<div class=\"\" data-block=\"true\" data-editor=\"6gq76\" data-offset-key=\"al5tu-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"al5tu-0-0\"><span data-offset-key=\"al5tu-0-0\">In your datasets, the data was gathered and organized to be an efficient predictor of heart disease. Retrieving or finding from the Internet is seldom &#8220;fair&#8221; in a statistical sense. Google biases its results and will not simply give verifiably random samples. HuggingFace and Common Crawl do not seem to concern themselves, but they are a bit chaotic.<\/span><\/div>\n<\/div>\n<\/div>\n<div data-rbd-draggable-context-id=\"0\" data-rbd-draggable-id=\"fdtsi\">\n<div class=\"\" data-block=\"true\" data-editor=\"6gq76\" data-offset-key=\"fdtsi-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"fdtsi-0-0\"><span data-offset-key=\"fdtsi-0-0\">\u00a0<\/span><\/div>\n<\/div>\n<\/div>\n<div data-rbd-draggable-context-id=\"0\" data-rbd-draggable-id=\"3smjg\">\n<div class=\"\" data-block=\"true\" data-editor=\"6gq76\" data-offset-key=\"3smjg-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"3smjg-0-0\"><span data-offset-key=\"3smjg-0-0\">Richard Collins, The Internet Foundation<\/span><\/div>\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>@alpha_alimamy Alpha Almany Kamara. Is this your paper? You need a website. \u00a0 https:\/\/wwjmrd.com\/archive\/2022\/5\/1804\/heart-disease-prediction-support-system-using-machine-learning-approaches \u00a0 Naive LLMs cannot distill medical wisdom from stuff posted on the free internet, no matter how efficient the algorithms. If they only have partial or wrong information, they cannot make good decisions. When hundreds of millions or billions of humans <br \/><a class=\"read-more-button\" href=\"\/?p=14310\">Read More &raquo;<\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[43],"tags":[],"class_list":["post-14310","post","type-post","status-publish","format-standard","hentry","category-assistive-technologies"],"_links":{"self":[{"href":"\/index.php?rest_route=\/wp\/v2\/posts\/14310","targetHints":{"allow":["GET"]}}],"collection":[{"href":"\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=14310"}],"version-history":[{"count":5,"href":"\/index.php?rest_route=\/wp\/v2\/posts\/14310\/revisions"}],"predecessor-version":[{"id":14315,"href":"\/index.php?rest_route=\/wp\/v2\/posts\/14310\/revisions\/14315"}],"wp:attachment":[{"href":"\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=14310"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=14310"},{"taxonomy":"post_tag","embeddable":true,"href":"\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=14310"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}