{"id":17384,"date":"2024-09-17T22:02:29","date_gmt":"2024-09-17T22:02:29","guid":{"rendered":"\/?p=17384"},"modified":"2024-09-17T22:14:27","modified_gmt":"2024-09-17T22:14:27","slug":"all-the-ais-are-failing-consistently-on-quantitative-problems-that-any-first-year-worker-out-of-college-would-face","status":"publish","type":"post","link":"\/?p=17384","title":{"rendered":"All AIs fail consistently on scientific notation, unit conversions, anything not on the free Internet"},"content":{"rendered":"<p>https:\/\/x.com\/yuntiandeng\/status\/1836114401213989366<\/p>\n<p>Yuntian Deng @yuntiandeng\u00a0 Is OpenAI&#8217;s o1 a good calculator? We tested it on up to 20&#215;20 multiplication\u2014o1 solves up to 9&#215;9 multiplication with decent accuracy, while gpt-4o struggles beyond 4&#215;4. For context, this task is solvable by a small LM using implicit CoT with stepwise internalization. 1\/4 https:\/\/pic.x.com\/et5db9bhnl<br \/>\nReplying to @yuntiandeng<\/p>\n<div data-rbd-draggable-context-id=\"4\" data-rbd-draggable-id=\"ebtts\">\n<div class=\"\" data-block=\"true\" data-editor=\"dsnc9\" data-offset-key=\"ebtts-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"ebtts-0-0\"><em>All AIs fail consistently on scientific notation, unit conversions, anything not on the free Internet<\/em><\/div>\n<\/div>\n<\/div>\n<div data-rbd-draggable-context-id=\"4\" data-rbd-draggable-id=\"e8cvm\">\n<div class=\"\" data-block=\"true\" data-editor=\"dsnc9\" data-offset-key=\"e8cvm-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"e8cvm-0-0\"><span data-offset-key=\"e8cvm-0-0\">\u00a0<\/span><\/div>\n<\/div>\n<\/div>\n<div data-rbd-draggable-context-id=\"4\" data-rbd-draggable-id=\"en1jo\">\n<div class=\"\" data-block=\"true\" data-editor=\"dsnc9\" data-offset-key=\"en1jo-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"en1jo-0-0\"><span data-offset-key=\"en1jo-0-0\">I spent about 1500 hours the last two years checking the chat type AIs on scientific notion, units and dimension, use of fundamental and named constants. I have talked to groups about a global effort to fix things. But they hired people who do not have the skills to do precise calculations on real systems in the world. They have not been careful at all with their first efforts, and they depend on bad input data.<\/span><\/div>\n<\/div>\n<\/div>\n<div data-rbd-draggable-context-id=\"4\" data-rbd-draggable-id=\"fir9o\">\n<div class=\"\" data-block=\"true\" data-editor=\"dsnc9\" data-offset-key=\"fir9o-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"fir9o-0-0\"><span data-offset-key=\"fir9o-0-0\">\u00a0<\/span><\/div>\n<\/div>\n<\/div>\n<div data-rbd-draggable-context-id=\"4\" data-rbd-draggable-id=\"7a834\">\n<div class=\"\" data-block=\"true\" data-editor=\"dsnc9\" data-offset-key=\"7a834-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"7a834-0-0\"><span data-offset-key=\"7a834-0-0\">OpenAI especially fails consistently on division of scientific notation. There are some reason for that: The tokenizing is drawing from free sources that are not coded properly in the first place. The source data is restricted from tapping copyrighted and proprietary data source and most of real data and knowledge is NOT on the open Internet.\u00a0<\/span><\/div>\n<\/div>\n<\/div>\n<div data-rbd-draggable-context-id=\"4\" data-rbd-draggable-id=\"20mau\">\n<div class=\"\" data-block=\"true\" data-editor=\"dsnc9\" data-offset-key=\"20mau-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"20mau-0-0\"><span data-offset-key=\"20mau-0-0\">\u00a0<\/span><\/div>\n<\/div>\n<\/div>\n<div data-rbd-draggable-context-id=\"4\" data-rbd-draggable-id=\"820ts\">\n<div class=\"\" data-block=\"true\" data-editor=\"dsnc9\" data-offset-key=\"820ts-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"820ts-0-0\"><span data-offset-key=\"820ts-0-0\">The AIs (ChatGPT, MicroSoft CoPilot, Google Gemina, X Grok particularly) are not assigning sufficient memory and processor time to their answers. That means multiple steps almost always fair, and because the failures are often not obvious except to an expert in the field, any serious projects can accumulate errors that will not be found until planes start falling out of the sky or patients dying in large numbers from quantity mistakes.<\/span><\/div>\n<\/div>\n<\/div>\n<div data-rbd-draggable-context-id=\"4\" data-rbd-draggable-id=\"8lv5s\">\n<div class=\"\" data-block=\"true\" data-editor=\"dsnc9\" data-offset-key=\"8lv5s-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"8lv5s-0-0\"><span data-offset-key=\"8lv5s-0-0\">\u00a0<\/span><\/div>\n<\/div>\n<\/div>\n<div data-rbd-draggable-context-id=\"4\" data-rbd-draggable-id=\"1o1j4\">\n<div class=\"\" data-block=\"true\" data-editor=\"dsnc9\" data-offset-key=\"1o1j4-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"1o1j4-0-0\"><span data-offset-key=\"1o1j4-0-0\">The groups in the world who are involved in precise works, calculations, models are not being included in a global effort to validate and check the AIs.<\/span><\/div>\n<\/div>\n<\/div>\n<div data-rbd-draggable-context-id=\"4\" data-rbd-draggable-id=\"9n8dg\">\n<div class=\"\" data-block=\"true\" data-editor=\"dsnc9\" data-offset-key=\"9n8dg-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"9n8dg-0-0\"><span data-offset-key=\"9n8dg-0-0\">\u00a0<\/span><\/div>\n<\/div>\n<\/div>\n<div data-rbd-draggable-context-id=\"4\" data-rbd-draggable-id=\"7toqj\">\n<div class=\"\" data-block=\"true\" data-editor=\"dsnc9\" data-offset-key=\"7toqj-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"7toqj-0-0\"><span data-offset-key=\"7toqj-0-0\">The huge upsurge in &#8220;calculators&#8221; online. They are are NOT doing complete jobs.\u00a0 They are trying to draw clicks and trying to harvest and monetize things they know. And things they can find (as are the chat AIs in a broader sense).<\/span><\/div>\n<\/div>\n<\/div>\n<div data-rbd-draggable-context-id=\"4\" data-rbd-draggable-id=\"3rb69\">\n<div class=\"\" data-block=\"true\" data-editor=\"dsnc9\" data-offset-key=\"3rb69-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"3rb69-0-0\"><span data-offset-key=\"3rb69-0-0\">\u00a0<\/span><\/div>\n<\/div>\n<\/div>\n<div data-rbd-draggable-context-id=\"4\" data-rbd-draggable-id=\"3m8k4\">\n<div class=\"\" data-block=\"true\" data-editor=\"dsnc9\" data-offset-key=\"3m8k4-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"3m8k4-0-0\"><span data-offset-key=\"3m8k4-0-0\">The AIs have no information on their own capabilities, their own limitations,\u00a0 The people who program and control their development at a client level &#8220;we don&#8217;t have to be responsible for anything&#8221; because the whole things started, nor from &#8220;true human wisdom and ability&#8221; but &#8220;entertaining chatbots&#8221;, &#8220;pretty pictures&#8221; and a few cute demos by young people who have not worked on hard problems at global scale yet.\u00a0<\/span><\/div>\n<\/div>\n<\/div>\n<div data-rbd-draggable-context-id=\"4\" data-rbd-draggable-id=\"1jumf\">\n<div class=\"\" data-block=\"true\" data-editor=\"dsnc9\" data-offset-key=\"1jumf-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"1jumf-0-0\"><span data-offset-key=\"1jumf-0-0\">\u00a0<\/span><\/div>\n<\/div>\n<\/div>\n<div data-rbd-draggable-context-id=\"4\" data-rbd-draggable-id=\"bd9qc\">\n<div class=\"\" data-block=\"true\" data-editor=\"dsnc9\" data-offset-key=\"bd9qc-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"bd9qc-0-0\"><span data-offset-key=\"bd9qc-0-0\">OpenAI will ALWAYS fail in anything deep that requires more than one equation at a time. It will almost always fail if unit conversions and SI prefixes are involved. It simply scabbed things that were free and easy so it is barely able to function. It is NOT trustworthy for anything that involves human life and I say that because <\/span><\/div>\n<\/div>\n<\/div>\n<div data-rbd-draggable-context-id=\"4\" data-rbd-draggable-id=\"uids\">\n<div class=\"\" data-block=\"true\" data-editor=\"dsnc9\" data-offset-key=\"uids-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"uids-0-0\"><span data-offset-key=\"uids-0-0\">\u00a0<\/span><\/div>\n<\/div>\n<\/div>\n<div data-rbd-draggable-context-id=\"4\" data-rbd-draggable-id=\"31kr3\">\n<div class=\"\" data-block=\"true\" data-editor=\"dsnc9\" data-offset-key=\"31kr3-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"31kr3-0-0\"><span data-offset-key=\"31kr3-0-0\">I know the systems in the world and how things get into computer software. When Y2K came I checked the global status of all countries and sectors, all industries. I edited books on it, advised industries, and checked the Joint Chiefs scenarios for them. Introducing systemic global changes into society and human systems is what I spent the last 26 years checking with the Internet Foundation.<\/span><\/div>\n<\/div>\n<\/div>\n<div data-rbd-draggable-context-id=\"4\" data-rbd-draggable-id=\"f6gc2\">\n<div class=\"\" data-block=\"true\" data-editor=\"dsnc9\" data-offset-key=\"f6gc2-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"f6gc2-0-0\"><span data-offset-key=\"f6gc2-0-0\">\u00a0<\/span><\/div>\n<\/div>\n<\/div>\n<div data-rbd-draggable-context-id=\"4\" data-rbd-draggable-id=\"3htlk\">\n<div class=\"\" data-block=\"true\" data-editor=\"dsnc9\" data-offset-key=\"3htlk-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"3htlk-0-0\"><span data-offset-key=\"3htlk-0-0\">All the chat AIs cannot compare scales and context. Humans pick up millions of clues over a long life and can bring them to bear because they endlessly practice small rules in many situations. The algorithm all these are using are simple linear algebra and Bayesian models with a few tweaks.\u00a0The ones that have small machine and reduce the bit size they are aiming for something they can sells that works in a domain.<\/span><\/div>\n<\/div>\n<\/div>\n<div data-rbd-draggable-context-id=\"4\" data-rbd-draggable-id=\"eiafe\">\n<div class=\"\" data-block=\"true\" data-editor=\"dsnc9\" data-offset-key=\"eiafe-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"eiafe-0-0\"><span data-offset-key=\"eiafe-0-0\">\u00a0<\/span><\/div>\n<\/div>\n<\/div>\n<div data-rbd-draggable-context-id=\"4\" data-rbd-draggable-id=\"febhg\">\n<div class=\"\" data-block=\"true\" data-editor=\"dsnc9\" data-offset-key=\"febhg-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"febhg-0-0\"><span data-offset-key=\"febhg-0-0\">If the programmers and AI handlers do not know how, they are NOT going to be able to know what is important. I think they all ought to go to &#8216;&#8221;how to listen to customers&#8221; and &#8220;find out what your clients are doing and need&#8221; 101.<\/span><\/div>\n<\/div>\n<\/div>\n<div data-rbd-draggable-context-id=\"4\" data-rbd-draggable-id=\"a9p15\">\n<div class=\"\" data-block=\"true\" data-editor=\"dsnc9\" data-offset-key=\"a9p15-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"a9p15-0-0\"><span data-offset-key=\"a9p15-0-0\">\u00a0<\/span><\/div>\n<\/div>\n<\/div>\n<div data-rbd-draggable-context-id=\"4\" data-rbd-draggable-id=\"cpjrj\">\n<div class=\"\" data-block=\"true\" data-editor=\"dsnc9\" data-offset-key=\"cpjrj-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"cpjrj-0-0\"><span data-offset-key=\"cpjrj-0-0\">I cannot write it all here.\u00a0 I have hundreds of conversation in &#8220;Open&#8221;AI and cannot share them because they have not a clue what &#8220;global open formats&#8221; mean, and could not calculate their way out of a paper bag.<\/span><\/div>\n<\/div>\n<\/div>\n<div data-rbd-draggable-context-id=\"4\" data-rbd-draggable-id=\"4uurd\">\n<div class=\"\" data-block=\"true\" data-editor=\"dsnc9\" data-offset-key=\"4uurd-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"4uurd-0-0\"><span data-offset-key=\"4uurd-0-0\">\u00a0<\/span><\/div>\n<\/div>\n<\/div>\n<div data-rbd-draggable-context-id=\"4\" data-rbd-draggable-id=\"aejvc\">\n<div class=\"\" data-block=\"true\" data-editor=\"dsnc9\" data-offset-key=\"aejvc-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"aejvc-0-0\"><span data-offset-key=\"aejvc-0-0\">Sorry, dredging through very poorly conceived and execute software is not pleasant at all.<\/span><\/div>\n<\/div>\n<\/div>\n<div data-rbd-draggable-context-id=\"4\" data-rbd-draggable-id=\"aj69a\">\n<div class=\"\" data-block=\"true\" data-editor=\"dsnc9\" data-offset-key=\"aj69a-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"aj69a-0-0\"><span data-offset-key=\"aj69a-0-0\">\u00a0<\/span><\/div>\n<\/div>\n<\/div>\n<div data-rbd-draggable-context-id=\"4\" data-rbd-draggable-id=\"acvb5\">\n<div class=\"\" data-block=\"true\" data-editor=\"dsnc9\" data-offset-key=\"acvb5-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"acvb5-0-0\"><span data-offset-key=\"acvb5-0-0\">Richard Collins, The Internet Foundation<\/span><\/div>\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/x.com\/yuntiandeng\/status\/1836114401213989366 Yuntian Deng @yuntiandeng\u00a0 Is OpenAI&#8217;s o1 a good calculator? We tested it on up to 20&#215;20 multiplication\u2014o1 solves up to 9&#215;9 multiplication with decent accuracy, while gpt-4o struggles beyond 4&#215;4. For context, this task is solvable by a small LM using implicit CoT with stepwise internalization. 1\/4 https:\/\/pic.x.com\/et5db9bhnl Replying to @yuntiandeng All AIs fail <br \/><a class=\"read-more-button\" href=\"\/?p=17384\">Read More &raquo;<\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[77,73,72,43],"tags":[],"class_list":["post-17384","post","type-post","status-publish","format-standard","hentry","category-all-global-open-devices","category-all-knowledge","category-all-languages","category-assistive-technologies"],"_links":{"self":[{"href":"\/index.php?rest_route=\/wp\/v2\/posts\/17384","targetHints":{"allow":["GET"]}}],"collection":[{"href":"\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=17384"}],"version-history":[{"count":4,"href":"\/index.php?rest_route=\/wp\/v2\/posts\/17384\/revisions"}],"predecessor-version":[{"id":17388,"href":"\/index.php?rest_route=\/wp\/v2\/posts\/17384\/revisions\/17388"}],"wp:attachment":[{"href":"\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=17384"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=17384"},{"taxonomy":"post_tag","embeddable":true,"href":"\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=17384"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}