{"id":14233,"date":"2024-03-17T23:06:21","date_gmt":"2024-03-17T23:06:21","guid":{"rendered":"\/?p=14233"},"modified":"2024-03-17T23:12:22","modified_gmt":"2024-03-17T23:12:22","slug":"mixture-of-experts","status":"publish","type":"post","link":"\/?p=14233","title":{"rendered":"Mixture of Experts"},"content":{"rendered":"<p>Omar Sanseviero @osanseviero Grok weights are out. Download them <em>quickly<\/em> at https:\/\/huggingface.co\/xai-org\/grok-1<\/p>\n<p>huggingface-cli download xai-org\/grok-1 &#8211;repo-type model &#8211;include ckpt\/tensor* &#8211;local-dir checkpoints\/ckpt-0 &#8211;local-dir-use-symlinks False<\/p>\n<p>Learn about mixture of experts at https:\/\/hf.co\/blog\/moe<br \/>\nReplying to @osanseviero<\/p>\n<div data-rbd-draggable-context-id=\"3\" data-rbd-draggable-id=\"b7iqf\">\n<div class=\"\" data-block=\"true\" data-editor=\"aj64s\" data-offset-key=\"b7iqf-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"b7iqf-0-0\"><span data-offset-key=\"b7iqf-0-0\">It seems there is a conflict between saying &#8220;Grok-1 <\/span><strong>open<\/strong><span data-offset-key=\"b7iqf-0-2\">-weights model&#8221; and &#8220;Due to the large size of the model (314B parameters), a multi-GPU machine is required&#8221; on the same page. Your people need to learn that &#8220;open data&#8221; means it has to be accessible to all. A few weights at a time, statistical summaries, examples, open description, links and background, some effort at an open community. &#8220;Here, we dumped this on the web. See, we shared!!&#8221; &#8220;You have no way to verify it, because you do not have the big computers that we have!!&#8221;<\/span><\/div>\n<\/div>\n<\/div>\n<div data-rbd-draggable-context-id=\"3\" data-rbd-draggable-id=\"9u1ha\">\n<div class=\"\" data-block=\"true\" data-editor=\"aj64s\" data-offset-key=\"9u1ha-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"9u1ha-0-0\"><span data-offset-key=\"9u1ha-0-0\">\u00a0<\/span><\/div>\n<\/div>\n<\/div>\n<div data-rbd-draggable-context-id=\"3\" data-rbd-draggable-id=\"drn9g\">\n<div class=\"\" data-block=\"true\" data-editor=\"aj64s\" data-offset-key=\"drn9g-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"drn9g-0-0\">\n<p><span data-offset-key=\"drn9g-0-0\">Your &#8220;Mixture of Experts Explained&#8221; is very revealing. Yes, I looked at your &#8220;Open Source MoEs&#8221; project links.<\/span><\/p>\n<p>Are those few groups with big computers supposed to be the &#8220;Mixture of Experts&#8221;? It is not that hard if you actually share and use global open tokens.\u00a0 When I see the flopping door on the suborbital test, I know why you do things the way you do. Tell him to stop saying &#8220;open&#8221;.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<div data-rbd-draggable-context-id=\"3\" data-rbd-draggable-id=\"b709q\">\n<div class=\"\" data-block=\"true\" data-editor=\"aj64s\" data-offset-key=\"b709q-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"b709q-0-0\"><span data-offset-key=\"b709q-0-0\">\u00a0<\/span><\/div>\n<\/div>\n<\/div>\n<div data-rbd-draggable-context-id=\"3\" data-rbd-draggable-id=\"6spua\">\n<div class=\"\" data-block=\"true\" data-editor=\"aj64s\" data-offset-key=\"6spua-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"6spua-0-0\"><span data-offset-key=\"6spua-0-0\">Do you know that &#8220;Moe&#8221; in Japanese means &#8220;feelings of strong affection toward characters in anime, manga, video games, and other media &#8211; directed at the otaku markets&#8221; So I recommend you always write out &#8220;Mixture of Experts approach&#8221;<\/span><\/div>\n<\/div>\n<\/div>\n<p>Richard Collins, The Internet Foundation<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Omar Sanseviero @osanseviero Grok weights are out. Download them quickly at https:\/\/huggingface.co\/xai-org\/grok-1 huggingface-cli download xai-org\/grok-1 &#8211;repo-type model &#8211;include ckpt\/tensor* &#8211;local-dir checkpoints\/ckpt-0 &#8211;local-dir-use-symlinks False Learn about mixture of experts at https:\/\/hf.co\/blog\/moe Replying to @osanseviero It seems there is a conflict between saying &#8220;Grok-1 open-weights model&#8221; and &#8220;Due to the large size of the model (314B parameters), <br \/><a class=\"read-more-button\" href=\"\/?p=14233\">Read More &raquo;<\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[43],"tags":[],"class_list":["post-14233","post","type-post","status-publish","format-standard","hentry","category-assistive-technologies"],"_links":{"self":[{"href":"\/index.php?rest_route=\/wp\/v2\/posts\/14233","targetHints":{"allow":["GET"]}}],"collection":[{"href":"\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=14233"}],"version-history":[{"count":5,"href":"\/index.php?rest_route=\/wp\/v2\/posts\/14233\/revisions"}],"predecessor-version":[{"id":14238,"href":"\/index.php?rest_route=\/wp\/v2\/posts\/14233\/revisions\/14238"}],"wp:attachment":[{"href":"\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=14233"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=14233"},{"taxonomy":"post_tag","embeddable":true,"href":"\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=14233"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}