{"id":60334,"date":"2024-06-25T18:27:51","date_gmt":"2024-06-25T22:27:51","guid":{"rendered":"http:\/\/pappp.net\/?guid=fa11044b9a6170a5719083c5ce12d36c"},"modified":"2024-06-25T18:27:51","modified_gmt":"2024-06-25T22:27:51","slug":"researchers-upend-ai-status-quo-by-eliminating-matrix-multiplication-in-llms","status":"publish","type":"post","link":"https:\/\/pappp.net\/?p=60334","title":{"rendered":"Researchers upend AI status quo by eliminating matrix multiplication in LLMs"},"content":{"rendered":"<p class=\"syndicated-attribution\">Source: <a href=\"https:\/\/arstechnica.com\/?p=2033314\">Ars Technica<\/a><\/p>\n<div style=\"background-color : #fff7d5;\n\t\t\tborder-width : 1px; padding : 5px; border-style : dashed; border-color : #e7d796;margin-bottom : 1em; color : #9a8c59;\">Article note: It isn't really matrix-math-free, it's just matrices of ternaries.  \nThat said, I'm a fan of small-range systems for sloppy approximators, (-1,0,1) ternaries map well to LLMs, and not using huge, expensive, power-hungry monstrosities for dumb bullshit is a win for everyone.<\/div><div>\n<figure>\n  <img src=\"https:\/\/cdn.arstechnica.net\/wp-content\/uploads\/2024\/06\/AI_lightbulb-800x450.jpg\" alt=\"Illustration of a brain inside of a light bulb.\" referrerpolicy=\"no-referrer\" loading=\"lazy\"\/>\n      <p><a href=\"https:\/\/cdn.arstechnica.net\/wp-content\/uploads\/2024\/06\/AI_lightbulb.jpg\" rel=\"noopener noreferrer\">Enlarge<\/a> (credit: <a rel=\"noopener noreferrer\" href=\"https:\/\/www.gettyimages.com\/detail\/photo\/artificial-intelligence-domination-light-bulb-brain-royalty-free-image\/1985871636\">Getty Images<\/a>)<\/p>  <\/figure>\n\n\n\n\n\n\n<div><a name=\"page-1\"><\/a><\/div>\n<p>Researchers claim to have developed a new way to run AI language models more efficiently by eliminating matrix multiplication from the process. This fundamentally redesigns neural network operations that are currently accelerated by GPU chips. The findings, detailed in a <a href=\"https:\/\/arxiv.org\/abs\/2406.02528\" rel=\"noopener noreferrer\">recent preprint paper<\/a> from researchers at the University of California Santa Cruz, UC Davis, LuxiTech, and Soochow University, could have deep implications for the <a href=\"https:\/\/arstechnica.com\/ai\/2024\/06\/is-generative-ai-really-going-to-wreak-havoc-on-the-power-grid\/\" rel=\"noopener noreferrer\">environmental impact<\/a> and operational costs of AI systems.<\/p>\n\n<p><a href=\"https:\/\/en.wikipedia.org\/wiki\/Matrix_multiplication\" rel=\"noopener noreferrer\">Matrix multiplication<\/a> (often abbreviated to \"MatMul\") is at the <a href=\"https:\/\/arstechnica.com\/information-technology\/2022\/10\/deepmind-breaks-50-year-math-record-using-ai-new-record-falls-a-week-later\/\" rel=\"noopener noreferrer\">center<\/a> of most neural network computational tasks today, and GPUs are particularly good at executing the math quickly because they can perform large numbers of multiplication operations in parallel. That ability momentarily made Nvidia the <a href=\"https:\/\/www.wsj.com\/tech\/ai\/nvidias-ascent-to-most-valuable-company-has-echoes-of-dot-com-boom-dd836c90\" rel=\"noopener noreferrer\">most valuable company<\/a> in the world last week; the company currently holds an estimated <a href=\"https:\/\/www.hpcwire.com\/2024\/06\/10\/nvidia-shipped-3-76-million-data-center-gpus-in-2023-according-to-study\/\" rel=\"noopener noreferrer\">98 percent market share<\/a> for data center GPUs, which are commonly used to power AI systems like <a href=\"https:\/\/arstechnica.com\/information-technology\/2023\/11\/chatgpt-was-the-spark-that-lit-the-fire-under-generative-ai-one-year-ago-today\/\" rel=\"noopener noreferrer\">ChatGPT<\/a> and <a href=\"https:\/\/arstechnica.com\/information-technology\/2023\/12\/google-launches-gemini-a-powerful-ai-model-it-says-can-surpass-gpt-4\/\" rel=\"noopener noreferrer\">Google Gemini<\/a>.<\/p>\n<p>In the new paper, titled \"Scalable MatMul-free Language Modeling,\" the researchers describe creating a custom 2.7 billion parameter model without using MatMul that features similar performance to conventional large language models (LLMs). They also demonstrate running a 1.3 billion parameter model at 23.8 tokens per second on a GPU that was accelerated by a custom-programmed <a href=\"https:\/\/en.wikipedia.org\/wiki\/Field-programmable_gate_array\" rel=\"noopener noreferrer\">FPGA<\/a> chip that uses about 13 watts of power (not counting the GPU's power draw). The implication is that a more efficient FPGA \"paves the way for the development of more efficient and hardware-friendly architectures,\" they write.<\/p><\/div><p><a href=\"https:\/\/arstechnica.com\/?p=2033314#p3\" rel=\"noopener noreferrer\">Read 13 remaining paragraphs<\/a> | <a href=\"https:\/\/arstechnica.com\/?p=2033314&amp;comments=1\" rel=\"noopener noreferrer\">Comments<\/a><\/p>","protected":false},"excerpt":{"rendered":"<p>Enlarge (credit: Getty Images)  <\/p>\n<p>Researchers claim to have developed a new way to &#8230;<\/p>\n<p> <a href=\"https:\/\/pappp.net\/?p=60334\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[226],"tags":[],"class_list":["post-60334","post","type-post","status-publish","format-standard","hentry","category-news-2"],"_links":{"self":[{"href":"https:\/\/pappp.net\/index.php?rest_route=\/wp\/v2\/posts\/60334","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/pappp.net\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/pappp.net\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/pappp.net\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/pappp.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=60334"}],"version-history":[{"count":0,"href":"https:\/\/pappp.net\/index.php?rest_route=\/wp\/v2\/posts\/60334\/revisions"}],"wp:attachment":[{"href":"https:\/\/pappp.net\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=60334"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/pappp.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=60334"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/pappp.net\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=60334"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}