{"id":87641,"date":"2025-04-18T15:46:06","date_gmt":"2025-04-18T19:46:06","guid":{"rendered":"http:\/\/pappp.net\/?guid=69f9470589d580a52d44aa788601f708"},"modified":"2025-04-18T15:46:06","modified_gmt":"2025-04-18T19:46:06","slug":"microsofts-1%e2%80%91bit-ai-model-runs-on-a-cpu-only-while-matching-larger-systems","status":"publish","type":"post","link":"https:\/\/pappp.net\/?p=87641","title":{"rendered":"Microsoft\u2019s \u201c1\u2011bit\u201d AI model runs on a CPU only, while matching larger systems"},"content":{"rendered":"<p class=\"syndicated-attribution\">Source: <a href=\"https:\/\/arstechnica.com\/ai\/2025\/04\/microsoft-researchers-create-super%E2%80%91efficient-ai-that-uses-up-to-96-less-energy\/\">Ars Technica<\/a><\/p>\n<div style=\"background-color : #fff7d5;\n\t\t\tborder-width : 1px; padding : 5px; border-style : dashed; border-color : #e7d796;margin-bottom : 1em; color : #9a8c59;\">Article note: This is one of the only lines of AI research I'm excited about, and I've been excited since that 2023 paper.  Most of the even vaguely neuromorphic stuff should work approximately as well as with floats on essentially 1-2 bits per signal (basically just positive,negative, and maybe 0), and that should be _markedly_ cheaper compute-wise, making it likely to actually be worthwhile without the hype and burn-barrels full of investor money.\nI'm also my graduate advisor's academic offspring and find the idea of variable bit-width\/bitserial\/packed architectures generally intriguing, and this continuing to work out would favor that design family.<\/div><p>When it comes to actually storing the numerical weights that <a href=\"https:\/\/arstechnica.com\/science\/2023\/07\/a-jargon-free-explanation-of-how-ai-large-language-models-work\/\" rel=\"noopener noreferrer\">power a large language model's underlying neural network<\/a>, most modern AI models rely on the precision of 16- or 32-bit <a href=\"https:\/\/blog.demofox.org\/2017\/11\/21\/floating-point-precision\/\" rel=\"noopener noreferrer\">floating point numbers<\/a>. But that level of precision can come at the cost of large memory footprints (in the hundreds of gigabytes for the largest models) and significant processing resources needed for <a href=\"https:\/\/arstechnica.com\/information-technology\/2024\/03\/matrix-multiplication-breakthrough-could-lead-to-faster-more-efficient-ai-models\/\" rel=\"noopener noreferrer\">the complex matrix multiplication<\/a> used when responding to prompts.<\/p>\n<p>Now, researchers at Microsoft's <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/group\/general-artificial-intelligence\/\" rel=\"noopener noreferrer\">General Artificial Intelligence group<\/a> have <a href=\"https:\/\/huggingface.co\/microsoft\/bitnet-b1.58-2B-4T\" rel=\"noopener noreferrer\">released a new neural network model<\/a>&nbsp;that works with just three distinct weight values: -1, 0, or 1. Building on top of previous work Microsoft Research&nbsp;<a href=\"https:\/\/arxiv.org\/abs\/2310.11453\" rel=\"noopener noreferrer\">published in 2023<\/a>, the new model's \"ternary\" architecture reduces overall complexity and \"substantial advantages in computational efficiency,\" the researchers write, allowing it to <a href=\"https:\/\/github.com\/microsoft\/BitNet\" rel=\"noopener noreferrer\">run effectively on a simple desktop CPU<\/a>. And despite the massive reduction in weight precision, the researchers claim that the model \"can achieve performance comparable to leading open-weight, full-precision models of similar size across a wide range of tasks.\"<\/p>\n<h2>Watching your weights<\/h2>\n<p>The idea of simplifying model weights isn't a completely new one in AI research. For years, researchers have been experimenting with <a href=\"https:\/\/huggingface.co\/docs\/optimum\/en\/concept_guides\/quantization\" rel=\"noopener noreferrer\">quantization techniques<\/a> that squeeze their neural network weights into smaller memory envelopes. In recent years, the most extreme quantization efforts have <a href=\"https:\/\/arxiv.org\/abs\/2310.11453\" rel=\"noopener noreferrer\">focused on so-called \"BitNets\"<\/a> that represent each weight in a single bit (representing +1 or -1).<\/p><p><a href=\"https:\/\/arstechnica.com\/ai\/2025\/04\/microsoft-researchers-create-super%E2%80%91efficient-ai-that-uses-up-to-96-less-energy\/\" rel=\"noopener noreferrer\">Read full article<\/a><\/p>\n<p><a href=\"https:\/\/arstechnica.com\/ai\/2025\/04\/microsoft-researchers-create-super%E2%80%91efficient-ai-that-uses-up-to-96-less-energy\/#comments\" rel=\"noopener noreferrer\">Comments<\/a><\/p>","protected":false},"excerpt":{"rendered":"<p>When it comes to actually storing the numerical weights that power a large language m&#8230;<\/p>\n<p> <a href=\"https:\/\/pappp.net\/?p=87641\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[226],"tags":[],"class_list":["post-87641","post","type-post","status-publish","format-standard","hentry","category-news-2"],"_links":{"self":[{"href":"https:\/\/pappp.net\/index.php?rest_route=\/wp\/v2\/posts\/87641","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/pappp.net\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/pappp.net\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/pappp.net\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/pappp.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=87641"}],"version-history":[{"count":0,"href":"https:\/\/pappp.net\/index.php?rest_route=\/wp\/v2\/posts\/87641\/revisions"}],"wp:attachment":[{"href":"https:\/\/pappp.net\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=87641"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/pappp.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=87641"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/pappp.net\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=87641"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}