What is Windows AI Studio?
Windows AI Studio is a software development environment that allows developers to build generative AI apps and deploy them locally on digital devices that run Windows 11.
Microsoft officially announced Windows AI Studio in October 2023 during its annual Microsoft Ignite conference.
The release of Windows AI Studio marks a significant step in Microsoft’s efforts to democratize generative AI and make it accessible to a broader range of developers and users.
It also highlights the growing need for generative AI to function on edge devices and in regions with limited, unreliable, or no internet connectivity.
How Does Windows AI Studio Work?
Windows AI Studio has a simple workflow that allows developers to build, train, and deploy small language models (SLMs) without having to use cloud infrastructure.
Developers access Windows AI Studio through Visual Studio Code, a lightweight open-source code editor developed by Microsoft.
Microsoft’s strategy is to allow developers to take advantage of their familiarity with Visual Studio Code’s editing environment while benefiting from the generative AI capabilities of Azure AI Studio. This approach aims to make generative AI more accessible and less intimidating for developers with varying skill levels.
The first thing the developer will be asked to do is select a pre-trained small language model from a curated list provided by Hugging Face and Azure. This is an important step that new developers should plan on spending time on.
Many of the pre-trained SLMs provided by HuggingFace and Azure are designed for discrete tasks, such as text classification, sentiment analysis, machine translation, or question answering.
Whatever pre-trained model the developer chooses, it should be designed for the task(s) the developer is building their Windows app to carry out, and the model should be able to function well on battery-powered computing devices that have limited power, storage, and processing resources.
The next thing the developers will be asked to do is use their own data to fine-tune the pre-trained model they selected.
This process involves adjusting the model’s parameters to optimize it for the application the developer is building. Although it may seem as if this step should take a lot of time, it doesn’t.
The Windows AI Studio user interface (UI) makes it easy to adjust parameters by providing the developer with sliders, buttons, and other low code/no code (LCNC) dashboard elements.
On the back end, behind the scenes, the developer is actually using an integrated development environment (IDE) called Olive and a technique called QLoRA (Quantized Low-Rank Adapters) to adjust parameters.
Once the model has been fine-tuned and optimized, it is converted from its native format into the Open Neural Network Exchange (ONNX) format so the model and its dependencies can be used by the developer’s Windows application. The application will use ORT (Onnx Runtime) to execute the model during inference.
Advantages of Running AI Locally
Now that generative AI has become cheaper and easier to use, people are discovering new uses for the technology.
Researchers have been exploring techniques like model compression, quantization, and knowledge distillation to reduce the computational footprint of large language models (LLMs), but they are still not suitable for direct deployment on personal computers, tablets, mobile phones, kiosks, and other types of Internet of Things (IoT) edge devices.
Until now, on-device AI has been a challenge. Microsoft’s investment in Windows AI Studio is expected to be a key driver for running small language models locally and creating new use cases for on-device generative AI.
If Microsoft is successful, developers will be able to:
- Build generative AI chatbots that can continuously function even when internet connectivity is unavailable.
- Build text summary apps that don’t have to send sensitive data out to the cloud for processing and analysis.
- Fine-tune generative AI models for specific hardware configurations.
- Avoid potential disruptions or downtime caused by cloud outages or unexpected service changes.
- Meet regulatory compliance requirements in certain industries, such as healthcare or finance, where data privacy and security are paramount.
- Maintain complete control over when and how their AI models are updated.
- Explore new uses for small language models like Microsoft’s Phi, Meta’s Llama 2, and Mistral.
- Build hybrid apps that can switch between local SLMs and cloud-based LLMs using a feature called Prompt Flow.