The generative AI tech stack saw massive progress in 2023, with breakthroughs in systems like ChatGPT, DALL-E 3, and Google’s Gemini. However, as AI becomes more powerful and widespread, it’s clear we’re only beginning to tap into the possibilities.
In this article, we compiled the key developments to anticipate in each area.
The Essential AI Tech Stack and Development Trends to Watch
High-quality training data remains the fuel for increasingly powerful AI models. As models scale up into the trillion-parameter range, the data hunger only grows. However, not all data is created equal – variance, complexity, and alignment matter as much as scale.
Key data trends to track include:
- Synthetic data generation will continue to improve, producing training sets that better mimic the complexity of the real world – tools like Mostly AI and AI21 Lab’s Jurassic-1 point the way.
- Multimodal data integration will allow models like Google’s Imagen to tackle tasks that require connecting images, audio, video, and text. Models pre-trained on aligned multimodal datasets will power further breakthroughs.
- Real-world data from users and companies will supplement synthetic data via federated learning and other techniques. This real-world grounding is key to avoiding AI hallucinations.
- Low-data techniques like prompt engineering will enable highly sample-efficient fine-tuning. Models will adapt to new domains with only hundreds of examples rather than millions.
- Data markets will emerge to value, trade, and combine diverse data sources. As AI models consume more data, proper valuation and incentives become critical. In November 2023, OpenAI announced the launch of Data Partnerships, where they will work together with organizations to produce public and private datasets for training AI models.
Training the largest AI models already requires Google-scale infrastructure. Optimizing the compute AI stack will help democratize access to the development of various AI-powered solutions:
- Specialized hardware like tensor processing units (TPUs), Dojo, and Cerebras will offer order-of-magnitude speedups and power efficiencies vs GPUs.
- Model parallelism, as shown in Megatron LM, will efficiently scale model training beyond what fits on any one chip.
- Inference optimization will reduce latency and costs. Approaches like mixture-of-experts, model quantization, and streaming inference will help.
- Cloud marketplace competition from Amazon, Microsoft, Google, and startups will continue driving down model serving costs.
- On-device inference will push AI compute to the edge devices like smartphones. This will enable developers to avoid cloud costs and latency.
Researchers from MIT, the MIT-IBM Watson AI Lab, developed a technique enabling deep-learning models to adapt to new sensor data directly on an edge device.
According to Song Han, an associate professor in the Department of Electrical Engineering and Computer Science (EECS), a member of the MIT-IBM Watson AI Lab: “On-device fine-tuning can enable better privacy, lower costs, customization ability, and also lifelong learning, but it is not easy. Everything has to happen with a limited number of resources. We want to be able to run not only inference but also training on an edge device. With PockEngine, now we can.”
Language, image, video, and multimodal models will continue to grow more powerful. However, not just scale matters—new architectures, training techniques, and evaluation metrics are also critical.
- Multimodal architectures like Google’s Gemini fuse modalities into a single model, avoiding siloed AI. This enables richer applications like visual chatbots.
- Improved training with techniques like Anthropic’s Constitutional AI will reduce harmful biases and improve safety. Models like Midjourney’s v6 show steady progress.
- Better evaluation through benchmarks like HumanEval and AGIEvaluator will surface real progress, avoiding vanity metrics. Robust out-of-distribution (OOD) generalization is the goal.
- Specialized models will tackle vertical domains like code, chemistry, and maths. Transfer learning from general models helps bootstrap these.
The AIOps stack requires tooling for rapid experimentation, deployment, and monitoring to build real-world AI applications.
- MLOps will become table-stakes, allowing seamless model development and deployment lifecycles.
- Experiment tracking through tools like Comet ML and Weights & Biases will accelerate research.
- Infrastructure automation via Terraform and Kubernetes will simplify scaling.
- Monitoring through WhyLabs, Robust Intelligence, and others will ensure reliable production AI.
- Distribution platforms like HuggingFace, Render, and Causal will simplify model access.
- Vertical solutions will hide complexity for non-experts. For example, Replicate and Runway ML focus on deploying generative models.
The Critical Role of AI Infrastructure
As AI models grow more powerful, the infrastructure to support them becomes even more crucial. Here’s why it’s so essential:
With AI models requiring vast amounts of high-quality data, infrastructure must provide secure and efficient data pipelines. This includes capabilities like data versioning, lineage tracking, access controls, and compliance monitoring.
AI workloads demand high-performance compute like GPUs and TPUs. Infrastructure must make these resources available on demand while optimizing cost and energy efficiency.
As the model size and request volumes grow, infrastructure must scale smoothly via distribution and load balancing. Auto-scaling on serverless platforms helps match supply to demand.
Once in production, AI systems require robust monitoring for accuracy, latency, costs, and other metrics. This allows to prevent harmful errors or degradation.
The trends in AI stack point to a future where AI capabilities will become far more powerful and more robust, transparent, and accessible to all developers.
Significant work is still ahead in improving data quality and availability, specialized hardware, evaluation rigor, and productive tooling.
However, the progress of 2023 sets the stage for an exciting decade of AI innovation to come.
What is a stack in artificial intelligence?
What tech stack is used for AI?
Will AI replace full-stack programmers?
- MOSTLY AI’s synthetic data platform features (MOSTLY AI)
- Announcing AI21 Studio and Jurassic-1 language models (AI21)
- Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding (Cornell University arxiv)
- OpenAI Data Partnerships (OpenAI)
- Cerebras (Cerebras)
- MegatronLM: Training Billion+ Parameter Language Models Using GPU Model Parallelism (NVIDIA ADLR)
- Technique enables AI on edge devices to keep learning over time (Massachusetts Institute of Technology News)
- Anthropic (Anthropic)
- Midjourney Showcase (Midjourney)
- GitHub – openai/human-eval: Code for the paper “Evaluating Large Language Models Trained on Code” (HumanEva)
- Comet ML – Build better models faster (Comet)
- Weights & Biases: The AI Developer Platform (Weights & Biases)
- Terraform by HashiCorp (Terraform)
- Kubernetes (Kubernetes)
- Whylabs (Whylabs)
- Hugging Face – The AI community building the future (Hugging Face)
- Cloud Application Hosting for Developers (Render)
- Replicate (Replicate)
- Advancing creativity with artificial intelligence (Runway)