September 24, 2024 9 min read AI

Generative AI and the Next Era of Software Engineering

Ravi Shankar Goli is a lead principal software engineer at Microsoft, developing a new generative AI product. He recently completed a second master’s degree focused on AI and is helping to nurture the next generation of innovators through volunteering.

Generative AI and the Next Era of Software Engineering

You have been a software engineer for nearly 20 years. How would you describe the impact of generative AI on computing? How transformative is this change in comparison to previous advancements?

I see this as one of the most revolutionary technologies in human history. If you go back to the invention of paper about 2,000 years ago, it was one of the biggest innovations that humans came up with, transforming knowledge into a record on paper. Fast forward nearly 1,500 years, and the printing press created an unimaginable spread of knowledge. Fast forward another 500 years, we have computers. Their true potential came with the rise of the internet and what happens when people are connected to each other, exchanging information.  

I think AI is bigger than the computer and the internet together. In all these previous technologies, a human was the protagonist producing information. In this case, it’s the first time AI is trying to produce new knowledge, and I am very hopeful that it will be transformative.  

One of the key impacts of generative AI is that machines can adapt to how humans work, meaning humans don’t need to conform to machine logic as much as they did in the past. What does this mean for computing, and does it change UI?

Current computing is essentially rule-based and instruction-based, and I, as a human, feed these rules to the computer. So I need to understand the problem in the outside world and then translate it into mission instructions and rules and feed it to the computer, which will start working in a deterministic way. But there have always been limitations to this model because the real world is so complex, and the rules and the requirements are huge—infinitely huge—and computers are limited.  

But now we are moving to a model where computing is learning from the world and the data that you feed it. It is not a fixed rule set or fixed instructions. It creates its own rules and its own patterns, provided you have enough data.  

For UI, AI is multimodal—it can use text, it can see, it can hear. And that creates a different kind of UI. An app could change its menu and options by picking up on what a user is doing—a real-time music playlist suggestion that is perfectly tweaked for that moment. A natural language interface can change the type of language it uses based on the user’s emotional state.  

At the same time, we still want to know when a parcel will be delivered or our heartrate during a workout. So traditional UI will be relevant in the same way that we didn’t replace paper even after 2,000 years. We’ll just need a lot less clicks to get there.   

The computing-intensive nature of AI is bringing in a plethora of diverse computing devices—GPUs, DPUs, accelerators, etc. What does that mean for software engineers? How much do you have to think about the underlying infrastructure? 

Yes, hardware plays a crucial role for software application developers, especially in AI applications. These applications require top-tier GPUs, CPUs, memory, and storage—far beyond the capabilities of a typical local computer.  

When designing and architecting applications, particularly with generative AI, I must consider my user base and the availability of necessary hardware resources. For instance, I need to ensure I have enough GPUs to handle the workload. If my user base spans across countries like India, Germany, or the UK, I need to have data centers in these regions to deploy my applications close to the users. This proximity is important for reducing latency and ensuring efficient performance. 

When dealing with vast amounts of data, performant storage is essential because it influences the speed at which data can be read and processed for model training. Deploying an application on low-performing storage will result in suboptimal performance. During inference, the location of where the model is stored is critical to ensure quick and efficient access. 

As you’re developing a generative AI product, what is the most significant hardware bottleneck?  

Availability is the biggest challenge, particularly during the development phase. Due to the high demand for hardware, even within a single company, accessing GPUs can be difficult. There are simply not enough GPUs for everyone to use simultaneously, and the costs associated with owning and operating them are significant. As a result, we often have to work within these constraints, sometimes simulating with smaller data sets and performing the main testing much later, closer to production. 

The AI Hardware Race

This article is part of a series called The AI Hardware Race, which focuses on the evolving systems that run AI. Click here to read the rest.

The power demand for AI training is breathtaking. Could the answer lie in software? Can we build AI that isn’t so compute-intensive? 

That’s an interesting question. It is true, the current power consumption with GPUs and AI training especially, you need thousands of megawatts for large language models (LLMs). A single LLM query is somewhere 30 to 50 times more consuming than how we’ve used internet search until today. But everyone is talking about this and there is so much research going on in this space to reduce energy consumption, to optimize overall hardware. There are dozens of startups coming up in this space so I’m very optimistic we will solve this problem of energy consumption.  

For the software, we need to design new algorithms and new architectures that reduce the computing demands. There is a lot of innovation required in this space too. Maybe there is entirely new architecture required for AI. 

So the answer lies in not just software or hardware, but it should be coming from both. 

Are Small Language Models part of the solution?  

There’s a surge of LLMs with some of the biggest models hitting a trillion parameters. You need a huge, powerful data center for that, and that’s why the industry started exploring Small Language Models (SLMs). Small models that, given the right hardware, can run without an internet connection on your local machine.  

Surprisingly, we saw SLMs performing equivalent to LLMs in some scenarios, given proper data, training methods, and fine-tuning. This creates a lot of interest in the industry and every other month, we see 100 new language models being added to the mix. Once GPUs become cheaper and available in personal computers, we’ll see this space explode, because there are so many use cases that you’d want AI to be local and private.  

How do you ensure AI integrations, especially when multiple applications work in concert, are seamless? How do you prevent users from becoming middle managers of a roster of AI tools?  

Integration with multiple applications is a complex task. AI adds additional layers to that complexity because it also brings hardware problems. As an application designer, I need to create modularity, proper architecture design, and automate this so that the end user is not aware of any of this in the background. 

Copilots are new in the industry, and companies need some kind of skill set to do this type of configuration. There are already low-code and no-code solutions available, like Microsoft Copilot Studio, to help bridge the gap. 

How big is the skill gap for AI? And do you see AI writing its own future as far developing code and doing the bulk of the work?  

AI will undoubtedly be capable of writing and coding. However, at least for the next decade, it will not replace humans but rather augment their abilities. Consider the complexity of integrating multiple applications; it’s challenging to fully articulate everything this process entails. When faced with entirely new problems that AI has never encountered, it won’t be able to solve them on its own. Once a human writes the proper code, AI can learn from it and replicate it in the future. 

As a result, the expectations for coders will increase. While becoming a software engineer today is relatively accessible, true expertise requires additional knowledge. Ensuring the authenticity and correctness of code and content generated by AI, which can sometimes produce nonsensical results, means that software engineers are here to stay. Their skill set will, in many ways, become more critical and advanced. 

Yes, LLMs are often criticized for “hallucinations” and inaccuracies. How do you ensure the accuracy of your AI product? Can these strategies be generalized? 

Yes, definitely there is a hallucination problem, and these large language models sometimes say things confidently that are not true. When we develop these applications, addressing the hallucination problem must be a primary focus in the application development process. 

So how do we solve it? There are several strategies. You need to first make sure you have a better model, which is trained on the relevant data. That’s one aspect. And then the second thing is that if it is not working, we can do something called fine-tuning, a process where you take the main model and then you further fine-tune it to your problem set. For instance, in the financial sector, you would feed the model a lot of financial information along with data outlining your expected outputs.  

Additionally, there are numerous toolsets and extensions available. These might be referred to as skill sets, plugins, or functions—terms the industry uses interchangeably for the same concept. Essentially, these are extensions of LLMs that don’t rely on AI but instead use traditional computing functions to ensure the LLM’s output is accurate or to gather data from external systems. For example, weather predictions are not generated by an LLM, or for calculations, an AI can call a calculator API to ensure accuracy. These issues can be addressed through fine-tuning, using textual extension functions, and proper testing. In the future, we can expect even better models. 

What’s your advice to technology leaders looking to build a generative AI product?  

My key piece of advice is that, yes, AI is going to be extremely disruptive to the industry. That said, businesses don’t need to panic. They need to make sure they are aware and up to date. Everyone will try to apply AI to their product, but that doesn’t mean it will be successful. So instead of trying to apply AI to your product and wasting huge amounts of resources, my advice is to first examine if and where AI makes sense and to leverage experts to look at the problem. Those resources will pay off. 

Credits

Editor and co-author Ronni Shendar Research and co-author Thomas Ebrahimi Art direction and design Natalia Pambid Illustration Cat Tervo Design Rachel Garcera Content design Kirby Stuart Social media manager Ef Rodriguez Copywriting Miriam Lapis Copywriting Michelle Luong Copywriting Marisa Miller Editor-in-Chief Owen Lystrup