Businesses of all sizes have not been able to escape the incredible impact that artificial intelligence has had on the ways we conduct business in recent times.
From large corporations to SMEs, organizations are becoming faster, more agile, and more powerful as we outsource administrative and repetitive tasks to our AI colleagues.
One of the latest trends in AI is the creation of large language models in the public domain: machine learning algorithms trained on massive amounts of data to recognize the structures and patterns of natural language. They are capable of processing natural language, which allows us to explore large data sets through everyday questions or commands.
As such, deep learning programs are the most common way to make AI understandable—to cite the most famous example, deep learning programs are the means by which ChatGPT can answer your questions. But there is one classic flaw with this intelligence: it is stuck in something of a time capsule.
Deep learning programs are trained intensively, fed millions and millions of data points in a continuous feedback loop to teach each model how to understand particular data points or patterns. But “running” a deep learning program — taking it out of the training loop and into the internet as part of the infrastructure — clearly prevents it from learning anything new. Even some early versions of ChatGPT, if you ask a question about very recent events, will politely explain their own timelines.
This means that you need to make sure that the project owner can rely on the systems they will explore, and the data available to them. While a large corporation may have the funding and technical tools to make this happen, this is a bold assumption for small and medium-sized companies.
Move it or lose it.
Historically, we tended to think of data as static. When an average person downloads a file onto their computer, the file doesn’t “exist” until it shows up in your documents, even as millions of individual bytes of data quietly stitch themselves into something infinitely more complex.
With this mindset, you can understand why companies often choose to collect as much data as possible, and only then begin to identify what they have already collected. Typically, we dump data into a large data warehouse or lake, spend a lot of time cleaning and preparing that data, and then extract different pieces for analysis—a method widely known as batch processing.
This approach is as effective as it sounds. Dealing with a whole dataset duplicates work, obscures insights, and imposes huge demands on hardware and energy consumption—all while delaying key business decisions. For small and medium-sized companies trying to find ways to compensate for limited funds and staff, this approach undermines the flexibility and speed that should be their natural advantages.
Until now, the information has not been required to be consumed in real time, or even collected in real time, and this has not been a problem until now. But given the number of new businesses that rely on real-time data to deliver value to end customers (i.e., imagine hailing a taxi using Uber or a similar app and not seeing a “live” map showing the driver’s location), this has now become “necessary” rather than “fun.”
Fortunately, LLM programs don’t just work on batch processing. They can interact with data in a variety of ways—and some of these ways don’t require the data to remain static.
Ask and you will receive
As small and medium-sized businesses seek to displace older, more established companies, data flow is replacing batch processing.
Data streaming platforms use real-time data “pipes” to collect, store, and use data—continuously and in real time. The processing, storage, and analysis that you would have waited for in batch processing can now be done instantly.
Broadcasting does this through what we call event routing principles, which essentially treat every change in a dataset as an “event” in itself. Each event includes a trigger for more data to be received, creating a continuous stream of new information. Instead of having to go and fetch the data (typically stored in a table somewhere in the database), data sources “publish” their data in real time, at all times, to anyone who wants to consume that data simply by “subscribing” to it.
All of this would free LMS from the distinction between training and operation. Moreover, if every data point could be implemented, LMS could train itself; using the correctness of its actions to continually improve the underlying algorithms that define its purpose.
This means that the LLM can benefit from a dataset that is constantly being updated and curated, while the mechanisms that deliver and contextualize that data are constantly being improved. Data is not at risk of being duplicated or abandoned in forgotten silos – all you have to do is ask for it!
Cut from SME fabric
So: what does this mean for SMEs?
On the other hand, this technology removes the constraints imposed by large companies. The incredible speed at which SMEs can deliver information through a fast-flowing infrastructure enables decision makers to move business forward at the pace they want, without the need for batch processing to keep them in second gear. The flexibility that enables SMEs to outperform larger companies is back in vogue.
These decisions are made with less uncertainty, and in a more appropriate context, than in the past. And it is much easier to access specific insights, thanks to the natural language that LLM students are familiar with, so that the data stream can foster a real passion for commercial transparency at all levels.
Not only is the output faster and more accurate, but SMEs can also free themselves from legacy technology. Data can be streamed entirely on-premises, entirely in the cloud, or a combination of the two. The heavy hardware required for batch processing is often no longer necessary if you can order the same output in record time from an LLM. Also, there are many providers that offer fully managed (out-of-the-box) solutions that require zero capital investment from SMEs.
For SMEs to get the most out of their LLM programs, they need to think about how they handle their company’s data. If a company is willing to commit to treating data as a continuous stream of information, it will be in a much better position to maximize the potential that data in motion has to help it grow.