When it comes to technology, confidence is great, but competence is paramount.

Large language models (LLMs) like ChatGPT, GPT4, and CoPilot are generating text that looks like what humans write. Except, the content is noticeably lacking a few minor details: the machines have no actual consciousness and are simply taught to generate text that a person would believe is plausible. What’s more, the machine’s output is often inaccurate and quality is mediocre at best.

Since LLMs don’t know what anything means, they generate lots of artifacts that look good but can be wrong. Term papers would be easier to write if the author could make up sources in the bibliography. Code would be faster to write if performance, accuracy, or context didn’t matter. To be clear, LLMs write a lot of words that are important, but aren’t just about facts. Phatic communication is real and matters.

So, when accuracy matters we must check the facts diligently, especially since they are not always tethered to reality.

Why Context AND Content Matters

Thinking of my personal life, my family and I are trying a fitness program with specific diet targets around carbohydrates, proteins, and fats. ChatGPT speedily provided us with delicious-looking recipes. How exciting! But when we checked the recipes with reputable calculators, everything was way off. In our case, it wasn’t a big deal. We quickly checked the recipes and adapted the ingredients. Plus, we knew what good looked like.

It had me thinking, though, what if we didn’t know what good looked like or didn’t have a way to check the recipe’s nutrition? In my situation, we’d be working against our health goals – not something we wanted to find out after our meal!

In the financial world, if you ask an LLM to synthesize a trading recommendation (many of the major language models already have warnings), the model can confidently recommend a trade with detailed reasoning. But there’s a major caveat: the model’s suggestions may not be connected to current information.

As many of us are (hopefully) coming to learn, LLMs work best as an augmentation for knowledgeable humans. But they are not a replacement. A noted vulnerability with LLMs is that people can over rely on them instead of verifying the content – like the lawyer currently facing sanctions for submitting LLM-generated briefs citing fake cases.

Artificial intelligence (AI) models work well to help smart people break writer’s block or spark an idea; not as something to copy and paste. Trying to come up with the structure for an internal blog post? Great use! Figuring out an introduction or cover letter can be a wonderful use – as long as the human writer can edit and take ownership of the content. Autocompletes for starting or ending an email seem fine as well. This kind of augmentation is why cars excel in assistive cruise control and lane drift warnings right now.

Will we ever be ready for a time when AI models fully take the reins? In my view, I’m kind of glad that full self-driving cars always seem to be a few years and billions of dollars away.

The Ins and Outs of Large Language Models

It’s also important to think about inputs and outputs. Say you ask GitHub CoPilot to write programming code based on a document description. There is no programmer inside CoPilot. It’s an LLM generated over many repositories on GitHub. A user doesn’t know the authors, intentions, context, or copyright of that code. If the copied code is from a GPL-licensed repository, using it may create copyright liability if the product isn’t GPL-licensed.

It’s also important to think about how the prompt works. When someone sends data to the LLM to generate code, the AI developer may get to keep the input to further train the LLM. But is there a contractual relationship that limits what the AI developer can see or do with the code context and prompts? In thinking about inputs for LLMs, users must consider their understanding of the training data. For users that have not generated the model, they probably don’t know the data it was trained on.

As the headlines show us, it’s becoming increasingly clear that training models may not align with users’ values.

When the Outputs Become the Inputs

Was the LLM trained using general content on the internet? There’s some very dark stuff out there. At some point, the outputs become the inputs. Most LLMs in release are trained on old web crawls of the internet and don’t know the same information we know about the world today. So, when an LLM spits out content of dubious accuracy to the internet, it’s negatively affecting the next-generation model.

Some people are trying to use LLM generation as a shortcut around the hard parts of content creation. Yet, as that content gets out in the world, there’s nothing stamped on the output saying it was generated by an LLM. What that means is the next generation of LLMs could consume AI-generated data in training. Ultimately, this feedback loop could mean more expensive algorithms and worse data. Or it may mean it’s more difficult to train any LLM on text published after a certain point as content written by other LLMs becomes more widespread.

The Future Is Augmentation, Not Replacement

I see LLMs helping people achieve their goals, speed up work, or get started when they’re stuck. Much like how a GPS helps me find a destination where I still pick where I’m going and adapt to the world in front of me, LLMs help with the structure of similar words, finishing phrases, and providing the raw work to edit.

I’m excited about the possibilities that AI will help humans achieve. I’m also grounded in the reality of how this technology works. It isn’t magic. And, while AI can help scaffold out repetitive work, it doesn’t supply intention, direction, or understanding. We mere humans are likely to fill that role for a long, long time.

Author:
Matt Katz, Senior Vice President, Forward Deployed Software Engineering

DISCLAIMER:
This blog post is made available for personal informational purposes only. It does not constitute legal, tax, or investment advice and should not be treated as such. Nothing on our blog constitutes an offer to contract or acceptance of contract terms you may offer to us. We contract solely by definitive written agreement reviewed and approved by counsel. Any views or opinions represented in this blog belong solely to the author(s) and do not represent those of Arcesium LLC, its affiliates, or any other individuals, institutions, or organizations associated therewith. Arcesium LLC and its affiliates do not represent, warrant, or guarantee the availability, accuracy, or completeness of the information contained in this blog and shall not be liable for any losses, injuries, or damages resulting from the display or use of such information.

back arrowBack to Insights