Quick Cyber Thoughts: More Reasons for Local AI

This week, I was hoping to follow up on my previous cybersecurity project post, only to run into an unforeseen problem.
Since it's best to make lemonade out of lemons, let's talk about what happens when web AI interfaces stop working.
The Problem

Two days ago, my ChatGPT session went from normal to what you see above.
Typing does nothing - I can make a prompt, but there's no data transmission to ChatGPT.
Clearing cache, temporarily disabling PiHole DNS blackhole, and ad/script blockers do nothing. I'll have to do more investigation to see if the connectivity issue is on my end or ChatGPT's, but it does provide another great example of why local AI is a good idea.
(Bing Copilot is still available, so it's not something that affecting all web AI, as far as I can tell.)
AI Needs High Availability
In IT/cybersecurity, the Five 9s are the gold standard for availability - 99.999% uptime.
Let's set aside all the data confidentiality and integrity issues that might come with using AI (Large and Small Language Models - LLMs/SLMs). The main benefit of AI at the moment is generating content - informational or artistic - quickly and in accordance with the user's input.
In order to be useful, it needs to be available, and an AI hosted on someone else's servers is always going to be vulnerable to service disruption. It could be at their end, your end, or even the ISP in general, but there's plenty of failure points on the chain.
So, how do we get around this?
Exploit the Training Bubble Pop
80% of AI projects fail, according to RAND Corporation research.
Most of that comes down to inflated expectations created by Hollywood and grifters, but also a fundamental misunderstanding of how LLMs and SLMs function. You don't need to train the AI on your data to get useful outputs from it. You can use Retrieval Augmented Generation (RAG) to expand the AI's knowledge base:
So we only need the hardware to do the inferencing (generating responses), which is a lower tier of complexity and performance requirements than training.
Graphics cards and Neural Processing Unit (NPU) equipped chips are providing most of the inferencing hardware on the market, but they're not exactly cost efficient. Decent performance GPUs cost hundreds of dollars, and NPU equipped processors are only in laptops, which also require hundreds of dollars each. What are we to do?
Well, thanks to the billions of dollars spent on AI training hardware, which is also capable of inferencing, and the general trend of AI project failure, the answer is simple. Wait for the AI training bubble to pop, and acquire the used hardware as it's being liquidated to keep the companies afloat.
Things to Keep in Mind
So, on top of all the usual caveats about buying used hardware, especially in a corporate environment, there's one specific thing to keep in mind:
AMD's server chips have a fuse in them that trips when they are initialized, locking them to a specific vendor's motherboards.
This raises the cost of some used AI server hardware, as buying a matched processor and motherboard set is going to be safer than trying to get two separate units and hope that AMD didn't put even more specific fuses in there to further restrict hardware choice.
Another thing to consider is what information will be provided to the AI through RAG, where the information is to be hosted, and how implement network segmentation.
For instance, for a local AI server in a home or single-building small business, it may make sense to host all the RAG data you want on that same system. (It may also be the only option, if you're using software that doesn't allow for remote access to data.) However, the more data you have, the more likely it is that a NAS or Storage Area Network (SAN) might be necessary.
What that data is also plays a huge part in figuring out your storage needs. If you're just pulling down text documentation for software you're using and making it accessible to the AI, you could probably get away with way less storage than an organization that's using RAG on multimedia content.
Naturally, you also have to be careful what you make available to the AI, because if prompted the right way, it will provide that information, whether you intended for that to happen or not. Even if you apply access controls to the AI or the data, it may be simpler and easier to redact/remove any Personal Identification Information (PII) or Personal Health Information (PHI). The AI can't leak what it doesn't have, and it reduces your compliance headaches.
Network segmentation is also going to be an interesting challenge. Obviously, the AI server should be on its own segment, so that access can be controlled through firewall control lists. But should the storage servers associated with the RAG data also be on the same network segment?
These questions and more are what we cybersecurity practioners need to ponder as organizations adopt AI and integrate it into their operations.