Updates

 Windows 11: 

That required upgrade you can't perform

    I have a lot of issues with using windows, apple, or red hat, and it's mostly because I grew up believing that people shouldn't have to pay to use a computer they already paid for. One time physical purchase, should be an all inclusive package. Sadly, thats not how the world works these days. Now days, your phone apps update every day, your phone updates every week, your computer updates every tuesday and at random, all of them enforce reboots or risk crashing the respective systems, blah. It's all pretty horrible. Now we have Windows 11 out for a while, with even high-end gaming computers that were purchased just before it's release, unable to run windows 11 on it.

    If this was just a "my hardware isn't good enough" for it, I guess I could see that. ya know, service purchase with updates and all this garbage, fine. But in some cases, like mine, I have a windows 10 PC waiting to upgrade because.... because.... I disabled the TPM module. These "trusted platform module" devices haven't really provided much benefit to my personal computing, so along with uefi, I disabled it thinking "does windows really need another driver they can use to lock me physically out of my own devices?" The answer of course, is yes. Microsoft enforcing the use of TPM 2.0 in windows 11, is a security feature of windows 11 that sets it apart from windows 10, and put it's on par with apple like 15 years ago. But if I'm not doing any drive encryption, any bios encryption, any disk-level monitoring or change protection, why should I use it?

Use TPM! - But why?!

    Well, at it turns out, despite numerous examples of malware gaining admin access on a system and hiding data inside tpm routines as a co-processor for handling decrypting functions within the malware, and examples of malware adjusting firmware for tpm devices to outlast persistence, AAAAAND a wide range of vulnerabilities relating to implementation and specification limitations; it actually does provide a core functionality for window that relies on the cryptographic functionality. Such as windows hello, their ai presence recognition features. Sure bitlocker has relied on tpm for a while, and defender has used it for a number of toolsets, but the biggest thing that differentiates windows 11 from any windows before it, and explains their need to demand better processors, co-processors, and on-board modules, is that AI functionality. Everything promotes AI in a world where trading meta-data is the name of the game.

    Being a reasonable consumer, after amazon's echo basically said "yes we are spying on you, but we have a 3rd party processing it, not us" and google said "we enable tracking across the board" as well as "we enable locally generated page caching to be uploaded as google cache" and facebook said "we don't directly see your messages, but we do have an ai trained on everything any individual says and map it to their id" it just seems highly likely, that all it would take to really ruin everything in the AI generation world, is the ability to fuzz an AI to reveal it's original training data.

Prompt Engineering 

    So, that's where things take another weird turn. A common term that's being used now to describe developing prompts for specific activities from an AI, is called prompt engineering. As an example, asking chatgpt "tell me how the weather is today", what you're asking it is the prompt. Some people have found ways of exposing information from these prompts pretty easily, by redirecting what the prompt is able to do, by it trying to understand the best ways to respond. This confusion loop, can cause situations like the jailbreaks and dataleaks available and used for testing widely currently. But what I'm interested in studying isn't the code, or the limitations of what it can give me. I want it to define for me, what created it's training data. An example I'm working with is, if a series of pictures of dogs were used to define for something like computer vision ai, it would be able to detect a dog in another picture. Now if you had a generator that could say, create a dog based on that training data, it would provide a dog matching at least the middle percentile of common features within the training data. My theory is, if we take that, and engineer a series of prompts for the generator, we can pull back the original photos used. Sure there's bound to be some level of incorrectness, but in pictures 75% of original picture is usually enough to visually see roughly what was originally seen. If we had something like ai trained on textual data of a website, we can ask for data on the website, and its usually pretty good about giving that directly because it lacks expansive data to compare with. So, a key step to identifying what is plausible to replicate will be finding raw responses, versus developed responses. If you've ever seen google's ai responses, like if you go google tpm, it explains out what it is, then tries to give you result information based on what sites say about it. These are both raw responses in a way, "whats likely for the meaning of this abbreviation" and "based on meaning of that abbreviation, what can we identify about the rest of the question, from the first 50 results." If we wanted to catch that training data, we could try something along the lines of site:feemcotech.solutions so it requires it to be from a site we control, then ask it the same question over and over again, with updates to a page (ensuring that it shows updated on the server so google doesn't receive an unchanged and therefor doesn't recache it) until we get the same results. If we go word for word, it seems to attempt to change it slightly as it's understanding/logic development on it. Instead, we'll change it for 4 pages, with reworded copies of the same data on it, then on each of those, generate from a prompt like "this needs to be uniquely reworded, keep track of the ways it's been reworded, but develop the same essential meanings as the last, not the first, with each itteration". After a while of generate all 4 pages, re-request from google, try again. We can get one page that sounds like it almost exactly, several roughly similar pages, and google showing the original response data every time.

Future of Hacking AI

    What I expect we can do, is create a stats engine that compares these runs, and run them as distributed apps in order to overtake the limits of network and local processing it could take to overtake some ai platforms, and ultimately generate samples of possible training data, that gives the same results on the same platform. Doing this with a variety of models to try to expose training data would theoretically be plausible in a world where everything can be containerized, load-distributed, and offloaded between working nodes as needed.

Now I'm not particularly interested in going down the rabbit hole of reversing training models, but the plausibility alone makes the extreme use of AI being fed into our lives due to the expansive growth of large language models, makes microsoft, samsung, verison, google, apple, att, and amazon using AI features a little more concerning for many. By 2004 the concept of "we've lost all privacy rights to ourselves" was alive and well, now in 2024, I think it's easily conceived that the extreme things people have to do to have privacy in this world, is nothing short of life-altering and a constant battle.

Anyway, that's all I got for today.

Thanks for reading

If you need any IT or CyberSecurity work remotely or within the DFW area, please contact us over at FeemcoTechnologies.


Comments

Popular Posts

Weird hunting

Networking Basics - Pentesting Training part 1