Friday, December 30, 2022

MyGPT

I want to create my own AI chatbots based on text corpora I specify. That will let me to converse with avatars of large bodies of content, like encyclopedias, user groups, and corporations. (Cf. Conversations with ogregores.) I could also set them to debating with each other. And not least, it would let me to create a personalized AI search engine. The “neeva AI answer” becomes the “meeva AI answer.”

I not only want to specify the corpora—yielding several personal oracles, optimized for various uses—but I also want to be able to train them myself. OpenAI used human trainers to make ChatGPT, resulting in a characteristic voice for their chatbot. (It sounds like NPR to me.) I’d like to calibrate my AI searchbots with my own preferences, for example by doing Reinforcement Learning from Human Feedback (RLHF) myself, perhaps with add-ons that enable me (or more likely, an employer with a budget) to commission human AI trainers working to my specs. At the simplest level, Neeva already collects some preferences by allowing the user to set Prefer More, Prefer Less, or No Preference for each domain in a search result listing. For users that provide a lot of preference data, this may be the 80/20 solution.

At the moment, I can only see a few at-scale AI chatbots (such as those from OpenAI, Anthropic, and DeepMind) but there must be many more currently in stealth mode (like Neeva). I don’t know what the limiting constraint is: the processing power needed for large parameter models, budget for human trainers, building large corpora, or something else. But as expertise grows and computing becomes cheaper, it’s hopefully soon going to become affordable for smaller outfits—and perhaps even unskilled individuals. Open-source RLHF tools and language datasets are already available. DNA sequencing saw a steep decline in cost between 2007 and 2012; hopefully the same will happen with AI language models.

Of course, just because one can, it doesn’t mean one should—in this case, help users to create web experiences where both search results and their tone conform to the user’s world view. Echo chambers could become even more personal. But just as obviously, because one can, somebody will.


No comments: