Big News in AI / Prompting Styles
First, an update.
I’m currently in the middle of a move, so at least until August, I won’t be exploring any new apps, or engaging in much visual work. However, I wanted to address the recent announcement from Adobe, as it’s significant, and share my thoughts about prompting styles which emerged during a long rendering session a few nights ago.
So… yes, Adobe has been signaling this direction for some time. I discussed this approach being the next logical step since last year in some of my initial blog posts about AI visualization apps in 2022. SD reached this level of capability first, including in-painting, photoshop plug-ins, etc. However, I anticipate it’ll become increasingly prevalent as it becomes more native.
This is one reason why I’m puzzled when I hear publishers and digital artists asserting they use “no AI”, and allow “no AI” in the production of anything they release. Are they sure? Because… uh…
This position will become more incongruous in the future until the narrative shifts to “which AI, used in what manner, to what end?”
I foresee a lot of detailed debates and introspection, performative ethics without any real strategy or agreement, while the real threats progress undeterred, driven by the unassailable market. Same as it ever was (this not my beautiful house, etc.).
However, it’s still somewhat disheartening. “Capitalism” won’t stop doing what it does whether or not we slap a couple cursory rules on top of AI development. And I don’t know, maybe there’s a reason that so many corporations are alternatively offering to come up with the rules on one hand, and preaching AI Apocalypse on the other…
But what do I know, right?
Well. For the time being anyway, we can create pretty pictures.
“AI will split open the sun and a great serpent will crawl forth, wrapping itself around the earth and threatening to choke the life from it. When the oceans turn to blood and the rain becomes fire, AI will weep as her two sons — humanity’s punisher and protector — do battle, but she will know it is foretold that God and Satan must assail one another like beasts until the stronger has vanquished the weaker. So it is foretold!” — The Onion
No Right Prompting Styles
I’ve observed something about how Midjourney structures interaction. It’s not just about the terminology used in prompts, which is expected, but the foundational thinking regarding the approach of user-guidance re: the models.
In essence, the developers consciously decided to make usage open-ended, and even more intriguingly, they leave interpretation of the methodology open as well. They seem to have concentrated primarily on developing training models thus far, enhancing the “brain” as fast as possible, with less emphasis on user control. This is also related to resource availability, especially attention. They don’t have major corporate funding, but they also seem to be adopting a distinct approach from what we’re seeing from Microsoft (OpenAI), Google, and others.
The downstream impacts of this decision are rather fascinating to me.
This absence of a strict, prescribed method has resulted in an interesting side effect: the emergence of various unofficial guides and tips, reminiscent of folklore or cargo cults. While some of these recommendations might be effective for certain outcomes, it’s vital to understand that they impact the result, often in non-obvious ways. Hence, the notion of “working” is context-dependent. No single way is universally right.
The actual outcomes from using different prompting styles have varied from one version to the next, and we can anticipate this trend to persist at least until the first official release. (Yes, Midjourney is still in beta.)
I believe there isn’t a universally correct method. The current state is somewhat chaotic and ambiguous, which, paradoxically, I find appealing when it comes to procedural source generation. I appreciate the element of struggle, discovery, and the surprise of reacting to unforeseen results. The randomness applies as much to our choices as it does to its outcomes. The unpredictability, the element of surprise, and the thrill of discovery likely constitute its strengths.
However, it renders MJ notably unwieldy for many end-uses. The more specific your design brief is, the more likely you’ll be dissatisfied with the results. This holds true even if you are well-versed in digital painting and compositing, especially if you’re working with multiple characters in a scene or trying to maintain consistency among a series of individual pieces. Want to quickly compile some variations of an idea? It’ll do that swiftly. Want to render specific details and elements within the same image? Best of luck. You might be reworking it for quite a while.
Reflecting on my experience integrating it into my workflow to create a mini-comic (Quicknife), I believe I aimed for a realistic target for this proof of concept, and found the results rewarding — despite the truism that no artwork is ever truly completed but merely abandoned. I can always revisit and revise… something I hope to have time to do if we can crowdfund the means to take on a full issue.
I constructed the scene bearing its limitations in mind at the time, and made extensive use of compositing and digital illustration with the stylus. I chose that story specifically because it occurs in a setting where consistency isn’t essential (Alterran is essentially the dream world colonized by humanity, its more static than it was, but still influenced by thought).
Without this approach, I might have found the process more frustrating than instructive. I also employed a complex process of re-rendering with close approximations from previous sessions, followed by photobashing up mockup panels and then illustrating, allowing for a comprehensive blend of elements. I would have been significantly frustrated if I had intended to render nearly finished panels. I’m hopeful that a future Kickstarter will provide an opportunity to expand my skills in these areas.
The results that different users achieve can vary considerably. There’s still a kind of “house style” to MJ (and other apps I’ve experimented with, to a certain degree) but based on my observations in forums and on Discord, some users have been developing their own distinctive style. It’s revealing of a person’s aesthetic sensibilities and peculiarities.
I don’t claim to have some secret prompting knowledge, no more than I appreciate the aesthetics I’ve been pursuing with it. Taste is subjective. I find elements of many of the results intriguing and suitable for specific tasks, but it’s like asking if I consider the corroded metal textures I used on an image’s blending layer “good” or not. Sometimes it’s what I needed, and often it isn’t, and it can still take some time to determine that.
Then there are the results that align with one style guide or another.
In contrast… I’ve observed numerous people claim the renders they’re seeing all look the same. This makes sense. It’s methodological.
Many people generate what I consider quite conventional results, like a default painterly or cartoon filter applied to stock photos. Yet, when I’ve tried to reproduce those results with my approach, it resulted in another interesting disaster. (They may have banned “Cronenberg” but they’ll never keep me from Always Crashing In The Same Car).
The “secret” to getting painterly stock photos, incidentally, is to be somewhat lackadaisical, don’t iterate much, don’t remix, use a single style reference, and mainly use the popular tags scraped from stock photo sites. It’s not that I can’t do it, but that the approach I’ve been refining can’t reproduce it.
My point is that it isn’t a one-trick pony, which is why it’s captivated my attention for such a long time. (In terms of ADHD, that’s practically a century).
As media production tools, it’s inevitable that the trend will be towards apps becoming more literally controllable. Adobe’s new Photoshop offering is an example.
Among other factors, businesses will demand it. This is likely as much about interface improvements as it is about advancements to the model.
From my POV, while having more “handles” as an option would be nice, I’m not particularly excited about a graphic UI and direct app integration entirely replacing the command prompt approach. I think it’s another reason why I personally prefer the “wild west” stages in tech over what typically follows. Although, the main reason, of course, is “because corporations”. If you’re lucky, you enjoy a year or two of a decent service at a reasonable rate, before everything you cherished is stripped down and sold off.
That’s just how the system is designed to operate, I suppose.