Hacker Newsnew | past | comments | ask | show | jobs | submit | ianbicking's commentslogin

I haven't really kept up with what Midjourney has been doing the past year or two. While I liked the stylistic aspects of Midjourney, being able to use image examples to maintain stylistic consistency and character consistency is SO useful for creating any meaningful output. Have they done anything in that respect?

That is, it's nice to make a pretty stand-alone image, but without tools to maintain consistency and place them in context you can't make a project that is more than just one image, or one video, or a scattered and disconnected sequence of pieces.


I've been working on a LLM fiction writing workflow and associated tools. It's built on agentic coding tools with lots of structure, guidance, prompting, and critique. Almost all of the flow is on the filesystem and using a custom command-line tool, making it accessible to agentic programming tools. (No MCP though; it seems superfluous?)

I was fairly neutral about the tool for a while, but lately I've been going all-in on Claude Code, using things like rules and subagents.

It's also built to "rerender" the story, for instance rewriting it (slightly) for voice, translate it, or target different reading levels or background. I'm interested in translating stories for language learners in addition to simply translating into other native languages.

I'm also hoping to create some stories that stretch the medium. Perhaps CYOA (though I'm struggling with understanding what a CYOA is good at), though also other multi-perspective stories with reader autonomy in how to read through the story. LLMs make it easier to overproduce content, so you can give the reader flexibility without feeling regret that much of the content will be skipped, or rewrite passages for readers who jump into stories part way through.

Producing quality content is hard, and frankly kind of expensive, which is why I'm focused on finished products instead of interactive experiences. Though I do look forward to some future opportunity to take these rich characters that are grounded in full stories and find other things to do with them.


I've come around to feeling that if I'm going to make an experimental development tool, I need to make it in service of building something specific. Maybe something playful... if I'm building something "important" then it can put unwanted conservative pressure on the tool. But something, and if I do that then at least I have something interesting regardless of the fate of the development tool. Because yeah, there's a good chance no one else is going to be excited about the tool, so I have to build for my own sense of excitement, be my own most enthusiastic user.


I share a similar sentiment.

I have a deep drive to build the "important" stuff so that my life has meaning, but there's something hard to motivate about any given thing being "important" when you look at it long enough. It seems like the "important" thing I'm building eventually looks ridiculous and I bounce off of it.


Maybe this is some kind of art that doesn't need to be useful.


It does however work just fine if you ask it for grammar help or whatever, then apply those edits. And for pretty much the rest of the content too: if you have the AI generate feedback, ideas, edits, etc., and then apply them yourself to the text, the result avoids these pitfalls and the author is doing the work that the reader expects and deserves.


I think there might be a pattern across education with a strong ideology (Montessori, Waldorf, Classical education, etc) that they aren't very good at recognizing when the ideology is failing a kid. The relatively weak and mushy educational philosophy of a normal public school is also a somewhat reasonable way to run a school that has to take kids wherever they are at and wherever they came from.


>that they aren't very good at recognizing when the ideology is failing a kid.

I would amend this somewhat to say that most ideologies have aspects that are much less effective in practice than they are in theory. I think this matches your sentiment, but addresses the likely rebuttal of "But X method has this specific element designed to address the shortcoming."


I have wanted to do experiments with a receipt printer hooked up to a Raspberry Pi, with some simple controls... but every time I look up the cost of the printer I balk. It's probably not fair, but I guess in my head it feels like they should be cheaper. Or at least the cost then makes me question how much time I'm really ready to put into stuff like debugging the printer drivers and putting together a case, etc etc.

The thing I actually want to play with is probably some kind of board game that incorporates the printer... ideally with bar/QR codes so the computer can print out money, IOUs, instructions, etc., and have this computer mediation that still gives people physical items to manipulate.


While not an outright solution to the fact that they _are_ expensive, if you don't care about them being second hand or a little older you can score a pretty good deal on sites like eBay.

For instance, TM-T88V printers can do more but cost around 3x as much as the one I got, a TM-T88IV which is the older version. Not perfect, but beats the like $200 price tag brand new.


I recently bought a thermal ESC/POS printer for 25 Euros on Aliexpress but I saw the sames ones for 30 Euro on Amazon.

I connected it to my linux server and without any drivers I can print with e.g.: `echo "Hello World!" >> /dev/usb/lp0`.

It also supports bar/QR codes.


You have a make/model/link?



You can pick up a Rongta for $75 brand new


I think it can be a false comfort to think of LLMs being trained to the center of the bell curve. I think it's closer to true that there's no real "average" (just like there isn't an "average" human) because there's just too many dimensions.

But what LLMs do, in the absence of better instructions, is expect that the user WANTS the most middling innocuous output. Which is very reasonable! It's not a lack of capability; it's a strong capability to fill in the gaps in its instructions.

The person who has a good intuition for design (visual, narrative, conversational) but can't articulate that as instructions will find themselves stuck. And unsurprisingly this is common, because having that vision and being able to communicate that vision to an LLM is not a well practiced skill. Instructing an LLM is a little like instructing a person... but only a little. You have to learn it. And I don't think as LLMs get better that this will magically fix itself, because it's not exactly an error, there's no "right" answer.

Which is to say: I think applying design to one's work with AI is possible, important, and seldom done.


I remembered hearing a podcast about a startup robotics company doing the same thing; a little search and they actually have a comparison page between their product and HP's:

https://www.dustyrobotics.com/compare/fieldprinter-vs-sitepr...


So Dusty is for SMBs and SitePrint is for clueless Corporations, like Oracle.


Thank you. I wondered immediately on seeing this if HP had just acquired those guys.


I think a big part of it is not so much that they aren't capable of being a dungeon master, but they are constitutionally unfit due to their agreeability.

The biggest improvement there is to treat the game engine as the "user", and the player (and their input) is merely one among many things the game engine is managing. But then you also need a game engine that manages lots of the state programmatically, with formal transitions of that state. The LLM can manage the state transitions, but the actual state needs to be more formal.


I think the player freedom and simulation elements of a text adventure are mostly an illusion. I don't think a typical text adventure has more degrees of freedom than a point-and-click adventure.

Doing experiments with LLMs and text adventures was revealing for me in this sense. An obvious thing to consider is using the LLM to parse the text... but if you try this you'll quickly realize that the parsers are mostly limited by what the parser _can parse into_. That is, the representation of a command is so limited that there's not a rich set of alternate inputs that would map to any valid command.

Before LLMs this also struck me in the voice assistant / NLP space, especially "natural language understanding" (NLU). The parsing wasn't great, but the thing-you-parse-into was also incredibly limited. Like you could parse "set an alarm for 8:30" into some template structure. But "no, change that to 8" didn't have a template structure, didn't have any structured representation.

What we've discovered is that the representation that actually fits these concepts is the chat log, or the somewhat magical discernment process of the LLM.

Unlike the point-and-click adventure, the text adventure has poor discoverability. This creates a fog where the player can imagine all kinds of possibilities. But the actual choice points are on the same order of magnitude as the hotspots, verbs, and inventory that define the choice points of a point-and-click adventure.

What I think the text adventure DOES accomplish (and the point-and-click adventure also accomplishes) is giving the player freedom of focus. You can look anywhere. You are usually in some open series of spaces where you can explore at leisure. The text adventure in particular offers a kind of tesseract opportunity, like in the flashback sequence shown in the article.

(Writing this, I am now thinking about a kind of LLM-driven game that discards all pretense of action or puzzles, but instead the player is a ghost free to view their environment, free even to view the internal thoughts of characters, but unable to change anything.)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: