Besides the fact that you do have conflicts of interest (disclosing them doesn't negate them), you don't seem to understand that given how all of the Big AI players have shown themselves to be ruthless and shamelessly dishonest (hoovering up people's creative work without concern for its licensing, aggressively scraping websites to the point of DDOSing them, disregarding robots.txt and using residential IPs to disguise their traffic, and all the while denying everything), that when you assume the role of their most enthusiastic shill, people will naturally start to doubt your integrity.
EDIT: To put it another way, I remember a talk where you dismissed people's concerns about their data being used for training after AI was integrated into a product, citing the company's (a big AI player) denial--as if that should just be taken at face value, because they says so--a perspective that many of us view as naive or disingenuous.
The purpose of disclosure is to allow people to make their own decisions about how trustworthy I am as a source of information.
If I started rejecting access to early models over a desire to avoid conflicts of interest my coverage would be less useful to people. I think most of my regular readers understand that.
I was responsible for one of the first widely read reports on the ethics of model training back in 2022 when I collaborated with Andy Baio to cover Stable Diffusion's unlicensed training data: https://waxy.org/2022/08/exploring-12-million-of-the-images-...
Calling me "their most enthusiastic shill" is not justified. Have you seen what's out there on LinkedIn/Twitter etc?
The reason I show up on Hacker News so often is that I'm clearly not their most enthusiastic shill.
This is a disappointing thread to find - HN is usually a little more thoughtful than throwing around insults like "insufferable AI cheerleader".
If I can provide a different perspective, I find your writing on LLMs to be useful. I've referenced your writing to coworkers in an effort to be a little more rigorous when it comes to how we use these new (often unintuitive) tools.
I think the level of disclosure you do is fine. Certainly a better effort at transparency than what most writers are willing to do.
It's called having standards.
If I'm reading propaganda I'd at least like something in return.
This whole I'm so positive haha I just wanna help humanity might fly on likedin but the whole point of this place is to have interesting information.
BTW why was this thread on the front page with 1 upvote? I'm sure there's no funny business going on here lol.
>inb4 flagged
I stand by my opinion that if a major AI company says they aren't training on something it means they aren't training on that thing.
I continue to be annoyed that they won't confirm what they ARE training on though. Saying "we don't train on data submitted is our API" isn't exactly transparent, I want to know what they ARE training on.
That lack of transparency is why they have a trust crisis in the first place!
I can't think of a more insufferable AI cheerleader. I wish I could hide all submissions of his blogposts, as well as his comments. (Note that I flag neither.)
(Odd to see a complaint about me being an "AI cheerleader" attached to a post about the negative impact of AI on copywriting and how I think that sucks.)
Ignore the haters big dawg - your commentary and distillations are widely appreciated but there's always someone having a bad day looking for a punching bag.
> the negative impact of AI on copywriting and how I think that sucks.)
The extent of your analysis is
> whelp that sucks
with a tone similar to what one might take when describing the impact of flatscreen TVs on the once-flourishing TV repair business, without mentioning all of the legitimate ethical (and legal) objections people have to how AI companies train their models and how the models are used.
> Anything I could do to be less insufferable?
Sure, go do a series on how they use residential IPs to hide their scraping, or on how they're probably violating copyright in a multitude of ways, including software FOSS licenses by disregarding attribution clauses and derivative work licensing obligations, especially for copyleft licenses like the GPL. Write about people using these systems to effectively "rewrite" GPL'd code so they can (theoretically) get around the terms completely.
Has it been proven that the major labs are scraping via residential IPs? If so I will absolutely write about that.
I know there are a ton of fly-by-night startups abusing residential IP scraping and I hate it, but if it's Anthropic or OpenAI or Google Gemini that's a story worth telling.