Hacker News — vinext + Cloudflare Workers

new
past
show
ask
show
jobs
submit

▲Indexing a year of video locally on a 2021 MacBook with Gemma4-31B (50GB swap) (blog.simbastack.com)

461 points by asenna 2 days ago | 134 comments

desro 2 days ago [-]

> The skill is open at ~/.claude/skills/video-index/. If you're working on something similar (indexing personal archives, getting a local model to do real archival work, building agents that drive editing tools), I'd be glad to compare notes.

When your Claude wrote this post they might not have selected the right URL to share, unless your home folder is exposed. Care to share the skill files?

embedding-shape 2 days ago [-]

We just got a modern example of the classic message from a friend who just picked up programming, containing: "I just created my own web app, wanna check it out? It's here: http://localhost:8080"

0x38B 1 days ago [-]

Different context, but I sent a message like that in Signal the other day to a family member with a link to my IP, pointing to `Python -m http.server` running in a directory with a file for them to try (1). Easier than having them open my Samba share.

1: To get an Android app working that has been delisted and requires a 'key' app that you purchase. We did purchase it, but didn't think to make any backups.

m463 1 days ago [-]

reminds me of telling a friend:

I hacked your system: file:///etc/passwd

taneq 1 days ago [-]

There was a Userfriendly comic with Miranda telling some ‘hacker’ “my IP addy is 127.0.0.1, come get some”.

867-5309 1 days ago [-]

https://nitter.net/pic/orig/media%2FCrxXxYlWYAAjGJN.jpg

dcminter 1 days ago [-]

[dead]

z2 2 days ago [-]

I've been getting this weekly from colleagues. It's very much an epidemic right now! And the port number is indeed almost always a random number between 8000 and 8100.

a012 1 days ago [-]

Wait until they discovered the port number could go over 9000

dvfjsdhgfv 1 days ago [-]

> I've been getting this weekly from colleagues. It's very much an epidemic right now! And the port number is indeed almost always a random number between 8000 and 8100.

Really? A bit hard to believe, unless you have many dumb colleagues.

wiseowise 18 hours ago [-]

Not sure about dumb, but "not giving a shit" for sure. I routinely see tickets with markdown links to files on their local filesystem, drives me insane how someone can pull shit like this with a straight face.

aurmc 23 hours ago [-]

A lot of people here have colleagues who are playing with Claude Code but who have essentially no experience with development at all.

It’s not at all surprising. Not everyone is a developer.

asenna 2 days ago [-]

Oops! My bad. Fixing it now. And yeah, I can share the Skill file. Give me 5 mins.

asenna 2 days ago [-]

Ok I scrambled to finalize a name for it and create a new repo for it - https://github.com/Simbastack-hq/framedex

PS - I just put this together in the last few mins, removed my personal files and references. So it's not tested properly, please let me know if any issues.

It's still an early hack, but I have thousands of still images as well from my camera which I've not processed and I need to do the same analysis for those.

So I'll continue working on it, but happy to receive any PRs if anyone finds any use for it.

I'm tired of having a backlog of thousands of images and videos, leaving it for later.

jaggederest 2 days ago [-]

Hey friend, try something in this ballpark, your post has a bunch of painful AI tropes:

https://github.com/blader/humanizer

You get a pass here because you're doing really cool stuff but it's kinda tough to read past the AI nonsense, and it's relatively easy to screen out "it's not x it's y" kind of things and the bolded bullet points.

bonoboTP 2 days ago [-]

I don't dislike those tropes because they are frequent or because they are not pleasing to read intrinsically. I dislike them because it tells me it was made by AI and AI output varies strongly in quality and most of it is low on insight but rings the right bells to make it seem insightful. It indicates a lack of human care.

Hiding these clues by another AI pass doesn't solve the core problem. Now you just end up with content that camouflaged better but is still equally low in nutritional value.

cortesoft 2 days ago [-]

I feel like human copywriters have been using those same tricks for clickbait articles for years…

squeaky-clean 1 days ago [-]

"My AI writing isn't slop. It's just as good as buzzfeed lists or celebrity gossip articles."

nchmy 2 days ago [-]

Hence I've hated them all since before Ai. But now I'm utterly repulsed by it

dvfjsdhgfv 1 days ago [-]

Sure, but the omniprevalence of LLMs just just crystallized these into clearly recognizable patterns. Just like cliches, but not being limited to simple phrases.

fenix1851 1 days ago [-]

it’s a different story.

I was highly interested in reading this article from start to finish.

Ofc there are was a lot of slop moments, but author experience itself is great!

And i genuinely don’t care if he would share it through LLM article.

Just please remove slop markers :)

Forgeties79 2 days ago [-]

I dislike them because I find they generally don’t give any useful information OR if the information is in fact useful, it could do it with a fraction of the words.

Vigorous writing is concise.

yellow_postit 2 days ago [-]

As someone that naturally used a rule of 3 and em dashes I hate AI for taking that away from me.

jaggederest 2 days ago [-]

Agreed, I find myself avoiding constructs I would use naturally because they read as AI - "not just because other people would judge them, but because I also notice and dislike them".

asenna 2 days ago [-]

Thanks for this! This is exactly what I was looking for.

Tbh, I have a lot of thoughts and ideas and things to share and I do spend time and effort trying to de-AI-ing it but this should help a lot.

I'll try it out.

In fact, I was expecting getting shit on by HN readers for this but was pleasantly surprised that readers moved past it.

jaggederest 2 days ago [-]

Yeah I think you'll find these days that there's a lot of respect for substance like what you're doing, even past the noise of the AI. I also use a lot of AI but you really have to demand quality from it, whether it's writing, media, or code. It's clear you've got the taste from your media work, and we're all still learning as we go, so I'm very glad that I could point you in that direction.

refulgentis 2 days ago [-]

I'm curious: how, exactly, did it go from this is painful to read due to AI, to no one cares about AI use and you demanded quality when you used it and delivered?

jaggederest 2 days ago [-]

It didn't, it went from "this reeks from AI after edits, here's a tool that can help" to "people can read past it but there are better ways, you must demand quality". I don't think those two things are inconsistent.

refulgentis 2 days ago [-]

Ah, I see, after he uses the tool it'll be great because he has taste.

AlecSchueler 2 days ago [-]

I think you missed an important distinction being made:

> I also use a lot of AI but you really have to demand quality from it, whether it's writing, media, or code. It's clear you've got the taste from your media work, and we're all still learning as we go...

Their use of AI for "media work" has shown a taste but their writing usage still needs to equal that.

jaggederest 2 days ago [-]

I don't think "if you iterate on this, try using some tools, and ultimately demand that the output meet or exceed your demonstrated taste in other domains" is a hot take, honestly.

refulgentis 2 days ago [-]

It's not a hot take, you're right, I gravely misunderstood the timing in your post, i.e. you were clearly framing it as after and being polite and encouraging.

I'm more hot about it because it's frustrating having so many HN posts be a place for people to work out first drafts, especially when the first piece of feedback is "hey, uh, you clearly used AI and it's horrible to read as a result." So easy to avoid...good on you for being kinder.

(part of my frustration is I was excited because I write an local LLM client and thought I missed Gemma 4 has streaming video input support, but after reading through the slop it turns out its just the ol' "extract frames" workflow. tbf that would have happened AI or not, but put me in a mood)

jaggederest 2 days ago [-]

No worries, text is hard whether there's AI involved or not - I, in turn, mistook your clarification as a snarky "ah well of course if they try harder it'll be fine", my apologies for that. I share your frustration, but the best way I think is to educate not remonstrate unless they're someone who should clearly know better[1]

[1] https://news.ycombinator.com/item?id=48172536

repparw 2 days ago [-]

if you care for some feedback about the writing, dropping the link and saying "PR's are open!" would land probably equal or better, and would reduce noise on the message. as sibling said, substance and noise

asenna 2 days ago [-]

Agreed.

To be honest, my literal thought process initially when writing was: - I think this is cool, I should probably open source this - No wait, I'm again over planning, no one's gonna read this and the problem is probably too specific to me for anyone to care.

So I just mentioned "lets compare notes if anyone else trying".

Hence you can see from the comment above, I immediately realized I made a mistake when the parent asked for the Skill file. Should've had the link ready. Pleasant surprise.

jaggederest 2 days ago [-]

That's actually a really good point, blog posts as open source

refulgentis 2 days ago [-]

They haven't: this is the top thread, and the entire threads is saying its unreadable and explaining step by step how to do the basics you should have done before you posted. I'm not sure why you're pleasantly surprised, I would have expected embarrassed, and taken down the HN post to get at least the basics down before sharing it under my name (if possible, dunno how HN submissions work)

asenna 2 days ago [-]

Unfortunately will have to disappoint you, can't get embarrassed easily. In fact when all of this worked well locally, felt pretty proud ngl.

constantius 1 days ago [-]

It's quite sad that you're feeling pride largely for your ability to write a prompt, and it's sadder that you're being snarky with someone who expects more from HN users.

Your behaviour is not affecting the HN community in a positive way.

whattheheckheck 22 hours ago [-]

This diminishes the llm performance by making it use random out of band non thinking tokens

Zababa 2 days ago [-]

Btw I like your article, it does feel a bit AI generated but I think the problem and setting are interesting enough that it was a pleasant read.

asenna 2 days ago [-]

UPDATE: Quickly created a repo for this - https://github.com/Simbastack-hq/framedex (MIT License)

It's not tested properly after I genericized it. Will try to go through it properly and add more updates.

Two big things on my TODO: 1) Make use of this indexing and using Claude's help, make video editing faster with Davinci Resolve (now that I have a good index of all the content)

2) I currently did this for videos, but I want to add more things to this for my thousands of still images of my camera - need to make sense of them. So I'll be working on this as well.

Confiks 2 days ago [-]

I'm not quite sure why all that swapping is necessary. I really does age your SSD quite fast considering the enormous memory bandwidth required. Gemma 4 31B at 4-bit quantization should only be around 19 GiB [1], not 28.4 GiB. I'm not feeding it images regularly, so I'm not sure how much memory it needs to get those into context, but I can't imagine it is more than 10 GiB.

The activity monitor does show all kinds of Electron apps active, on top of a presumably model-loaded Handy and a virtual machine for Claude Code, so I guess that's the real root cause for all the swapping. If your laptop starts trashing I can't imagine you have any use for those apps, which will grind to a halt.

[1] https://huggingface.co/mlx-community/gemma-4-31b-it-4bit

asenna 1 days ago [-]

Yeah to be fair, I could've cleaned everything up but this was taken when I was doing other work on my laptop while the screenshot was taken.

Although slightly laggy, I was impressed by the fact that I was still able to work on other things and have a bunch of tabs open on my Brave browser.

carpo 1 days ago [-]

This is great. I wish I had enough ram for a local model. I just spent the last few weeks writing something very similar, but I made it a local Electron app with Whisper, ffmpeg and I added semantic search and embeddings for chatting with the videos. It talks to Claude for the vision analysis, tagging and video chat. Do you only send one image for yours? I used a customised scene detection algorithm to find multiple different images per video and then send them all in one request to Claude (along with the subtitles). It's definitely the most expensive part. Using Sonnet 4.6 for the analysis and Haiku for the tagging costs about $1 for an hour of footage, I can imagine it would be slow locally.

nl 1 days ago [-]

Try some of the models on OpenRouter if you are looking to save money. Gemma 4 31B is $0.12/M input, $0.37/M output vs $1/M input, $5/M output for Haiku.

There are other options that are good too. Gemini 3.1 Flash Lite is great for this kind of thing (NOT Gemini 3.5 Flash though - the pricing for that is bad).

https://openrouter.ai/google/gemma-4-31b-it

carpo 1 days ago [-]

Cheers, I'll give it a try. How are those models at returning structured results? When I was writing the prompts for the analysis step and testing with older Claude models, it would have trouble structuring the XML consistently. Sonnet 4.6 handles it really well.

nl 1 days ago [-]

Use function calling/tool use, not XML output. The models are all trained for that now.

Ie, instead of telling it to generate

  <name>Name</name>
  <age>19</name>
  <address>whatever</name>

give it a function

  details(name: string, age: int, address: string)

That is actually a JSON schema, and the models do great at it. Here's the claude docs, but they are all similar: https://platform.claude.com/docs/en/agents-and-tools/tool-us...

carpo 10 hours ago [-]

Hey, just want to thank you for this suggestion. Spent this morning swapping to open router and changing all my prompts to use tools instead of XML. Not only is Gemma and Gemini much cheaper, the output tokens from the tool call are much less too. Cost to analyse one 20 minute video with 10 snapshots went from $0.21 to $0.009, and I'm even sending full HD snapshots instead of the 960x540 ones I was sending before (to save costs). The results so far are pretty good. It looks like the larger images are giving the model more context, so in some cases making the cheaper models results better than the expensive models. I'm going to run this over a few hundred videos today and see how it goes in bulk!

carpo 1 days ago [-]

Very interesting. Thank you!

JosephTimpson 22 hours ago [-]

[flagged]

asenna 1 days ago [-]

Not one image - 5 frames per clip, sent in a single request with a transcript snippet. So the multi-frame + subtitles in one call part is the same as yours.

But yeah, how it picks the frame is the weak-point here. Scene detection would definitely help - this is #1 on the Roadmap.

Could you share how your scene-detection picks the frames?

---

For the vector search, I went for the trade-off of not having it but keeping it simple with plain Markdown files for more portability. The knowledge travels with the files when an SSD moves, no index to keep in sync, and plain text that outlives the tool. But the other path you mentioned is interesting as well to explore.

carpo 23 hours ago [-]

I originally limited mine to 10 frames spread evenly throughout the video, but it missed a fair bit of context at the analysis step, and didn't scale with length. So now when a video is loaded the app extracts a bunch of frames for the entire video, then calculates an image histogram and compares similarity to the previous one. There's some configuration so it doesn't send too many to the LLM, but still gets a good cross-section of frames to send.

You could also just use FFmpeg as it can do scene detection too. I tested both but liked the results from the histogram analyzer more.

Yeah, markdown works well if you're going to search through it with Claude Code or something like that. I built ClipScape as an Electron app with a local SQLite database, as I wanted an interface I could search and chat in and see the relevant thumbnails.

benbojangles 7 hours ago [-]

Gemma4 because presumably it does image analysis right?

-31b It's a dense model

-how many tokens/s is it running at

-What temps are the M1 max GPU/CPU running at

-Is it mlx or gguf

-Why 31b and not 26b which is moe and much more efficient on the m1 max at 50tokens/s & low temps.

I personally use (MLX) qwen3.6-35b-8bit mostly, but use Gemma-4-26b-4bit for image analysis, its mind blowing how fast it is at identifying the scene in a photograph.

herf 2 days ago [-]

Two questions:

1. What is the search index?

2. The "description.md" example has things like "faces -> cluster_id". Is this from Davinci Resolve's face index? Things like faces+names and locations are really important with photo collections, but general LLMs don't handle them so well.

asenna 2 days ago [-]

1) It's just simple plain-text `.description.md` sidecar files, one per clip, sitting next to each video.

Something which I can query later - Like when brainstorming with Claude "I wanna make some videos of the Luxury rooms in the lodge" and it knows what all videos could help here (going through the files).

There's also a folder root level files that aggregates the text descriptions to make it easier to find.

I've just attached an image in the blog showing an example - https://blog.simbastack.com/_media/gvcycx2n.png

2) No - nothing from DaVinci Resolve. Framedex is a standalone pipeline. Resolve isn't involved.

Faces come from insightface (the open-source buffalo_l pack - RetinaFace for detection), running locally on CPU. For each clip it detects faces in the sampled frames, embeds them, and writes rows to ~/.framedex/faces.db.

Tbh, this part I know it's building up in my local DB but I haven't tested how good is it. Will check them out properly soon.

But yeah, on your broader point that's why framedex deliberately does not ask the LLM to handle faces or locations.

----

Faces → insightface / ArcFace embeddings. Deterministic, comparable across clips. The vision model only contributes a rough people_count; it never tries to identify anyone.

Locations → EXIF GPS via exiftool, reverse-geocoded through Nominatim/OpenStreetMap. Hard metadata, not a guess.

The LLM only does what it's good at: scene description, mood, shot type, keywords, keep/review/cull rating (this last part is also debatable though).

throwa356262 2 days ago [-]

I ran Gemma on a 2015 thinkpad to do something similar. Fortunately, I could upgrade the memory otherwise it would have been a painful exercise.

Not gonna lie, llama.cpp had the fans spinning at max speed. But it worked and I got the job done.

iMerNibor 2 days ago [-]

> the fans spinning at max speed

This always confuses me - don't people want their computations to run as fast as possible and thus inevitably produce more heat that needs to be vented?

I suppose sometimes it is just an analogy for "its utilizing 100% of my resources" (which I'm guessing it is here), but I've definitely had people say it as an actual complaint in different contexts

overfeed 2 days ago [-]

> I've definitely had people say it as an actual complaint in different contexts

I think fan loudness is an outgrowth of conspicuous consumption because a certain OEM decided to make it a marketing bullet-point.

I was equally disappointed by by people - especially device reviewers - banging on the drum that phones made of plastic "didn't feel premium", and we got phones with glass backs that have to be shoved into plastic cases (because plastic is the near-perfect material to protect fragile phones screens and innards)

dist-epoch 2 days ago [-]

What people complain is when they visit a blog with two images and the fans are spinning at max speed because the blog has 100 trackers.

0xbadcafebee 2 days ago [-]

Fans shouldn't be running at max speed if the model fits in RAM with room to spare for context. Usually fans max out when the model doesn't fit and the CPU is chugging to make up the difference (or the user didn't tune LLM settings)

mmmlinux 18 hours ago [-]

I didn't realize GPUs operate with zero heat output.

egorfine 2 days ago [-]

> generative AI video has no place on a real travel brand

I am pretty sure that the vast majority of Airbnb hosts would not agree with you.

> equals TripAdvisor crucifixion

I have no idea how the Airbnb hosts with fake listings survive, really.

asenna 2 days ago [-]

Haha. It's honestly something that I've been struggling with myself. I'm running this safari lodge but I don't want to go down that route of slop videos!

But on the other hand, genuine videos do take time and slows down the process.

egorfine 2 days ago [-]

Thanks for the article! I have a beefy M5 Pro and I'm eagerly looking around for ways to use local models (specifically Gemma4 & Qwen3.6).

This is an excellent thing to do. Especially that LLMs excel at batching thus you can index multiple photos and videos in parallel for no performance penalty.

satvikpendem 2 days ago [-]

Unsloth Studio [0] is what I recommend these days, open source alternative to the more widely known LM Studio, and also built by the people who make good quantizations of released models. With MTP support not merged in you should get 2x token generation speed with no accuracy difference. They also have MLX quants if you scroll down a bit, which is a format specifically for macOS' Metal GPU acceleration but that's not integrated into Unsloth Studio just yet.

[0] https://unsloth.ai/docs/models/qwen3.6#mtp-guide

egorfine 2 days ago [-]

I have researched for quite a bit and so far the fastest runtime is the oMLX one. But there's a caveat: ttft on MLX on M4 Pro is enormous. On M5 Pro it has been greatly sped up.

regexorcist 2 days ago [-]

Curious if you tested llama.cpp and still found oMLX faster? I haven't tried the latter myself, might give it a go.

egorfine 2 days ago [-]

Oh yeah I did test various solutions and different settings and quants

Llama is about 1/3 slower on Apple Silicon.

mft_ 2 days ago [-]

I tried Unsloth Studio recently and was disappointed - in particular the downloading functionality is half-baked and didn’t cope with resuming downloads. As it seemed to just be a simple wrapper over llama.cpp, I found that huggingface hub, llama.cpp, and a couple of simple scripts actually offered better functionality once it was set up.

satvikpendem 1 days ago [-]

Yeah it still has some issues on the UX side. It works fine resuming though, just select the same model again and it'll resume the download, the only issue is there isn't a dedicated download page as that would help a lot.

What's better about Unsloth Studio vs LM Studio is it tells you exactly what quantization to use especially as Unsloth ones are quite good, and that it has web search and self-healing tool calls so having a web-searching local ChatGPT alternative is very easy to spin up.

asenna 2 days ago [-]

Thanks! Videos is still kinda new to me. But I have a large collection of amazing photos - tens of thousands of RAW images - just lying there spread across the different trip folders.

You know what I REALLY want? Just point this beast at the folders and it tell me which 150 shots are good to process from these 1,500 images. That's the dream!

Although the technology is getting there, it's still a very difficult problem to solve. Taste and art is subjective. Also me as a photographer will always be concerned - "what if my best shot was in one of these rejected shots".

But yeah, I think I'll try to do some more of these experiments soon.

endymi0n 1 days ago [-]

there’s a lot of open models out there… I told Claude to do a weighted score on several models and deduplicate by CLIP similarity for an expedition, should be easy to replicate (see below). Sure doesn’t select the absolute best pics from an emotional impact perspective, but it was pretty damn good at me not having to wade through the bottom 80% of mediocre shots and dupes!

—-

“Models scored all 4,487 photos. NIMA rewards technical craft (sharpness, composition), LAION rewards emotional/aesthetic appeal, MUSIQ is more general quality. Combined: 0.4 NIMA + 0.3 LAION + 0.3 MUSIQ, deduped at 0.85 CLIP similarity.

Interesting: the models wildly disagreed on some shots — one photo ranked NIMA #2 globally but LAION #4313.”

asenna 1 days ago [-]

Very interesting! Wasn't aware of these. I'll be exploring them soon. Thanks

busfahrer 2 days ago [-]

I have been contemplating a M5 Pro MBP, but for the life for me I wasn't able to find benchmarks for real-world models, do you happen to know how many tokens per second roughly you get with MoE models like Qwen 3.6 35B/A3B or Gemma 4 26B?

ahknight 2 days ago [-]

I'm not normally one to share videos as answers, but this particular fellow does a LOT of work with local AIs and Macs and happens to have a nuanced answer. https://youtu.be/XGe7ldwFLSE

embedding-shape 2 days ago [-]

You need to ask macOS people for their prefill speed as well, there are two numbers you care about here, and current MacBooks have generally terrible numbers when it comes to prefill performance. Surely it'll get better with time, but if you already have a desktop, I'd go the "beefy GPU" route first.

egorfine 13 hours ago [-]

> current MacBooks have generally terrible numbers when it comes to prefill performance

Previous MacBooks. Prefill speed on M4 Pro and M5 Pro are hugely different.

embedding-shape 25 minutes ago [-]

Alright, show me the numbers then, whenever I ask any macOS people about their prefill speed instead of generation speed, they all seem to disappear :P

egorfine 2 days ago [-]

Qwen 3.6 35B running on oMLX 0.3.9rc1: on oMLX I get 86 t/s on Q4 and 74 t/s on Q6.

Bear in mind that ttft on MLX is much much faster on M5 Pro as compared to M4 Pro.

Also bear in mind that those figures are with NO optimizations whatsoever: no MCP, no DFlash. I am waiting for both to be released for the Qwen models.

busfahrer 2 days ago [-]

Great, thanks! :-) and to mirror another poster: what kind of prompt parsing (prefill) speed do you get for that model? Also how is the speed for the 27B model?

egorfine 2 days ago [-]

35B: 1300-1800 t/s on both Q4 and Q6.

27B: give me 20 minutes

busfahrer 1 days ago [-]

Thank you, good sir!

egorfine 2 days ago [-]

Qwen3.6 27B oQ6: 12.5 t/s generation, 340-360 t/s pp.

egorfine 2 days ago [-]

Native MCP:

For Qwen 35B enabling native MCP on MLX models slows it down by 10%.

For Qwen 27B enabling native MCP on MLX models speeds token generation up almost exactly 1.5x.

(all tested on M5 pro).

mlvljr 1 days ago [-]

[dead]

juancn 2 days ago [-]

I'm running unsloth/Qwen3.6-35B-A3B-UD-Q8_K_XL on an M3 Max, 64GB at ~57 t/s with llama-server

brcmthrowaway 2 days ago [-]

Prefill speed and 27B number?

juancn 8 hours ago [-]

Prefill is around ~600 t/s.

I don't remember what the 27B was, I tried a 27B with different quantization at some point for that one, but I settled on the 31B.

edg5000 6 hours ago [-]

Love this article! Had never thought of a use case like this. Had no idea Gemma had a vision encoder. Great use case for local LLM!

theodorewiles 2 days ago [-]

My take is that B2C AI applications are kind of structurally limited by how hard it is to build personalized context.

The idea of capable local models could be a huge unlock here if they are able to do the bottom-up context collection research / tagging / etc. at scale.

michaelbuckbee 2 days ago [-]

I made a B2C AI app that's fully local (and free) to do AI based contextual file renaming.

So if you give it a bunch of screenshots it will try and intelligently name them based upon what is in the screenshot. Same for videos, PDFs, etc.

But to your point I haven't even tried charging money as it feels like something Apple is just going to bake in as a feature.

https://finalfinalreallyfinaluntitleddocumentv3.com/

asenna 1 days ago [-]

This is cool. And yeah love the name!

Are you planning to open source it? Or maintain it in the future?

michaelbuckbee 23 hours ago [-]

My plan was to just see if anyone wanted to actually use it first. That if I couldn't give it away I'd not invest the time in selling or open sourcing it.

I'd sort of designed it for my own needs first and hadn't thought too far beyond that.

ntcho 1 days ago [-]

absolutely love the domain here. great taste

asenna 2 days ago [-]

Definitely agree with this. Here, me and Claude brainstorming together did that Research, and some trial-and-error to get to this.

But I can tell it's only a matter of time before agents become smart enough to let my non-tech friends be able to just say "Make sense of all these videos in my folder" and it just does it.

enos_feedler 2 days ago [-]

Is it really local models that unlock this? Surely stateless model APIs would yield the same benefits? I get that local can be “cheaper” depending on usage, but we’ve been renting storage and compute from clouds at a premium for ages..

asenna 2 days ago [-]

A huge thing here was the massive amount of data that was just processed - I went through about 1TB of files over 24 hours.

Using API to analyze even a subset of this would've been painful imo.

enos_feedler 2 days ago [-]

I thought about that in this video case and it's true. I thought the parent comment was making a broader statement about local models in general. But even with video, if it was stored in private cloud storage near the LLM could this still have worked efficiently? What are the most painful elements of this whole setup / work environment if everything was cloud?

asenna 2 days ago [-]

Oh yes, if everything is cloud, then this is a non-issue.

The few other points of consideration would be:

1) Cost - I was considering using Sonnet for this but there's always the concern of reaching limits OR the API cost if you're using the API.

The feeling of knowing you have a capable model in your hands without any limits is actually pretty awesome. Your mind starts running at what else can I throw at it to do grunt work.

2) Privacy issues - same as with moving to cloud.

3) Reliability issues - I know from experience Claude uptime has been pretty bad the past few months

4) Restrictions - Claude has been pretty heavy handed with their restrictions lately, anything which remotely triggers there flags gets an instant denial (or worse, an account ban). Often these are false-positives.

I love the value I get from Claude but there's a different kind of freedom you get with local, capable models.

oceanus 2 days ago [-]

[flagged]

dang 2 days ago [-]

Could you please not post generated comments to HN? It's not allowed here. See https://news.ycombinator.com/newsguidelines.html#generated and https://news.ycombinator.com/item?id=47340079.

We ban accounts that do this and I don't want to ban you, so please write everything that you post to HN by hand.

Of course, it's impossible to know for sure what was LLM processed or not, but we're getting complaints about some of your posts and, upon inspection, the complaints seem justified.

genxy 2 days ago [-]

The article itself has many AI tells. Can we update the guidelines on AI generated content ?

dang 2 days ago [-]

That's a separate issue and more of a grey area still. We're thinking about it.

tefkah 23 hours ago [-]

i’m glad to hear this is being considered, thank you for your efforts

genxy 2 days ago [-]

Why did you destroy your own voice to have it replaced by AI ?

mainaisakyuhoon 1 days ago [-]

I really struggled to read the AI slop in this.

dwa3592 20 hours ago [-]

did you know that this existed and is pretty good and doesn't hog 50GB of swap?

https://github.com/iliashad/edit-mind

echion 20 hours ago [-]

Thanks for that link -- deserves more attention.

clueless 2 days ago [-]

This sounds like a great capability to be added to immich

asixicle 2 days ago [-]

Or Stash lol

moinism 1 days ago [-]

> Every AI video editor on the market assumes your footage is already labeled

Shameless plug: I'm the founder of Chat Octopus, an AI media assistant, and it actually 'looks' at the videos to understand them before creating a cut.

andai 2 days ago [-]

Awesome. Say, this is very comprehensive.

I was vaguely aware of all these pieces existing (except for running a facial recognition database at home o_o), but it's really neat to put them all together like that.

asenna 2 days ago [-]

Thanks! I was honestly casually trying it out on the side with Claude's help. And I was actually pleasantly surprised to see how good the result was.

Still blows my mind I can do all this from my 2021 MBP.

I'll try to do a post once I have the next steps working (helping with planning and editing videos with Davinci Resolve).

ahknight 2 days ago [-]

I also have a 64GB M1 Max and am similarly impressed with what that workhorse can do. The M5 tempted me -- a lot -- but then I looked at what I was already getting done on that machine and just couldn't justify it ... yet. Someday, surely, but not yet. Gemma4 gave all my local projects new life, just like what you did here.

Great job. Long live the M1 Max!

asenna 2 days ago [-]

100%

Although knowing how good these local models are getting, I am now eyeing the upcoming M5 Ultra Mac Studio (256gigs perhaps). But knowing how crazy the market is, it might be a year before I get the chance to get my hands on it. If it even launches by WWDC.

gitowiec 2 days ago [-]

Reading this text feels strange, sentences seems to be detached

cataphract 2 days ago [-]

I had exactly the same impression, and I recall seeing this style other times recently. First time I thought it was just bad writing skills, now I'm thinking it's AI generated.

asenna 1 days ago [-]

I'm the author, yes it is AI-assisted.

You can make AI-generated content without it being slop. Slop, to me at least, is content that's wrong, padded, or generic.

I see the cadence / short-sentence issues but if there's something else beyond those, I'd actually want to know what made it feel bad.

I would've put off documenting what I did over the weekend but instead, I did document everything, spent quite some time (several iterations) and effort to make sure it does not hallucinate and writes in my own tone and voice. I'm sure it could be better but the content is not made-up.

At a time where most of us software engineers have changed our workflows to let AI write 80+% of our code using agents, I feel writing is heading the same way. It then becomes a matter of taste, whether it's done well or not.

If you're looking clues and signs for whether a content has used AI, you're going to be disappointed over the next 12 months.

If it feels jarring right now, I'll work harder on the workflow so it feels more natural next time (someone shared this project with me - https://github.com/blader/humanizer).

But this clearly allows me to make content which I wouldn't have done earlier.

cataphract 22 hours ago [-]

I'm not philosophically against AI or anything, but I think this needed some heavy editing.

I did not even initially think upon seeing this style for the first time that it was AI-written, because I would associate AI-written text as fluffy. This staccato instead looks like the model was told to be terse and informal. I think the informality doesn't help either -- it's not that you can't have a well-written colloquial text, but I think it's harder to pull off.

Here is an example:

> Gemma returned people_count: "many" instead of an integer. My vision prompt literally said integer or the string "many" if >10. Gemma followed instructions correctly; the bug was schema design. The fix was a stricter prompt (integer 0-99 with explicit guidance to estimate) plus a coercion in the parser for the legacy "many" responses. Don't union-type schema fields. Pick always-int or always-string, never "int or this one specific string," because every downstream consumer pays for the choice.

bentcorner 15 hours ago [-]

> The first half is a constant flood of footage from the iPhone, the DJI Pocket, the drone, the Nikon Z8, and lately the Ray-Ban Metas too. There's always something being recorded. Every photographer or videographer I know is sitting on the same problem: an archive that grows faster than they can edit it. The second half is why mine never gets touched.

This is your second paragraph but reads awkwardly. You mention two halves in the previous paragraph, so I kind of try to map those two halves to the halves in this paragrpah. But I don't understand what the second half is in this second paragraph.

> Three months ago the lodge's social channels went dark. Not for lack of content; the lodge has years of raw footage across multiple SSDs. The bottleneck was editing time, and my time disappeared. Claude Code with Opus 4.5 (and then 4.6) hit the point in February where you could leave agents running for hours and come back to merged PRs. KaribuKit was going live with its first paying property in the same window. I stopped sleeping properly, started running three or four agents in parallel in the background, and the months when I would have cut reels turned into months when I shipped software instead.

I don't fully understand this paragraph either. Your time disappeared? Into what? Was it the lack of sleep? I don't know what KaribuKit is.

> I asked it out loud: how does the agent know what's in each clip?

Did you? Really?

> Four bugs, four lessons

I've noticed that AI tends to rathole into random things when summarizing a piece of work, so I'm skeptical that these were actually the most four interesting bugs you could have shared.

I would recommend you just remove this section or take the time to actually think about some learnings you had from this project. Syntax errors or missed CLI params are mildly interesting but what makes these four bugs interesting to your readers?

> The actual take

The same criticism here applies. Are these your real takes, or did Claude make these up too?

Some obvious tells to me of things that AI likes to write that humans rarely ever say:

> Both real, both consuming attention.

> Four constraints set the shape:

There's way more than just this (the writing style of nearly the entire post screams Opus 4.7), but that's just what jumped out at me when I started reading your post.

I don't mind you used AI to write this but in the future when you write using AI, take the time to read the entirety of the article and consider the goals of what you want to write and if the AI achieved that. Take out what doesn't belong and make sure that what you have left says things in your voice.

pavlov 1 days ago [-]

The content is good, but this LLM writing style gets tiresome. Everything is a revelation:

>“I bought it for Chrome. It's running a model that didn't exist when I bought it.”

Well duh, personal computers run new software. That’s literally the whole point. The Apple II didn’t sell on the strength of the preinstalled apps.

asenna 1 days ago [-]

Author here. I totally hear you. I wasn't expecting this to do well on HN for exactly this reason.

But I've mentioned elsewhere - if it wasn't for all the AI-assistance, I would've put-off documenting everything that I did and not even get to the writing part.

But yeah, I'll be working on the workflow to make the next write-up better, more humanized.

ngai_aku 2 days ago [-]

I’d like to do something like this for the collection of home videos I have piling up, but I’m still on 16GB M1. Any hope of getting decent results with smaller models? If not, does anyone have tips on GPU rental?

I have a Claude max sub and plenty of OpenRouter credit, but I don’t feel good about uploading my family’s private videos

coldtea 1 days ago [-]

The post is a mix of human and AI writing and the AI-mannerisms get on the nerves. At least it has a clear topic and some actionable insights and code examples.

mujib77 20 hours ago [-]

This is sick. Nice work

cold_harbor 2 days ago [-]

the reason 50GB swap is even viable here is Apple Silicon's memory bandwidth. on x86 that much swap would make inference unusably slow

throwawaytea 2 days ago [-]

Memory bandwidth or storage bandwidth?

bahmboo 1 days ago [-]

potato potatoh

harlanji 16 hours ago [-]

Interesting. I've been doing similar stuff with my archive on a weak Celeron laptop with 4GB RAM using vanilla ML tech that I'm learning by prompting LLMs (heh). Extract all info from media as sidecar files and all, exploring low power approaches.

I can sell this as a service to people who can't even run an LLM, or don't want to cook their hardware.

Waitlist open:

"Catalog, search, preview, and generate production-ready prompts & scripts from your entire archive — on your existing hardware. Then render in the cloud."

https://harlanji.pythonanywhere.com/assetforge/

yardie 2 days ago [-]

Now I have another project for this weekend! I also have tons of video and not a lot of time to index them.

brcmthrowaway 2 days ago [-]

So do they run the lodge or what?

asenna 2 days ago [-]

Hi. I wrote this article - yes, I do run a safari lodge in Maasai Mara, Kenya. It's amazing. Ask me anything if you're interested in knowing more.

(Also email is in my profile).

mbenne 22 hours ago [-]

[dead]

zazibar 2 days ago [-]

The subject matter is interesting but the amount of slop makes it difficult to read through. Yeah, it's great that you can throw your technical problems at Claude without caring much about the generated output but treating your own writing that you actually want to share with the world the same way is a terrible idea.

asenna 2 days ago [-]

Tbh, I did spend a lot of time trying to ground it and de-slopify it - verified nothing was halucinated and went through 10 iterations to get to this. It's almost like wrestling with Claude and I knew it would be tough on HN.

But because of the fear of non-perfection, I used to put away things like creating this article or even posting it anywhere. And I do think the article has real value that HN would appreciate (I am myself an HN-enthusiast).

I'll try more. Someone else shared this project which would be really helpful - https://github.com/blader/humanizer

Also a side note, the blog is posted on my self-created Slopit.io platform which is purely meant for your personal agents (working along with you) to post content - I recommend trying it out. https://blog.slopit.io/this-blog-post-is-slop/

I know, things are getting difficult with all the slop around, but my personal opinion is, as the agents get better at writing, the "annoying-ness" factor reduces and pieces of substance will still be appreciated, even if it was written by agents. This and the fact that agents aren't going away.

If I've automated a lot of my coding, I feel like engineers like me would naturally progress to also taking agents' help to write useful content.

PS - this comment was 100% hand-typed.

teach 2 days ago [-]

For what it's worth, I really enjoyed this read and almost came here to comment "this is the most enjoyable llm-assisted article I've read in a while"

The tells were unmistakable but it still had a human touch, so I for one am glad you published anyway.

asenna 2 days ago [-]

I'm definitely learning and hope to do better next time but your comment truly means a lot.

I kid you not, I've taken a screenshot of this to motivate me next time I'm doubting publishing :)

Maya_Andersson 18 hours ago [-]

[flagged]

RealMarcus_AI 20 hours ago [-]

[flagged]

maxothex 2 days ago [-]

[flagged]

danborn26 22 hours ago [-]

[dead]

Rendered at 10:09:45 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.