It’s a cool idea: certain approaches to encryption still allow math to be performed. Here’s one example: say you encrypt data X with algorithm Z. then you could multiply Z by four, which would also multiply X by four. So you can run computations on the encrypted data without decrypting it.
It would be quite complex, but I suppose you could run a machine learning model this way to tag images without ever seeing the image, or knowing the resulting tag. Only the decryption key can be used read the results (which is on the user’s iphone, I suppose).
However… I don’t know how much compute cost this adds to an already expensive computation. The encryption used might not be the strongest out there. But the idea is pretty cool!
I don’t really understand the purpose of the feature — GPS tags are already embedded in the photo by the phone, so it knows the location of each picture. The phone also analyzes faces of people you’ve identified so you can search for people you know. What else does this new feature add?
Because you took two selfies in a restaurant near there, made a huge stunning collage of a duck below the tower and a couple photos from a while away to get the whole tower in view.
I’m running this tech at home, because we had the same use case. Except for me it’s running on a nas, not Apple’s servers. The location solution doesn’t quite work as well when you’re avid photographer
If you read the article thoroughly you’d know that a smaller model runs locally, to get an guess that a landmark might be in a spot in the image. The actual identification and tagging is done in the cloud. The tag is then sent back.
I don’t know how much compute cost this adds to an already expensive computation.
At that scale and because they do pay for servers I bet they did the math and are constantly optimizing the process as they own the entire stack. They might have somebody who worked on the M4 architecture give them hint on how to do so. Just speculating here but arguably they are in a good position to make this quite efficient, even though in fine if it’s actually worth the ecological costs is arguable.
Looks like they did “Brakerski-Fan-Vercauteren (BFV) HE scheme, which supports homomorphic operations that are well suited for computation (such as dot products or cosine similarity) on embedding vectors that are common to ML workflows” namely they use a scheme that is both secure and efficient specifically for the kind of compute they do here. https://machinelearning.apple.com/research/homomorphic-encryption
not going to be the overhyped LLM doing the analysis
Here indeed I don’t think so but other vision models, e.g. https://github.com/vikhyat/moondream are relying on LLM to generate the resulting description.
Well to be fair, and even though I did spend a bit of time to write about the broader AI hype BS cycle https://fabien.benetou.fr/Analysis/AgainstPoorArtificialIntelligencePractices LLMs are in itself not “bad”. It’s an interesting idea to rely on our ability to produce and use languages to describe a lot of useful things around us. So using statistics on it to try to match is actually pretty smart. Now… there are so many things that went badly for the last few years I won’t even start (cf link) but the concept per se, makes sense to rely on it sometimes.
Their chips are pretty good at not drawing much power. But then you also get to the balance of power cost, computing power and physical space.
Google and Microsoft are already building their own power generation systems for even faster AI slop. That would make power a lot cheaper, and super efficient chips might not be the best answer.
I don’t know which way Apple will go, except further up their own behind. But either way, these are some really cool approaches to implementing this technology, and I hope they keep it up!
Yep, reading their blog post to read a bit better. I don’t like that it’s enabled by default, especially despite iCloud off (which should be a signal to say the user does NOT want data leaving their device) but considering what others are doing, this seems like the best trade off.
The end user can access the resulting tags, Apple cannot. However iphones do automatically report if they see something Apple does not like (in the usa).
Whatever lack of incentives may be, this is what is happening. I just explained it a bit simpler than the article did.
It’s a cool idea: certain approaches to encryption still allow math to be performed. Here’s one example: say you encrypt data X with algorithm Z. then you could multiply Z by four, which would also multiply X by four. So you can run computations on the encrypted data without decrypting it.
It would be quite complex, but I suppose you could run a machine learning model this way to tag images without ever seeing the image, or knowing the resulting tag. Only the decryption key can be used read the results (which is on the user’s iphone, I suppose).
However… I don’t know how much compute cost this adds to an already expensive computation. The encryption used might not be the strongest out there. But the idea is pretty cool!
I don’t really understand the purpose of the feature — GPS tags are already embedded in the photo by the phone, so it knows the location of each picture. The phone also analyzes faces of people you’ve identified so you can search for people you know. What else does this new feature add?
It let’s you type “eiffel tower” into search and get those pictures. Rather than all the other unspeakable things you did in Paris that night
So I recently installed Immich and it does it for me using local AI
Yep, machine learning is nice
Current implementation seems like overkill. Why not just:
Because you took two selfies in a restaurant near there, made a huge stunning collage of a duck below the tower and a couple photos from a while away to get the whole tower in view.
I’m running this tech at home, because we had the same use case. Except for me it’s running on a nas, not Apple’s servers. The location solution doesn’t quite work as well when you’re avid photographer
If you read the article, you would know that the hard work is done locally on your iPhone not on apples server.
If you read the article thoroughly you’d know that a smaller model runs locally, to get an guess that a landmark might be in a spot in the image. The actual identification and tagging is done in the cloud. The tag is then sent back.
Because then they don’t have an excuse to move all your data to Apple servers and scan it for later use.
At that scale and because they do pay for servers I bet they did the math and are constantly optimizing the process as they own the entire stack. They might have somebody who worked on the M4 architecture give them hint on how to do so. Just speculating here but arguably they are in a good position to make this quite efficient, even though in fine if it’s actually worth the ecological costs is arguable.
Did they? Because it seems like everyone else is in a hype bubble and doesn’t give a shit about how much this costs or how much money it makes.
Looks like they did “Brakerski-Fan-Vercauteren (BFV) HE scheme, which supports homomorphic operations that are well suited for computation (such as dot products or cosine similarity) on embedding vectors that are common to ML workflows” namely they use a scheme that is both secure and efficient specifically for the kind of compute they do here. https://machinelearning.apple.com/research/homomorphic-encryption
At least it’s not going to be the overhyped LLM doing the analysis, it seems, considering the input is a photo data.
Here indeed I don’t think so but other vision models, e.g. https://github.com/vikhyat/moondream are relying on LLM to generate the resulting description.
My gosh, what is with people’s reliance on single thing
Well to be fair, and even though I did spend a bit of time to write about the broader AI hype BS cycle https://fabien.benetou.fr/Analysis/AgainstPoorArtificialIntelligencePractices LLMs are in itself not “bad”. It’s an interesting idea to rely on our ability to produce and use languages to describe a lot of useful things around us. So using statistics on it to try to match is actually pretty smart. Now… there are so many things that went badly for the last few years I won’t even start (cf link) but the concept per se, makes sense to rely on it sometimes.
Their chips are pretty good at not drawing much power. But then you also get to the balance of power cost, computing power and physical space.
Google and Microsoft are already building their own power generation systems for even faster AI slop. That would make power a lot cheaper, and super efficient chips might not be the best answer.
I don’t know which way Apple will go, except further up their own behind. But either way, these are some really cool approaches to implementing this technology, and I hope they keep it up!
Yep, reading their blog post to read a bit better. I don’t like that it’s enabled by default, especially despite iCloud off (which should be a signal to say the user does NOT want data leaving their device) but considering what others are doing, this seems like the best trade off.
deleted by creator