🇦🇺𝕄𝕦𝕟𝕥𝕖𝕕𝕔𝕣𝕠𝕔𝕕𝕚𝕝𝕖@lemm.ee to LocalLLaMA@sh.itjust.worksEnglish · 10 days agoHow much gpu do i need to run a 90b modelmessage-squaremessage-square14fedilinkarrow-up11arrow-down10file-text
arrow-up11arrow-down1message-squareHow much gpu do i need to run a 90b model🇦🇺𝕄𝕦𝕟𝕥𝕖𝕕𝕔𝕣𝕠𝕔𝕕𝕚𝕝𝕖@lemm.ee to LocalLLaMA@sh.itjust.worksEnglish · 10 days agomessage-square14fedilinkfile-text
minus-squareSylovik@lemmy.worldlinkfedilinkEnglisharrow-up0·9 days agoIn case of LLM’s you should look at AirLLM. I suppose there is no conviniet integrations to local chat tools, but issue at Ollama already started.
minus-square🇦🇺𝕄𝕦𝕟𝕥𝕖𝕕𝕔𝕣𝕠𝕔𝕕𝕚𝕝𝕖@lemm.eeOPlinkfedilinkEnglisharrow-up0·9 days agoThat looks like exactly the sort of thing i want. Any existing solution to get it to behave like an ollama instance (i have a bunch of services pointed at an ollama run on docker).
minus-squareSylovik@lemmy.worldlinkfedilinkEnglisharrow-up0·9 days agoYou may try Harbor. The description claims to provide an OpenAI-compatible API.
In case of LLM’s you should look at AirLLM. I suppose there is no conviniet integrations to local chat tools, but issue at Ollama already started.
That looks like exactly the sort of thing i want. Any existing solution to get it to behave like an ollama instance (i have a bunch of services pointed at an ollama run on docker).
You may try Harbor. The description claims to provide an OpenAI-compatible API.