AI — Страница 1484

McKinsey launches a generative AI chatbot to bring its knowledge to clients

McKinsey and Company on the side of a building

Consulting firms provide businesses with professional advice based on thorough research of a specific industry or area. As a result, consultancy firms have robust sources and research data — and now McKinsey & Company has launched an AI chatbot to helps its clients access this information.

On Thursday, McKinsey unveiled Lilli, its AI-powered search tool that gives clients and consultants easy access to the firm's vast stores of knowledge.

Also: Otter.ai partners with Slack to share meeting insights with your team

When asked a question, Lilli scans the firm's databases and identifies five to seven relevant pieces of content, summarizes key points, includes links, and even identifies experts, according to a press release from McKinsey.

McKinsey has a robust knowledge base that consists of more than 40 curated knowledge sources, 100,000 documents and interview transcripts, and a network of experts that spans 70 countries. A tool like Lilli makes it easier to place those rich sources in the right hands.

"Lilli aggregates our knowledge and capabilities in one place for the first time and will allow us to spend more time with clients activating those insights and recommendations and maximizing the value we can create," says Erik Roth, a senior partner with McKinsey.

Also: How to use the new Bing (and how it's different from ChatGPT)

Since Lilli was designed with clients and consultants in mind, both were used to pilot the model. Ultimately, both clients and partners can use the firm's body of knowledge for everyday business applications and in their attempts to increase productivity.

As part of the press release, several McKinsey partners share their accounts of incorporating Lilli into their workflow for different use cases, including preparing for work with clients, and getting ready for meetings and presentations.

"I use Lilli to look for weaknesses in our argument and anticipate questions that may arise," said Adi Pradhan, an associate partner at McKinsey.

"I also use it to tutor myself on new topics and make connections between different areas on my projects."

Also: Google is beefing up AI-powered search on Google Chrome for iOS and Android

McKinsey has more than 70 experts working on Lilli who ensure it is deployed cost effectively and safely. The firm plans to scale Lilli across the business to thousands of colleagues by the end of the year.

Many enterprises are choosing to build their own generative AI models. This strategy allows companies to cater a language model to their specific business requirements and can also help to ensure that sensitive company data stays protected.

Artificial Intelligence

Now Every Llama Can Code

Meta is taking competition head on in every field. It started competing with Elon Musk’s X and launched Threads. To compete with OpenAI’s ChatGPT, it launched Llama, and then Llama 2. Threads is a flop, but when it comes to competing with OpenAI, Meta has it surrounded with its tiny llamas from every direction. It even partnered with Microsoft to host its open source and commercially available LLM on the cloud to compete with OpenAI’s GPT.

Now, Meta has another trick up its sleeve.

Meta is planning to launch a platform to help developers generate code automatically. According to a report by the Information, Meta’s code generating platform, which is dubbed as Code Llama, will be based on Llama 2 and is going to be open source. It is expected to launch as soon as next week, as told by someone familiar with the matter. Meanwhile, DeepMind’s AlphaCode, its own code generating platform, has been available on GitHub since last year. But arguably, it has barely made any impact.

Moreover, this is something that OpenAI has been doing all this while through Codex, or Amazon’s CodeWhisperer. But interestingly, Both of these are proprietary and closed source, something that “OpenAI” has been known for now for some time. But when it comes to Meta, it is building upon its biggest moat – the open source community.

What’s in this for Meta?

Though Microsoft has been trying to make everyone a developer by providing GitHub Copilot and even embedding some coding capabilities on ChatGPT, Meta is on a roll. Through Code Llama, companies and enterprises would be able to make their own AI apps, without paying for OpenAI’s or Google’s paid software.

Just like Llama 2 was allowing people to build their own chatbots, Code Llama will enable companies to build their own AI coding assistants. The best part for enterprises here is that since it is open source, developers from companies would be able to upload their source code and generate a lot of specific code based on the uploaded proprietary code.

This has been a continuous worry that has been pushing enterprises away from using OpenAI services, something that even Microsoft has acknowledged. That is why a lot of people have been banning the use of ChatGPT for their employees.

For Meta, there might be no direct monetary benefits of open sourcing its software. Meta revealed in May that all this while, the company has been using CodeCompose internally to generate code with AI. There is no proof that Code Llama is going to be an extension of it but the company has said that the model was trained on legally available programming data, unlike competitors like OpenAI’s Codex.

Furthermore, as Meta continues to open source its models, it would benefit from the contributions from the community and build on its products. For instance, the company has been developing chatbot personas for its social media platforms with the help of Llama 2. Developers all over the world have been able to spot the security flaws of the model and helped Meta improve it over time. Same could be the case for Code Llama.

Code Llama to bring a revolution

There has been this talk that open source is actually the moat for Meta, which OpenAI and Google lack. This is being proven again and again ever since a lot of companies have started adopting Llama 2 to build their own AI products, and moving over OpenAI and Microsoft proprietary algorithms. But lately, things haven’t been looking great for OpenAI amidst all of these open source models.

According to reports, OpenAI has also been working on an open source model to compete with Meta’s success. Code named G3PO, OpenAI has been in talks to work on this open source model. But clear information on the release date or if it would even go ahead and release it is unsure.

The only thing that Meta is still struggling with is that Llama 2 still requires a significant amount of computer power. Same would be the case with Code Llama. For this, there have been several solutions coming up. One such example is Petals, which is leveraging a decentralised pipeline, much like sharing compute on crypto mining or torrent, to increase the inference speed of neural networks.

Nonetheless, Meta’s Code Llama is going to increase the pressure that OpenAI has already been going through lately. Possibly, it might even push the company to release an open source model to not lose on its GPT and Codex customers because of privacy issues.

The post Now Every Llama Can Code appeared first on Analytics India Magazine.

Google Brain Researchers Launch AI Startup in Japan

A former Google Brain researcher, Llion Jones, renowned for his co-authorship of the groundbreaking paper – Attention is All You Need has united forces with ex-colleague David Ha to establish Sakana AI, an AI startup headquartered in Tokyo.

Jones, hailing from Wales, recently concluded his tenure at the renowned US tech behemoth, and alongside Ha, the former head of Google’s AI research division in Japan, he has embarked on this pioneering venture as the CTO. Ha, now serving as the CEO of Sakana AI, most recently orchestrated innovative research endeavours at Stability AI, a prominent player in image AI technology.

Personal Announcement! I’m launching @SakanaAILabs together with my friend, Llion Jones (@YesThisIsLion).https://t.co/nMIl8fzE73 is a new R&D-focused company based in Tokyo, Japan.
We’re on a quest to create a new kind of foundation model based on nature-inspired intelligence! pic.twitter.com/jAPe8kN60d

— hardmaru (@hardmaru) August 17, 2023

Sakana AI’s pivotal undertaking revolves around the development of an indigenous generative AI model, characterised by its capability to produce diverse forms of content such as text, images, code, and multimedia elements. This strategic move places Sakana AI in direct competition with titans of the AI realm, including industry giants like Google, Microsoft, and OpenAI, as well as burgeoning startups like Cohere, Character.ai, and Anthropic.

The name “Sakana,” derived from the Japanese term さかな (sa-ka-na) signifying “fish,” is emblematic of the co-founders’ vision. They envision their enterprise as a collective entity, akin to a school of fish coalescing and functioning in harmony through elementary rules, as explained by Ha on X.

Jones and Ha’s inspiration stems from natural concepts like evolution and collective intelligence, a motif that they seek to embody in their research.

Notably, Jones was a pivotal contributor among a group of eight Google researchers responsible for crafting the Transformers in 2017, which is applicable in almost every major generative AI development such as ChatGPT and Bard for textual generation, and Stability AI, Midjourney, and Dall-E for image synthesis. All of the researchers of the paper have now left Google.

In Jones and Ha’s assessment, existing AI models exhibit constraints by virtue of their rigid, unyielding structures resembling buildings or bridges. This stands in stark contrast to natural systems, which showcase adaptability and sensitivity to external shifts. The co-founders aspire to harness the principles of evolutionary computing to develop AI models that embody these adaptive traits, concurrently addressing concerns pertaining to cost and security.

Having been established in Japan for an extended period, the co-founders strategically chose Tokyo as the epicentre of their operations. Interestingly, Japan has been quite liberal when it comes to deciding the copyright laws in the country.

Furthermore, Jones and Ha anticipate Tokyo’s role as a pivotal hub for training data and tailored model infrastructure to cater to non-western cultures and societies. They perceive this confluence as a catalyst that will propel the next wave of technological breakthroughs.

Interestingly, OpenAI is also planning to expand to Japan soon.

The post Google Brain Researchers Launch AI Startup in Japan appeared first on Analytics India Magazine.

After Reporting $380 Million Loss, Databricks Explores Fundraising Talks

Databricks is currently engaging in preliminary talks with potential investors for a fresh injection of capital, with the potential amount expected to reach several hundred million dollars, The Information reported on Wednesday, citing two people familiar with the matter.

Investment discussions are underway as Databricks progresses toward achieving a point of financial stability. The company has made significant strides, moving closer to a break-even position, the report said.

Over the past two fiscal years, Databricks incurred losses totaling approximately $900 million, excluding considerations for depreciation and amortization.This encompasses a notable operating loss of $380 million recorded in its latest fiscal year, which concluded in January, the report added.

Databricks is a tech company that many people are keeping an eye on. It’s working hard to become a leader in the AI (artificial intelligence) trend. For a while now, it has been selling software tools that make it faster and easier to create computer programs that can learn and make decisions.

In June 2023, DataBricks announced its definitive agreement to acquire MosaicML, a generative AI platform, in a transaction valued at approximately $1.3 billion. The acquisition aims to make generative AI accessible to organisations, allowing them to “build, own and secure generative AI models with their own data.

Databricks has positioned itself strongly in the market through several strategic moves. The introduction of LakehouseIQ, the acquisition of MosaicML, and the development of Unity Catalog have placed Databricks in a favourable position to maintain its market position and compete for incremental market share.

The recent fundraising indicates that the company is not currently aiming to become publicly traded, which was the expectation earlier this year among financial experts and those monitoring initial public offerings. CEO Ali Ghodsi mentioned to Bloomberg in June that the company does have plans to eventually become a publicly traded company, but he emphasized that they don’t want to rush into it right away. This is because a quick move to the public market could potentially hinder their ongoing AI initiatives.

The post After Reporting $380 Million Loss, Databricks Explores Fundraising Talks appeared first on Analytics India Magazine.

Intel-Tower Split: A Blow to Karnataka’s Chip Ambitions?

The Karnataka Government’s aspirations to establish a fabrication facility within the state might have encountered a substantial setback. Last year, the International Semiconductor Consortium (ISMC), a collaborative effort between UAE’s Next Orbit Ventures and Israel’s Tower Semiconductor, had ambitious plans to establish a fabrication unit in Karnataka, India. Cementing this vision, ISMC inked a Memorandum of Understanding (MoU) with the Karnataka government, outlining their intent to build a $3 billion fabrication plant. The consortium also intended to secure 150 acres of land within the Kochanahalli industrial area for this purpose.

However, ISMC’s quest to set up a fab was hinging on Intel’s deal to acquire Tower Semiconductors. But recently Intel terminated the deal to acquire the Israeli semiconductor company. According to the Santa Clara-based tech giant, the deal was terminated because it failed to acquire necessary regulatory approvals on time. The agreement needed endorsements from global regulators, including China but despite Intel CEO Patrick Gelsinger‘s recent trip to China to secure approval, Chinese regulators did not grant clearance for the deal by the August 15 deadline.

Initially, ISMC, IGSS Ventures, and the Vedanta-Foxconn JV were the three applicants seeking government incentives to establish fabrication units in the nation. However, the Vedanta-Foxconn joint venture is no longer operational, and in the wake of the Intel-Tower deal’s collapse, it appears highly probable that the Israeli semiconductor firm will reevaluate its strategic direction concerning endeavors in India.

Setback for Karnataka?

The MoU ISMC signed with the Karnataka government with ISMC, in which Tower was a technology partner, is most likely off, according to Arun Mampazhy, an independent semiconductor analyst. To get some concrete answers, AIM wrote an email to Tower Semiconductors, to which they are yet to respond.

The consortium’s initial blueprint outlined the construction of a 65-nanometer analog semiconductor fabrication facility. This endeavor aimed to generate 1,500 direct job opportunities and more than 10,000 indirect employment prospects within seven years.

( Signing of the Memorandum of Understanding between ISMC and Karnataka Government)

The ISMC plant in Karnataka would have made the state a place to look forward to in the semiconductor space, according to Basavaraj Bommai, who was the chief minister of Karnataka at that time. But now, since the deal with Intel is off, Tower Semiconductor is no longer bound by Intel and not even ISMC, Mampazhy said. Moreover, even if Tower chooses to proceed with its intention to establish a fabrication unit in India through an alternative joint venture, the likelihood of selecting Karnataka once more seems minimal.

A blessing in disguise for others?

While the Intel-Tower deal termination could be a setback for Karnataka, it could be a blessing in disguise for Indian players looking to enter the semiconductor space. Mampazhy believes Intel’s inability to acquire Tower Semiconductor presents an opportunity for the likes of Vedanta and Tata Electronics.

After the JV with Foxconn fell through, Vedanta is in search of a new technology partner. At Semicon 2023, Vedanta Chairman Anil Agarwal did hint that they are in discussion with a ‘world-class’ technology partner. However, partnering with the likes of Tower Semiconductor could prove to be pivotal for Vedanta. “If the partnership with STMicroelectronics is turning out to be too tough, Vedanta could look to partner with Tower Semiconductor,” Mampazhy told AIM.

Similarly, last year, in an interview with Nikkei Asia, Tata Sons Chairman Natarajan Chandrasekaran revealed that the conglomerate has set up Tata Electronics in a bid to tap into the growing semiconductor space in India. While the plan initially for Tata is to set up an Outsourced Semiconductor Assembly and Test (OSAT), eventually Tata would like to get to the fabrication stage. For them too, Towers could emerge as a potential technology partner.

Moreover, another potential scenario involves an Indian business entity potentially acquiring Tower Semiconductor outright. “With Intel’s prior offer of 5.4 billion and the likelihood of share prices declining due to the terminated deal, the opportunity might arise at a slightly reduced price. This presents the prospect for an Indian business to contemplate a complete acquisition,” Mampazhy said. However, the likelihood of Chinese regulators approving such a deal remains an entirely separate question.

Much of it depends on Tower Semiconductor

Nonetheless, a substantial portion of the decision hinges on Tower Semiconductor’s choice – whether to sustain their existing joint venture with Next Orbit Ventures or initiate a new collaboration with an Indian partner, or to disengage completely. However, another viable option in front of Tower is to act as a technology provider, instead of a technology partner.

“The concept of technology providers involves a technology transfer agreement where ongoing support is provided until the new fabrication facility integrates the technology successfully. A licensing fee is levied for this service. After the facility becomes operational and the technology functions effectively Tower Semiconductor disengages. However, the Indian government seems less inclined towards this approach,” Mampazhy said.

What the Indian government wants is for the technology provider to be a technology partner as well, meaning that Tower to has to be an equity partner in a JV, if they were to participate in establishing a fab in India. For Towers, to become a technology partner, won’t require a significant investment, given 50% of the cost will be borne by the central government and state governments would chip in with another 20-25%. Hence, a scenario where Tower becomes a technology partner to one of the Indian business houses is the most welcoming scenario from an Indian perspective.

The post Intel-Tower Split: A Blow to Karnataka’s Chip Ambitions? appeared first on Analytics India Magazine.

McKinsey Launches Lilli, a Gen AI Tool for Employees

Management consulting giant McKinsey has unveiled “Lilli,” a generative AI tool designed for internal use by its employees.

Lilli functions as a consolidated platform, efficiently searching and combining the company’s extensive knowledge to provide valuable insights, allowing McKinsey’s teams to dedicate more time to working directly with clients, leveraging insights and aiding in problem-solving and skill-building.

Read more: [Exclusive] Amazon’s Generative AI Play for Bedrock

How Does Lilli Work

Designed by the company’s “ClienTech” team under chief technology officer Jacky Wright, Lilli operates by allowing users to input questions and uses a vast database to identify the five to seven most relevant pieces of information, providing concise summaries with links and the ability to identify experts in related fields. There are two modes within the platform: one for searching McKinsey’s internal knowledge and another for external sources.

The platform is named after Lillian Dombrowski, the first woman employed by McKinsey in 1945. Lillian’s remarkable contributions, including shaping sectors like transportation and insurance, establishing financial plans, and initiating the firm’s archives, inspired the platform’s name. “Lilli” represents the platform’s agility, adaptability, and thoroughness, much like Lillian herself.

According to a VentureBeat report, around 7,000 employees have been using Lilli as a “minimum viable product” (MVP), which has significantly reduced research and planning time from weeks to hours, and in some instances, from hours to minutes.

Consulting Sector Embraces Gen AI

In the past year, the consulting and finance sector has transitioned from scepticism to excitement about generative AI. KPMG has developed an in-house system using a ChatGPT-like framework to assist its staff with proprietary data. KPMG’s commitment to an AI alliance with Microsoft has increased to $2 billion, up from the previously announced $5 billion over five years for advanced technologies like AI.

PwC plans to invest $1 billion over three years to advance generative AI in its US operations. Working with Microsoft and OpenAI, PwC aims to automate aspects of tax, audit, and consulting tasks. Multiple teams are currently developing various AI and generative AI applications to boost efficiency, cut costs, save time, and gain new insights.

EY is using generative AI, incorporating tax laws into an AI system to provide instant answers through a ChatGPT-like interface for tasks like payroll queries. This experiment has resulted in notable improvements in efficiency and accuracy. Deloitte has also introduced a generative AI practice to serve its clients.

31% of Organizations Using Generative AI Ask It To Write Code

Artificial intelligence writing code. — Image: Vectors/Adobe Stock

Forty percent of data analysis leaders currently use generative artificial intelligence in their work, including to write code, analytics platform company Alteryx found in a report released August 15. Alteryx surveyed 300 data leaders across four countries — Australia, the U.K., the U.S. and Canada — about their use of generative artificial intelligence, qualms around its use and more.

Jump to:

A little over half of businesses have experimented with AI
CEOs often drive AI adoption
Tech leaders have qualms about generative AI
Organizations are still discovering how generative AI may benefit them

A little over half of businesses have experimented with AI

Surveyed companies using generative AI employed it in content generation (46%), analytics insights summary (43%), analytics insights generation (32%), code development (31%) and process documentation (27%).

Most companies surveyed are curious about AI but don’t use it as part of their everyday process. The majority, 53%, said they are “exploring” or “experimenting” with the technology. Only 13% have AI models in place already and are working on optimizing them. In the middle sits the 34% who are “formalizing,” moving from pilot programs to production on a generative AI solution.

SEE: Gartner found that generative AI will have a transformational benefit (TechRepublic)

Of those who do use generative AI in any capacity, most found a positive impact: 55% reported modest benefits, and 34% reported substantial benefits. The benefits they found included increased market competitiveness (52%), improved security (49%) and enhanced performance or functionality of their products (45%). Another 10% found they didn’t benefit at all, and 1% found it too early to say.

CEOs often drive AI adoption

Often, it takes only one business leader to adopt generative AI as their pet project and encourage the rest of the company to adopt it. In 98% of cases, organizations report that a single person in a leadership position drove their generative AI strategy. In most cases, that leader was the CEO (30%), with slightly fewer organizations following the directives of a head of IT (25%) or chief data or analytics officer (22%). Conversely, among companies not using generative AI, 35% said they had “no one to take the lead with implementation.”

Interestingly, there is an element of hobbyist enthusiasm to the business adoption of generative AI. According to the survey, 81% of people who use generative AI at work also use it for personal or recreational purposes outside of work.

Tech leaders have qualms about generative AI

Many companies still have concerns about the security, copyright rules or efficacy of generative AI. Organizations that haven’t implemented generative AI said they didn’t do so because of concerns about data privacy (47%), lack of trust in the results produced by the system (43%), lack of sufficient expertise (39%) and not having anyone on staff to take the lead on implementing generative AI (34%).

Of the organizations already using generative AI in their work, the most pressing concerns were data ownership (29%), data privacy (28%) and IP ownership (28%).

One way to solve some of these concerns is human oversight — 64% said they believe generative AI can be used now as long as a human has veto power over the output. And there is a high degree of trust among workers who already use generative AI; 70% think it can “deliver initial, rapid results that I can review and modify to completion.”

SEE: Everything you need to know about Google’s generative AI, Bard. (TechRepublic)

Others — 71% — agreed to the idea that risks around generative AI can be managed by using the technology within frameworks set up by trusted software vendors.

Whether generative AI will replace roles for human workers is complicated. There’s an impression among 77% of surveyed people who already use generative AI that it could replace entire roles.

Other risks include privacy concerns, novel security vulnerabilities and copyright infringement when AI models are trained on original work. One possible solution is working within fair use principles, Asa Whillock, vice president and general manager of machine learning at Alteryx, pointed out in an email to TechRepublic: “Leaders must understand, however, that the trust of AI and LLMs is reliant on the quality of data inputs. Insights that are generated by AI models are only as good as the data they have access to,” Whillock said.

Organizations are still discovering how generative AI may benefit them

“Though the pulse survey indicates that many companies are still in the nascent stages of adoption, there’s a growing awareness of the benefits, and early adopters are already reaping the rewards,” wrote Heather Ferguson, Alteryx editorial manager, in a blog post.

“If implemented strategically, generative AI provides a massive opportunity for data democratization that will positively impact business operations, decisions and outcomes due to the cases for integrating LLMs (large language models) responsibly with low-code/no-code,” said Whillock.

Subscribe to the TechRepublic News & Special Offers Newsletter

Keep informed about the latest site features, downloads, special offers, and products from TechRepublic.

Delivered Wednesdays and Fridays Sign up today

Learn how to create a ChatGPT AI bot with this course bundle

stack-social-chatgpt-ai-bundle — Improve your ChatGPT skills with this training bundle.

You may have experimented with ChatGPT, and found it to be a useful tool on its own for work, school, or personal projects. But if you take the opportunity to study even basic coding, you can expand what you can do with AI simply by customizing your own chatbot.

Start your coding education during this back-to-school sale and get the 2023 Ultimate AI ChatGPT and Python Programming Bundle for only $40.

Code your own AI chatbot with Python

This AI programming bundle primarily focuses on cultivating your skills with Python. Courses start at the very beginning and may be most useful to learners with little coding experience. If Python is a new language to you, start with Python 3: From ZERO to GUI Programming. This course gives you nine hours of programming tips to help you work through the other eight Python courses. Those introduce new skills like PDF handling and data analysis. One even gives you the chance to program an escape room.

Courses are taught by Dr. Chris Mall, who has a masters degree in IT and a Ph.D in Computer Science. He also teaches a course that introduces you to the theory behind AI and shows you how to program a robot. From there, you can apply your skills in two courses that walk you through the process of crafting a ChatGPT AI bot. Even if you aren't a coding expert, these courses will show you how to use ChatGPt to generate new code for you.

86 hours of Python and AI training

Back-to-school sales doesn't just mean going back to a classroom. Take control of your own education and learn to code and build your own AI chatbot.

Get the 2023 Ultimate AI ChatGPT and Python Programming Bundle for $40 right now. No coupon needed.

ZDNET Recommends

Moemate’s AI avatar analyzes your whole screen, with spotty but intriguing results

Moemate’s AI avatar analyzes your whole screen, with spotty but intriguing results Kyle Wiggers 8 hours

As evidenced by the slow death of Cortana, it’s clear that the AI assistants of yesteryear aren’t meeting expectations. And so they’re being remade.

Amazon is building a new large language model akin to OpenAI’s GPT-4 to power its Alexa voice assistant. Meanwhile, Google is reportedly planning to “supercharge” Google Assistant with AI that’s more like Bard, its algorithm-powered chatbot.

The paradigm shift hasn’t been limited to the realm of Big Tech. Startups, too, are beginning to realize their own versions of more helpful, useful AI assistants.

One of the more intriguing ones I’ve stumbled upon is Moemate, an assistant that runs on most any macOS, Windows and Linux machine. Taking the form of an anime-style avatar, Moemate — powered by a combo of models including GPT-4 and Anthropic’s Claude — aims to supply and vocalize the best answer to any question a user asks of it. (“Moe” is a Japanese word relating to cuteness, often in anime.)

That’s not especially novel; ChatGPT does this already, as do Bard, Bing Chat and the countless other chatbots out there. But what sets Moemate apart, is its ability to go beyond text prompts and look directly what’s happening on a PC’s screen.

Sound like a privacy risk? You betcha. Webaverse, the company behind Moemate, claims it stores much of the assistant’s chat logs and preferences locally, on-device. But its privacy policy also reveals that it reserves the right to use the data it does collect, like PC specs and unique identifiers, in compliance with legal requests and investigating suspected illegal activities. Fundamentally, giving software like this access to everything you see and do is, even in the best-case scenario, a considerable risk.

Nevertheless, curiosity spurred me to forge ahead and install Moemate, which is currently in open beta, on my work-supplied Mac notebook.

For a free (for now), early access product, Moemate is impressively robust. Almost every aspect of the experience can be customized, from the avatars and their animations to Moemate’s synthetic voices and responses. There’s even a way to build custom character models and import them, plus export avatars in a format that other Moemate users can then import and use.

Moemate’s “personality,” for lack of a better word, is driven by one of several text-generating models — users select which (e.g. GPT-4 versus Claude). As for the synthetic voices, Moemate offers the choice of ElevenLabs, Microsoft Azure or Moemate’s own text-to-speech engine. I opted for ElevenLabs’, which sounded the least robotic to me.

Image Credits: Moemate

To “ground” the chosen text-generating model and attempt to prevent it from going off the rails (as some AI models are wont to do), Moemate gives each avatar a bio, which it feeds to the model at the very start of the conversation. Here’s one:

You will be acting as Nebula, a serene voyager personality, always traversing the vast cosmos of knowledge. Their calm demeanor and explorer’s spirit captivate all who meet them. Nebula sidesteps intense political debates, preferring the serenity of stargazing and the mysteries of the universe. Their fascination captivates those around them, making every encounter tranquil and intriguing.

Bios can be written from scratch and edited — a plus and a minus in my mind. I’m all for customizability, but I worry about the potential for prompt injection attacks, which try to bypass a model’s safety features, like filters for toxic replies, with cleverly worded text. One imagines someone writing a “malicious” bio, exporting it and sharing the ill-behaving avatar with unsuspecting Moemate users.

In a nod to one of the intended demographics, Moemate offers an array of Twitch-focused features — none of which I was able to test, unfortunately. It can bring your chat window into focus and show the number of subscribers to your channel. And Webaverse advertises Moemate as being able to “talk and keep users engaged” if there aren’t any chat messages or “tackle stream chat by replying to chat messages,” although I question just how well it can handle those tasks.

Stick to asking Moemate basic questions, and the experience won’t blow you away. In terms of its top-level capabilities, Moemate is beholden to whichever text-generating model you’ve selected. (Tellingly, Claude often identifies itself as Claude in addition to the name mentioned in the avatar bio.) It can generate images using the open source Stable Diffusion model, either when instructed or on its own, depending on the prompt. But with the abundance of image-generating services on the market, that feels like old hat.

Image Credits: Moemate

Screen capture is a game-changer, however. Webaverse explains it thusly:

Moemate can see your screen. It analyzes it and gets the context. You can ask it about whatever you’re doing on your screen. It saves you the trouble of having to explain whatever you need help with.

No matter the text-generating model selected, Moemate can answer questions about whichever windows on the screen are in focus — whether a browser tab, settings window or video game. It’s unclear exactly how the app’s accomplishing this — not every model can accept images as input — but Moemate appears to be extracting the text from each screen capture and feeding that to the model.

It’s an imperfect system. But I’ve successfully used Moemate to summarize recipes and webpages without having to copy and paste the text, as well as get the gist — or at least a high-level summary — of a complicated topic.

Once, with Claude selected as the text-generating model, I asked Moemate a question about the macOS System Settings dashboard, which happened to be open on my laptop. It gave me a detailed rundown of each settings tab (e.g. Wi-Fi, Control Center) and their significance, plus additional context about the tab I had open at that moment (Privacy & Security).

New information? Not exactly. But to someone who, for example, doesn’t know their way around macOS or isn’t incredibly familiar with the ins and outs of newer config options, I’d argue it’s genuinely actionable background.

In another instance, with GPT-4 as the base model, I asked Moemate to tell me what it “saw” on my supremely messy desktop — a disorganized array of work and personal apps across two dozen Chrome tabs. The avatar fixated on the Google Messages web app, which I use to text — informing me that I seem to frequently text three specific people, all of whom it referred to by name.

And for gaming, Moemate seems like it could save a Google Search or two. In a demo video posted by Webaverse, the app’s shown giving suggestions for which Dota 2 character to choose — and then choosing which weapons to select for that character.

But as insightful as Moemate can be, it often breaks down.

Exactly where the app decides to focus its attention can be difficult to predict. Clicking a window into focus doesn’t always have the intended effect; Moemate will inexplicably refer to another window in the background sometimes, or fail to see a window’s contents altogether.

Moemate also tends to veer off topic in bizarre ways. After giving me the rundown of System Settings, the assistant strongly implied that privacy was too “stressful” of a topic and suggested that I get some fresh air, instead — accompanied by it. When I asked how it might join me without a physical body, Moemate promised to take me on a “mental nature walk,” and proceeded to describe in great detail a stroll by an imaginary forested pond.

Some of Moemate’s built-in commands are wonky also. The app can adjust the volume of voices, for example, but only its volume — not the system-wide volume. It can search the web for up-to-date answers to questions, too, but frustratingly not for every question. I only got web searching to work for the weather and trivia like “Who’s the current president of the U.S.?”; other times, Moemate performed a web search but failed to actually show the results.

To be fair, it’s an experimental product in beta. But Webaverse says it’s already working on adding automation capabilities via browser and terminal integrations, like the ability to organize spreadsheets and even send emails — a mildly terrifying prospect, frankly.

Despite its brokenness, there’s something compelling about Moemate. Multimodality, or combining text, image and other media analysis, is clearly powerful stuff, particularly in the context of an assistant running on a PC. I’m curious to see whether next-gen assistants, like the Windows Copilot, will follow in Moemate’s footsteps eventually, combining screen understanding with a text-generating model to supercharge productivity — or at least save a few steps in a workflow.

Time will tell. But Moemate feels like a glimpse — albeit a quite buggy one — into the future.

Text-2-Video Generation: Step-by-Step Guide

Gif by Author Introduction

Diffusion-based image generation models represent a revolutionary breakthrough in the field of Computer Vision. Pioneered by models including Imagen, DallE, and MidJourney, these advancements demonstrate remarkable capabilities in text-conditioned image generation. For an introduction to the inner workings of these models, you can read this article.

However, the development of Text-2-Video models poses a more formidable challenge. The goal is to achieve coherence and consistency across each generated frame and maintain generation context from the video's inception to its conclusion.

Yet, recent advancements in Diffusion-based models offer promising prospects for Text-2-Video tasks as well. Most Text-2-Video models now employ fine-tuning techniques on pre-trained Text-2-Image models, integrating dynamic image motion modules, and leveraging diverse Text-2-Video datasets like WebVid or HowTo100M.

In this article, our approach involves utilizing a fine-tuned model provided by HuggingFace, which proves instrumental in generating the videos.

Implementation

Pre-requisites

We use the Diffusers library provided by HuggingFace, and a utility library called Accelerate, that allows PyTorch code to run in parallel threads. This speeds up our generation process.

First, we must install our dependencies and import relevant modules for our code.

pip install diffusers transformers accelerate torch

Then, import the relevant modules from each library.

import torch  from diffusers import DiffusionPipeline, DPMSolverMultistepScheduler  from diffusers.utils import export_to_video

Creating Pipelines

We load the Text-2-Video model provided by ModelScope on HuggingFace, in the Diffusion Pipeline. The model has 1.7 billion parameters and is based on UNet3D architecture that generates a video from pure noise through an iterative de-noising process. It works in a 3-part process. The model firsts perform text-feature extraction from the simple English prompt. The text features are then encoded to the video latent space and de-noised. Lastly, the video latent space is decoded back to the visual space and a short video is generated.

pipe = DiffusionPipeline.from_pretrained(  "damo-vilab/text-to-video-ms-1.7b", torch_dtype=torch.float16, variant="fp16")      pipe.scheduler = DPMSolverMultistepScheduler.from_config(  pipe.scheduler.config)      pipe.enable_model_cpu_offload()

Moreover, we use 16-bit floating-point precision to reduce GPU utilization. In addition, CPU offloading is enabled that removes unnecessary parts from GPU during runtime.

Generating Video

prompt = "Spiderman is surfing"  video_frames = pipe(prompt, num_inference_steps=25).frames  video_path = export_to_video(video_frames)

We then pass a prompt to the Video Generation pipeline that provides a sequence of generated frames. We use 25 inference steps so that the model will perform 25 de-noising iterations. A higher number of inference steps can improve video quality but requires higher computational resources and time.

The separate image frames are then combined using a diffuser's utility function, and a video is saved on the disk.

We then pass a prompt to the Video Generation pipeline that provides a sequence of generated frames. The separate image frames are then combined using a diffuser's utility function, and a video is saved on the disk.

FinalVideo from Muhammad Arham on Vimeo.

Conclusion

Simple enough! We get a video of Spiderman surfing. Although it is a short not-so-high-quality video, it still symbolizes the promising prospect of this process, which can attain similar results as Image-2-Text models soon. Nonetheless, testing your creativity and playing with the model is still good enough. You can use this Colab Notebook to try it out.
Muhammad Arham is a Deep Learning Engineer working in Computer Vision and Natural Language Processing. He has worked on the deployment and optimizations of several generative AI applications that reached the global top charts at Vyro.AI. He is interested in building and optimizing machine learning models for intelligent systems and believes in continual improvement.

Рубрика: AI