Meta unveils ‘Seamless’ speech-to-speech translator

redbluegettyimages-954077242

Meta, owner of Facebook, Instagram, and WhatsApp, on Tuesday unveiled its latest effort in machine translation, this one geared toward speech translation.

The program, SeamlessM4T, surpasses existing models that are trained specifically for speech-to-speech translation between languages, as well as models that convert between speech and text in multiple language pairs. Hence, SeamlessM4T is an example not just of generality but of what is called multi-modality — the ability for one program to operate on multiple data types, in this case, both speech and text data.

Also: Meta to release open-source commercial AI model to compete with OpenAI and Google

Previously, Meta has focused on large language models that can translate text between 200 different languages. That focus on text is a problem, say lead author Loïc Barrault and colleagues at both Meta and UC California at Berkeley.

"While single, unimodal models such as No Language Left Behind (NLLB) push text-to-text translation (T2TT) coverage to more than 200 languages, unified S2ST [speech-to-speech-to-text] models are far from achieving similar scope or performance," write Barrault and team.

The formal paper, "SeamlessM4T — Massively Multilingual & Multimodal Machine Translation," is posted on Meta's dedicated site for the overall project, Seamless Communication. There is also a companion GitHub site.

Speech has been left behind partly because less speech data is readily available in the public domain to train neural networks, write the authors. But there's a deeper point: Speech data is fundamentally richer as a signal for neural networks.

"The very challenge around why speech is harder to tackle from a machine translation standpoint — that it encodes more information and expressive components — is also why it is superior at conveying intent and forging stronger social bonds between interlocutors," they write.

The goal of SeamlessM4T is to create one program that is trained on both speech data and text data at the same time. The "M4T" stands for "Massively Multilingual & Multimodal Machine Translation." Multi-modality is an explicit part of the program.

Also: Meta's latest AI model will make content available in hundreds of languages

Such a program is sometimes referred to as an "end-to-end" program because it doesn't break up the parts that are about text and the parts that are about speech into separate functions, as in the case of "cascaded models," where the program first is trained on one thing, such as speech to text, and then another thing, such as speech to speech.

As the program's authors put it, "most S2ST [speech-to-speech translation] systems today rely heavily on cascaded systems composed of multiple subsystems that perform translation progressively — e.g., from automatic speech recognition (ASR) to T2TT [text-to-text translation], and subsequently text-to-speech (TTS) synthesis in a 3-stage system."

Instead, the authors built a program that combines multiple existing parts trained together. They included "SeamlessM4T-NLLB a massively multilingual T2TT model," plus a program called w2v-BERT 2.0, "a speech representation learning model that leverages unlabeled speech audio data," plus T2U, "a text-to-unit sequence-to-sequence model," and multilingual HiFi-GAN, a "unit vocoder for synthesizing speech from units."

Also: Meta's 'data2vec' is a step toward One Neural Network to Rule Them All

All four components are plugged together like a Lego set into a single program, also introduced this year by Meta, called UnitY, which can be described as "a two-pass modeling framework that first generates text and subsequently predicts discrete acoustic units."

The whole organization is visible in the diagram below.

The authors built a program that combines multiple existing parts trained together, all of which are plugged together like a Lego set in a single program.

The program manages to do better than multiple other kinds of programs on tests of speech recognition, speech translation, and speech-to-text, the authors report. That includes beating both taint programs that are also end-to-end, as well as programs designed for speech explicitly:

We find that SeamlessM4T-Large, the larger model of the two we release, outper- forms the previous state-of-the-art (SOTA) end-to-end S2TT model (AudioPaLM-2-8B- AST [Rubenstein et al., 2023]) by 4.2 BLEU points on Fleurs [Conneau et al., 2022] when translating into English (i.e., an improvement of 20%). Compared to cascaded mod- els, SeamlessM4T-Large improves translation accuracy by over 2 BLEU points. When translating from English, SeamlessM4T-Large improves on the previous SOTA (XLS- R-2B-S2T [Babu et al., 2022]) by 2.8 BLEU points on CoVoST 2 [Wang et al., 2021c], and its performance is on par with cascaded systems on Fleurs. On the S2ST task, SeamlessM4T-Large outperforms strong 3-stage cascaded models (ASR, T2TT and TTS) by 2.6 ASR-BLEU points on Fleurs. On CVSS, SeamlessM4T-Large outperforms a 2-stage cascaded model (Whisper-Large-v2 + YourTTS [Casanova et al., 2022]) by a large margin of 8.5 ASR-BLEU points (a 50% improvement). Preliminary human evalua- tions of S2TT outputs evinced similarly impressive results. For translations from English, XSTS scores for 24 evaluated languages are consistently above 4 (out of 5); for into English directions, we see significant improvement over Whisper-Large-v2's baseline for 7 out of 24 languages.

Also: Google's 'translation glasses' were actually at I/O 2023, and right in front of our eyes

The companion GitHub site offers not just the program code but also SONAR, a new technology for "embedding" multi-modal data, and BLASAR 2.0, a new version of a metric by which to automatically evaluate multi-modal tasks.

Artificial Intelligence

Portkey.ai Secures $3M funding for Faster Deployment of GenAI Apps

Bengaluru-based startup Portkey.ai today announced a $3 million seed-funding round empowering engineering teams to build and launch generative AI apps faster. The funding round was led by Lightspeed with participation from angel investors, including prominent figures from AWS, OpenAI, Cloudflare, Postman, and Asana.

Portkey.ai has built tools that allow businesses to monitor their language model operations (LLMOps), connect to multiple large language models (LLMs) efficiently, experiment, improve, and manage prompts effectively. It also offers deep integrations with players like OpenAI, Anthropic, LangChain, LlamaIndex and more. They already serve millions of requests a day for innovative GenAI companies like Postman, Jio Haptik, Springworks and more through their full-stack LLMOps solution.

“Tech chiefs are facing a rush of demand from teams for AI apps that will save money without too much delay. But they cannot say yes to all their requests. There’s so much work to be done that there are often competing priorities and Portkey wants to help solve these dilemmas for tech teams.

“Our vision for Portkey has been to enable teams and companies to deploy GenAI apps and features with confidence,” Rohit Agarwal, co-founder, Portkey.ai, said.

An enterprise building GenAI features could use Azure & Llama2 with intelligent routing, save over 25% in budget spent with smart caching, and monitor all their requests for accuracy & latency. Building this platform internally would take months of work, and various iterations and requires significant experience with LLMs, while getting all this ready with Portkey takes 2 minutes.

The post Portkey.ai Secures $3M funding for Faster Deployment of GenAI Apps appeared first on Analytics India Magazine.

YouTube tests a search feature where users hum to identify songs

YouTube tests a search feature where users hum to identify songs Lauren Forristal 8 hours

YouTube announced a new experiment on Android devices that determines a song via humming—which seems like a major step up from Apple’s music recognition app Shazam.

As noted on YouTube’s support page, the video-sharing platform is testing a search-by-song capability on the Android version of the app that allows users to figure out a song on YouTube by humming, singing or recording a song.

Users who have access to the experiment can toggle from YouTube voice search to the new song search feature and hum, sing or record a song for three or more seconds. The platform then identifies the tune and directs the user to relevant YouTube videos featuring the searched song, whether that be the official music video, user-generated content or Shorts.

The search-by-song capability is only available to a small portion of Android users. If the feature rolls out more widely, we can see it being helpful for many, as YouTube is a popular destination for looking up songs.

Google introduces song matching via humming, whistling or singing

YouTube’s latest experiment probably sounds familiar to some users. In 2020, YouTube’s parent company Google first launched the capability on the Google app, Google Search widget and Google Assistant, letting users figure out a song by humming, whistling or singing into the microphone icon. However, the main difference appears to be that Google’s feature requires users to hum for 10-15 seconds in order to identify the song.

As Google previously explained, its feature is built on machine learning models that can match a person’s hum to a song’s “fingerprint” or signature melody. The new YouTube test uses the same technology as the Google feature, the company confirmed to TechCrunch.

Other music recognition apps like SoundHound and MusixMatch can also identify songs by singing or humming the tune, but they aren’t as popular compared to YouTube and Google. (Still, we recommend checking them out as well).

YouTube is working on a plan to compensate artists and rightsholders for AI music

VMware Unveils New Generative AI Tools, Expands Nvidia Partnership

VMware Unveils New Generative AI Tools, Expands Nvidia Partnership August 23, 2023 by Jaime Hampton

VMware kicked off its Explore event in Las Vegas with a series of announcements geared toward enabling enterprise generative AI development.

VMware and Nvidia extended their partnership to unveil VMware Private AI Foundation with Nvidia, an offering that promises to provide enterprises with the software and compute to fine-tune large language models and run AI-enabled applications using proprietary data in VMware’s cloud infrastructure.

Building applications using public AI models is a no-go for many enterprises due to the risks of data exposure and unknown training data. In fact, a new survey released by AI engineering platform Predibase found that more than 75% of enterprises do not plan to use commercial LLMs in production due to data privacy concerns.

The answer lies in custom models trained with company data using a secure architecture. Companies need flexibility when developing applications using their own training data, and VMware touts its multi-cloud approach as a secure and resilient option for building customized AI models.

VMware Private AI Foundation with Nvidia is a set of integrated AI tools that allow enterprises to deploy AI models trained on private data in datacenters, public clouds, or the edge. VMware’s Private AI architecture is built on VMware’s Cloud Foundation and is integrated with Nvidia’s AI Enterprise software and compute infrastructure.

VMware CEO Raghu Raghuram says the potential of generative AI cannot be unlocked unless enterprises are able to maintain the privacy of their data and minimize IP risk while training, customizing, and serving their AI models, “With VMware Private AI, we are empowering our customers to tap into their trusted data so they can build and run AI models quickly and more securely in their multi-cloud environment.”

Nvidia CEO Jensen Huang and VMware CEO Raghu Raghuram announced the expanded partnership at VMware Explore.

Enterprises can choose where to build and run their models using a data-secure architecture. VMware and Nvidia claim AI workloads can scale across up to 16 GPUs in a single virtual machine and across multiple nodes, leading to lower overall costs and more efficiency. Additionally, VMware says its vSAN Express Storage Architecture will provide performance-optimized NVMe storage and supports GPUDirect storage over RDMA, allowing for direct I/O transfer from storage to GPUs without CPU involvement.

The new platform with VMware will feature Nvidia NeMo, the company’s AI framework (included in Nvidia AI Enterprise, the operating system of its AI platform) that combines customization frameworks, guardrail toolkits, data curation tools, and pretrained models. NeMo uses TensorRT for Large Language Models, a service that optimizes inference performance on Nvidia GPUs. VMware and Nvidia say enterprises can use the new Nvidia AI Workbench to pull community models, like Llama 2, available on Hugging Face, customize them remotely and deploy production-grade generative AI in VMware environments.

“Enterprises everywhere are racing to integrate generative AI into their businesses,” said Jensen Huang, founder and CEO of Nvidia. “Our expanded collaboration with VMware will offer hundreds of thousands of customers – across financial services, healthcare, manufacturing and more – the full-stack software and computing they need to unlock the potential of generative AI using custom applications built with their own data.”

Nvidia is not the only AI development game in town, as many are turning to open source solutions because they require the ability to use multiple open source tools and frameworks. For these open source AI projects, VMware also unveiled VMware Private AI Reference Architecture for Open Source, which integrates OSS technologies from VMware partners to deliver an open reference architecture for building and serving OSS models on top of VMware Cloud Foundation.

One such technology partnership is with Anyscale, developers of the widely adopted, open source unified compute framework Ray. Data scientists and ML engineers can scale AI and Python workloads using Ray on VMware’s Cloud Foundation by utilizing their current compute footprints for ML workloads instead of defaulting to the public cloud, VMware says.

A crowd gathers in the expo hall at VMware Explore in Las Vegas.

Anyscale CEO Robert Nishihara commented in a release that companies are struggling to stay at the forefront of AI while scaling, productizing, and iterating quickly.

“Because Ray can run anywhere – on any cloud provider, on-premises, on your laptop – and VMware’s customers run everywhere, it’s a natural collaboration to make it easier for companies to accelerate their business using generative AI,” he said.

“AI has traditionally been built and designed by data scientists, for data scientists,” said Chris Wolf, vice president of VMware AI Labs. “With the introduction of these new VMware Private AI offerings, VMware is making the future of AI serve everyone in the enterprise by bringing the choice of compute and AI models closer to the data. Our Private AI approach benefits enterprise use cases ranging from software development and marketing content generation to customer service tasks and pulling insights from legal documents.”

In addition to the new Private AI offerings, VMware also announced Intelligent Assist, a family of generative AI-based solutions trained on VMware data that will automate aspects of enterprise IT in multi-cloud environments. Intelligent Assist will be integrated into several VMware products including VMware Tanzu, which will address the challenges of multi-cloud visibility and configuration by allowing users to conversationally request and refine changes to their enterprise’s cloud infrastructure, the company says. Workspace ONE will also include it and will allow users to create high-quality scripts using natural language prompts. NSX+ is another service to be enhanced with these new generative AI capabilities that will help security analysts to determine the relevance of security alerts to more effectively remediate threats.

Related

Protean Partners With Google Cloud to Setup Centre of Excellence

Protean eGov Technologies (Protean) has announced a partnership with Google Cloud to set up a generative AI Centre of Excellence (CoE). The partnership aims to accelerate DPI (Digital Public Infrastructure) deployment and the adoption of Generative AI and cloud in both public and private sectors in multiple ways.

The centre of excellence will harness the potential of recently launched Google’s open commerce solution for ONDC combined with Protean’s ONDC Buyer and Seller platforms to supercharge the adoption of ONDC by participants.

This partnership will further aim to strengthen Protean’s identity authentication solutions (eAuthentication, eKYC, eSign) and Data Services with Google’s cloud computing and AI capability to scale-up its adoption, aiming to solve for end-to-end digital onboarding journeys across sectors.

Further, this centre of excellence aims to spur innovation in delivering digital public goods across diverse sectors of e-commerce, healthcare, agriculture, mobility, education and financial services.

Protean intends to leverage the advanced GenAI and VertexAI offered by Google Cloud to solve a wide array of use cases and deliver eGovernance solutions across these sectors.

“This partnership will influence ONDC network adoption and the centre of excellence will power multi-sectoral innovations at population-scale. We eagerly look forward to unlocking the potential of this collaboration with Google’s products and people and staying committed to our mission of Building for Billions,” Suresh Sethi, MD & CEO, Protean eGov Technologies said.

The post Protean Partners With Google Cloud to Setup Centre of Excellence appeared first on Analytics India Magazine.

China’s Copycat Culture Now Dominates AI

Once synonymous with a copycat culture, China has transformed itself into a prominent contender within the AI domain. Despite geopolitical obstacles, China has actively challenged the US in the AI race. Interestingly, this very culture could prove instrumental in China’s pursuit of becoming an AI leader.

When Google launched the Bert language models introduced in 2018, the following year, Baidu in China responded with Ernie. Interestingly, the character name Bert hails from the renowned children’s show ‘Sesame Street,’ and Baidu also named their model after a character from the same show, reflecting the ongoing AI competition between the two nations.

Currently, the US is once more leading an AI revolution with generative AI. San Francisco-based OpenAI has ushered in a groundbreaking transformation in the AI landscape with the introduction of ChatGPT. But China too has responded with Ernie Bot, a competitor to ChatGPT and Alibaba, the e-commerce giant, has released Tongyi Qianwen, its own generative AI product. Tencent too, is working on a proprietary Large Langauge Model, which reportedly, could be the best to come out of China. In fact, Reuters reported that in the last three years, Chinese entities introduced 79 large-language models (LLMs) within the nation.

AI finesse

China already holds a prominent position in the realm of computer vision, despite initial research started out of the US in the early 60s. China’s heavy focus on this subset of AI is primarily due to the government’s extensive use of state surveillance systems. The Chinese government has identified computer vision as a strategic area and has allocated substantial funding to support advancements in this field. In 2016-2018, Gartner found VC investment in computer vision firms quadrupled to exceed $8 billion. Examining 1400+ deals, they revealed that Chinese investors led the field, contributing over 56% of total investment and dominating eight of the top ten deals.

China is also a leader when it comes to AI research. The country was responsible for approximately one-third of the global output in terms of both AI research papers published and AI citations in 2021. Moreover, over the years, we have seen many products coming out of China, gaining traction worldwide.

Take Titok for example, it was the most downloaded app last year, with around 672 million downloads and became the only second app in the world to cross 3 billion downloads. Tiktok was so successful that it forced other social media platforms such as Meta-owned Facebook, Instagram and Google-owned Youtube to launch similar short-video features.

However, Tiktok’s secret sauce is its algorithm. TikTok’s global popularity is rooted in its AI-powered algorithm, considered a standout in social media. Continuously refined, it drives the platform’s meteoric rise, adapting through user feedback and behavior. Designed for content discovery, it crafts addictive, personalised experiences, enhancing engagement and user satisfaction.

Making strides in generative AI

Moreover, brands such as Baidu, Alibaba and Huawei, have become household names in China, and these are the very companies investing heavily in generative AI. These companies are already shipping their AI products to the world. For example, Baidu’s Apollo project focuses on autonomous driving, and its technologies have been used by companies outside of China, including in partnerships with BMW and Ford. Similarly, Huawei’s AI chips have been used in smartphones and other devices, while its cloud services, including AI capabilities, are utilized by businesses worldwide.

Although GPT-4 currently holds the title of the most advanced AI model, Baidu’s Ernie Bot stands out for its tailored understanding of the complexities of the Chinese language and culture. In fact, reports indicate that Ernie 3.5, the latest iteration of the Ernie AI model, has outperformed the widely acclaimed OpenAI chatbot, ChatGPT, across various pivotal metrics.

Until now, OpenAI, while primarily concentrating on Artificial General Intelligence (AGI), has confirmed their absence of plans for GPT-5 or any models surpassing GPT-4. On the other hand, Google could unveil its work on Gemini, a potential advancement beyond GPT-4’s capabilities. Yet, it remains entirely conceivable for China to unveil the next significant breakthrough in this realm.

A state-driven quest to become an AI superpower

About six years ago, China laid out a development plan to become the world leader in AI by 2030. So far, the Asian superpower has already laid down a solid foundation to support its AI economy. China quest to AI supremacy is intertwined with economic growth. In fact, China is preparing for an AI-powered future. In terms of investments, China is also second to the US, according to a 2022 report by Stanford University. China is also expected to more than double its investment in AI to nearly USD 27 billion by 2026, according to an IDC report

Paul Scharre, the author of the book ‘Four Battlegrounds: Power in the Age of Artificial Intelligence’ believes China’s AI labs are only a year and a half behind the foremost research labs in the Western world. Additionally, the nation holds an advantage in terms of effectively implementing AI across various aspects of society.

China’s significant strides in AI is making many in the US jittery. For example, venture capitalist Vinod Khosla, whose firm has invested in OpenAI, said that the US cannot afford to moderate the rate of progress in AI in the country, as it could lead to China’s advantage. Sam Altman, the CEO of OpenAI, acknowledged the role China could play in solving alignment. He said that China has some of the best AI talent in the world and given the difficulties around solving alignment for advanced AI systems, it would require the best minds from around the world.

The data advantage

Another big advantage China has is data. China has more internet users than the US and Europe combined. With over 1.4 billion Chinese hooked to the internet, the data being generated in China is massive and foundational models thrive on such data. Nearly everyone in China uses WeChat for text messaging, video conferencing, video games, and mobile payments, among other things. Enormous volumes of data are being generated, and unlike the West, China’s data privacy regulations are comparatively less stringent.

This potentially offers Chinese LLM-building labs a notable edge. Additionally, with lenient data privacy laws, Chinese AI labs could also have easier access to valuable public datasets crucial for training LLMs. While data is gold and China has it in abundance, but most of it is in Chinese.

Models such as the GPT series caught the world’s imagination because of its ability to converse in English, and all countries in the world have a section of people that can converse or understands English, however, that is not the case for Chinese. Nonetheless, LLMs launched by Chinese labs such as Erniebot are trained both in English and Chinese datasets.

AI intertwined with US geopolitical strategy

Amidst the imposition of export controls on AI-enabling hardware by the US, China continues to make remarkable advancements in the AI domain. The Joe Biden-led administration aims to curtail NVIDIA’s sales of AI chips to China, a pivotal supplier of these chips that fuel technologies like generative AI. A report by the Centre for the Governance of AI, a British think-tank, found that more than half of the AI labs in China rely on NVIDIA for processing prowess.

Even though China is working hard to become self-reliant in this space, it will take time. For now, China is finding workarounds. In March, The Financial Times revealed that SenseTime, a firm blacklisted by the US, was employing intermediaries to circumvent export controls. Reports also suggest that certain Chinese AI companies are leveraging NVIDIA’s processors via cloud servers located in different countries to navigate the restrictions.

Autocratic government and talent exodus

Another challenge for China is AI talent. Despite having them in abundance, and given some of the top AI scientists in the world are Chinese, yet of most of them are working outside of China. A report by McKinsey revealed that China should face a significant shortage in terms of AI talent by 2030.

Moreover, Time Magazine, in an article, stated that more than half of China’s top AI undergraduate students opt to pursue their graduate studies in the US, and a significant majority choose to remain there after completing their Ph.D. Surprisingly, the primary beneficiary of this influx of Chinese talent is not China itself, but rather the US.

To mitigate this, the Chinese government has launched programmes designed to bring the talent back into the country. Nonetheless, the reason many Chinese prefer working outside of China is because of the autocratic government. “Talent exodus is a major hindrance to China’s authoritarianism in that it drives people away. China’s top AI scientists leave and it’s not just that they go abroad to study and work, they prefer a more democratic way of life,” Scharre said.

China is, in fact, one of the strictest censorship regimes in the world. From political dissent and discussion of sensitive historical events in China to information about its leaders and high-level officials, the government censors almost everything through the Great Firewall of China programme. Given the AI model will be as good as the data it is being trained on, censoring data could prove to be a problem.

The post China’s Copycat Culture Now Dominates AI appeared first on Analytics India Magazine.

Automattic CEO Matt Mullenweg talks future of Tumblr, with algorithmic choice, AI enhancements and more

Automattic CEO Matt Mullenweg talks future of Tumblr, with algorithmic choice, AI enhancements and more Sarah Perez @sarahintampa / 7 hours

Matt Mullenweg, CEO of Automattic, the company behind WordPress.com and other online publishing tools, is offering a glimpse into Tumblr’s future — the blogging site Automattic acquired from Verizon in 2019. On the Evening Standard’s “How to be a CEO” podcast, the WordPress founder offered a vision of Tumblr’s future direction, including its embrace of open source, plans for algorithmic choice, and use of AI technologies, among other things.

The exec was enthusiastic about Tumblr’s ability to bring a younger user base into the broader Automattic community, noting that over half its users are under the age of 25 and more women than men use the service. The site also has a vibrant LGBTQ+ community — over a quarter of its network, larger than any other social network, Mullenweg also claimed. Driving Tumblr’s community towards Automattic’s other tools, over time, is one of the company’s ultimate goals.

Mullenweg says that users might start with just a Tumblr blog but then, over time, want to expand into something larger — an e-commerce store, a more customizable site, a newsletter, or a membership site — and Automattic could direct users to other products it offers that allow those possibilities, like WordPress.com or WooCommerce, and others.

“I’m excited about that on-ramp as well as to bring a younger demographic and young people into WordPress,” Mullenweg noted.

Beyond catering to a younger demographic, the company also wants to bring its open source ethos to parts of Tumblr, the exec noted.

For example, the company open sourced the algorithm that powers Tumblr, stream-builder, which is now available on GitHub. That means other developers can now submit a patch to it, change it, or even create their own customized version, explains Mullenweg.

But open sourcing the algorithm is only one step towards building a more open Tumblr.

Mullenweg also foresees a future where users get to pick their own algorithm to control their Tumblr experience. He says other social networks haven’t typically offered this option in the past because what users say they want to see and how they actually behave often differs — they tend to be more engaged with the algorithm the network provides, that is.

Still, Mullenweg believes there can be a balance between the main algorithm a network offers and user choice.

“We’ll always have our defaults that maybe are the default thing you see, but then you can switch into your own mode if you want,” he explained.

Other social networks have begun to do something similar, including Bluesky, a Twitter/X rival, which recently introduced the concept of custom feeds that let users filter the network’s posts in different ways.

a collage of tumblr and twitter logos

Image Credits: Bryce Durbin / TechCrunch

Mullenweg also touched on the opportunities Tumblr has had in the face of changes at Twitter, which have benefitted other social networks that mimic Twitter, including Bluesky, Mastodon, and Threads. Shortly after Elon Musk took over Twitter, Mullenweg had reported a 58% increase in iOS app downloads and a 57% increase in Android users during the first week of November.

But in the new interview, the exec downplayed the impact of the Twitter exodus, saying that people would arrive in waves, but “if we’re being really honest, like less than you would think in the long-term.”

Instead, he said shifts are more successful when entire communities choose to move over to Tumblr after becoming frustrated with their current platform host, as a Lego community recently did when leaving Instagram.

“Get everyone coming over at the same time, teach each other how it works…follow each other…kind of like bootstrap the community on the new thing. And then also give us feedback,” said Mullenweg, adding that Tumblr wants to be responsive to communities’ feature requests as they make these moves.

As for the matter of AI, Mullenweg said not to worry — there’s not going to be some Tumblr AI chatbot coming in the future.

Instead, he believes AI will play a larger role behind the scenes at the company.

“I am one of the people who will say that it’s almost impossible to overstate how big AI will impact society,” he began, but added that for Tumblr, its impact will be less obvious than in other places.

“For Tumblr…I think it can make our developers a lot more productive…the code could be checked by AI or tested by AI or something like that. So that’ll allow us to do a lot more with the same or fewer developers, which is really exciting. So maybe our pace of development can increase,” he said.

Plus, AI can be a help in moderation, flagging things before they’re even reported by Tumblr users. In addition, AI and machine learning could make the Tumblr feed better and more personalized to end users.

“You can tweak it and it can really learn the things you want to see and the friends you want to follow,” he said.

The exec was also generally bullish on generative AI as a tool for artists, which may benefit the community that uses Tumblr, but didn’t note that Tumblr itself would build gen AI tools.

Mullenweg wrapped the interview by positioning Tumblr in a slightly different space than traditional social media.

“You often hear people say they want to do less social media, but you almost never hear people say they want to blog less…What is it about blogging, that they feel like adds to their life or is a valuable task, or valuable use of time, that maybe they’re not getting from more traditional social media?,” he asked. “Like I said, we’re making Tumblr for art and artists. I haven’t heard anyone say I’d love less art in my life.”

Image Credits: Tumblr

That said, Tumblr — which now hosts half a billion blogs, Mullenweg noted — does seem to be chasing the Twitter crowd with its recent web redesign that moved the navigation bar to the left, similar to Twitter/X.

Without directly addressing the Twitter-inspired revamp, Mullenweg did explain why so many social tools are starting to look alike.

“I think what happens is that — the reality is that people use multiple social networks at the same time. And there’s sort of a baseline functionality that they just come to expect,” he said. “There’s certain things that you just probably don’t even realize that you expect until you are going to visit one of these social networks that doesn’t support it. And you’re like, Oh, I really like that thing,” he concluded.

Tumblr is rolling out a new web interface, and it looks a lot like X (formerly Twitter)

A Year of Stability AI’s Trials and Errors

Yesterday, Stability AI, the San Francisco-based AI firm completed one year of its journey in the generative AI space. The CEO Emad Mostaque feels proud of himself for his contribution to the AI open source space and thinks that he should be honoured with the Nobel prize for truly building an ‘open AI’ with its diffusion models.

Mostaque aspiration is not without merit, the company’s models are the most popular ones amongst the lot, according to a recent report by Everypixel journal. Its open source models stand invincible with a dominating 81.32% market share compared to high profile models like Midjourney (6.23%), OpenAI’s Dalle2 (5.92%) and Adobe (6.5%).

According to Mostaque, Stable Diffusion has more than 10 million users across all channels. On the other hand, its competitor Midjourney has the largest user base with a count of whopping 15 million users.

For the AI company ‘all that glitters is not gold’ stands corrected since its market supremacy and the ‘all for open source narrative’ has not been able to overshadow its mischief.

Not a happy anniversary

Alongside Stability AI’s recent $100 million funding and a dominant market share, several allegations from employees pinpoint to the CEO’s leadership shortcomings to reports of unpaid invoices amounting to $70,000 adding to the woes. Earlier this summer, Forbes published a bombastic exposé revealing inconsistency in Mostaque’s academic claim of him holding a master’s degree from Oxford.

While the AI startup is dealing with internal struggles, it has been leaning hard on its open source approach for the experimental community and remains laudable. Just within a year, the company has launched a bunch of tools — the latest — an open-source version of DreamStudio, a commercial interface for Stable Diffusion. Moreover, versions of the model have been freely available to download and tinker with since its release in August 2022. Earlier this year, the company also released a suite of open-source large language models collectively called StableLM.

A messy mixed reality

AI art generation tools are making prize-winning pictures, comic books, and glossy magazine covers but has also become an epicentre of legal battles. Apart from its popularity concerns over privacy, misinformation and problematic lack of context have been raised at several instances.

Last Christmas, social media started being flooded with ‘avatars’ — generated by Lensa — an popular yet problematic application rooted in diffusion models. As Lensa went viral, people also posted eyebrow-raising concerns about how their photos and images were being used and stored. After the software generates the avatars, Prisma Labs — the force behind Lensa deletes the uploaded photos within a mere 24 hours time frame, claimed Andrey Usoltsev, the company’s co-founder and chief executive.

Some users have said Lensa has created images that overemphasise certain parts of a woman’s body or alter the eye colours and shapes of their faces to remove racially or ethnically identifiable features.

“Tools like these tend to be flashy,” said Jennifer King, privacy and data policy fellow at the Stanford Institute for Human-Centred Artificial Intelligence. “Sometimes, it’s correct enough, but without the right guardrails in place, it opens you up to a lot of issues.”

The Everypixel report notes that every piece of visual art humanity has created over the last century and a half has been outnumbered by AI generated art just within a brief span of 1.5 years.

It is worth noting that Adobe has the fastest growing user base among the lot as the design software veteran has managed to garner a billion users for its AI services within three months of the release. Despite the expanding user base, the company’s approach to avoid using copyrighted material has resulted in an inferior product quality.

The research also recalculated some of the estimates and found that a staggering count of over 11 billion images have been created using models from three repositories — GitHub, HuggingFace, and Civitai, — where users have uploaded tens of thousands of Stable Diffusion-based models.

While copyright battles are not a new thing in the genAI town, companies like Deepmind are finding ways to tackle the issue with initiatives like Visualizing AI. The company is setting an example of ‘evolving with the landscape’ from which Mostaque and the rest can take notes.

The post A Year of Stability AI’s Trials and Errors appeared first on Analytics India Magazine.

Generative AI Takes the Spotlight in Gartner’s 2023 Hype Cycle

Gartner, Inc., in its latest Hype Cycle for Emerging Technologies 2023, positions Generative artificial intelligence (AI) at the Peak of Inflated Expectations, anticipating it to bestow transformational benefits within the next two to five years. This AI variant is part of the broader emergent AI trend, heralding new vistas for technological innovation.

Generative AI's Rise in the Tech Realm

“The popularity of many new AI techniques will have a profound impact on business and society,” elucidates Arun Chandrasekaran, Distinguished VP Analyst at Gartner. Chandrasekaran emphasizes the significant scale of AI foundational models, the rising adoption rate of conversational tools, and the widespread applications of generative AI, stating they signify “a new wave of workforce productivity and machine creativity.”

The unique facet of the Hype Cycle for Emerging Technologies is its ability to condense insights from the plethora of technologies Gartner reviews annually, pinpointing only the most transformative ones predicted to redefine industries in the upcoming decade.

Beyond AI: Gartner's Emphasis on Other Emerging Technologies

Melissa Davis, VP Analyst at Gartner, points out, “While all eyes are on AI right now, CIOs and CTOs must also turn their attention to other emerging technologies with transformational potential.” Davis underscores the importance of technologies augmenting developer experiences, those catalyzing innovation through a ubiquitous cloud, and technologies emphasizing human-centric security and privacy.

Davis further adds a note of caution, reminding that these technologies, being in their nascent stages, carry a degree of uncertainty regarding their evolution. However, she posits that early adopters might gain significantly despite the inherent risks.

Gartner (2023)

Emerging Technology Trends in Four Broad Themes

  1. Emergent AI: Beyond generative AI, other promising techniques like AI simulation, causal AI, federated machine learning, and more, are poised to refine digital customer interactions and lead to better-informed business decisions.
  2. Developer Experience (DevX): Focusing on the symbiotic relationship between developers and their working tools, enhancing DevX is pivotal for the success of digital initiatives. Technologies like AI-augmented software engineering and open-source program office are spearheading this domain.
  3. Pervasive Cloud: Forecasted to transition from just an innovation platform to an indispensable business innovation driver in the coming decade, the cloud's far-reaching effects are undeniable. Technologies like augmented FinOps and cloud-native are pioneering this evolution.
  4. Human-centric Security and Privacy: With humans at the core of most security breaches, the spotlight is on devising security that's woven into the very fabric of an organization’s digital blueprint. Emerging tech like AI TRISM and postquantum cryptography are front runners in this endeavor.

Gartner's 2023 Hype Cycle for Emerging Technologies not only underscores the monumental potential of generative AI but also brings to the fore other emerging technologies that are set to reshape the landscape of the digital world. As these technologies continue to evolve, businesses and society at large must gear up to harness their transformative power, ensuring a future that's not only technologically advanced but also sustainable and secure.

India Scripts History as Chandrayaan-3 Awakens Sleepy Lunar South Pole

This Indian Startup is Going to Space

In a momentous achievement, India’s Chandrayaan-3 has triumphantly touched down on the lunar surface, marking a significant milestone after its predecessor’s setback in 2019.

ISRO chief, S Somanath announced proudly, “India is now on the Moon”.

The historic landing occurred precisely at 6:04 pm IST today, more than a month following the spacecraft’s launch. This success propels India into the league of nations that have softly landed on the moon — including the former Soviet Union, the US, and China. Moreover, this triumph bestows India with the distinction of being the only country to land on the Moon’s southern pole, a region teeming with unexplored potential to unveil insights about the Moon’s atmosphere and lay a foundation for future space exploration ventures.

Earlier this month, Russia had vied for this accomplishment by launching Luna-25, intended for a soft landing on the lunar south pole before Chandrayaan-3. However, the Russian mission ended in failure as Luna-25 crashed into the Moon after losing communication with Roscosmos, Russia’s space agency.

The Indian Space Research Organisation (ISRO) launched Chandrayaan-3 through its ‘Launch Vehicle Mark-III’ on July 14 from the Satish Dhawan Space Centre on Sriharikota Island in South India.

Chandrayaan-3, the third iteration of India’s Chandrayaan mission, seeks to showcase safe landing and mobility on the Moon’s surface, along with conducting on-site scientific investigations. Developed on a budget below $75 million, the spacecraft comprises a propulsion module, a lander, and a rover equipped with seven scientific instruments.

To address previous challenges, the lander of Chandrayaan-3 incorporates enhanced sensors, software, and propulsion systems. Rigorous simulations and additional tests were conducted by ISRO to ensure the lander’s durability and ensure a successful landing.

The lander’s experiments encompass seismic vibrations, near-surface plasma, lunar temperature, thermal conductivity, elemental composition, and spectral signatures of Earth. The knowledge gained from Chandrayaan-3 will contribute to understanding the lunar surface ahead of human exploration.

The rover of Chandrayaan-3 is identical to its predecessor, Chandrayaan-2. Both the lander and rover are designed for a lunar day’s mission life, equivalent to 14 Earth days.

Chandrayaan-3 follows India’s initial lunar mission launched in 2008, which discovered water molecules in the lunar atmosphere. Although Chandrayaan-2’s lander-rover crash-landed, its orbiter continues to study the Moon. The Chandrayaan-2 orbiter played a pivotal role in pinpointing the Chandrayaan-3 landing site and maintains communication with Earth for relaying signals to the lander.

Additional Efforts

In recent years, India has displayed a robust interest in space exploration, marked by the contributions of over a hundred space-tech startups. The nation has made strides in launch vehicles, satellites, and Earth imaging technologies. With the introduction of a space policy, New Delhi aims to foster public-private collaborations in space endeavours.

ISRO’s agenda extends beyond Chandrayaan-3, encompassing the Gaganyaan human spaceflight mission and the Aditya L1 solar observatory project. In a partnership with NASA, India has embraced the Artemis Accords and is on track to train its astronauts at NASA’s Johnson Space Center, aiming to send them to the International Space Station next year.

Furthermore, ISRO and NASA are collaborating on a low-Earth observatory (LEO) slated for 2024, designed to map the planet within 12 days. This endeavour will provide vital data for analyzing shifts in ecosystems, ice masses, vegetation biomass, sea levels, and natural disasters.

The post India Scripts History as Chandrayaan-3 Awakens Sleepy Lunar South Pole appeared first on Analytics India Magazine.