Audio journalism app Curio can now create personalized episodes using AI

Audio journalism app Curio can now create personalized episodes using AI Sarah Perez @sarahintampa / 10 hours

Curio, a startup building a platform that turns expert journalism into professionally narrated content, is embracing AI technology to create customized audio episodes, based on your prompts. The company today already has a large catalog of high-quality journalism licensed from partners like The Wall St. Journal, The Guardian, The Atlantic, The Washington Post, Bloomberg, New York Magainze, and others, which it leveraged to train its AI model, powered by OpenAI technologies. This allows Curio users to now ask its new AI helper, “Rio,” a question they want to learn more about, then have it return a bespoke audio episode that includes only fact-checked content — not AI “hallucinations.”

The company is also today announcing an additional strategic investment from the head of TED, Chris Anderson, a prior investor in Curio’s Series A round. Ahead of this, Curio had raised over $15 million from investors including Earlybird, Draper Esprit, Cherry Ventures, Horizons Ventures, 500 Startups, and others.

Anderson’s new contribution amount is not being disclosed, but Curio says he’s a “significant investor.”

Here's an amazing new use of AI: to create a custom audio episode of the most interesting recent magazine and newspaper articles. It comes from Curio, a startup I'm proud to be an investor in. https://t.co/gsBMRL1yLL

— Chris Anderson (@TEDchris) May 17, 2023

Founded in 2016 by ex-BBC strategist Govind Balakrishnan and London lawyer Srikant Chakravarti, Curio’s original concept was to offer a subscription-based service that provides access to a curated library of journalism translated into audio. To do so, the company partnered with dozens of media organizations to license their content, which is then narrated by voice actors and added to the Curio app. The experience is an improvement over the news audio offerings provided by services like Pocket, where users save articles to listen to later, as Curio’s content is read by real people, not robotic-sounding AI voices.

With the addition of its AI feature, Curio is now able to curate custom audio as well, on top of its hand-picked selection of audio journalism. The company believes this could become a powerful use case for AI at a time when there are legitimate concerns about AI chatbots providing false information or making up facts when they don’t know how to generate the right answer — something that’s called a “hallucination.” Already, we’ve seen falsehoods provided by AI chatbots when both Google and Microsoft demonstrated their new AI search tools, for instance.

Curio’s AI, on the other hand, won’t return anything it “makes” up, as it’s combining audio clips from across its catalog in response to users’ queries, effectively creating mini podcast episodes that allow you to explore a topic through quality, fact-checked journalism.

The company suggests you could use the AI feature via prompts like, “tell me about the possibility of peace in Ukraine,” “what is the future of food?,” “tell me about the U.S. debt ceiling,” “tell me why Vermeer is so great,” or “I have 40 minutes, update me on AI,” for example.

Image Credits: Curio screenshot on web

However, the AI can’t return information on breaking news, as it takes time for it to translate news articles into narrated audio. But could be used to explore various topics in more detail.

“We are trying to create from, a technical perspective, an AI that doesn’t hallucinate,” explains Curio’s Chief Marketing Officer, Gastón Tourn. “And the second thing that is interesting is this idea of unlocking knowledge from journalism — from news — because when you ask questions, it actually also proposes articles from, maybe from a few years ago, but they’re still super relevant to what’s going on right now.”

In addition to the media brands mentioned above, Curio also has relationships with The Economist, FT, WIRED, Vox, Vulture, Scientific American, Fast Company, Salon, Aeon, Bloomberg Businessweek, Foreign Policy, The Cut, and others — in total, over 30 publications are supported. (The New York Times, we should note, is not one of them. And the company launched its own audio journalism app today, as it turns out.)

To get started with the new Curio AI, you’ll type your question or prompt into the box provided, as if you were interacting with an AI chatbot, like ChatGPT. (Curio relies on OpenAI’s GPT 3.5 model, we understand.) This feature is available both on the web and in Curio’s mobile apps.

To create the personalized audio episode for you, Curio crunches through over 5,000 hours of audio, but this all takes just a few moments of processing from the user’s perspective. This results in a custom audio episode that includes an introduction along with two articles from Curio’s publications.

Curio itself is a premium subscription service priced at $24.99 per month (or $14.99/mo if paying for a year upfront). However, the AI feature is free to use, for the time being. The company says that’s because it wants to get “Rio” into the hands of as many people as possible, so it can learn. For instance, it’s looking to understand what length users prefer for these personalized episodes, though right now it’s leaning toward shorter articles.

Later, Curio may add more features — like the ability to share your episodes with others or get suggestions based on what other users are asking about.

“We don’t see AI as a curation tool,” notes Tourn. “We see it more as a discovery tool. We think what AI does is unearth content that is super interesting and finds ways to relate to it, but the curation is still human and the voices are still human.”

The company today has a customer base of thousands of subscribers, and a million-plus app downloads, but the AI addition may prompt the app to gain more traction as users explore this unique use case for AI. The company is forecasting a reach of 100K paid subscribers by year-end.

Curio, the curated audio platform for journalism, has closed $9M Series A funding

Updated, 5/17/23, 12:57 PM ET to include forecast

Google Goes Silent on Quantum Computing Project. What’s Cooking?

Google has made a recent breakthrough by successfully observing non-Abelian anyons, quantum particles with exotic statistical properties that defy the commutative property of multiplication. The application of non-Abelian anyons can revamp the field of quantum computing by improving its resistance to noise and better topological quantum computation.

In February 2023, Google surprised the community with a significant breakthrough by successfully demonstrating the capability to lessen errors in calculations while simultaneously increasing the quantity of physical quantum bits in a “logical qubit”.

The legacy of quantum computing at Google goes back to 2014 when the company established a dedicated quantum AI lab. Fast forward to 2019, Google made a groundbreaking revelation by attaining “quantum supremacy” through its Sycamore quantum processor. Since then, Google has persistently pushed the boundaries of quantum computing.

In 2022, the tech giant announced the creation of its second-generation quantum processor, known as Bristlecone, which surpasses the capabilities of Sycamore and possesses the capacity to undertake even more intricate calculations.

However, Google’s recent breakthrough of observing non-Abelian anyons came a few days after Microsoft, JPMorgan, and Nvidia announced that they have tied up with Honeywell-backed Quantinuum to launch their H2 quantum processor. The processor incorporates non-Abelian anyons through intricate braiding techniques.

Competitors Run the Show

Besides partnering, Microsoft is also using non-Abelian particles in their quantum computing approach. Moreover, IBM has already embarked on an early investment in this domain by introducing Watson X, a powerful AI and data platform tailored for enterprises to amplify the influence of AI throughout their operations.

At the Cleveland Clinic, IBM has installed the first US-based, private-sector quantum computer, and plans to introduce the Heron processor later this year. On the other hand, Microsoft has collaborated with AMD, a chip company, to develop their own in-house AI processors known as ‘Athena’, in an attempt to stay ahead of the game.

Additionally, AWS has launched two specialised skill development programs in India, which are specifically focused on advancing quantum computing.

Tracing Back a Little

At the Google I/O 2022 conference, Google created quite a spectacle regarding quantum computing, showcasing a cutting-edge research facility named the Google Quantum AI campus in Santa Barbara, California. They also released the TPUv4 pod, a revolutionary powerhouse composed of custom-made 5 nm TPU chips, resulting in 18 times faster training speed compared to its predecessor.

But at this year’s Google I/O, it was just the introduction of the second generation of its custom AI chip, the Tensor G2. Built on a 5nm processor, it includes new ML accelerators designed to improve the performance of AI-powered features on Pixel devices. Google Cloud has also announced the new A3 supercomputer virtual machine.

However, this year, it seemed that Google deliberately chose to withhold the announcement of non-Abelian anyons. But why so?

Read more: After Google, Microsoft Targets Nvidia

AI Takes the Centre Stage

In an interesting turn of events, Google chose to ignore quantum and redirect their unwavering focus towards reclaiming dominance in the AI race. It unleashed a slew of AI advancements, including PaLM 2, MedPaLM 2, Workspace, Duet AI, Vertex AI, immersive map features and more.

However, ever since Google sounded the internal alarm by issuing ‘Code Red’, it was evident that the team, already gripped by panic, had descended into chaos. Amid the aftermath of the disastrous Paris event, the I/O conference proceeded almost smoothly, yet it was palpable that Google was trying too hard to wage its battle. Chief Sundar Pichai has taken a particular liking to the two-letter acronym AI that he uttered more than 27 times.

Maybe it is their ‘bold and responsible AI’ approach that is backfiring. “We believe our approach to AI must be both bold and responsible. To us that means developing AI in a way that maximises the positive benefits to society while addressing the challenges, guided by our AI Principles,” Google shared on their website.

Google has always been very cautious in its approach. But it remains sluggish in catching up with the swift tech advancements. In a recent interview with The Verge, Pichai commented on their ‘bold and responsible AI’ that looks like is backfiring. “We took time to ensure we got it (the technology) right, especially considering the widespread use of their products in critical situations. It is a matter of importance and ensuring that we approached it correctly,” he said.

So, if Google really wants to build a useful quantum computer by 2029, it better get back to focusing on the right pointers, or else it is going to lose that race, too. The stakes are high, and time is unforgiving.

Read more: Google-backed Startups Are Using OpenAI’s GPT, Should it be Worried?

The post Google Goes Silent on Quantum Computing Project. What’s Cooking? appeared first on Analytics India Magazine.

Union AI raises $19.1M Series A to simplify AI and data workflows with Flyte

Union AI raises $19.1M Series A to simplify AI and data workflows with Flyte Frederic Lardinois @fredericl / 7 hours

Union AI, a Bellevue, Washington–based open source startup that helps businesses build and orchestrate their AI and data workflows with the help of a cloud-native automation platform, today announced that it has raised a $19.1 million Series A round from NEA and Nava Ventures. The company also announced the general availability of its fully managed Union Cloud service.

At the core of Union is Flyte, an open source tool for building production-grade workflow automation platforms with a focus on data, machine learning and analytics stacks. The idea behind the platform was to build a single platform that teams can then use to create their ETL pipelines and analytics workflows, as well as their machine learning pipelines. And while there are other projects on the market that offer similar orchestration capabilities, the idea here is to build a tool that is specifically built for the needs of machine learning teams.

Flyte was originally developed inside of Lyft, where Union AI CEO and co-founder Ketan Umare developed some of the company’s earliest machine learning–based ETA and traffic models in 2016. At the time, Lyft had to glue together various open source systems to put these models into production.

“We got something running, but behind the scenes, it was a man behind the curtain. It was happening, but it was a lot of work,” Umare said. “What we learned was that other teams in the company were also struggling — and these were massive teams. And what happens when teams struggle is that they cannot keep the talent on. That’s a big problem, but what was the root of that? They were not able to deliver their things and they were not able to articulate why they were not able to deliver. It turns out to be an infrastructure problem.”

Image Credits: Union.ai

So he set out with a small team to build out the infrastructure tooling to make it easier for these teams to build their models and put them into production. But there was always friction between the software engineers and machine learning specialists. “The reason was that — at least in the way I have distilled it — I think software and machine learning systems or AI products are inherently different beasts,” Umare argued. In his view, software typically matures over time while AI models tend to deteriorate. These models, he noted, also often change based on external factors that users have little control over. “So you cannot use the same infrastructure that you use for [software deployments],” he said.

At that point, the team decided to open source its work in the form of Flyte and work with others to build out a more machine-learning-native platform.

As is so often the case, Umare and four other members of the original Flyte team then decided to build a startup around these core ideas and the Flyte open source project, with Union AI launching in late 2020.

Currently, Flyte is being used by companies like blackshark.ai, HBO, Intel, LinkedIn, Spotify, Stripe, Wolt, and ZipRecruiter.

“The fun thing about working with these large companies — what we do in the open source — is that we are working on some of the biggest models on our platform. So we know it works and we didn’t have to build anything specifically because we’ve been doing this for years. We just had to extend a couple of things,” Umare said.

“Based on a single team, we see 10x more offline training jobs dispatched from Flyte, and that results in 5x more frequent model releases with sizable business gains,” said Mick Jermsurawong, a machine learning infrastructure engineer with Stripe. “I think the realization here is that ML productivity is not a nice to have but actually a business requirement.”

But the Union AI platform isn’t simply building Flyte-as-a-service. The team also built Pandera (a framework for data testing) and Union ML (a framework that sits on top of Flyte and helps teams build and deploy their models using their existing set of tools). Union Cloud combines all these elements and layers a set of enterprise tools, such as single sign-on, on top of it.

“Machine learning, and especially large language models, raise big issues around privacy and information security. Companies are becoming increasingly wary of using services where they lose control over what precisely happens with their data,” said Greg Papadopoulos, venture partner, NEA. “Combining the power of big models with rich company data has to be handled with care — that’s one of the reasons why we’re so excited about the progress made by the Union.AI team, first with Flyte and now with Union Cloud. This is exactly what people are demanding and a real differentiator: Let me exploit the power of large language models while maintaining control and ownership of my data.”

How this Startup Utilises Generative AI to Transform Online Branding

The COVID-19 pandemic has revolutionised the way we shop, making it almost imperative for brands to establish an online presence. In today’s digital age, it has become a common practice for consumers to conduct online research before making a purchase. Therefore, being easily discoverable and accessible online is essential for brands to capture the attention and trust of potential customers.

A Gurgaon-based company is helping brands in India to do so. Called SingleInterface, it is India’s largest hyperlocal marketing-to-commerce software for storefronts. The company is leveraging Generative AI to help brands monitor, manage and improve visibility of their stores on search engines.

“We have developed our own AI/ML stack using TensorFlow, but the emergence of ChatGPT has changed the landscape. The process of creating custom models has become more accessible and standardised, thanks to this advanced technology. It’s an interesting development as ChatGPT offers superior capabilities compared to previous approaches,” Girish Laxminarayana, CTO, SingleInterface, told AIM.

The company is converting a lot of its technology stack to leverage the power of Large Language Models (LLMs) like GPT3.5 and GPT-4.

“We have gone for a hybrid model. Our chatbot functions as nodes rather than actual answering bots. For example, if somebody is in our IVR system, and if it sounds like this virtual person has very high intent to purchase, he/she is automatically routed by the AI to a human agent and this has actually helped us a lot more than just trying to answer something,” he said.

Scaling with Generative AI

SingleInterface is currently using LLMs to try to replicate the personalities of a sales agent. “Our focus has been on replicating the salesmanship and personality of individuals using AI. We are developing a system with GPT that goes beyond simply answering questions.

“It considers factors like the specific salesperson’s expertise and past performance to determine the best person to handle a particular conversation. By leveraging AI, we aim to guide the conversation in a direction that aligns with the salesperson’s previous successes,” Laxminarayana said. However, he did stress that it is an ongoing area of research for us.

Further, it is also exploring the use of Generative AI to create hyper personalised advertisements. “We already have a feature called Spotlight, which is approximately used by 25% of our customers already , generating significant revenue. As we expand our services, we are considering leveraging AI to intelligently generate ads.

“This includes the possibility of automatically generating ads for platforms like Facebook and Google, including images. While we are still evaluating the practicality of this approach, it is a major focus for us this year, and we plan to implement it soon,” he said.

About SingleInterface

Founded in 2014, the company initially started by helping brands build their digital presence. “ So the use case is basically if somebody’s searching on the internet, for example-HDFC Bank, which has presence in nearly 14,000 locations, we ensure that their locations and information are accurately represented across various platforms, including search engines like Google and Bing, as well as other channels like WhatsApp” Laxminarayana said.

So, doing that, the company also realised that its importance for brands to converse and stay connected with their customers. “So we build an engagement layer as well. For instance, when people search for a Nissan showroom near their location, our platform ensures that the relevant Nissan showrooms appear in the search results. Users can even initiate a chat with the showroom directly, and all the conversations are routed to us for effective management.”

Singleinterface offers various deployments, including an automated IVR system that guides users through a structured menu of Nissan’s offerings. Through this system, customers can book services, schedule test drives, and access brochure downloads, among other features.

“Further, we also realised that we can help brands with transactions. So for Nissan we processed close to about INR12 crores just from one chat channel of business for them recently, in the last eight months,” he said.

So far, SingleInterface is helping over 200 plus brands compete and grow in the digitally connected ecosystem. In the automobile industry, its clientele includes brands such as Audi, NISSAN, Ford, Datsun, Honda, Skoda, MG, and Tata Motors among others. In The BFSI sector, they help brands such as HDFC Bank, Axis Bank, ICICI Bank, Tata Capital, and Muthoot.

The post How this Startup Utilises Generative AI to Transform Online Branding appeared first on Analytics India Magazine.

How to use ChatGPT in your browser with the right extensions

sample-image-16-9-red.jpg

Beyond using OpenAI's ChatGPT at its website or through an app, you can access it directly from your browser via an extension. Such extensions as ChatGPT for Chrome, ChatGPT everywhere, Merlin, Monica, WebChatGPT, AI Anywhere for ChatGPT, and Talk-to-ChatGPT integrate with OpenAI's chatbot so that you can more quickly and easily submit your requests and prompts.

Also: ChatGPT fraud is on the rise: Here's what to watch out for

As these extensions communicate with ChatGPT, you'll typically need an account with OpenAI. If you don't yet have one, head to the ChatGPT sign in/sign up page and click the Sign up button. The options accessible with each extension may also vary based on whether you have a free or paid ChatGPT account. Now let's check out a few extensions.

More on AI tools

Amazon refreshes its Echo lineup, adds a Wi-Fi extender and smart speaker combo, Echo Pop

Amazon refreshes its Echo lineup, adds a Wi-Fi extender and smart speaker combo, Echo Pop Sarah Perez @sarahintampa / 7 hours

After last fall’s Amazon hardware event which brought us a handful of new Echo devices, like the Dot with the clock and other minor updates, Amazon today is rolling out a broader refresh of its Echo lineup, which includes a new form factor with the arrival of the Echo Pop. While the current Dot is more of a rounded, bubble-shaped Echo, the Pop is a semi-circle and comes in teen-friendly colors like lavender and teal, while also serving to extend your home’s Wi-Fi network thanks to eero.

Alongside the launch, Amazon also unveiled other updated Echo devices, including the Echo Show 5, Echo Buds, and Echo Auto, and offered more context about how it sees AI impacting Alexa’s future.

Until recently, Alexa’s future was seemingly uncertain, given reports of the billions of dollars the division has lost, its failures to inspire voice shopping, and, later, the larger cost-cutting measures at the retail giant, which included layoffs in Amazon’s devices group.

However, SVP of Alexa Rohit Prasad downplays the impact those cuts had on Alexa, telling TechCrunch that of the 2,000 people let go within Amazon’s Devices and Services organization, only “a fraction of those were in Alexa,” he says. “Contrary to some of the things written, it was very small in context,” Prasad argues. “In terms of our roadmap and our conviction, Alexa is one of the biggest investments at Amazon and our conviction has only grown — especially in this time of how exciting AI is and what can be a quantitatively different, better, and more useful Alexa for our customers.”

Or, reading between the lines, Alexa’s AI potential outpaces that of the devices where it currently lives, like the Echo speakers, given its ability to extend its learnings elsewhere inside Amazon.

“I’m very optimistic that…the AI advances will be massive, but we are actually contributing to the Amazon businesses,” Prasad adds. “And I believe that Alexa is well on its trajectory to be that personal AI — which will also be a successful business for us.”

As for the products themselves, the new $39.99 Echo Pop, which could seemingly eat into the Dot’s market share, has a front-facing directional speaker and is powered by the Amazon AZ2 Neural Edge processor. It also comes eero Built-in, allowing the device to add up to 1,000 square feet of coverage to an existing eero Wi-Fi network, giving the device a dual purpose.

In addition to “Lavender Bloom” and “Midnight Teal,” the Pop comes in black and white.

Amazon says the Pop is in addition to, and not a replacement for, its existing Dot lineup, and the Dot and Dot with Clock remain available.

Image Credits: Amazon

Meanwhile, the $89.99 Echo Show 5 and $99.99 Echo Show 5 Kids edition both got an upgrade that makes them now 20% faster than the prior generation. Both also include a re-engineered microphone array, a faster AZ2 Neural Edge processor, and an upgraded speaker system that promises a doubling of the bass and clearer sound.

And they now support the new smart home standard Matter, as does the new Echo Pop.

The kids’ device — available in the U.S., U.K., and Germany — is slightly more expensive as it includes a year of the subscription service Amazon Kids+, offering ad-free and age-appropriate apps, games, Alexa skills, and audiobooks, plus other kid-friendly features like the AI-powered story making tool, Create with Alexa. The latter has two new themes: Dinosaurs and Jazzy Jungle.

Image Credits: Amazon

Amazon’s $49.99 Alexa earbuds, Echo Buds, are also being updated with richer sound via a 12mm dynamic driver, improved clarity, and a longer-lasting battery.

The new Buds get up to 5 hours of continuous playback, and up to 20 total hours of listening from a full case charge — the latter is up from 15 hours, in the prior generation. The earbuds also feature two microphones and a voice detection accelerometer, customizable tap controls, a VIP filter, and multipoint pairing.

Image Credits: Amazon

The existing Echo Auto for vehicles is also now being made available to customers in Australia, Canada, the United Kingdom, Germany, France, Italy, Spain, and Japan.

Image Credits: Amazon

As for Alexa, Amazon previously shared some of its thoughts about its smart assistant’s future during last month’s first-quarter earnings call with investors. Here, CEO Andy Jassy spoke of the company’s work to build a more “generalized and capable” large language model (LLM) to power Alexa and said the new LLM would help Amazon work towards its goal of building the best personal assistant — not just a smart speaker.

Amazon is developing an improved LLM to power Alexa

Already, many of Alexa’s experiences have been powered by a large 20 billion-parameter language model with encoder-decoder architecture, which it says is the biggest encoder-decoder model ever built. And with the introduction of Transformer-based large-scale multilingual models, the Alexa Teacher Model (or AlexaTM as it’s called), can transfer what it knows to another language without human supervision, helping Alexa get smarter, faster. The model is also used in conversational skills like Create with Alexa, where kids and Alexa come up with stories together.

Still, despite these AI advances, most customers are still using their Echo devices for basic tasks, like controlling their smart home or getting updates about their Amazon orders. Though sales of Alexa-enabled devices have now topped 500 million, they haven’t helped Amazon sell more merchandise, reports found. But Prasad says Alexa usage is still growing, with Alexa interactions up 35% year-over-year. People using the assistant to get information is up by more than 50% year-over-year, he notes.

As for what comes next, the exec only offers the broadest of hints.

The goal, he says, is for Alexa to provide better answers, but also those that are grounded in facts — a concern with modern AI. Plus, Alexa’s answers should be personalized to the end user. For example, if you told Alexa you felt hot, it should offer to turn your thermostat down, not suggest you go to the beach to cool off.

“We already have announced the Bedrock service which is for enterprise use cases,” Prasad says, referring to AWS’ new tools for building with generative AI. “And then, for Alexa — for the consumer use cases, like powering all the conversational experiences on Alexa — you’ll find qualitatively different elements of experiences that we’ll be launching along the year,” he teases.

Is AI Copyright Really Necessary? 

Is AI Copyright Really Necessary?

Copyright infringement is increasingly becoming the talk of the AI world with governments imposing laws trying to rein in AI makers and their AI systems. Recently, during his appearance before Congress, OpenAI CEO Sam Altman agreed with the need for government regulation for building responsible AI systems, this included giving proper attribution and rights to the original content creators.

“We have been talking with the artists and content owners about what they think about this [copyright and attribution],” said Altman. “I think people totally deserve control over how their content and likenesses are used in this technology.”

Altman said that the creation of AI models like these was meant for helping and benefiting the creators and artists, and not stealing ownership. “That is exactly what the economic model is,” said Altman. But the same cannot be said about the government.

On the other hand, it seems as though the artists and creators that the company wants to assist with its products is actually against its adoption. Similar to how the Writer’s Guild of America was protesting against the use of ChatGPT in script writing, major music labels are sending notices to streaming services to take down “AI soundalikes“.

While Altman and other AI makers are concerned about aligning their models with the economic and ethical objectives, the government might be in this to get more control over the developing technology, and shape its future.

During the same hearing, Gary Marcus, the AI sceptic, spoke about how there are no existing laws that talk about the copyright issue with these generative AI models. To this, Senator Josh Hawley replied, “We can just say that section 230 of copyright law can be applied to models like this.”

If the government gets the power to control AI development, with the reason of “protecting” copyright owners, the very high pace of progress of these AI models would just stop. Moreover, Hawley’s statement makes it clear that the government wants to impose an all-out copyright ban on generative AI models, which can be tricky as it would not address the concerns but merely make the government rule over AI. Do we really want that?

David Holz, the founder of Midjourney, doesn’t really care about copyright infringement. In an interview with Forbes, Holz said that he is using images without seeking permission from the owners. He explains that it is impossible to do so with the huge dataset.

Regulations with Scepticism

Under the new proposed draft of the AI act by the European Parliament, any content generated by AI models like Midjourney or Stable Diffusion will have to mention that it has used copyrighted material for training, and appropriately attribute the original creator. This sounds like fair practice unless looked closely.

Under Article 28b (4) of the draft, it is written that the foundational model used in AI systems to generate either images, text, audio, or video, should comply with the transparency obligations of Article 52 (1), and publicly disclose a detailed summary of the training data under the copyright law.

If these generative AI companies have to go back and address all the corpus of data it has been trained on, which is essentially scrapped from the internet, and is put under the copyright law, they would not be able to create any more AI models. Furthermore, which specific image or generated content utilises which specific fed data is untraceable even for the companies that created the models. Or even if they can, it is extremely difficult to trace and thus difficult to attribute.

For example, Shutterstock took a step to address proper licensing of stock images used for training AI generated images. To compensate the artists, the company would pay the “fair share” through royalties each time the image was used for generating art. Though there was no precise method to explain how the model would work for this, the company did launch a contribution fund to compensate the artists. This received a lot of criticism as it meant treating the artists’ work as tokens.

Moreover, this also put the makers of these models, like OpenAI and StabilityAI, under a lot of pressure, hindering further development of this technology and risking the potential benefits. When it comes to copying images or using someone else’s work manually or through Photoshop-like softwares, the copyright holders will know for sure that their work was used. But when it comes to these generative models, the amount of data they are trained on will make their offices fill up with lawsuits.

According to the new act, anyone breaking the copyright act as mentioned in Article 28 and 52, the foundation model providers are liable to a fine of “€‎10 million or 2% annual turnover, whichever is higher.”

On the other hand, it is true that these technologies bring up several ethical questions with them about training on unauthorised and private data. But there needs to be a fair balance between copyright infringement and building generative AI models.

Is there a balance?

Amid the concerns around the European Parliament’s proposed AI act and the US Senate discussion, the Copyright Office of the US has specified some guidelines regarding the registration of works created solely by machines. For instance, if an AI technology generates intricate written, visual, or musical works based on a prompt from a human, the traditional elements of authorship in such works will not be registered.

The reason behind this is that AI technology determines the expressive elements, rather than the human user, making the generated content ineligible for copyright protection.

Nonetheless, a work that incorporates AI-generated material can still be eligible for copyright protection if it includes a sufficient amount of human authorship. For instance, if a human creatively selects or arranges AI-generated content or modifies it to the point where the modifications meet the standard for copyright protection, then copyright protection applies only to the aspects that were contributed by the human.

In a similar notable event, a macaque monkey named Naruto captured selfies using a photographer’s camera. Subsequently, the photographer faced a lawsuit from the People for the Ethical Treatment of Animals (PETA) who contended that Naruto, the monkey, was the rightful owner of the photographs, and thus the photographer was infringing on Naruto’s copyright.

However, the Court of Appeals for the 9th Circuit ruled that nonhuman entities are not eligible for copyright protection. This decision aligned with the US Copyright Office’s definition of an “original work”, which explicitly requires a “human author” to be involved.

This might be the way forward for generative AI as well. Instead of following the European path, the US should stick with the current copyright laws. Generative AI companies should adopt techniques and make ways to incentivize artists and original authors. This would encourage AI innovation with sufficient regulation, and not too much control by the governments.

The post Is AI Copyright Really Necessary? appeared first on Analytics India Magazine.

5 ChatGPT Features to Boost your Daily Work

5 ChatGPT features to boost your daily work
Image by Editor

ChatGPT has completely changed the way to develop code. However, most software developers and data professionals still do not use ChatGPT to improve — and ease — their work.

This is why I am outlining here 5 different features to increase both velocity and quality in your daily work.

You can use them in your day-to-day work. Let’s discover together how 👇🏻

⚠️ Warning: You should never use critical code or information with ChatGPT.

#1. Generate your Coding Skeleton

ChatGPT is my secret weapon when it comes to building new projects from scratch. With just a few prompts, it can generate a skeleton of the code I need, complete with my chosen technologies, frameworks, and versions. Not only does it save me at least an hour or so of work each time, but it also helps to keep my documentation — or team’s one — organized and up-to-date. It’s like having an extra set of hands on the team!

For instance, imagine I want to use some external API. I can ask ChatGPT to draft an initial code structure for me.

Generate boiler plate code that integrates an extern API using Python.

And ChatGPT answers right away:

5 ChatGPT features to boost your daily work
Screenshot ChatGPT chat. ChatGPT gives me a code skeleton.

Or I can even ask for a skeleton to scrape LinkedIn using Selenium, and ChatGPT gives me the following output.

from selenium import webdriver  from selenium.webdriver.common.keys import Keys  import time    # Initialize the web driver  driver = webdriver.Firefox()    # Navigate to LinkedIn  driver.get("https://www.linkedin.com/")    # Find the email and password input fields  email = driver.find_element_by_id("username")  password = driver.find_element_by_id("password")    # Enter your credentials  email.send_keys("YOUR_EMAIL")  password.send_keys("YOUR_PASSWORD")    # Click the login button  driver.find_element_by_xpath("//button[@type='submit']").click()    # Wait for the page to load  time.sleep(5)    # Navigate to the profile of the desired user  driver.get("https://www.linkedin.com/in/USER_PROFILE_NAME")    # Extract the information you want from the page  name = driver.find_element_by_xpath("//span[@class='inline t-24 t-black t-normal break-words']").text  location = driver.find_element_by_xpath("//span[@class='t-16 t-black t-normal inline-block']").text    print(name)  print(location)    # close the browser  driver.quit()

Easy, right? Try to challenge ChatGPT with any project you can imagine.

#2. Research and Compare

Making decisions on how to implement something can be tough, especially when there are multiple options to choose from. My go-to method is to create a basic proof of concept for each approach and then compare them. But, with the help of ChatGPT, this process just got a lot easier.

I can now directly ask it for its expert opinion on which option or library is best for my code development. This saves me time and effort in the decision-making process and ensures that I am using the best tools for the job.

Let’s imagine I want to work with geospatial data but I am not sure whether I should use Geopandasor a Plotly. I can ask ChatGPT to compare for me — with a type included 😉 — and it answers right away the main differences between both libraries.

5 ChatGPT features to boost your daily work
Screenshot ChatGPT chat. ChatGPT explains to me the differences between geopandas and plotly.

If now I want to scrape a website, I can ask what’s the best library to do so. ChatGPT answers with the most popular web-scraping libraries in Python.

5 ChatGPT features to boost your daily work
Screenshot ChatGPT chat. ChatGPT explains the most popular scraping website

You can even ask what’s the best option for the website you want to scrape — even though ChatGPT will most likely warn you that it will be against that website’s content policy — so just be careful.

What’s the best option to scrape a social network?

5 ChatGPT features to boost your daily work
Screenshot ChatGPT chat. ChatGPT explains the best option to scrape a social network.
#3. Understanding Code

We’ve all been there, struggling to understand a codebase that wasn’t created by us. Navigating through a complex and poorly-organized code — also known as spaghetti code — can be a frustrating and time-consuming task.

But, with ChatGPT, understanding a new codebase just got a lot easier. I can now simply ask it to explain the functionality of the code and understand it in no time. No more wasting valuable time and effort trying to decipher poorly-written code.

Let’s imagine I am trying to scrape Linkedin and I found a random code on the internet that is supposed to scroll down the Linkedin job offers website.

What does the following code do? [insert code here]

#We find how many jobs are offered.  jobs_num = driver.find_element(By.CSS_SELECTOR,"h1>span").get_attribute("innerText")  if len(jobs_num.split(',')) > 1:      jobs_num = int(jobs_num.split(',')[0])*1000  else:      jobs_num = int(jobs_num)    jobs_num   = int(jobs_num)    #Here I choose manually a number of jobs, so it wont take that long:  jobs_num = 1000;      #We create a while loop to browse all jobs.   i = 2  while i <= int(jobs_num/2)+1:      #We keep scrollind down to the end of the view.      driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")      i = i + 1      print("Current at: ", i, "Percentage at: ", ((i+1)/(int(jobs_num/2)+1))*100, "%",end="r")      try:          #We try to click on the load more results buttons in case it is already displayed.          infinite_scroller_button = driver.find_element(By.XPATH, ".//button[@aria-label='Load more results']")          infinite_scroller_button.click()          time.sleep(0.1)      except:          #If there is no button, there will be an error, so we keep scrolling down.          time.sleep(0.1)          pass

And ChatGPT answers right away with a really detailed explanation of the code above. You can even ask for some more details in a given part of the code — just play a bit with ChatGPT and you’ll discover all the wonders it can make.

5 ChatGPT features to boost your daily work
Screenshot ChatGPT chat. ChatGPT explains the code I didn’t understand.
#4. Adding Comments to our Code

ChatGPT can also help us to improve the quality and maintainability of our own code. By asking it to add comments line by line, we can ensure that our code is properly documented before shipping. This eliminates the need for uncommented code and makes it easier for others to understand and work with our code.

ChatGPT can significantly improve readability, maintainability and collaboration with others.

Let’s imagine the previous code had no comments at all. We can just ask ChaGPT to add comments directly to it.

Add comments to the following code [insert code here]

# Find the number of jobs offered on the website  jobs_num = driver.find_element(By.CSS_SELECTOR,"h1>span").get_attribute("innerText")    # Check if the number of jobs contains a ',' and if so, convert it to number * 1000  if len(jobs_num.split(',')) > 1:      jobs_num = int(jobs_num.split(',')[0])*1000  else:      jobs_num = int(jobs_num)    # Convert jobs_num to an integer  jobs_num = int(jobs_num)    # Set the number of jobs to scrape to 1000  jobs_num = 1000;    # Initialize a counter variable to scroll through the jobs  i = 2    # Start a while loop to scroll through the jobs  while i <= int(jobs_num/2)+1:      # Scroll to the bottom of the page      driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")      # Increment the counter variable      i = i + 1      # Print the current number of jobs scrolled through and the percentage of completion      print("Current at: ", i, "Percentage at: ", ((i+1)/(int(jobs_num/2)+1))*100, "%",end="r")      try:          # Try to locate the "Load more results" button and click on it          infinite_scroller_button = driver.find_element(By.XPATH, ".//button[@aria-label='Load more results']")          infinite_scroller_button.click()          # Sleep for 0.1 seconds          time.sleep(0.1)      except:          # If the button is not present, sleep for 0.1 seconds and continue scrolling          time.sleep(0.1)          pass

Impressive right? No more code without comments! 😀

#5. Rewriting our Code using some Style

ChatGPT is not only a valuable tool for understanding unfamiliar code, but it can also help us to ensure our own code follows industry standards and conventions. By asking it to correct our code to conform with the Pep-8 convention — or even create a custom convention for our coding style — we can avoid the need for costly and time-consuming refactoring when merging code from different repos or teams.

This helps to streamline the collaboration process and make it more efficient. Overall, ChatGPT is a versatile tool that can improve the quality and maintainability of our codebase.

Is we ask ChatGPT to write the previous code using Pep-8 standard, it will directly gives us the refactorized code.

Can you rewrite the following code using Pep8 standard [Insert code here]

5 ChatGPT features to boost your daily work
Screenshot ChatGPT chat. ChatGPT giving our code following Pep8 standard.
Main Conclusion

I hope after this article you realize that ChatGPT can help us to be more productive and create even higher quality output. I know it can be easy to fall into the trap of thinking that AI may eventually take over our jobs, but the right kind of AI can be a powerful asset that can be used in our behalf.

However, it’s important to remember that critical thinking is still key when working with AI, just like it is when working with our human colleagues.

So, before you rush to implement AI-generated responses, make sure to take the time to review and assess them first. Trust me, it’s worth it in the end!

Let me know if ChatGPT surprises you with some other good features. I will read you in the comments! 😀

Josep Ferrer is an analytics engineer from Barcelona. He graduated in physics engineering and is currently working in the Data Science field applied to human mobility. He is a part-time content creator focused on data science and technology. You can contact him on LinkedIn, Twitter or Medium.

Original. Reposted with permission.

More On This Topic

  • What Is ChatGPT Doing and Why Does It Work?
  • How to label time series efficiently – and boost your AI
  • Boost your machine learning model performance!
  • Boost Your AI and ML Skills for Free at NVIDIA Conference
  • 7 Machine Learning Portfolio Projects to Boost the Resume
  • 6 Best Free Online Courses to Learn Python and Boost Your Career

An Intriguing Job Interview Question for AI/ML Professionals

In my last project, I had to come up with some code and algorithm to solve an interesting problem. I realized that it could lead to some off-the-beaten-path job interview question. The problem is a fundamental one. The level ranges from elementary school to one of the most difficult unsolved problem of all times, depending on how deep you dig into it. It is a question that ChatGPT could not invent, at least until today. And definitely not a question that it could answer, if you were allowed to use this app in a job interview.

Now I discuss my interview question in details, with four possible difficulty levels.

Level 1: an elementary school problem

The question could be: what is the value of 327 multiplied by 1149? You need to explain why your computation works, and offer an alternative method, hopefully faster. With the extra requirements, it quickly becomes a question for high school students, or a warm-up question in a real job interview. In 50 years, it could become a very challenging question, if nobody learn multiplication tables anymore.

abacus
This tool can be used to solve the level 1 problem

Level 2: a job interview question

At this level, it is a tech question of average difficulty in a typical job interview. The question is about writing code to solve this problem: given a number X and two positive integers p, q, compute the binary digits of the products pX and qX. It must work even if X has billions of digits. In addition, compute the correlation between the first n digits of pX and qX, for large n, assuming the digits of X are random. Do you notice a pattern? I provide the solution later in this article, as well as the reason why I am interested in this problem.

Level 3: the topic of a PhD thesis

I break down the problem into a number of steps. By digits, I mean the digits starting after the decimal point.

  • Prove that the answer to the level 2 question, regarding the correlation, is 1/(pq). This is true assuming that p, q are co-prime odd integers.
  • Assuming that the binary digits of SQRT(2) and SQRT(3) behave exactly like random bits, using the above result, prove that the corresponding two digit sequences are not correlated. For SQRT(12) by SQRT(75), the correlation is 1/10. Generalize. Illustrate with numerical computations involving one trillion digits, and convergence of the correlation to zero in the first example, and to 1/10 in the second.
  • Find a very fast algorithm to compute the binary digits for square roots of non-square integers. First, look at the integer square root concept, and Python libraries such as gmpy2. Then, see if you can do better. I actually have a solution to this problem.

Level 4: to win the Nobel prize in mathematics

According to the legend, there is no Nobel prize for math because a mathematician was carrying on an affair with Alfred Nobel’s wife. You may double-check on Wikipedia or ChatGPT. That said, there is the equally prestigious Fields Medal. You will win it if you prove that indeed, the digits of SQRT(2) and SQRT(3) are uncorrelated, or more specifically, that these digits can not be distinguished from pure random noise. This is level 4 for my question. Countless people including top geniuses have worked on this problem for decades, to non-avail. It is safe to say that there will be human beings walking on planet Mars before this problem gets solved – if ever.

I have been working on it for well over a decade, on occasion sharing progress with top experts in the field such as David Bailey. I am still nowhere close to a solution.

Solution to the level 2 problem

The computations of the digits in question are a main component of my new money game. You can check it out in my article “Synthetic Stock Exchange Played with Real money”, available here. The code below computes the binary digits of the products pX, qX, and the cross-correlation, for a random number X. In short, the answer to the level 2 question. The algorithm in question is known as grade-school multiplication.

# Compute binary digits of X, p*X, q*X backwards (assuming X is random)  # Only digits after the decimal point (on the right) are computed  # Compute correlations between digits of p*X and q*X  # Include carry-over when performing grammar school multiplication    import numpy as np    # main parameters  seed = 105  np.random.seed(seed)  kmax = 1000000  p  = 5  q  = 3    # local variables  X, pX, qX = 0, 0, 0  d1, d2, e1, e2 = 0, 0, 0, 0  prod, count = 0, 0     # loop over digits in reverse order  for k in range(kmax):         b = np.random.randint(0, 2)  # digit of X      X = b + X/2          c1 = p*b      old_d1 = d1      old_e1 = e1       d1 = (c1 + old_e1//2) %2  # digit of pX      e1 = (old_e1//2) + c1 - d1      pX = d1 + pX/2        c2 = q*b      old_d2 = d2      old_e2 = e2       d2 = (c2 + old_e2//2) %2  #digit of qX      e2 = (old_e2//2) + c2 - d2      qX = d2 + qX/2        prod  += d1*d2      count += 1       correl = 4*prod/count - 1        if k% 10000 == 0:            print("k = %7d, correl = %7.4f" % (k, correl))      print("np = %3d, q = %3d" %(p, q))  print("X = %12.9f, pX  = %12.9f, qX  = %12.9f" % (X, pX, qX))  print("X = %12.9f, p*X = %12.9f, q*X = %12.9f" % (X, p*X, q*X))      print("Correl = %7.4f, 1/(p*q) = %7.4f" % (correl, 1/(p*q)))  

About the Author

vgr2-1

Vincent Granville is a pioneering data scientist and machine learning expert, founder of MLTechniques.com and co-founder of Data Science Central (acquired by TechTarget in 2020), former VC-funded executive, author and patent owner. Vincent’s past corporate experience includes Visa, Wells Fargo, eBay, NBC, Microsoft, CNET, InfoSpace. Vincent is also a former post-doc at Cambridge University, and the National Institute of Statistical Sciences (NISS).

Vincent published in Journal of Number Theory, Journal of the Royal Statistical Society (Series B), and IEEE Transactions on Pattern Analysis and Machine Intelligence. He is also the author of “Intuitive Machine Learning and Explainable AI”, available here. He lives in Washington state, and enjoys doing research on stochastic processes, dynamical systems, experimental math and probabilistic number theory.

A Beginner’s Guide to Anomaly Detection Techniques in Data Science

A Beginner's Guide to Anomaly Detection Techniques in Data Science
Image by Author

Anomaly Detection is a very important task that you can meet or you’ll meet in the future eventually if you are dealing with data. It’s very applied in many fields, like manufacturing, finance and cybersecurity.

Getting started with this topic for the first time can be challenging by yourself without a guide that orients you step by step. In my first experience as a data scientist, I remember that I struggled a lot to be able to master this discipline.

First of all, Anomaly Detection involves the identification of rare observations with values that deviate drastically from the rest of the data points. These anomalies, often called outliers, are a minority, while most of the items belong to the normal class. This means that we are dealing with an imbalanced dataset.

Another challenge is that most of the time there is no labelled data when working in the industry and it’s challenging to interpret the predictions without any target. This means that you can’t use evaluation metrics typically used for classification models and you need to undertake other methods to interpret and trust the output of your model. Let’s get started!

What is Anomaly Detection?

Anomaly detection refers to the problem of finding patterns in data that do not conform to expected behavior. These nonconforming patterns are often referred to as anomalies, outliers, discordant observations, exceptions, aberrations, surprises, peculiarities, or contaminants in different application domains. Credit Anomaly Detection: A Survey

This is a good definition of anomaly detection in a few words. Anomalies are often associated with errors obtained during data collection and, then, they finish to be eliminated. But there are also cases when there are new items with a completely different variability compared to the rest of the data and there is a need for appropriate approaches to recognize this type of observation. The identification of these observations can be very useful for making decisions in companies operating in many sectors, such as finance and manufacturing.

What are the Types of Anomalies?

There are three main types of anomalies: point anomalies, contextual anomalies and collective anomalies.

A Beginner's Guide to Anomaly Detection Techniques in Data Science
Example of point anomaly. Illustration by Author.

As you may deduce, point anomalies constitute the simplest case. It happens when a single observation is anomalous compared to the rest of the data, so it’s identified as an outlier/anomaly. For example, let’s suppose that we want to make credit card fraud detection in the transactions of clients in a bank. In that case, a point anomaly can be considered a fraudulent activity of a client.

A Beginner's Guide to Anomaly Detection Techniques in Data Science
Example of contextual anomaly. Credit EPA. Modified by Author.

Another case of anomaly can be a contextual anomaly. You can meet this type of anomaly only in a specific context. An example can be the summer heat waves in the United States. You can notice that there is a huge spike in 1930, which represents an extreme event that happened in the United States, called Dust Bowl. It’s called that way because it was a period of dust storms that damaged the south-central United States.

A Beginner's Guide to Anomaly Detection Techniques in Data Science
Example of collective anomaly. Illustration by Author.

The third and last type of anomaly is the collective anomaly. The most intuitive example is to think about the absence of precipitations we are having this year from months in Italy. If we compare the data in the last 50 years, there haven’t ever been similar behaviours. The single data instances in an anomalous collective may not be identified as outliers by themselves, but all these data points together indicate a collective anomaly. In this context, a single day without precipitation is not anomalous by itself, while a lot of days without precipitation can be considered anomalous compared to the data of previous years.

What Machine Learning Models can be used for Anomaly Detection?

There are several approaches that can be applied to anomaly detection:

  1. Isolation Forest is an unsupervised and non-parametric technique introduced by Fey Tony Liu in 2012. Like the random forest, it’s an ensemble learning method that trains decision trees in a parallel way. But differently from other ensemble methods, it is specialized in isolating the anomalies from the rest of the items. The assumptions behind this approach constitute the reason for the effectiveness of this approach: (1) the anomalies are part of a minority class compared to the normal data which are more numerous; (2) the anomalies tend to be found fastly with the shortest average path.
  2. Local Outlier Factor is a density-based clustering algorithm proposed by Markus M. Breuningin 2000, that detects anomalies by calculating the local density deviation of a specific item with respect to its neighbours. It assumes that the density around an anomaly should be significantly different from the density around its neighbours. Moreover, the outliers should have lower density.
  3. Autoencoder is an unsupervised model composed of two neural networks, an encoder and a decoder. During training, only normal data is passed to the model. In this way, it learns the compressed representation of normal data, which is supposed to be different from the representation of outliers. There is also the assumption that anomalous data shouldn’t be reconstructed well by the model since it’s completely different from normal data and, then, it should have a higher reconstruction error.

How can I Evaluate an Anomaly Detection Model in an Unsupervised Setting?

In an unsupervised setting, there are no evaluation metrics that can help you to understand the rate of correct positive predictions (precision) or the rate of the actual positives (recall).

Without any possibility of evaluating the performance of the model, it’s more important than ever to provide an explanation of model predictions. This can be achieved by using interpretability approaches, like SHAP and LIME.

There are two possible interpretations: global and local. The aim of global interpretability is to provide explanations of the model as a whole, while the local interpretability aims at explaining the model prediction of a single instance.

Final Thoughts

I hope you found useful this fast overview of anomaly detection techniques. As you have noticed, it’s a challenging problem to solve and the suitable technique changes depending on the context. I also should highlight that it’s important to make some explorative analysis before applying any anomaly detection model, like PCA to visualize the data in a lower dimensional space and boxplots. If you want to go deeper, check the resources below. Thanks for reading! Have a nice day!

Resources

  • Anomaly Detection: A Survey by V. Chandola
  • Isolation Forest’s paper
  • Paper Review: Reconstruction by inpainting for visual anomaly detection
  • SHAP’s paper
  • LIME’s paper

Eugenia Anello is currently a research fellow at the Department of Information Engineering of the University of Padova, Italy. Her research project is focused on Continual Learning combined with Anomaly Detection.

More On This Topic

  • Automated Anomaly Detection Using PyCaret
  • How to use Machine Learning for Anomaly Detection and Conditional…
  • A Beginner's Guide to Q Learning
  • A Beginner's Guide to the CLIP Model
  • A Beginner's Guide to End to End Machine Learning
  • A Beginner’s Guide to Web Scraping Using Python