Meta’s Ray-Ban smart glasses can now record up to 3 times longer videos

Meta Ray-Ban Smart Glasses

Your Ray-Ban Meta Smart Glasses are no longer limited to brief one-minute video recordings. Rolling out to all owners of the $299 smart glasses, version 6.0 of the software will let you record a video for as long as three minutes, as first reported by Digital Trends.

Outfitted with a 12-megapixel ultra-wide camera and five microphones, the smart wearable Ray-Ban glasses provide a hands-free, voice-activated way to shoot videos in 1080p at a resolution of 1,440 x 1,920. With the one-minute cap, your options for recording extended scenes were limited. Now, you'll be able to capture more lengthy videos suitable for sharing on Facebook, Instagram, TikTok, and other social media platforms.

Also: Why Meta's Ray-Ban Smart Glasses are my favorite tech purchase this year

There are a couple of hurdles if you do decide to record a lot of three-minute videos, according to Digital Trends. First, the default recording time is still set as one minute. If you want more time, you'll have to adjust it via the camera section in settings. Second, three-minute videos are going to chew up more of the battery, so you may have to pop your glasses in the charging cases more frequently.

There's more in the latest software version. As promised in May, Amazon Music has joined Apple Music and Spotify as one of the integrated streaming audio services that you can listen to on the Ray-Bans. To rev up some music, you can either tap and hold on the stem of the glasses or request a specific service through your voice, as in "Hey Meta, play Amazon Music."

When you need some rest and relaxation, you can now turn to the Calm app for guided meditation and mindfulness sessions. If you don't already subscribe to Calm, you can unlock a free three-month subscription to get started. To trigger Calm in hands-free mode, just say: "Hey Meta, play the daily Calm."

Meta's smart glasses will install version 6.0 automatically if you've enabled the auto-update option. Otherwise, you'll need to manually download and install the new version. To do that, go to the Meta View app on your glasses and tap the Settings icon. Tap "Your glasses" or select the device you want to manage and then tap "Your glasses." Tap Updates to grab the latest version.

Featured

Why is Databricks Betting on Data Engineering Over AI Magic?

In response to a significant demand from its customers, Databricks is intensifying its efforts in data engineering. According to CEO and co-founder Ali Ghodsi, the company initially perceived AI as the primary area of interest, however, customer feedback guided them to prioritise data integration, leading to the acquisition of Arcion and its subsequent integration into Databricks.

“Two years ago, at the CIO Forum, we asked our customers what they wanted most from Databricks, and the majority expressed a need for easier data integration,” Ghodsi said in an exclusive conversation with AIM.

“Now, customers can seamlessly integrate data from sources like Salesforce, Workday, Google Analytics, SQL Server, MySQL, and Postgres into Databricks. This strategic move aligns with our customer’s needs and has the potential to significantly impact our financial performance,” he explained.

Databricks AI summit is showing me how fast data engineering is changing.
Platforms for monitoring, compute, storage, etc will speed the operational side up!
LLMs and generative AI will speed up the development side!
Exciting times to be alive! #dataengineering pic.twitter.com/PVnDNTjLRX

— Zach Wilson (@EcZachly) June 27, 2023

Speaking to AIM, Nick Eayrs, vice president – field engineering APJ, Databrick, explained that the emphasis on data engineering over AI is essential for building a solid data foundation necessary for effective AI implementation.

He highlighted a collaborative approach involving data-literate C-suite executes who work closely with data engineers to source and enrich data.

“We need more enriched data to kind of answer the problem space, there’ll be able to then give that on to analysts in the same sort of environment to then go and explore and visualise and, you know, double click into the data to see if they can find something interesting in the data, and then kind of resurface that back to the C level very, very seamlessly,” he pointed out.

Eayrs described a comprehensive process where analysts explore and visualise data to uncover valuable patterns. These insights are then communicated back to the executives, fostering a team-oriented approach.

Advanced data platforms facilitate real-time collaboration and sharing, ensuring that the entire organisation can leverage data effectively.

Databricks LakeFlow

Databricks LakeFlow is a new solution designed to unify and streamline data engineering from ingestion through transformation and orchestration. With LakeFlow, data teams can efficiently ingest data from various databases.

Introducing Databricks LakeFlow, a new solution that simplifies all aspects of data engineering through a simple, unified & intelligent experience. Easily build production-grade data pipelines to scale to the ever-growing demand for reliable #data & #AI. https://t.co/8B2P067gW0 pic.twitter.com/EinhCzhEE3

— Databricks (@databricks) June 13, 2024

LakeFlow automates the deployment, operation, and monitoring of pipelines at scale, featuring built-in CI/CD support and advanced workflows with capabilities such as triggering, branching, and conditional execution.

It incorporates data quality checks and health monitoring, integrated with alerting systems like PagerDuty. LakeFlow simplifies the construction and management of production-grade data pipelines, empowering data teams to meet the increasing demand for reliable data and AI solutions.

Source: X

Convergence of AI and Data Engineering

Data engineering ensures that data is clean, complete, and reliable, as AI models rely heavily on high-quality and accurate data to function correctly.

A year ago, there was a discussion about when AI would be able to make sense of and take ownership over the mountains of SQL data that engineers and analysts have been accumulating for years.

Source: X

Maxime Beauchemin, the CEO & founder of Preset and a pioneering data engineer who created Apache Airflow and Apache Superset, humorously commented on a concept from both AI and software development: “But what happens when AI can create spaghetti SQL faster than all of us!? That’s when we reach the spaghetti SQL singularity. Infinite happiness ensues.”

At the Data + AI Summit, Databricks announced several innovations for the Mosaic AI platform to help customers build production-quality generative AI applications. The focus is on supporting compound AI systems, improving model quality, and introducing new AI governance tools.

Further, it introduced Shutterstock ImageAI, an AI tool for advanced image analysis that integrates seamlessly into business workflows.
Additionally, Databricks unveiled Databricks AI/BI, an intelligent analytics platform featuring AI-powered dashboards and a conversational interface, Genie, for natural language queries. This platform aims to make data analytics accessible to all organisational levels without needing specialised knowledge.

The post Why is Databricks Betting on Data Engineering Over AI Magic? appeared first on AIM.

How Does NetApp Use Half of the World’s Data in AI?

As industry stalwarts like Matei Zaharia indicate that “the next promise in AI is in domain-specific enterprise use cases,” NetApp, a data management and storage provider, finds itself well-positioned to capitalise on this trend.

At the recent NVIDIA GTC event, CEO Jensen Huang highlighted NetApp’s role, stating “nearly half of the world’s files are stored on-premises on NetApp’s platform”.

With 76% of global tech companies and 70% of Indian enterprises having AI initiatives underway, outpacing the 49% global average, NetApp’s storage footprint could prove invaluable.

Notably, 91% of Indian companies plan to leverage over half their data to train AI models by 2024. NetApp is banking on this pace in the Indian market. However, its leadership isn’t surprised.

“India has always been strong on the tech side,” said Puneet Gupta, NetApp MD for India and SAARC, in an exclusive interaction with AIM.

He pointed out that India’s vast datasets, such as those generated by initiatives like UPI and Aadhaar, are crucial assets. He also said that India has the ingredients for AI success.

India, NetApp’s fourth-largest APAC market, has the potential to become the largest in the next few years, according to Gupta.

(Source: Polygon.io)

Globally, with a $26.48 billion market cap and Q4 FY2024 revenue of $1.668 billion, up 5.1% year-over-year, NetApp is poised for growth driven by all-flash arrays, cloud services, partnerships, and enterprise AI adoption, if executed well.

Partnering With NVIDIA to Be Enterprise’s Go-to Choice

NetApp has partnered with NVIDIA to advance RAG for generative AI applications in enterprise. It allows integration of NVIDIA’s NeMo Retriever with NetApp’s ONTAP storage, enabling LLMs to securely access the vast amounts of enterprise’s data stored on NetApp without compromising privacy and security.

Enterprises can leverage their existing data assets on NetApp to “talk to their data” through simple prompts and gain insights for generative AI, without the need to move data.

“The work that NVIDIA is doing in the AI space is remarkable,” Gupta said. “While they focus on building compute and server farms, we provide high-performance storage to make the whole solution work effectively,” he added

Moreover, the partnership complements NetApp’s existing AI services, which have been used by over 500 joint customers for AI model training and inference.

NetApp has also worked with NVIDIA to update its FlexPod AI converged infrastructure to support NVIDIA AI Enterprise software and was one of the first partners to complete storage validation for NVIDIA OVX systems.

This collaboration is significant because it allows enterprises to safely use their proprietary data within LLMs without the risk of data leaks or privacy concerns. It reduces friction, cost, and time to value for RAG by enabling access to data wherever it is stored.

Gupta also elaborated on the partnership with Cisco, particularly in the Indian context.

“We have a strong partnership with Cisco globally and in India. They work with NVIDIA as well, and we fit into the ecosystem by providing the necessary high-performing storage to complete the AI story,” Gupta explained.

Unified Data Management

Another critical challenge in data management has been the proliferation of disparate storage systems for different data types, such as blocks, files, and objects. NetApp addresses this issue by unifying all data services on a single platform, simplifying storage infrastructure and enhancing efficiency.

Shuja Mirza, NetApp’s India/SAARC director of solutions engineering, explained, “NetApp is doing its bit by ensuring that you can run all these data services from a single platform. So unifying all of it—objects, files, blocks, structured, unstructured—you know, the ability to store data and data lakes, put it to good use through modern workloads like Spark, Hadoop, etc.”

This consolidation supports modern workloads like Spark and Hadoop, which are vital for handling large-scale unstructured data.

Mirza emphasised NetApp’s commitment to providing customers with choice and flexibility, stating, “Our job as a technology provider is to make sure that we provide you with that choice. And, whichever format you’re getting data in, we’ll help you store and manage it efficiently, and make sure it is available to you all the time.”

Features like NetApp FlexClones allow data scientists to instantly create writable copies of datasets for experimentation without consuming additional storage.

NetApp also enables data governance frameworks to manage privacy, compliance, and access controls, which is vital when dealing with sensitive data used for training LLMs.

More Efficient Data Centers

With India’s data centre industry booming, to accommodate AI needs, including investments in GPUs and larger racks, Gupta sees significant opportunities for NetApp.

“The infrastructure needed to support the generative AI wave includes both large-scale cloud provider setups and enterprise-specific builds. NetApp aims to participate in both segments, leveraging our expertise to manage data efficiently across various platforms,” he stated.

NetApp could also provide storage efficiencies and a secure platform, including ransomware recovery guarantees.

“AI, especially generative AI, is all about having large datasets and creating copies of these data sets,” Mirza said. Adding that, “NetApp’s solutions help reduce these copies through snapshots, thus providing storage efficiencies and a secure platform.”

Mirza elaborated, “Typically, in these projects, the data scientists create environments. Each environment has got its own copy. So the volumes are large. With NetApp FlexClones it becomes easy and efficient by storing those copies through snapshots.”

This approach not only ensures efficiency but also secures data management, which is critical in AI projects. “Storing data is the tip of the iceberg. But if we were not to really do a good job of managing and securing it, then there is always a chance of breach and influence,” he asserted.

And even in case of a breach NetApp provides a 99.999999 guarantee with a recovery time of a few seconds.

The post How Does NetApp Use Half of the World’s Data in AI? appeared first on AIM.

Go to University from Home with These Online Degrees

Online Data Science Degrees
Image by Author

Growing up I never thought there would be a time when people could get degrees from home. I heard about Open University and other similar platforms, but now I am seeing it become more and more available.

Getting a degree in education in this day and age is hard. People have to sacrifice a lot. Whether it’s finances or finding somebody to look after their loved ones such as children. Online degrees are an amazing alternative where you can continue your learning journey in the comfort of your own home.

So for those of you who are holding back from getting into the tech industry because you cannot make those sacrifices — I’m here to help!

BSc Computer Science

Link: BSc Computer Science

The BSc Computer Science course offered by the University of London and Goldsmiths University is 100% online learning, where you can be hands-on learning from anywhere around the world. It is an accredited Bachelor's degree, offering you flexibility with a study schedule that matches your commitments. The Bachelor's degree consists of 23 courses, which will take you 14-28 hours per week. This is 36 — 72 months altogether.

In this degree, you will:

  • Perfect your choice of programming languages such as Python, C++, C#, Java Script.
  • Build your knowledge and skills in a practical, project-based learning environment where you’ll get to develop your own software
  • Choose a specialism to match your career needs, from machine learning and AI, data science, web and mobile development, UX and more
  • Specialise in 1 of 7 cutting-edge topics: ML and AI, data science, web and mobile development, physical computing and IoT, game development, VR, or UX.
  • Create a portfolio of practical research and applications that can be used to demonstrate your expertise and communicate your worth to employers and investors

BSc Data Science & Artificial Intelligence

Link: BSc Data Science & Artificial Intelligence

The BSc Data Science & Artificial Intelligence course offered by the Indian Institute of Technology Guwahati is a recognised Bachelor's Honours degree and is 100% online allowing you to get hands-on learning from anywhere, including online classes & examinations, as well as optional campus immersions. If you commit 18-20 hours per week, the degree can be completed in 4 to 8 years.

8 years sounds a bit terrifying, right? However, this online degree offers flexible exit options at the end of each year, shaping your education in a way that aligns with your career aspirations. You can exit with a Foundational Certificate in Data Science & AI at the end of the first year, a Diploma in Data Science & AI at the end of the second year, receive your Bachelor of Science degree upon completing all the courses in the third year, or earn an honours degree at the end of the fourth year.

In this degree, you will:

  • Prepare yourself for the evolving tech landscape
  • Focus on application-based learning through capstone projects, internships and career-boosting certificates
  • Learn how to tackle real-world problems with the capstone project
  • Expose yourself to over 50 programming languages, tools, libraries and repositories.
  • Develop machine-learning systems and integrate them with large-scale AI models under the supervision of industry leaders.

Wrapping Up

This article aimed to provide hope for those who have been hesitant to take their data science career seriously because of the constraints or fear of having to go back to university.

It is never too late to chase your dreams!

Nisha Arya is a data scientist, freelance technical writer, and an editor and community manager for KDnuggets. She is particularly interested in providing data science career advice or tutorials and theory-based knowledge around data science. Nisha covers a wide range of topics and wishes to explore the different ways artificial intelligence can benefit the longevity of human life. A keen learner, Nisha seeks to broaden her tech knowledge and writing skills, while helping guide others.

More On This Topic

  • Data Science Degrees vs. Courses: The Value Verdict
  • Use These Unique Data Sets to Sharpen Your Data Science Skills
  • Avoid These Mistakes with Time Series Forecasting
  • Avoid These Five Behaviors That Make You Look Like A Data Novice
  • Learn Data Science From These GitHub Repositories
  • Learn Data Engineering From These GitHub Repositories

Air India is Experimenting with GPT-4 Omni

Air India is probably undertaking the biggest corporate restructuring the world has ever seen. Since being acquired by the Tata Group, the troubled carrier has strategically utilised technology to enhance its services and customer experience.

Prior to the takeover, Air India drastically lacked in providing good customer service and experience to its customers compared to its competitors. To change things around, the carrier is turning its attention to AI.

Last year, Air India became the first carrier in the world to deploy an LLM-powered chatbot, called Maharaja (now renamed to AI.G). Initially, the chatbot was powered by OpenAI’s GPT3.5, but Viju Chacko, VP – head of digital architecture at Air India, revealed that the company has shifted to GPT-4 to power its AI assistant.

“We were using GPT3.5, but have now transitioned to GPT-4,” Chacko revealed while speaking at GitHub Galaxy 2024, held in Bengaluru.

However, Satya Ramaswamy, chief digital and technology officer at Air India, recently told AIM that the carrier is now experimenting with GPT-4o.

“We are actively working on integrating GPT-4 Omni, and it will have multimodal capabilities,” he said on the sidelines of the Salesforce World Tour Essentials event in Mumbai.

Multimodal is the future

Air India envisions a scenario where AI agents could play an important role in customer service.

“Currently, customer service agents are often occupied with typing into their systems to pull out customer information. Our goal is to automate these tasks with AI,” said Ramaswamy.

He added that the AI agent will pull out all the information so that the human agents can focus completely on engaging with customers in a genuine and empathetic manner without distractions.

Can AI agents book tickets on your behalf?

Nonetheless, Ramaswamy said that the carrier is taking a measured approach when it comes to leveraging AI. Currently, there are over thousands of use cases, however, booking a flight ticket is not one of them.

“In our mind, traditional booking is not a compelling use case for the chatbot because there are so many nuances to it. The customer may want to customise the booking and add different options. Hence, it’s better if a human handles it,” he said.

However, he did reveal that they have found an innovative way which will allow customers to book a ticket with just one click. This novel way of flight booking could be live within a year.

For instance, if you want to make a reservation, you can simply type in your request, such as “need to travel from Bombay to Delhi on a particular date”.

“Using your past history and preferences, it will present you with a single screen where you can confirm your choice with a click, and it will automatically proceed to book your ticket,” he added.

Bringing AI to Air India’s mobile app

Moreover, Air India is enhancing its mobile application with advanced AI capabilities. The airline intends to integrate computer vision into the app, allowing users to obtain real-time flight status and additional information about their destination, including weather conditions and other relevant details, with just a single scan of their boarding pass.

AI.G is also available through the mobile app. Earlier this year, Air India launched a WhatsApp version of the chatbot.

The chatbot currently understands four languages: Hindi, English, German, and French. According to Ramaswamy, AI.G will be able to understand more languages, especially Indic languages, in the future.

The post Air India is Experimenting with GPT-4 Omni appeared first on AIM.

Why Perplexity AI Wasn’t Built in India

Starting an AI company is not for the faint-hearted. Even though there is an AI boom and everyone is ready to invest in AI companies, the VC ticket sizes everywhere, when compared to the Silicon Valley in the West, are minuscule.

The situation is far worse in India. Here, nobody is ready to invest in research and development startups.

Recently, AIM made a ballpark estimate of the cost of building an AI research startup in India. For the seed stage, a funding of $5 million or $10 million is a decent amount, but comparatively very less to build something like OpenAI, or even Perplexity AI.

To put this in perspective, Perplexity AI had raised $15 million in its seed funding round when it was just a six-month old company. Besides, Perplexity did not spend time training its own AI model, but instead chose to build a fantastic product around the existing LLMs in the market.

Seeing @perplexity_ai ads in Berlin. When ads in India @AravSrinivas ? pic.twitter.com/ewhwu2oMOo

— Arnav Gupta (@championswimmer) June 20, 2024

For building an estimated 7 billion parameter foundation model out of India, it is said that the cost for the compute alone would be close to $2 million. Accounting for all of these, an ideal amount to do a lot of foundational work in AI at the seed stage is anywhere around $7-15 million.

Assuming a seed fund of $10 million, an AI startup in India can last for around two years without raising any more funds and not running inference. But for Perplexity, the whole point is to provide inference to customers. Plus, it has not trained any models of its own.

So, how would Perplexity fare in India had the founder not moved to the Bay Area to start it?

Building an AI Startup in India is No Joke

When speaking with AIM, Vishnu Ramesh, the founder & CEO of Subtl.ai, a company which is coincidentally also building a ‘private Perplexity for enterprise’, said that several times, he has also evaluated the thought of starting his company in Hyderabad, instead of the US.

Born in the Bay Area, Ramesh came back to India at the age of three.

“The deeper question is why does the Bay Area get so much money when startups come out,” said Ramesh. He explains that it is more to do with the fact that Google, Meta, or OpenAI have come out of the same place, which gives investors a sense of confidence when investing in startups born there. “Once India has its Google moment, I think the scene should change,” he added.

CEOs such as Aravind Srinivas of Perplexity AI are an inspiration for Indians building AI. His vision is to build a product which is in direct competition to Google. Graduating from IIT Madras and leaving OpenAI to start one’s own company surely takes a lot of courage.

One of the amazing things @AravSrinivas has done is convince Indian engineers that they too can create consumer technology for the world.
"I want to work at Google" to
"I want to run Google" to
"I want to build Google"
is an incredible vibe shift.

— Deedy (@deedydas) May 8, 2024

The company is one of the latest to become an AI unicorn with its last funding round raising a $63 million round led by Daniel Gross, NVIDIA, and Jeff Bezos. This points to the fact that even though the seed round was not that high, with the right product, the company was able to make a dent in the market.

This brings us back to what Abhishek Upperwal, the CEO and founder of Soket AI Labs, told AIM. He said that funding scenarios and sizes are just enough to manage for AI research within a startup.

“Yes, there are fewer funds that are available as compared to any foreign markets, but I also believe that we can maybe make-do with that particular fund and then ultimately grow in scale after the seed stage,” said Upperwal.

It is Probably the Right Time to Build in India

For comparison, TWO.AI has released a product called Geniya, which can browse data from the internet using Google. Pranav Mistry, the founder of TWO.AI, said that like Perplexity AI, Geniya is still in public beta and users can try it out, but it has not achieved as much traction as Perplexity yet.

Surprisingly, the company has introduced two products, another being SUTRA, with just $20 million in funding.

Similarly, PAiGPT is another product similar to Perplexity AI’s search. The app’s USP is its ability to fetch real-time information on various topics and current affairs, ideal for the UPSC exam preparation. The company is bootstrapped and has invested around $1.2 million funding till date.

There are several new AI startups researching the field and focusing on building foundational models for AI. Perplexity, for instance, partnered with AI providers to integrate capabilities into its own AI services, which is probably what a lot of India AI companies should focus on.
The only thing that lacks for something like Perplexity AI in India is the proof of concept that something can stem out from here to the world, which is slowly but surely changing with Bengaluru becoming the Silicon Valley of the East.

Moreover, it is also tougher for companies in India to garner support from heavyweights like Jeff Bezos and NVIDIA, which Perplexity had. It is high time for investors to realise that these startups can be built within the country, so that the talent remains within India and builds for India.

The post Why Perplexity AI Wasn’t Built in India appeared first on AIM.

Building Your First ETL Pipeline with Bash

Building Your First ETL Pipeline with Bash
Image by Author | Midjourney & Canva

Introduction

ETL, or Extract, Transform, Load, is a necessary data engineering process, which involves extracting data from various sources, converting it into a workable form, and moving it to some destination, such as a database. ETL pipelines automate this process, making sure that data is processed in a consistent and efficient manner, which provides a framework for tasks like data analysis, reporting, and machine learning, and ensures data is clean, reliable, and ready to use.

Bash, short for short for Bourne-Again Shell — aka the Unix shell — is a powerful tool for building ETL pipelines, due to its simplicity, flexibility, and extremely wide applicability, and thus it is an excellent option for novices and seasoned pros alike. Bash scripts can do things like automate tasks, move files around, and talk to other tools on the command line, meaning that it is a good choice for ETL work. Moreover, Bash is ubiquitous on Unix-like systems (Linux, BSD, macOS, etc.), so it is ready to use on most such systems with no extra work on your part.

This article is intended for beginner and practitioner data scientists and data engineers who are looking to build their first ETL pipeline. It assumes a basic understanding of the command line and aims to provide a practical guide to creating an ETL pipeline using Bash.

The goal of this article is to guide readers through the process of building a basic ETL pipeline using Bash. By the end of the article, readers will have a working understanding of implementing an ETL pipeline that extracts data from a source, transforms it, and loads it into a destination database.

Setting Up Your Environment

Before we begin, ensure you have the following:

  • A Unix-based system (Linux or macOS)
  • Bash shell (usually pre-installed on Unix systems)
  • Basic understanding of command-line operations

For our ETL pipeline, we will need these specific command line tools:

  • curl
  • jq
  • awk
  • sed
  • sqlite3

You can install them using your system's package manager. On a Debian-based system, you can use apt-get:

sudo apt-get install curl jq awk sed sqlite3

On macOS, you can use brew:

brew install curl jq awk sed sqlite3  

Let's set up a dedicated directory for our ETL project. Open your terminal and run:

mkdir ~/etl_project  cd ~/etl_project  

This creates a new directory called etl_project and navigates into it.

Extracting Data

Data can come from various sources such as APIs, CSV files, or databases. For this tutorial, we'll demonstrate extracting data from a public API and a CSV file.

Let's use curl to fetch data from a public API. For example, we'll extract data from a mock API that provides sample data.

# Fetching data from a public API  curl -o data.json "https://api.example.com/data"

This command will download the data and save it as data.json.

We can also use curl to download a CSV file from a remote server.

# Downloading a CSV file  curl -o data.csv "https://example.com/data.csv"

This will save the CSV file as data.csv in our working directory.

Transforming Data

Data transformation is necessary to convert raw data into a format suitable for analysis or storage. This may involve parsing JSON, filtering CSV files, or cleaning text data.

jq is a powerful tool for working with JSON data. Let's use it to extract specific fields from our JSON file.

# Parsing and extracting specific fields from JSON  jq '.data[] | {id, name, value}' data.json > transformed_data.json

This command extracts the id, name, and value fields from each entry in the JSON data and saves the result in transformed_data.json.

awk is a versatile tool for processing CSV files. We'll use it to extract specific columns from our CSV file.

# Extracting specific columns from CSV  awk -F, '{print $1, $3}' data.csv > transformed_data.csv

This command extracts the first and third columns from data.csv and saves them in transformed_data.csv.

sed is a stream editor for filtering and transforming text. We can use it to perform text replacements and clean up our data.

# Replacing text in a file  sed 's/old_text/new_text/g' transformed_data.csv

This command replaces occurrences of old_text with new_text in transformed_data.csv.

Loading Data

Common destinations for loading data include databases and files. For this tutorial, we'll use SQLite, a commonly used lightweight database.

First, let's create a new SQLite database and a table to hold our data.

# Creating a new SQLite database and table  sqlite3 etl_database.db "CREATE TABLE data (id INTEGER PRIMARY KEY, name TEXT, value REAL);"

This command creates a database file named etl_database.db and a table named data with three columns.

Next, we'll insert our transformed data into the SQLite database.

# Inserting data into SQLite database  sqlite3 etl_database.db <<EOF  .mode csv  .import transformed_data.csv data  EOF

This block of commands sets the mode to CSV and imports transformed_data.csv into the data table.

We can verify that the data has been inserted correctly by querying the database.

# Querying the database  sqlite3 etl_database.db "SELECT * FROM data;"

This command retrieves all rows from the data table and displays them.

Final Thoughts

We have covered the following steps while building our ETL pipeline with Bash, including:

  1. Environment setup and tool installation
  2. Data extraction from a public API and CSV file with curl
  3. Data transformation using jq, awk, and sed
  4. Data loading in an SQLite database with sqlite3

Bash is a good choice for ETL due to its simplicity, flexibility, automation capabilities, and interoperability with other CLI tools.

For further investigation, think about incorporating error handling, scheduling the pipeline via cron, or learning more advanced Bash concepts. You may also wish to investigate alternative transformation apps and methods to increase your pipeline skillset.

Try out your own ETL projects, putting what you have learned to the test, in more elaborate scenarios. With some luck, the basic concepts here will be a good jumping-off point to more complex data engineering tasks.

Matthew Mayo (@mattmayo13) holds a Master's degree in computer science and a graduate diploma in data mining. As Managing Editor, Matthew aims to make complex data science concepts accessible. His professional interests include natural language processing, machine learning algorithms, and exploring emerging AI. He is driven by a mission to democratize knowledge in the data science community. Matthew has been coding since he was 6 years old.

More On This Topic

  • ETL vs ELT: Which One is Right for Your Data Pipeline?
  • Building a Scalable ETL with SQL + Python
  • Building a Tractable, Feature Engineering Pipeline for Multivariate…
  • Building a Formula 1 Streaming Data Pipeline With Kafka and Risingwave
  • Building and Training Your First Neural Network with TensorFlow and Keras
  • Step-by-Step Tutorial to Building Your First Machine Learning Model

Synthesia 2.0 reinvents AI video creation for businesses

07-1-click-translations.png

More and more businesses are turning to video content to support internal and external communications, simplify employee onboarding, and raise engagement with instructional content. Artificial intelligence (AI) company Synthesia may have the next tool for the boom.

After the company's April release of its new line of Expressive AI Avatars — which display expressions, body language, and tone of voice based on the script a user uploads — Synthesia is now expanding its offerings. On Monday, the company launched Synthesia 2.0, an end-to-end AI video creation platform for businesses.

Also: This AI video platform will assemble a short for you from start to finish

The platform aims to "reinvent every aspect of the video production and distribution process" with tools to create AI-generated videos at scale, according to the press release.

"It's no longer just about providing a tool for people to make AI video presentations. It's thinking about how we can help these businesses overcome these challenges," Alexandru Voica, Synthesia's head of corporate affairs and policy, told ZDNET.

Here are a few key features of Synthesia 2.0.

Personal AI Avatars

In Synthesia 2.0, users can create their avatars in two ways: by visiting a studio and shooting footage of themselves with HD cameras or by shooting footage at home on their phone or webcam. The new avatars will feature improved lip synchronization and natural-sounding voices. The new avatars will also let users replicate their voices in more than 30 languages.

AI Video Assistant

Synthesia's AI Video Assistant can already help with script writing. Users select a template, write their prompt, and upload any documents they have for context. They can also specify preferred tone of voice, length, and audience. AI Video Assistant will then create a draft of a video.

Starting next month in 2.0, AI Video Assistant will build brand elements like fonts, logos, and colors into each video for continuity across content. The assistant will also be able to bulk-create videos. Users will select their template and connect their knowledge base, such as a set of help center articles, and the assistant will generate a collection of videos.

AI Screen Recorder

This new feature creates seamless videos from screen recordings, automating the editing process with intuitive support. The feature aims to streamline the steps in making an instructional video with screen-recorded content, which normally means bouncing between several applications. In 2.0, "once the recording is done, the video is immediately available for editing, with the voiceover transcribed, perfectly matching the screen capture, and automatic zoom effects to emphasize key actions," the release explains. Users can also add their avatars to the video.

Also: Vyond's video generator adds AI that businesses will love. Try it for yourself

AI-enabled tools like this can save teams time. "Nine out of 10 people can create their first video in less than 10 minutes, without prior experience," according to the press release. AI Screen Recorder will be available in the next few months, though Synthesia did not specify exactly when.

Personalized Video Player

Synthesia is building personalized video-viewing software with more advanced translation capabilities. Currently, Synthesia lets users translate their videos automatically into more than 120 languages; 2.0 upgrades this feature and reduces file management by using the source file to keep other language versions current. If users need to update their original video, Synthesia will automatically update other language versions for consistency, without making multiple files.

Also: The best AI image generators of 2024: Tested and reviewed

Another feature is a video player with interactive and personalized real-time viewing experiences. Starting next month, users can share their videos via Synthesia's player and they will automatically play in the viewer's language.

The company also teased future releases, including the next generation of its AI avatars. Later this year, Avatars will have full bodies and hands they can use to express themselves. "They will be able to have personalities and tell captivating stories by using the full range of body language available to humans," according to the press release.

If you're wondering if using synthetic avatars in your content is effective, one study found them to be as engaging as human presenters — until participants started perceiving them as AI-generated.

Also coming later this year are interactive features in Synthesia's video player, including "clickable hotspots, embedded forms, quizzes, and personalized call-to-actions," the announcement states.

Safety

In keeping with Synthesia's safety and privacy guidelines, the company emphasized continued improvements with the launch of 2.0. According to the press release, the company is receiving its ISO/IEC 42001 certification, "the world's first standard for AI management, providing a structured way to manage risks and opportunities associated with AI, and balancing innovation with governance."

Artificial Intelligence

How to Reduce Hallucinations in LLMs for Reliable Enterprise Use 

Gen AI has been well-received by enterprise decision-makers. Yet, wary of technology pitfalls from past experience, they have expressed serious concern about hallucinations, which are demonstrably false model responses.

Just how challenging is the problem? A recent study found that LLMs may hallucinate between 3-27% of the time, depending on the model. In specific contexts, this may be much worse. Another study found that LLMs provide false legal information between 69-88% of the time – very worrying given the criticality of legal transactions.

When LLMs Lie

What are hallucinations? Large language models (LLMs), such as GPT-4, Llama 3, and Mixtral, can generate rapid, fluent responses to varied user prompts in many scenarios. But some of these are nonsensical, some are untruthful but hard to detect as incorrect, and a few are accurate but not derived from source data, all categories that make it seem that the model is hallucinating. The underlying reasons that drive the LLMs to behave this way include not giving them enough context while training, overfitting them, and data ingestion errors such as wrong encoding.

It is still early days, but undesirable scenarios caused by hallucinations may convince enterprise leaders to pull back on funding gen AI initiatives and make business heads reluctant to pilot or deploy solutions.

To begin with, business users may hesitate to use LLM tools in their daily work when they find that they cannot trust the output, setting up a significant barrier to adoption. Second, even a single failure to detect hallucinations in sensitive use cases, such as health care, can cause serious harm to external stakeholders like patients and significant reputational damage to the organization, negating any ROI.

Guiding Models to Tell the Truth: Why Retrieval Augmented Generation (RAG) helps

Technology leaders, extremely aware of the urgent need to increase the reliability of LLM outputs, are quickly creating effective governance, including automated and human-guided accuracy checks. The strategies being tried include using guardrails with prompts, providing examples of the desired output while querying, and regularly fine-tuning the data sets that train the LLMs.

While constant fine-tuning of massive data sets requires significant resources, the other two approaches are not structured enough to guarantee reliability. This brings us to a promising fourth route – retrieval augmented generation (RAG). RAG leverages the robust self-learning mechanism of an LLM while focusing it on a limited set of pre-approved and up-to-date information sources. For instance, if an internal user not in the finance team wants to know the company’s latest turnover, a model not restricted by RAG may pick up these numbers from external websites of low credibility. However, an RAG-restricted model can be instructed to get the numbers from the latest internal financial updates, ensuring accuracy.

Hallucination Detection to improve the Reliability of RAG-restricted LLM Outputs

Restricting the LLM’s behavior with RAG can boost the reliability of its output and reduce hallucinations, but it does not completely eliminate them. Consider a marketing team using a tailored RAG-led LLM to scour the web for campaign ideas. The LLM may come up with something from a successful competitor campaign, not understanding that while it has to look at what competitors are doing, it cannot use the information for ideas.

Data scientists are fast building strategies to avert such disasters. To spot hallucinations in the Black Box LLMs used mainly by enterprises today, SelfCheckGPT, a recent research paper on hallucination detection, offers three recommendations – the BERT Score that uses semantic similarity, the prompt method that uses another LLM and its understanding of language to evaluate consistency, and evidence-based evaluation leveraging natural language inference (NLI).

To test which of these is the most effective within an RAG set-up, we built an RAG system based on the Llama 2-13B-chat model using a corpus of financial reports, and relevant questions, and applied the three approaches to evaluate the responses. By helping to generate hallucination-free output 88.63% of the time with optimal resource utilization, the NLI-led method knocked down the competition.

But is this enough, given that gen AI will soon see adoption in high-stakes situations where people’s lives or millions of dollars are on the line? To spot hallucinations to a finer degree by identifying them within responses, we recommend the integrated gradient approach that uses a baseline to detect hallucinations with up to 99% confidence. In the marketing use case, for example, the integrated gradient method will unerringly pick out parts of the responses that do not tally with the company’s brand and style guidelines.

Ready for Enterprise-level Adoption

The combined RAG, NLI, and integrated gradient methodology give enterprises a winning strategy for gen AI adoption. Users can confidently isolate problematic responses, increasing their trust in model output and making them amenable to use the technology frequently. While competitors struggle to tame pilot projects, IT teams that consistently generate high-quality output using this three-pronged method can rapidly scale LLMs enterprise-wide. Generative AI can be extended to more use cases and complex workflows, empowering employees with new insights, increasing ROI, and cementing competitive advantage.

The post How to Reduce Hallucinations in LLMs for Reliable Enterprise Use appeared first on AIM.

Genesys Launches India’s First AI-Powered Navigation Maps

Mumbai-based mapping company Genesys International has announced the launch of India’s first AI-powered navigation maps tailored specifically for the automotive and mobility sectors.

This innovative solution marks a pivotal moment in the evolution of the Indian automobile industry, introducing a new era of personalised driving experiences and sophisticated location intelligence.

The newly unveiled map by Genesys encompasses the largest navigable road network in India, spanning an impressive 8.3 million kilometers and including over 30 million points of interest (POIs).

This extensive coverage ensures that drivers across the country can access precise and dependable navigation, significantly enhancing their overall driving experience.

Alongside the AI-powered navigation map, Genesys has introduced five products aimed at revolutionising the automotive and mobility industries.

Navigation with Augmented Reality (AR) integrates real-time data from the vehicle’s cameras. AR provides intuitive overlays to guide drivers. Navigation with GPT AI Solution offers intelligent route planning. It provides real-time adjustments and voice assistance.

Advanced Driver Assistance Systems (ADAS) ensures compliance with Euro NCAP safety standards. ADAS includes Intelligent Speed Assistance (ISA). An Online Marketplace is integrated into the car’s app. It allows for convenient in-vehicle purchases. Usage-Based Insurance (UBI) tracks driving behavior. UBI offers lower premiums for safer drivers.

Sajid Malik, Chairman and Managing Director of Genesys International, expressed the company’s vision, stating, “With the launch of India’s first AI-powered navigation map by Genesys, we are set to completely revamp the way the Indian geospatial sector operates.

Features like ISA and ADAS set new benchmarks for safety and convenience on Indian roads, alerting drivers to speed limits, recognising traffic signs, assisting with lane-keeping, and offering adaptive cruise control.”

Malik further emphasised the importance of Genesys’ 3D Digital Twin technology, which allows the production of high-definition maps detailing every aspect of the road environment. This level of detail is crucial for the safe operation of autonomous vehicles, addressing challenges such as poor visibility, urban canyons, and GPS-deprived zones.

Looking ahead, Genesys sees a future where their maps will play a critical role in the transition to autonomous and electric vehicles.

The Indian automotive market, valued at $108.10 billion in 2022, is projected to reach $217.90 billion by 2031, growing at a compound annual growth rate of 8.1%.

With its cutting-edge AI-powered navigation maps and innovative product offerings, Genesys International is well-positioned to capitalise on this growth and revolutionise the way people navigate and interact with their vehicles in India.

The post Genesys Launches India’s First AI-Powered Navigation Maps appeared first on AIM.