Hailo lands $120 million to keep battling Nvidia as most AI chip startups struggle

Hailo lands $120 million to keep battling Nvidia as most AI chip startups struggle Kyle Wiggers 9 hours

The funding climate for AI chip startups, once as sunny as a mid-July day, is beginning to cloud over as Nvidia asserts its dominance.

According to a recent report, U.S. chip firms raised just $881 million from January 2023 to September 2023 — down from $1.79 billion in the first three quarters of 2022. AI chip company Mythic ran out of cash in 2022 and was nearly forced to halt operations, while Graphcore, a once-well-capitalized rival, now faces mounting losses.

But one startup appears to have found success in the ultra-competitive — and increasingly crowded — AI chip space.

Hailo, co-founded in 2017 by Orr Danon and Avi Baum, previously CTO for wireless connectivity at the microprocessor outfit Texas Instruments, designs specialized chips to run AI workloads on edge devices. Hailo’s chips execute AI tasks with lower memory usage and power consumption than a typical processor, making them a strong candidate for compact, offline and battery-powered devices such as cars, smart cameras and robotics.

“I co-founded Hailo with the mission to make high-performance AI available at scale outside the realm of data centers,” Danon told TechCrunch. “Our processors are used for tasks such as object detection, semantic segmentation and so on, as well as for AI-powered image and video enhancement. More recently, they’ve been used to run large language models (LLMs) on edge devices including personal computers, infotainment electronic control units and more.”

Many AI chip startups have yet to land one major contract, let alone dozens or hundreds. But Hailo has over 300 customers today, Danon claims, in industries such as automotive, security, retail, industrial automation, medical devices and defense.

In a bet on Hailo’s future prospects, a cohort of financial backers including Israeli businessman Alfred Akirov, automotive importer Delek Motors and the VC platform OurCrowd invested $120 million in Hailo this week, an extension to the company’s Series C. Danon said that the new capital will “enable Hailo to leverage all opportunities in the pipeline” while “setting the stage for long-term growth.”

“We’re strategically positioned to bring AI to edge devices in ways that will significantly expand the reach and impact of this remarkable new technology,” Danon said.

Now, you might be wondering, does a startup like Hailo really stand a chance against chip giants like Nvidia, and to a lesser extent Arm, Intel and AMD? One expert, Christos Kozyrakis, Stanford professor of electrical engineering and computer science, thinks so — he believes accelerator chips like Hailo’s will become “absolutely necessary” as AI proliferates.

“The energy efficiency gap between CPUs and accelerators is too large to ignore,” Kozyrakis told TechCrunch. “You use the accelerators for efficiency with key tasks (e.g., AI) and have a processor or two on the side for programmability.”

Kozyrakis does see longevity presenting a challenge to Hailo’s leadership — for example, if the AI model architectures its chips are designed to run efficiently fall out of vogue. Software support, too, could be an issue, Kozyrakis says, if a critical mass of developers aren’t willing to learn to use the tooling built around Hailo’s chips.

“Most of the challenges where it concerns custom chips are in the software ecosystem,” Kozyrakis said. “This is where Nvidia, for instance, has a huge advantage over other companies in AI, as they’ve been investing in software for their architectures for 15-plus years.”

But, with $340 million in the bank and a workforce numbering around 250, Danon’s feeling confident about Hailo’s path forward — at least in the short term. He sees the startup’s technology addressing many of the challenges companies encounter with cloud-based AI inference, particularly latency, cost and scalability.

“Traditional AI models rely on cloud-based infrastructure, often suffering from latency issues and other challenges,” Danon said. “They’re incapable of real-time insights and alerts, and their dependency on networks jeopardizes reliability and integration with the cloud, which poses data privacy concerns. Hailo is addressing these challenges by offering solutions that operate independently of the cloud, thus making them able to handle much higher amounts of AI processing.”

Curious for Danon’s perspective, I asked about generative AI and its heavy dependence on the cloud and remote data centers. Surely, Hailo sees the current top-down, cloud-centric model (e.g OpenAI’s modus operandi) is an existential threat?

Danon said that, on the contrary, generative AI is driving new demand for Hailo’s hardware.

“In recent years, we’ve seen a surge in demand for edge AI applications in most industries ranging from airport security to food packaging,” he said. “The new surge in generative AI is further boosting this demand, as we’re seeing requests to process LLMs locally by customers not only in the compute and automotive industries, but also in industrial automation, security and others.”

How about that.

Data Science Hiring Process at Epsilon

In September last year, global advertising and marketing tech firm Epsilon introduced Epsilon AI Audiences, a new offering to identify potential consumers ready for purchases or donations. Unlike traditional marketing, it offers three smart solutions for personalised, people-centred marketing.

“At Epsilon, we focus on how consumers see the impact of their interactions with brands. We believe that combining predictive AI with generative AI can yield the best outcomes for both brands and consumers,” Lakshmana Gnanapragasam, senior vice president of strategy and analytics at Epsilon, told AIM in a recent conversation.

Currently, the Texas-based company is working on various use cases for generative AI, including real-time summarisation of marketing and loyalty campaign performance with actionable recommendations. It facilitates natural language-based ad-hoc queries on customer, product, store, and campaign data, generating audience lists for media activation based on marketing objectives and budget considerations.

It’s also developing hyper-personalised creative assets for marketing and media campaigns.

Epsilon also operates an Analytics Center of Excellence (A-CoE) in India, focusing on associate learning, development, and career progression. Gnanapragasam further shared with us the company’s data science hiring process and more.

The company has multiple open positions across all levels, from senior data scientists to managerial roles, at its Bengaluru office.

Inside Epsilon’s Data Science Team

“The primary problem addressed using data science revolves around helping brands understand and effectively engage with their customers throughout their lifecycle,” Gnanapragasam added.

The engineering team consists of over 2,000 associates, and the data science team has over 400 associates globally. Engineering teams are organised according to the platforms they are working on, such as technology, media, and data platforms. On the other hand, data science teams are structured by business units and further divided into product analytics and client analytics teams to embed AI/ML features and demonstrate platform value to clients, respectively.

Leveraging extensive consumer data covering demographics, lifestyle, purchases, media consumption, and online behaviour, Epsilon helps brands in various aspects. These include, customer acquisition, retention, reactivation, cross-selling, upselling, product recommendations, real-time engagement, media planning, execution, measurement, attribution, and loyalty.

Implementing AI and ML enhance Epsilon’s operations, offering solutions like people-based identity verification, behaviour and opportunity segmentations, next-best-action and product recommendation algorithms, and real-time personalisation in digital media solutions.

Tech Stack

Gnanapragasam detailed the company’s use of cloud-based SaaS applications for their customer data platforms, operating on cloud platforms such as AWS, Azure, and GCP, and incorporating Databricks and SageMaker applications.

The technology stack encompasses Python, Spark, PySpark, R, SQL, and SAS, tailored to specific use cases, platforms, and client technical requirements.

They leverage diverse technological capabilities, including Apache Spark for distributed data processing, Amazon S3 for scalable storage, Python, PySpark, and SQL for data analysis. Besides, there’s Sklearn and MLlib for model development, TensorFlow, Keras, and PyTorch for deep learning, Databricks and SageMaker for comprehensive platform support, and visualisation tools like Tableau and Power BI for reporting.

Additionally, transformer-based open-source models from Hugging Face are employed for generative AI applications alongside open-source frameworks such as LangChain and LlamaIndex.

Hiring Process

The prospective candidates undergo a series of assessments before being hired. Initially, they attend an HR round to qualify for the job opening. Following this, all candidates take a proctored technical test and participate in one or two in-depth technical interviews to evaluate their technical knowledge, problem-solving abilities, and practical skills.

After successfully completing these stages, candidates undergo a hiring manager interview, followed by the HR round. This final phase is crucial for assessing alignment with company values and cultural fit. It provides a comprehensive evaluation beyond skills and qualifications to determine how well candidates would integrate into the team.

For mid- to senior-level hires, the process may include solving an analytical challenge through a coding exercise, extending the evaluation further.

“The primary need for entry-level positions is proficiency in Python. We require experience with PySpark SQL, SAS, and machine learning for mid-level positions.

For senior-level candidates, we look for technical depth across multiple ML platforms and strong product or client orientation, depending on the team they join,” Gnanapragasam added.

However, he noted that candidates should thoroughly research the company and the job requirements when interviewing for a data science role at Epsilon.

To improve their chances of getting hired, candidates should ensure they have a solid understanding of Epsilon, its products, clients, and the nature of the work. They should tailor their resumes to highlight experience with customer data and improve marketing outcomes.

Expectations

When candidates join the data science team, they can anticipate working on enhancing platform features using analytics and machine learning or optimising marketing and loyalty outcomes through data analysis.

“As I said earlier, our rich consumer data assets, combined with the client’s first-party data in a secured, privacy-safe environment, provide one of the best playgrounds for data scientists to continually learn and improve their analytics skills,” he added.

The company facilitates work by providing comprehensive data analytics toolkits and accelerators on its tech platforms. Additionally, the platforms support campaign measurement and attribution, enabling data scientists to evaluate the effectiveness of their recommendations.

“We expect associates to be curious and apply critical thinking skills when working on our platforms or client solutions. As an organisation, we believe good ideas and great execution can come from any place, and capability has nothing to do with age,” commented Gnanapragasam.

Work Culture

“We are a people-first organisation. We value employee wellbeing, which is the centre of all our policies,” Gnanapragasam added.

Through programs like EPIC, the company fosters empathy and collaboration, both internally and within the communities it serves. Employees enjoy inclusive insurance policies and access to wellness platforms like HealthifyMe and Headspace. Learning and development initiatives are personalised to address individual needs and cover a wide range of topics, including behavioural training and diversity.

Additionally, a hybrid work model promotes teamwork and collaboration, with most associates in the office two days a week and flexible remote work options on the remaining days.

Check out the careers page now.

Read more: Data Science Hiring Process at Confluent

The post Data Science Hiring Process at Epsilon appeared first on Analytics India Magazine.

A Beginner’s Guide to the Top 10 Machine Learning Algorithms

A Beginner's Guide to the Top 10 Machine Learning Algorithms
Image by Author

One of the fields that underpins data science is machine learning. So, if you want to get into data science, understanding machine learning is one of the first steps you need to take.

But where do you start? You start by understanding the difference between the two main types of machine learning algorithms. Only after that, we can talk about individual algorithms that should be on your priority list to learn as a beginner.

Supervised vs. Unsupervised Machine Learning

The main distinction between the algorithms is based on how they learn.

A Beginner's Guide to the Top 10 Machine Learning Algorithms
Image by Author

Supervised learning algorithms are trained on a labeled dataset. This dataset serves as a supervision (hence the name) for learning because some data it contains is already labeled as a correct answer. Based on this input, the algorithm can learn and apply that learning to the rest of the data.

On the other hand, unsupervised learning algorithms learn on an unlabeled dataset, meaning they engage in finding patterns in data without humans giving directions.

You can read more in detail about machine learning algorithms and types of learning.

There are also some other types of machine learning, but not for beginners.

Machine Learning Tasks

Algorithms are employed to solve two main distinct problems within each type of machine learning.

Again, there are some more tasks, but they are not for beginners.

A Beginner's Guide to the Top 10 Machine Learning Algorithms
Image by Author

Supervised Learning Tasks

Regression is the task of predicting a numerical value, called continuous outcome variable or dependent variable. The prediction is based on the predictor variable(s) or independent variable(s).

Think about predicting oil prices or air temperature.

Classification is used to predict the category (class) of the input data. The outcome variable here is categorical or discrete.

Think about predicting if the mail is spam or not spam or if the patient will get a certain disease or not.

Unsupervised Learning Tasks

Clustering means dividing data into subsets or clusters. The goal is to group data as naturally as possible. This means that data points within the same cluster are more similar to each other than to data points from other clusters.

Dimensionality reduction refers to reducing the number of input variables in a dataset. It basically means reducing the dataset to very few variables while still capturing its essence.

Overview of the 10 Machine Learning Algorithms

Here’s an overview of the algorithms I’ll cover.

A Beginner's Guide to the Top 10 Machine Learning Algorithms
Image by Author

Supervised Learning Algorithms

When choosing the algorithm for your problem, it’s important to know what task the algorithm is used for.

As a data scientist, you’ll probably apply these algorithms in Python using the scikit-learn library. Although it does (almost) everything for you, it’s advisable that you know at least the general principles of each algorithm’s inner workings.

Finally, after the algorithm is trained, you should evaluate how well it performs. For that, each algorithm has some standard metrics.

1. Linear Regression

Used For: Regression

Description: Linear regression draws a straight line called a regression line between the variables. This line goes approximately through the middle of the data points, thus minimizing the estimation error. It shows the predicted value of the dependent variable based on the value of the independent variables.

Evaluation Metrics:

  • Mean Squared Error (MSE): Represents the average of the squared error, the error being the difference between actual and predicted values. The lower the value, the better the algorithm performance.
  • R-Squared: Represents the variance percentage of the dependent variable that can be predicted by the independent variable. For this measure, you should strive to get to 1 as close as possible.

2. Logistic Regression

Used For: Classification

Description: It uses a logistic function to translate the data values to a binary category, i.e., 0 or 1. This is done using the threshold, usually set at 0.5. The binary outcome makes this algorithm perfect for predicting binary outcomes, such as YES/NO, TRUE/FALSE, or 0/1.

Evaluation Metrics:

  • Accuracy: The ratio between correct and total predictions. The closer to 1, the better.
  • Precision: The measure of model accuracy in positive predictions; shown as the ratio between correct positive predictions and total expected positive outcomes. The closer to 1, the better.
  • Recall: It, too, measures the model’s accuracy in positive predictions. It is expressed as a ratio between correct positive predictions and total observations made in the class. Read more about these metrics here.
  • F1 Score: The harmonic mean of the model’s recall and precision. The closer to 1, the better.

3. Decision Trees

Used For: Regression & Classification

Description: Decision trees are algorithms that use the hierarchical or tree structure to predict value or a class. The root node represents the whole dataset, which then branches into decision nodes, branches, and leaves based on the variable values.

Evaluation Metrics:

  • Accuracy, precision, recall, and F1 score -> for classification
  • MSE, R-squared -> for regression

4. Naive Bayes

Used For: Classification

Description: This is a family of classification algorithms that use Bayes’ theorem, meaning they assume the independence between features within a class.

Evaluation Metrics:

  • Accuracy
  • Precision
  • Recall
  • F1 score

5. K-Nearest Neighbors (KNN)

Used For: Regression & Classification

Description: It calculates the distance between the test data and the k-number of the nearest data points from the training data. The test data belongs to a class with a higher number of ‘neighbors’. Regarding the regression, the predicted value is the average of the k chosen training points.

Evaluation Metrics:

  • Accuracy, precision, recall, and F1 score -> for classification
  • MSE, R-squared -> for regression

6. Support Vector Machines (SVM)

Used For: Regression & Classification

Description: This algorithm draws a hyperplane to separate different classes of data. It is positioned at the largest distance from the nearest points of every class. The higher the distance of the data point from the hyperplane, the more it belongs to its class. For regression, the principle is similar: hyperplane maximizes the distance between the predicted and actual values.

Evaluation Metrics:

  • Accuracy, precision, recall, and F1 score -> for classification
  • MSE, R-squared -> for regression

7. Random Forest

Used For: Regression & Classification

Description: The random forest algorithm uses an ensemble of decision trees, which then make a decision forest. The algorithm’s prediction is based on the prediction of many decision trees. Data will be assigned to a class that receives the most votes. For regression, the predicted value is an average of all the trees’ predicted values.

Evaluation Metrics:

  • Accuracy, precision, recall, and F1 score -> for classification
  • MSE, R-squared -> for regression

8. Gradient Boosting

Used For: Regression & Classification

Description: These algorithms use an ensemble of weak models, with each subsequent model recognizing and correcting the previous model's errors. This process is repeated until the error (loss function) is minimized.

Evaluation Metrics:

  • Accuracy, precision, recall, and F1 score -> for classification
  • MSE, R-squared -> for regression

Unsupervised Learning Algorithms

9. K-Means Clustering

Used For: Clustering

Description: The algorithm divides the dataset into k-number clusters, each represented by its centroid or geometric center. Through the iterative process of dividing data into a k-number of clusters, the goal is to minimize the distance between the data points and their cluster’s centroid. On the other hand, it also tries to maximize the distance of these data points from the other clusters’s centroid. Simply put, the data belonging to the same cluster should be as similar as possible and as different as data from other clusters.

Evaluation Metrics:

  • Inertia: The sum of the squared distance of each data point’s distance from the closest cluster centroid. The lower the inertia value, the more compact the cluster.
  • Silhouette Score: It measures the cohesion (data’s similarity within its own cluster) and separation (data’s difference from other clusters) of the clusters. The value of this score ranges from -1 to +1. The higher the value, the more the data is well-matched to its cluster, and the worse it is matched to other clusters.

10. Principal Component Analytics (PCA)

Used For: Dimensionality Reduction

Description: The algorithm reduces the number of variables used by constructing new variables (principal components) while still attempting to maximize the captured variance of the data. In other words, it limits data to its most common components while not losing the essence of the data.

Evaluation Metrics:

  • Explained Variance: The percentage of the variance covered by each principal component.
  • Total Explained Variance: The percentage of the variance covered by all principal components.

Conclusion

Machine learning is an essential part of data science. With these ten algorithms, you’ll cover the most common tasks in machine learning. Of course, this overview gives you only a general idea of how each algorithm works. So, this is just a start.

Now, you need to learn how to implement these algorithms in Python and solve real problems. In that, I recommend using scikit-learn. Not only because it’s a relatively easy-to-use ML library but also because of its extensive materials on ML algorithms.

Nate Rosidi is a data scientist and in product strategy. He's also an adjunct professor teaching analytics, and is the founder of StrataScratch, a platform helping data scientists prepare for their interviews with real interview questions from top companies. Nate writes on the latest trends in the career market, gives interview advice, shares data science projects, and covers everything SQL.

More On This Topic

  • Essential Machine Learning Algorithms: A Beginner's Guide
  • A Beginner's Guide to End to End Machine Learning
  • A Beginner's Guide to Q Learning
  • Learn To Reproduce Papers: Beginner’s Guide
  • A Beginner’s Guide to Web Scraping Using Python
  • Beginner's Guide to Cloud Computing

This Indian Logistics Company Developed an LLM to Enhance Last-Mile Delivery 

Last-mile delivery is particularly tricky in India due to poor infrastructure, population density, vague or incomplete addresses and complex street layouts.

In India, a peculiar challenge arises with 80% of addresses depending on landmarks up to 1.5 kilometres away. This reliance on landmarks makes geolocation difficult for logistics companies, resulting in an average deviation of approximately 500 metres between the given address and the actual doorstep.

To solve this problem, Gurugram-based logistics company Ecom Express developed a solution powered by language models trained on data from nearly 2 billion parcels delivered by the company since its inception in 2012.

Bulls.ai improve delivery quality by up to 60%

Called Bulls.ai, the solution improves operational accuracy by correcting, standardising, and predicting geo-coordinates for addresses across the length and breadth of India. Not just in metros and Tier-1 cities but also into the hinterlands of Tier-2 cities and beyond where the address quality becomes inferior, according to Manjeet Dahiya, head – machine learning & data sciences, Ecom Express Limited.

“Bulls.ai helps in identifying the correct last-mile delivery centre to deliver the shipment based on the consignee address, reduce misroutes, determine junk addresses and correct incomplete addresses/PIN codes to route the shipment correctly.

“It geocodes the address consignee’s location on the map, assisting our field executives,” Amit Choudhary, chief product and technology officer at Ecom Express Limited told AIM.

Bulls.ai can significantly improve the delivery quality by up to 60%, increase operational efficiency, and slash logistics costs by as much as 30%.

“It also helps in misroute reduction from 7% to 2%. Resulting in a reduction of 5% in shipments reaching the correct last-mile centre at the first go,” Dahiya added.

So far, the company has opened the API to its customers to validate their user addresses. “We have first started with our existing customers and have showcased Bulls.ai to them in our customer panel.”

What makes Bulls.ai unique

The solution is powered by three models – 354 million, 773 million and 1.5 billion parameters and has been trained on 8.4 billion tokens representing 80 million addresses and geo-coordinate pairs.

Ecom Express has a nationwide presence, spanning all 28 states of the country. It extends its services to over 2,700+ towns across more than 27,000 PIN codes, effectively reaching over 95% of India’s population. Over the years, the company has accumulated a substantial amount of data through its extensive operations.

“The architecture is built in a decoder-only transformer pattern, specifically GPT2. It has been trained from scratch and the dataset of the historic addresses that we have delivered in the past is the key data. The training approach is distributed data parallel,” Choudhary said.

Currently, there are no similar solutions in the market. What makes Bulls.ai unique is the training dataset, according to the company. Moreover, existing LLMs like GPT models or the LLaMA models are not tailored to address this particular challenge and do not have the capability to output the geo-coordinates of an address.

“This is a domain specific LLM and no such LLM exists. For instance, the domain of GPT4/LLaMA is very different from the domain of address and location data. These models cannot tell the geo-coordinates of addresses. Achieving good results with these models will require fine-tuning with significantly large data, which would effectively be a pre-training,” Dahiya explained.

Choudhary said that his team encountered a few challenges when training the model. “For example, a number of optimizations were needed to improve the training speed and reduce the GPU memory footprint such as 8-bit optimizer, mixed prediction training, and gradient checkpointing.

“This allowed us to train bigger models and faster. During inference, it is quite challenging to use bigger models for real-time predictions as they could be slow. We used pruning of the models to make these bigger models faster at the time of inference,” he said.

Can LLMs solve other last-mile delivery problems?

LLM can also play a pivotal role in addressing many other last-mile delivery challenges inherent in this crucial phase of the supply chain. LLMs can optimise route planning, enhance delivery scheduling, and streamline communication between drivers and customers.

“Apart from address and location intelligence, we see applications in understanding the descriptions of the goods to identify dangerous goods and not route them through the air,” Choudhary said.

Computer vision models are already being used by logistics and supply chain companies to identify dangerous or defective goods, however, multimodal LLMs could potentially do a much better job.

“Understanding of goods is also necessary to figure out the category of the goods. Fraud detection at consignee and seller is another important aspect from the logistics point of view that can be solved through generative AI,” Choudhary added.

The post This Indian Logistics Company Developed an LLM to Enhance Last-Mile Delivery appeared first on Analytics India Magazine.

Researcher from Upwork Releases Nandi, Gemma-based Telugu Model

Researcher from Upwork Releases Nandi, Gemma-based Telugu Model

Another Telugu model is here, it’s called Nandi. Built on top of Zephyr-7b-Gemma, the model boasts 7 billion parameters and is trained on Telugu Q&A data sets curated by Telugu LLM Labs.

Click here to check out the model.

Bharadwaj Swarna, a freelance senior data scientist, got inspired by the pioneering work of Ramsri Goutham Golla and Ravi Theja Desetty of Telugu LLM Labs and created this model.

You can prompt the model in both English and Telugu.

Swarna’s future endeavours include the expansion of datasets for Data Processing Ontology (DPO) in Telugu, a significant area that remains largely unexplored.

Additionally, efforts will be directed towards refining the Tokenizer based on insights gleaned from the video series released by the Telugu LLM Labs team.

There has been a rise in Indic language models coming from the developer community of India. For example, Telugu Llama is a passion project for both Golla and Theja. In February, they introduced Telugu-LLM-Labs, a collaborative independent effort where they released datasets translated and romanised in Telugu.

The company has also announced the release of Navarasa 2.0, a Gemma 7B/2B instruction-tuned model capable of processing content in 15 Indian languages along with English. This latest iteration marks a significant advancement following the recent introduction of fine-tuned Gemma models for nine Indian languages.

The post Researcher from Upwork Releases Nandi, Gemma-based Telugu Model appeared first on Analytics India Magazine.

The Only Interview Prep Course You Need for Deep Learning

The Only Interview Prep Course You Need for Deep Learning
Image by Author

Suppose you are preparing for a data science, machine learning engineer, AI engineer, or a research scientist job. In that case, you should look for great resources to help you ace your interview.

Deep learning is becoming more and more popular, as it forms the foundations of topics such as large language models, and generative AI, as well as combining a lot of different concepts. This is why this interview prep course is probably one of the best things I have seen in a while.

Not only will you get a great foundational and experience knowledge of deep learning, but you will also enhance your data science and machine learning skills. Even if you are not preparing for any interview but you are on a learning journey — I would recommend this interview prep course!

Deep Learning Interview Course

This course consists of 2 parts. The first part, the video will go through the top 50 questions with corresponding answers. In the second part, the video will go through the remaining 50 questions.

100 questions altogether. That's 7.5 hours altogether!

Basic Interview Questions

You will start with the basic questions of deep learning, the concepts of neural networks, the architecture of neural networks, activation functions and gradient descent. These are the first 10 questions, therefore you will go through these quite quickly.

Intermediate Interview Questions

In the next 20 questions, you will dive a bit deeper and be able to define how backpropagation is different from gradient descent and cross-entropy. From there, you will dive a little deeper and test your skills in areas such as Stochastic Gradient Descent and Hessian and how they can be used to speed up the training process.

Expert Interview Questions

The last 20 questions will test your knowledge with topics such as Adam and its use in neural networks, what is layer normalization, residual connections, and how to solve exploding gradients. You will also dive into learning more about dropout and what it is, how it prevents overfitting, the curse of dimensionality, and more.

Happy Learning

We hope that this course has helped you become more confident for your upcoming interview or your learning process in general. Going over the top interview questions will help you understand what is important knowledge and what interviewers deem as important skills and knowledge.

If you know of any other good resources, please share them in the comments for the community!

Nisha Arya is a data scientist, freelance technical writer, and an editor and community manager for KDnuggets. She is particularly interested in providing data science career advice or tutorials and theory-based knowledge around data science. Nisha covers a wide range of topics and wishes to explore the different ways artificial intelligence can benefit the longevity of human life. A keen learner, Nisha seeks to broaden her tech knowledge and writing skills, while helping guide others.

More On This Topic

  • The Only Free Course You Need To Become a Professional Data Engineer
  • The Only Free Course You Need To Become a MLOps Engineer
  • Not Only for Deep Learning: How GPUs Accelerate Data Science & Data…
  • Interview Kickstart Data Science Interview Course — What Makes It…
  • 7 Super Cheat Sheets You Need To Ace Machine Learning Interview
  • 10 Cheat Sheets You Need To Ace Data Science Interview

We are Entering an Era of ‘LLM Pollution’

We are Entering an Era of ‘LLM Pollution’

Everyone is building LLMs. Be it closed or open, the number of language models out there significantly outpace the amount of extensions and applications based on them. Some are small and some are big, but only a few companies are actually able to build something tangible out of them.

Arguably, that is also something important to do. Many extensions that come out of open language models are just language additions or are making them faster on a smaller scale. Though a noble cause, it doesn’t really affect how these models get adopted. When it comes to India, there are models built on top of Llama 2 in various Indic languages, but beyond that, nothing significant has been brought out.

These models come in various sizes, from modest to monumental, yet despite their abundance, only a handful of companies have managed to translate them into tangible applications effectively.

Indeed, the proliferation of LLMs represents a significant milestone in AI advancement. Yet, the sheer volume of models being produced is outpacing the development of meaningful extensions and practical applications. While these efforts are commendable, they fall short of addressing the core issue of widespread LLM adoption.

A waste of time?

For example, there are hundreds and thousands of language models on the Hugging Face Leaderboard. As soon as a new model drops, people start messing around with it, testing its capabilities, and benchmarking it for their use before moving on. Next day, the cycle repeats with the latest model in town.

Falcon, one of the largest open source language models, when launched, was tested and applauded by a lot of developers. But soon, after testing its capabilities, people found out that Meta’s Llama 2 is a lot better even with its smaller size. The same happened with Mistral’s new models and with OpenAI’s GPT-2 after so many years.

Speaking of Falcon, it is there, but people rarely use it. No significant applications are being built out of it. But TII, the institute behind the language model, might be up to making another AI model which they would want to see on top of the leaderboards.

Undoubtedly, this is how competition works. Databricks’ new AI model called DBRX is currently outperforming every other model in the market and at a much cheaper price. Enterprises are ready to adopt it given its capabilities. The same rush would arguably get witnessed again when Meta drops Llama 3. There would be choices for sure, but people will then forget about Llama 2 as well.

This abundance of foundation language models without any added innovation is now encapsulated as ‘LLM pollution’. Rather than facilitating innovations or transformative applications, the surplus of LLMs risk inundating the field with redundant or underutilised models.

What next then?

Naveen Rao, the VP of generative AI at Databricks, told AIM that a vast majority of foundational model companies will fail. “You’ve got to do something better than they [OpenAI] do. And, if you don’t, and it’s cheap enough to move, then why would you use somebody else’s model? So it doesn’t make sense to me just to try to be ahead unless you can beat them,” he added.

Rao also said that everyone has to have their take, but many of them just build models and call it a victory. “Woohoo! You built a model. Great,” he quipped. But he said that it will not work without differentiation or problem-solving.

“Just building a piece of technology because you said you can do it doesn’t really prove that you can solve a problem,” said Rao.

Ankush Sabharwal from CoRover.ai told AIM that there is no need to build more foundational models when you already have the ones that work for use cases. It is time to go with this approach.

Throwing billions of dollars for the next GPT might create an excellent model for OpenAI, but the billions of dollars used to build GPT-4 would be arguably gone to dust. People might use it for a while, but it would just become the next GPT-2 soon. There is no point in not accelerating AI, but measuring the impact, positive and negative, it has on the adoption side definitely needs to be accelerated alongside.

There is a pressing need for greater emphasis on practical applications and real-world problem-solving with LLMs. Rather than fixating solely on the technical prowess of language models, attention should also be devoted to their practical utility and societal implications.

Companies will definitely not all use the same LLMs, and we definitely need more options. But it is also necessary to define the exact use cases of these models before building a bunch of them in different languages. The era of ‘LLM pollution’ is here. And there would be LLMs in a heap that no one would use, which were once on top of the charts, but are now languishing in a pile.

The post We are Entering an Era of ‘LLM Pollution’ appeared first on Analytics India Magazine.

AIM Loves Perplexity AI

Perplexity AI

A few months ago, the company gave its Perplexity Pro access to AIM, predominantly for research and analysis purposes. The results have been promising.

The biggest pain point for any kind of research is the source and authenticity of information. Unlike any of the available LLM that generate results, Perplexity substantiates the given information with the source link, and the veracity of the content is addressed, which AIM users found to be the biggest plus point.

“I actually liked it. The sources are quite reliable,” said a research associate at AIM.

With the option to select from a list of the latest AI models, including Claude 3 Opus, Mistral Large and GPT-4 Turbo, users can experiment with any of them. A range of features such as ‘discover’, ‘focus’ and ‘attach’ allows users to explore and cater search to their specific needs.

“Whenever I don’t understand a feature or anything else while browsing the internet, I just take a screenshot and ask Perplexity about it,” said a video journalist at AIM, who also uses the application to improve the workflow and check the grammar of his scripts.

Interestingly, we observed that Perplexity generated answers faster than ChatGPT.

Lacks Depth, Hallucination Persists

Perplexity Pro provided various features that worked extremely well for us; however, it was not perfect. When generating in-depth responses to a particular context, ChatGPT fared better than Perplexity Pro. The latter was unable to handle complex instructions.

In a rare few occurrences, the chatbot even generated incorrect information. When prompted again about the incorrect information, it generated the right one. Like every LLM, Perplexity Pro is also not free from hallucinations.

Perplexity Pro Responses

It was also noticed that the order of search results closely resembled Google Search results. This sparked conversation around how Perplexity is possibly a Google wrapper. However, we have not observed an exhaustive list of search results that proves the same.

Nothing like Perplexity AI

Despite its flaws, Perplexity has become the talk of the town. The AI-powered answer engine has been in a quest to establish itself as a Google alternative. In the process, the company is actively partnering with device-makers, especially smartphones.

The company recently partnered with Nothing, where the buyers of their latest smartphones get a free subscription of Perplexity Pro.

Not just smartphone makers but even operators have become strategic partners. The company recently announced its partnership with Korea’s largest telecommunications company, SK Telecoms, where 32M+ subscribers will get access to Perplexity Pro.

Perplexity also announced its pplx-online LLM APIs to power Rabbit R1, an AI-powered gadget that uses a large action model (LAM).

Last month, Perplexity partnered with Yelp to improve local searches and help users find information on local restaurants and businesses, a probable step to combat Google reviews.

Perplexity recently incorporated DataBricks’ latest open-source LLM DBRX, which is said to outperform GPT-4 and other powerful AI models like LLaMA and Mistral.

Source: X

Not just Databricks, Perplexity has been open to embracing and offering all kinds of closed-source models through APIs and answer engines, be it the latest Claude 3 Opus, Mistral Large, or Google Gemma. Perplexity is quick at its game.

Further, Aravind Srinivas, Perplexity’s CEO and co-founder, recently announced that Copy AI, which is launching a GTM platform, is collaborating with Perplexity AI. “They chose to use our APIs for this, and we’re also providing six months of Perplexity Pro for free to current Copy AI subscribers,” he said.

Even NVIDIA chief Jensen Huang had earlier mentioned that he uses Perplexity, a company they have invested in, ‘almost everyday’.

The Future of Search?

The testimonials of big tech leaders such as Huang and Jeff Bezos may sound inflated considering they have invested in the AI company, but going by the growing number of Perplexity users, the company is surely capturing a wide audience.

The company has over 10 million monthly users.

Further, they are even offering the model in various languages, such as Korean, German, French and Spanish.

Source: X

While they aim to compete with Google and Sundar Pichai, whom Srinivas admires, things are looking good for Perplexity AI and Aravind Bhai. AIM loves Perplexity and Google, equally.

The post AIM Loves Perplexity AI appeared first on Analytics India Magazine.

We Finally Have A Hugging Chat for Indic LLMs

Indic Chat

Bhabha AI has unveiled Indic Chat, a playground for open source Indic LLMs. Built on top of Hugging Face’s Hugging Chat, the model hosts Indic AI models for people to chat and test out.

Click here to check it out.

Currently, the interface has:

  • Telugu-LLM-Labs/Indic-gemma-7b-finetuned-sft-Navarasa-2.0
  • GenVRadmin/AryaBhatta-GemmaOrca
  • BhabhaAI/Gajendra-v0.1
  • ai4bharat/Airavata

Additionally, users can also join the Discord channel which is now accessible to all for fostering collaboration and expediting the development of Indic LLMs.

Bhabha AI has also published a version of the OpenHermes-2.5 instruction dataset comprising approximately 600,000 rows, which have been filtered and translated into Hindi.

Satpal Singh Rathore, the creator of Bhabha AI and Indic Chat collaborated with Telugu LLM Labs. Ramsri Goutham Golla & Ravi Theja Desetty from Telugu LLM Labs supported Bhabha AI for computing resources.

Few weeks back, Bhabha AI unveiled Gajendra, an early release of their 7B Hindi-Hinglish-English instruction fine-tuned language model built on top of Sarvam AI’s OpenHathi, which is built on top of Llama 2.

In a forward-looking move, Bhabha AI is delving into the exploration of filtering examples that can be translated from English to Hindi. The initiative includes the release of initial versions of both the dataset and the corresponding model.

The post We Finally Have A Hugging Chat for Indic LLMs appeared first on Analytics India Magazine.

How NVIDIA is Turning Data Centers into AI Factories for Generative AI

NVIDIA has been surprising the world as if there’s no tomorrow. “It’s certainly a good time to be NVIDIA,” declared xAI chief Elon Musk in a recent interview with Peter Diamandis during the 2024 Abundance360 Summit.

Musk said that AI compute is growing exponentially, increasing by a factor of 10 every six months, and most data centres are transitioning from conventional to AI compute. ‘AI factories’ are most certainly just a step in that direction. NVIDIA is obsessed with it.

At the recent GTC 2024, Huang drew parallels between data centres and factories during the industrial revolution. He explained how data centres now produce data tokens using data and electricity as raw materials, and compared it with the production of electricity during the industrial revolution, when energy was used.

This marks a significant perspective shift from data centres being a cost-guzzling infrastructure to becoming revenue generators.

In October last year, the most valuable chip company joined hands with Foxconn to build ‘AI factories’ that would use NVIDIA chips to power a “wide range” of autonomous vehicles, robotics platforms, and LLM training.

This time, Huang emphasised the necessity, stating, “Anyone developing chatbots and generative AI will require an AI factory.” He also drew attention to their collaboration with Dell, a company Huang described as “skilled in building end-to-end systems for enterprises at scale”.

Additionally, major cloud providers and data centre operators, including AWS, Microsoft Azure, Google Cloud, and Oracle Cloud, have announced plans to offer NVIDIA’s new Blackwell GPUs and systems in their data centres. Hardware partners like Cisco, Dell, HPE, Lenovo, and Supermicro will deliver Blackwell-optimised servers and systems.

AWS and Microsoft Azure plan to offer Blackwell-based instances, co-developing the Project Ceiba AI supercomputer with NVIDIA. Google Cloud will incorporate NVIDIA’s GB200 NVL72 systems, while OCI will adopt the GB200 Grace Blackwell Superchip and host a 72 Blackwell GPU NVL72 cluster.

Other cloud providers, such as Lambda, CoreWeave, IBM Cloud, and NexGen Cloud, also intend to offer Blackwell hardware.

Data centre operators, including YTL Power in Malaysia and Singtel in Singapore, are preparing to host Blackwell-powered systems.

NVIDIA has also garnered interest from AI leaders like Meta, Microsoft, OpenAI, Oracle, and Tesla to adopt the Blackwell platform, signalling widespread adoption across hyper scalers public clouds, specialised GPU cloud providers, data centre operators, and sovereign clouds.

Challenges Galore

However, the AI revolution, powered by Blackwell, is not without challenges. As computational demands soar, the data centre industry faces mounting pressure to enhance energy efficiency and meet renewable energy goals.

Pedro Domingos, a professor of computer science at the University of Washington, also highlighted in a post on X how data centres seem to have become uni-focused rather than multi-focused after the popularisation of generative AI models.

This trend signals increasing data centre infrastructure for each use case or function, which could prove challenging. Hence, experts have indicated a need to move to low-cost data centres powered by renewable energy, which wouldn’t pose a burden.

“If you’re not building data centres adjacent to low-cost renewable power, you’re not going to have a fruitful conversation with the hyperscalers and cloud players,” said Marc Ganzi, CEO of DigitalBridge.

Musk said that last year, the primary constraint was AI chips. “This year, however, one of the major constraints, if not the biggest, is voltage step-down transformers. The challenge lies in reducing the power from a utility’s 300 kilovolts to less than one volt for computers,” he added.

Musk said that this massive voltage reduction is critical. On a lighter note, he said it is almost as if we need transformers for the transformers — voltage transformers for our AI’s neural net transformers. “That’s the main issue we’re facing this year,” he added.

What’s Next?

Musk said that looking ahead to next year and beyond, the constraint is likely to be on the availability of electrical power. He also said with AI’s substantial power demands and the shift to sustainable energy, especially electric vehicles, the need for electrical power is becoming increasingly significant.

OpenAI’s co-founder and chief AI scientist Ilya Sustkever has repeatedly envisioned the initial AGIs as “very large data centres packed with specialised neural network processors working in parallel”.

Given the recent developments in the field, Sustkever’s insights are particularly noteworthy.

A short while ago, Microsoft and OpenAI announced Stargate, a groundbreaking AI supercomputer data centre project costing over $115 billion, which is set to be launched in 2028. Additionally, AWS plans to invest over $150 billion in data centres over the next 15 years to accommodate increasing demand.

While all of them are at it, NVIDIA is better positioned to develop AGI chips and eventually AGI factories, considering its recent strides in defying Moore’s Law with its Blackwell infrastructure.

The post How NVIDIA is Turning Data Centers into AI Factories for Generative AI appeared first on Analytics India Magazine.