OpenAI Introducing Media Manager Tool in India Could Hurt Ola Krutrim’s Ego

OpenAI Introducing Media Manager Tool in India Could Hurt Ola Krutrim’s Ego

OpenAI recently announced its plans to develop a new tool called Media Manager. The tool enables creators and content owners to specify how their work is used in machine learning research and training AI models. The tool is designed to respect these choices and is expected to be released by 2025.

The catch is that this new tool will be of great help to OpenAI in collecting Indic data and building GPT models and could also hurt many Indian AI startups. This includes Ola Krutrim, SML Hanooman, and others, which have barely bloomed and are struggling to onboard users onto its platforms.

Recent statistics reveal that ChatGPT has amassed over 180 million users globally, with India emerging as its second-largest market. India accounts for 9.08 % of the total user base, which comes to approximately 14 million users. Neither Ola Krutrim nor Hanooman are anywhere close and are busy playing the so-called Indian ‘culture’ card.

That also explains why OpenAI recently hired Pragya Misra, its first employee in India, as the government relations head to lobby the Indian government and create a safe space for OpenAI to eventually operate in the country without any hindrance.

Tumse Na Ho Payega’… Really?

“You won’t be able to do it,” when translated into English, is what Ola Krutrim chief Bhavish Aggarwal said in a recent interview, pointing to OpenAI. He boldly claimed that he aims to challenge OpenAI by proving that India can build its own foundational language models from scratch.

However, Aggarwal admitted that Krutrim needs to catch up with ChatGPT but added, “Unless the start is made, how can we move ahead?”

Most recently, he also claimed that he wants Krutrim to be Indian-centric and free from Western influence, to the extent that he coined a new term called ‘Pronoun Illness’. This sentiment, shared by him, is facing criticism from the developer ecosystem, which is questioning Ola’s diversity and inclusion practices.

The irony is that the entire model and the idea of starting Krutrim itself appear to have been copied from OpenAI—to the extent that it even replied to some of the users’ queries stating that it was built on top of ‘OpenAI models’, which was later rectified vaguely, and not spoken about ever since.

Many believe the company used OpenAI’s GPT-4 output to train Krutrim.

Interestingly, Ola Krutrim is currently using Databricks services to streamline data for its model, and as far as building models go, it is most likely using DBRX as well. “We have been working closely with the Databricks team to pre-train and fine-tune our foundational LLM,” said Ravi Jain, Krutrim VP.

Indic Data is All You Need

“The amount of high-quality data originally available in Indian languages is quite small,” said Vivek Raghavan, co-founder of Sarvam AI, highlighting the challenges around creating datasets for low-resource Indic languages.

Further, Raghavan said that even if you take the example like Common Crawl, which is the most common web data repository, only 0.1% of the text is in Hindi, and other Indian languages are even lower than that,” he added.

Pratyush Kumar and Vivek Raghavan, the founders of Sarvam AI, have previously worked with another homegrown AI venture, AI4Bharat, which is building Indic language datasets like IndicVoices.

Similarly, Tech Mahindra, which is developing its own Hindi LLM ‘Project Indus’ consisting of 539 million parameters and 10 billion Hindi+ dialect tokens, sent its crew to North India to collect data.

“We went to Madhya Pradesh, Rajasthan, and parts of Bihar. The team’s task was to collect Hindi and dialect data by interacting with professors and leveraging the Bhasha-dan portal available on ProjectIndus.in,” said Nikhil Malhotra, global head at Makers Lab, Tech Mahindra and the brain behind Project Indus.

Coincidentally, similar to OpenAI’s Media Manager, Bhashini also introduced Bhashan Daan to create a large and open repository of language data in various Indian languages.

Customer-Centric, Not Ego-Centric

The only moat most Indian AI startups currently have is the plethora of Indic datasets they hoard or harness. Now, with OpenAI introducing the Media Manager tool, its presence in the country could expand multifold, alongside hindering growth for a bunch of companies building ChatGPT alternatives.

To be honest, most Indian AI startups are two years behind OpenAI or any other AI startups in the West.They have barely begun, and it is time they run a reality check and focus on developing innovative and collaborative solutions to cater to Indian consumers and enterprises instead of competing aimlessly.

India’s CTO, Nandan Nilekani, also echoed similar views recently. He said that India is not in the race to build LLMs but should focus on building AI use cases that will reach every citizen. “Winners in AI in India will be those who meet customers where they are,” he said.

In a recent interview with AIM, Sarvam AI’s Raghavan also said the same. “We’ve just started here; I don’t think we are trying to build the class of models that OpenAI is trying to build with GPT-5,” he said, sharing his company’s strategy of leveraging existing AI tools as well as in-house models to build meaningful products that impact millions of people in the country.

On the other hand, Mr Aggarwal is obsessed with competing with OpenAI and other tech giants, waging a ‘culture’ war against the West.

“Rich of you to call my post unsafe! This is exactly why we need to build our own tech and AI in India. Else we’ll just be pawns in other political objectives,” said Mr Aggarwal, over his controversial ‘pronoun illness’ LinkedIn post, and accusing them of imposing a political ideology on Indian users that’s unsafe, sinister.

The post OpenAI Introducing Media Manager Tool in India Could Hurt Ola Krutrim’s Ego appeared first on Analytics India Magazine.

5 Machine Learning Papers to Read in 2024

5 Machine_Learning_Papers_to_Read_in_2024

Image generated with DALL-E 3

Machine learning is a subset of artificial intelligence that could bring value to the business by providing efficiency and predictive insight. It’s a valuable tool for any business.

We know that last year was full of machine learning breakthrough, and this year is not any different. There is just so much to learn about.

With so much to learn, I select a few papers in 2024 that you should read to improve your knowledge.

What are these papers? Let’s get into it.

HyperFast: Instant Classification for Tabular Data

HyperFast is a meta-trained hypernetwork model developed by Bonet et al. (2024) research. It’s designed to provide a classification model that is capable of instant classification of tabular data in a single forward pass.

The author stated that the HyperFast could generate a task-specific neural network for an unseen dataset that can be directly used for classification prediction and eliminate the need for training a model. This approach would significantly reduce the computational demands and time required to deploy machine learning models.

The HyperFast Framework shows that the input data is transformed through standardization and dimensionality reduction, followed by a sequence of hypernetworks that produce weights for the network's layers, which include a nearest neighbor-based classification bias.

Overall, the results show that HyperFast performed excellently. It is faster than many classical methods without the need for fine-tuning. The paper concludes that HyperFast could become a new approach that can be applied in many real-life cases.

EasyRL4Rec: A User-Friendly Code Library for Reinforcement Learning Based Recommender Systems

The next paper we will discuss is about a new library proposed by Yu et al. (2024) called EasyRL4Rec.The point of the paper is about a user-friendly code library designed for developing and testing Reinforcement Learning (RL)-based Recommender Systems (RSs) called EasyRL4Rec.

The library offers a modular structure with four core modules (Environment, Policy, StateTracker, and Collector), each addressing different stages of the Reinforcement Learning process.

The overall structure shows that it works around the core modules for the Reinforcement Learning workflow—including Environments (Envs) for simulating user interactions, a Collector for gathering data from interactions, a State Tracker for creating state representations, and a Policy module for decision-making. It also includes a data layer for managing datasets and an Executor layer with a Trainer Evaluator for overseeing the learning and performance assessment of the RL agent.

The author concludes that EasyRL4Rec contains a user-friendly framework that could address practical challenges in RL for recommender systems.

Label Propagation for Zero-shot Classification with Vision-Language Models

The paper by Stojnic et al. (2024) introduces a technique called ZLaP, which stands for Zero-shot classification with Label Propagation. It’s an enhancement for the Zero-Shot Classification of Vision Language Models by utilizing geodesic distances for classification.

As we know Vision Models such as GPT-4V or LLaVa, are capable of zero-shot learning, which can perform classification without labeled images. However, it can still be enhanced further which is why the research group developed the ZLaP technique.

The ZLaP core idea is to utilize label propagation on a graph-structured dataset comprising both image and text nodes. ZLaP calculates geodesic distances within this graph to perform classification. The method is also designed to handle the dual modalities of text and images.

Performance-wise, ZLaP shows results that consistently outperform other state-of-the-art methods in zero-shot learning by leveraging both transductive and inductive inference methods across 14 different dataset experiments.

Overall, the technique significantly improved classification accuracy across multiple datasets, which showed promise for the ZLaP technique in the Vision Language Model.

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

The fourth paper we will discuss is by Munkhdalai et al.(2024). Their paper introduces a method to scale Transformer-based Large Language Models (LLMs) that could handle infinitely long inputs with a limited computational capability called Infini-attention.

The Infini-attention mechanism integrates a compressive memory system into the traditional attention framework. Combining a traditional causal attention model with compressive memory can store and update historical context and efficiently process the extended sequences by aggregating long-term and local information within a transformer network.

Overall, the technique performs superior tasks involving long-context language modelings, such as passkey retrieval from long sequences and book summarization, compared to currently available models.

The technique could provide many future approaches, especially to applications that require the processing of extensive text data.

AutoCodeRover: Autonomous Program Improvement

The last paper we will discuss is by Zhang et al. (2024). The main focus of this paper is on the tool called AutoCodeRover, which utilizes Large Language Models (LLMs) that are able to perform sophisticated code searches to automate the resolution of GitHub issues, mainly bugs, and feature requests. By using LLMs to parse and understand issues from GitHub, AutoCodeRover can navigate and manipulate the code structure more effectively than traditional file-based approaches to solve the issues.

There are two main stages of how AutoCodeRover works: Context Retrieval Stage and Patch Generation target. It works by analyzing the results to check if enough information has been gathered to identify the buggy parts of the code and attempts to generate a patch to fix the issues.

The paper shows that AutoCodeRover improves performance compared to previous methods. For example, it solved 22-23% of issues from the SWE-bench-lite dataset, which resolved 67 issues in an average time of less than 12 minutes each. This is an improvement as on average it could take two days to solve.

Overall, the paper shows promise as AutoCodeRover is capable of significantly reducing the manual effort required in program maintenance and improvement tasks.

Conclusion

There are many machine learning papers to read in 2024, and here are my recommendation papers to read:

  1. HyperFast: Instant Classification for Tabular Data
  2. EasyRL4Rec: A User-Friendly Code Library for Reinforcement Learning Based Recommender Systems
  3. Label Propagation for Zero-shot Classification with Vision-Language Models
  4. Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention
  5. AutoCodeRover: Autonomous Program Improvement

I hope it helps!

Cornellius Yudha Wijaya is a data science assistant manager and data writer. While working full-time at Allianz Indonesia, he loves to share Python and data tips via social media and writing media. Cornellius writes on a variety of AI and machine learning topics.

More On This Topic

  • KDnuggets News, April 27: A Brief Introduction to Papers With Code;…
  • Top Machine Learning Papers to Read in 2023
  • Must Read NLP Papers from the Last 12 Months
  • Generative Agent Research Papers You Should Read
  • Machine Learning Books You Need To Read In 2022
  • Top 10 Kaggle Machine Learning Projects to Become Data Scientist in 2024

The World Needs Something Better Than the Transformer 

Transformer

One can say that modern AI, or generative AI, is running on attention or Transformers that were created by Google. Seven years after the paper was released, everyone is still trying to find better architectures for AI. But arguably, even after all the backlash, Transformers still reign supreme.

Noam Shazeer, one of the creators of Transformers, revealed that Transformer architecture was once called ‘CargoNet,’ but nobody really paid much attention to it.

Regardless, researchers challenging Transformers is not new. The latest paper by Sepp Hochreiter, the inventor of LSTM, has unveiled a new LLM architecture featuring a significant innovation: xLSTM, which stands for Extended Long Short-Term Memory. The new architecture addresses a major weakness of previous LSTM designs, which were sequential in nature and unable to process all information at once.

LSTMs, compared to Transformers, are limited by their storage capacities, inability to revise storage decisions, and lack of parallelisability due to memory mixing. Unlike LSTMs, Transformers parallelise operations across tokens, significantly improving efficiency.

The main components of the new architecture include a matrix memory for LSTM, eliminating memory mixing, and exponential gating. These modifications allow the LSTM to revise its memory more effectively when processing new data.

What Are the Problems With Transformers?

In December last year, researchers Albert Gu and Tri Dao from Carnegie Mellon and Together AI introduced Mamba, challenging the prevailing dominance of Transformers.

Their research unveiled Mamba as a state-space model (SSM) that demonstrates superior performance across various modalities, including language, audio, and genomics. For example, the researchers tried their language modelling with the Mamba-3B model, which outperformed Transformer-based models of the same size and matched Transformers twice its size, both in pretraining and downstream evaluation.

The researchers emphasised Mamba’s efficiency through its selective SSM layer, which is designed to address the computational inefficiency of Transformers on long sequences up to a massive million sequence length, a major limitation in Transformers.

Another paper by the Allen Institute of AI titled “Faith and Fate: Limits of Transformers on Compositionality” discussed the fundamental limits of Transformer language models by focusing on compositional problems that require multi-step reasoning.

The study investigates three representative compositional tasks: long-form multiplication, logic grid puzzles (e.g. Einstein’s puzzle), and a classic dynamic programming problem.

The autoregressive nature of Transformers presents a fundamental challenge in understanding tasks comprehensively. These findings underscore the pressing need for advancements in Transformer architecture and training methods.

Maybe Attention is a Good Start

According to Meta’s AI chief Yann LeCun, “Auto-regressive LLMs are like processes that keep getting away from the correct answers exponentially”.

This is possibly why Meta also introduced MEGALODON, a neural architecture for efficient sequence modelling with unlimited context length. It is designed to address the limitations of Transformer architecture in handling long sequences, including quadratic computational complexity and limited inductive bias for length generalisation.

This is similar to Google introducing Feedback Attention Memory (FAM), a novel Transformer architecture that leverages a feedback loop to enable the network to attend to its own latent representations, fostering the emergence of working memory within the Transformer and allowing it to process indefinitely long sequences.

In April, Google also unveiled a new family of open-weight language model RecurrentGemma 2B, by Google DeepMind, based on the novel Griffin architecture.

This architecture achieves fast inference when generating long sequences by replacing global attention with a mixture of local attention and linear recurrences.

Speaking of mixture, Mixture of Experts (MoE) models are also on the rise. It is a type of neural network architecture that combines the strengths of multiple smaller models, known as ‘experts’, to make predictions or generate outputs. An MoE model is like a team of hospital specialists. Each specialist is an expert in a specific medical field, such as cardiology, neurology, or orthopaedics.

With respect to Transformer models, MoE has two key elements – Sparse MoE Layers and a Gate Network. Sparse MoE layers represent different ‘experts’ within the model, each capable of handling specific tasks. The gate network functions like a manager, determining which words or tokens are assigned to each expert.

This has led to Jamba. AI21 Labs introduced Jamba, which is a hybrid decoder architecture that combines Transformer layers with Mamba layers, along with an MoE module. The company refers to this combination of three elements as a Jamba block.

Jamba applies MoE at every other layer, with 16 experts, and uses the top 2 experts at each token. “The more the MoE layers, and the more the experts in each MoE layer, the larger the total number of model parameters,” wrote AI21 Labs in Jamba’s research paper.

The End of Transformers?

Before Transformers were all the hype, people were obsessed with recurrent neural networks (RNNs) for deep learning. But by definition, RNNs process data sequentially, which was thought to be an unfit choice for text-based models.

But Transformers are also just a modification of RNNs with an added attention layer. This could be the same for something that “replaces” Transformers.

Jensen Huang had asked the panel at NVIDIA GTC 2024 about the most significant improvements to the base Transformer design. Aidan Gomez replied that extensive work has been done on the inference side to speed up these models. However, Gomez said that he is quite unhappy with the fact that all developments happening today are built on top of Transformers.

“I still think it kind of disturbs me how similar to the original form we are. I think the world needs something better than the Transformer,” he said, adding that he hopes it will be succeeded by a ‘new plateau of performance’. “I think it is too similar to the thing that was there six or seven years ago.”

The post The World Needs Something Better Than the Transformer appeared first on Analytics India Magazine.

Why India has Less than 2000 AI Senior Engineers?

why india has less than 2000 senior AI engineers

According to a report by specialist staffing firm Xpheno, the active pool of senior AI engineers in India who can actually build core AI products and services is less than 2,000. And this is in a country that boasts over 2,20,000 software engineers.

A common consensus is that these handful of specialist engineers, which form less than 1% of the IT total engineering strength in the country, may not be enough to address our AI needs.

This also highlights a significant gap between the number of people receiving general AI training versus those with specialised skills to develop core AI technologies.

If we delve into the reasons behind this, the first that comes to mind is time. It takes time to become a seasoned AI engineer (approximately 15-20 years). This simply means it is difficult to nurture senior AI engineers and easier to create fresh graduates.

Another reason is the pay scale. AI is booming like never before, which means big tech companies are looking for experienced engineers to improve their products.

In India, the average salary of a senior AI engineer is between INR 9-21 lakh ($11,000 – $25,000), whereas in the USA, they can easily bag an offer of at least $121,000.

Another important factor is the lifestyle and standard of living in foreign countries, which is much-desired by Indians. Many AI professionals get swept off by offshore companies, thanks to the brain drain, leaving India with a minuscule number of senior AI folks.

How less is >2000?

India’s demand for AI talent is projected to grow 15% annually, with an existing demand-supply gap of about 51% for niche skills required to build core AI. This suggests that the current pool of AI engineers is inadequate to meet the increasing demand.

But the question is does India require its own LLM or an core AI product built from scratch? There are various FOSS projects which can be tuned to solve existing problems.

India has a different approach towards solving problems through AI. One such example is Tamil-Llama. It is based on Llama 2 and intended to break the language barrier and stay relevant to the AI world.

The Indian path in AI seems to be different. Nandan Nilekani also said, “We are not in the arms race to build the next LLM, let people with capital, let people who want to pedal ships do all that stuff… We are here to make a difference, and our aim is to put this technology in the hands of people.”

Furthermore, if you want to build something from scratch, it’ll only require a few senior engineers to provide vision and draw a path to a certain milestone.

“Core products and AI are driven by a few thousand people. You don’t need lakhs to build a core AI product,” said Atul Mehra, the founder of Vaayu.

What is the solution?

To solve this problem, India has started several focused AI and data science institutes to cater to the high demand for AI engineers. These institutes allow students and working professionals to pursue doctorates in AI and data science fields.

One such example is the Wadhwani brothers, Sunil and Romesh, who have established the Wadhwani Institute of AI (WIAI), which uses AI to serve underserved communities in developing countries and has impacted over 30 million lives across eight states in India.

The Indian Institute of Technology (IIT) Kharagpur has established the Centre of Excellence in Artificial Intelligence (AI) to foster cutting-edge research and develop skilled AI professionals. The centre aims to position IIT Kharagpur and India as global AI research and application leaders.

The core idea of AI is to help and educate people, and India is focused on democratizing AI technology and making it accessible to the masses. With this strategic vision and investment in AI education, India is well-positioned to overcome its talent gap and cement its position as the global AI hub in the years to come.

The post Why India has Less than 2000 AI Senior Engineers? appeared first on Analytics India Magazine.

Is GenAI Spoiling Gen Z? 

Is GenAI Spoiling Gen Z?

In 2022, OpenAI CEO Sam Altman remarked, “Ideological zombification is going out of fashion fast, led by Gen Z.” However, others argue that in the future people will rely heavily on AI for creative work and may become too lazy to form original thoughts.

When ChatGPT was introduced, Gen Z adopted it to do assignments, apply for jobs, and enhance work quality. OpenAI recently unveiled Media Manager, a tool designed to empower creators and content owners. This tool allows them to interact with AI, specifying ownership rights and preferences regarding the use of their works in machine learning research and training.

According to a Grammarly Survey, 61% of Gen Z assert that they can’t imagine doing work tasks without using GenAI as compared to 56% of millennials, 53% of Gen X-ers, and 41% of baby boomers.

The bond between Gen Z and GenAI has grown unbreakable, each shaping the other in profound ways. Yet, as this dependence deepens, a question arises, should Gen Z lean so much on GenAI?

Get Assistance but Don’t Copy

Platforms like EdrawMind, ToolBaz, Quillbot, and Scholarcy have become staples for Gen Z students, aiding research paper writing and beyond. According to a survey by BestColleges, 56% of college students had utilised AI in assignments or exams by 2023.

However, opinions remained divided. While many students embraced AI’s assistance, a significant portion deemed its use as cheating or plagiarism.

maybe an unpopular opinion but if you need to cheat to pass college you probably shouldn’t be there pic.twitter.com/bW1E2cRu78

— jj (@melodramateur) May 1, 2024

Meanwhile, Shoshana Davis, a Gen Z career expert and founder of the career consultancy Fairy Job Mother, shared with CNBC Make It during an interview that Gen Z is getting increasingly dependent on AI tools such as ChatGPT for composing cover letters and responses to job applications.

Gen Z Enters the Workplace with GenAI

The trend is clear: the technology once stigmatised as plagiarism has now found acceptance without alteration. Gen Z, equipped with GenAI, has embarked on a journey into the workforce. Projections by McCrindle indicate that by 2025, Gen Z will comprise up to 27% of the workforce.

LinkedIn‘s Future of Work Report highlighted that AI is expected to be vital in enhancing productivity, with 47% of US executives endorsing its potential. However, amidst this technological revolution, the demand for certain skill sets will remain crucial.

While across the globe, countries are witnessing a significant uptake in AI skills adoption. Singapore leads the pack followed by Finland, Ireland, India, and Canada.

AI for Fun

Further, Gen Z is now using AI for their daily tasks along with shaping their love story. The dating scene has got a makeover, with millennials and Gen Z leading the charge. According to a report by Cosmopolitan magazine and dating app Bumble, over three-quarters of millennials and 81% of Gen Z are open to using AI bots to spice up their flirting game on dating apps.

But it’s not just romance where AI is making waves. Recent insights from Sprout Social’s 2024 influencer marketing report reveal a trend among Gen Z. While 37% of consumers are curious about brands employing AI influencers, this curiosity spikes to 46% among Gen Z.

What’s more intriguing is that 27% of consumers, spanning different generations, remain undecided, struggling to differentiate between AI and human influencers. It seems Gen Z isn’t just embracing AI for practical purposes but also for a bit of fun and excitement.

Ultimately, the story of Gen Z and GenAI is one of adaptation and evolution, where the boundaries of creativity, responsibility, and authenticity are continually negotiated. As Gen Z continues to shape and be shaped by AI, the path forward calls for a delicate balance between innovation and the preservation of the human essence.

The post Is GenAI Spoiling Gen Z? appeared first on Analytics India Magazine.

Data Science Hiring Process at Razorpay

In February of this year, Razorpay, a prominent fintech unicorn, unveiled Razorpay RAY, a generative AI-powered assistant, for integrated payment and payroll management solutions specifically tailored for e-commerce businesses.

Leveraging GPT models through Azure APIs, RAY facilitates interactions via voice and text commands on platforms such as WhatsApp and web bots. It serves dual purposes within Razorpay: externally, it assists merchants by enhancing their understanding of data, and internally, it supports the company’s knowledge bases as a QnA service.

Around the same time, it also launched Payment Gateway 3.0, establishing itself as the only Payment Gateway in India that improves the payment process and the entire buyer journey. Powered by its in-house framework, AI-Nucleus, this innovative checkout system is set to improve business conversions by more than 30%, which is expected to lead to higher revenues.

The team behind making this possible is the company’s close-knit six-member AI/ML team, which is categorised into three roles: data scientist, machine learning engineer, and MLOps engineer.

Razorpay was founded in 2014 by Shashank Kumar and Harshil Mathur, IIT Roorkee graduates. Since then, the company has raised funding from investors like Y-Combinator, Sequoia India, and Tiger Global over several rounds.

The company is expanding its data science team and is looking for a senior machine learning engineer to join its Bengaluru team.

“At Razorpay, solving for our customers is the core of everything we do. And the one thing that enables us to do that is data,” Murali Brahmadesam, chief technology officer and head of engineering at Razorpay, told AIM in an exclusive interview last week.

Brahmadesam shared that as a technology-first company catering to a diverse range of businesses, it prioritises the development of scalable and automated products. However, the founding principle of its operational strategy involves a strong emphasis on security and compliance due to its status as a regulated entity.

Inside the Data Science Team of Razorpay

Razorpay is working towards AI and data democratisation, fundamentally changing how engineers and data scientists work.

“This redefines the roles of our engineers and data scientists, allowing every engineer at Razorpay to become a ‘citizen data scientist’,’ using data-driven insights and AI tools in their daily tasks,” said Brahmadesam, highlighting that democratisation is key to creating a collaborative environment.

“Moreover, our data science team is actively working on platforms using AI model preparation. This initiative is designed to streamline and standardise the process of building generative AI and predictive models, making it accessible for our engineering teams to develop new models independently,” he commented.

Razorpay leverages generative AI models primarily for fraud detection, risk assessment, and personalised marketing. Additionally, it is expanding its AI capabilities to improve document processing across various Indian languages, enhancing its service reach and operational efficiency.

“While our system excels at processing English documents, we recognise the need for improvement in handling other Indian languages. Fortunately, initiatives like Bhashini are underway to address this,” he added. `

Tech Stack

Razorpay employs a mix of tech solutions for various aspects of its operations, ranging from prototyping to deployment. The company uses Databricks and EMR for these purposes, providing both managed and self-serve options. The data infrastructure includes AWS-managed Kafka for the streaming layer, RDS/Aurora for the batch layer, and S3 for the lake layer. Kubernetes is used for distributed deployments through standard GitHub CI/CD and Spinnaker manages the deployment process. For the managed side, Datarobot is used for both development and deployment tasks.

“We have also explored fine-tuning smaller Falcon and Phi models internally for specific use cases and we will continue to pursue them as well,” he added.

Interview Process

“To ensure that we are able to hire the right talent and that the candidate also fully knows what is expected of them at the job, we follow a five-step process when recruiting for data science and ML roles,” said Brahmadesam.

The process starts with a screening and exploratory call, during which candidates’ experience and understanding of the role are assessed, providing a mutual opportunity to explore fitment. This is followed by a weekly data exercise, during which candidates must solve a case study to demonstrate their problem-solving skills and ability to translate a problem statement into a data solution; this solution is then independently evaluated by two engineers.

The third step involves a coding test focused on Python, SQL, and Pyspark skills through progressively challenging problems. Next, the system design and ML depth interview assesses candidates on their machine learning knowledge and their ability to design systems, such as a real-time ranking engine for payment gateways. Finally, the hiring manager round focuses on cultural fit and levelling considerations, if needed.

However, he noted that candidates often make common mistakes while interviewing.

“A lot of candidates are unable to properly chalk out their work experience and have their skills sufficiently reflected in their work resume,” he added, stating that this leads to lesser chances of them qualifying for the interview rounds.

Expectations

When joining the Razorpay data science team, new hires can anticipate an initial period filled with knowledge sharing, induction sessions, introductions to team members, and brainstorming activities. This phase is designed to integrate them smoothly into the team and familiarise them with the company’s culture and operational methods.

Gradually, new team members will be assigned specific tasks, where they’re expected to take full ownership and contribute to collaborative efforts to address customer pain points through innovation.

On the other hand, Razorpay expects more than just adherence to established processes from its new hires. The company values fresh perspectives and encourages its team members to share their ideas freely, without fear of judgement.

“The enthusiasm and passion to innovate and imagine beyond the ordinary is what we most definitely expect them to have when they become a part of the Razorpay family,” Brahmadesam noted.

Work Culture

“As an employee-first organisation, our policies, initiatives, and efforts, are always chalked out with the intent of co-creating a space where employees feel valued, respected, and nurtured,” said Brahmadesam.

The company’s culture is founded on transparency, questioning the status quo, integrity with agility, customer obsession, and mutual growth with its employees (“Razors”). It maintains a hybrid working environment.

It offers several unique perks, such as health insurance for same-sex and live-in partners, a Family Assurance Benefits Policy, and as well as offbeat initiatives like ‘Bring Your Children & Pets to Work’ initiative.

The company also supports women re-entering the workforce with its ‘Resume with Razorpay’ programme and offers open hours for mental health counselling. It has also conducted one of the largest ESOP buyback sales in India’s startup ecosystem, which includes both current and former employees. Recreational facilities like foosball, chess, and TV rooms are available at office locations to enhance employee well-being.

Razorpay’s work culture is distinct from its competitors, especially in the way it integrates core values across all job functions, including the data science team. Data scientists at the team have full ownership of their projects.

“The environment at Razorpay is one where data scientists are not just contributors but are decision-makers,” he added.

This culture of ownership, coupled with a strong emphasis on empathy and employee empowerment, sets Razorpay apart, reflecting its commitment to both individual and company growth.

“Joining Razorpay wouldn’t be like just having a day job but having the massive opportunity to gain that immersive experience of being a part of India’s fintech revolution,” Brahmadesam concluded.

Check out Razorpay’s careers page now.

The post Data Science Hiring Process at Razorpay appeared first on Analytics India Magazine.

Tredence Appoints Munjay Singh as Chief Operating Officer

Tredence Appoints Munjay Singh as Chief Operating Officer

Tredence, the global data science and AI solutions company, has appointed Munjay Singh as its Chief Operating Officer, as part of its continued growth strategy.

Singh, who has previously held senior leadership roles at global technology consulting and product firms, brings with him extensive experience in driving operational efficiency and customer experience initiatives across diverse market segments.

In his new role, Singh will provide strategic vision, leadership, and deep operational expertise to accelerate Tredence’s organic and inorganic growth strategies.

This appointment comes at a time when Tredence has experienced significant growth, having raised more than $205 million from private equity firms. The company has also expanded into new regions and verticals, launched the ATOM.AI ecosystem, and developed a GenAI-as-a-service platform. Tredence has achieved remarkable growth, growing fourfold between 2020 and 2024, with a 40% growth rate in 2023.

“I am thrilled to join Tredence and amplify its ability to help enterprise clients modernise data ecosystems and solve last-mile AI challenges,” said Singh. “Our suite of 100+ AI/ML accelerators delivers unprecedented value to clients, providing a wide array of solutions to boost decision-making and unlock new opportunities. Additionally, our verticalization strategy, expert practices, and extensive partner network enable us to solve increasingly complex industry challenges and tailor our capabilities to client needs.”

Shub Bhowmick, CEO and co-founder of Tredence, expressed his excitement about Singh’s appointment. “Tredence collaborates with more than forty Fortune 500 companies to help them uncover opportunities in marketing, customer experience, supply chain, and other functions. We have developed verticalized collections of AI and data accelerators that our clients have implemented to achieve tangible business improvements within weeks. Under Munjay’s leadership and strategic guidance, we aim to propel this vision forward, driving innovation and operational excellence across our business functions and practices, and achieving new levels of success.”

The post Tredence Appoints Munjay Singh as Chief Operating Officer appeared first on Analytics India Magazine.

GNANI.AI Unveils India’s First Voice-First SLM for Indian Languages

gnani ai

GNANI.AI has announced a series of voice-first SLM (Small Language Models), which are meticulously trained on millions of audio hours of proprietary audio datasets and billions of Indic language conversations, capturing the rich diversity of dialects, accents, and linguistic nuances prevalent across the country.

With a sharp focus on key industry verticals, GNANI.AI aims to usher in the era of GEN AI, empowering businesses with sophisticated language understanding capabilities.

The initial rollout of GNANI.AI‘s SLM targets pivotal domains such as banking, insurance, automotive, and retail.

Leveraging their unparalleled expertise and industry knowledge from serving over 200 customers, GNANI.AI’s generative AI platform already enjoys widespread adoption within India’s BFSI (Banking, Financial Services, and Insurance) sector.

Ganesh Gopalan, CEO, and co-founder of GNANI.AI, emphasised the company’s strategic advantage, stating, “Gnani.ai ‘s 200+ top-tier customers in India, including major banks, insurance firms, BNPL (Buy Now Pay Later) providers, MFIs (Microfinance Institutions), and automotive giants will start leveraging highly accurate, low latency, efficient SLM deployments for impactful use cases.”

“While several companies have recently announced fine-tuned versions of open sourced LLM models for Indian languages, they often grapple with effectiveness, lack of multimodal capabilities, high inferencing costs, and privacy concerns.

GNANI.AI‘s SLM models address these challenges head-on, prioritising performance, security, privacy, and deployability on edge computing environments and private infrastructure,” Ananth Nagaraj, CTO, and co-founder of GNANI.AI said.

The post GNANI.AI Unveils India’s First Voice-First SLM for Indian Languages appeared first on Analytics India Magazine.

AGI is a Rorschach Test for People to Project Their Technology Anxieties

AGI is a Rorschach Test for People to Project Their Technology Anxieties

There’s a lot of murmur around AGI. While some experts see it coming next year, a few believe it will be here in 2029, and some others say never. Then there are those that claim that we’ve attained AGI many times in 2024 with developments like Devin and Claude-3 Opus, sparking the debate that now when we do achieve it, no one will really care.

When asked about AGI in a recent interview, Microsoft’s CTO Kevin Scott said, “AGI is kind of a Rorschach test for people and a lot of times it’s like what are you most anxious about in the development of the technology.”

The Rorschach test is a psychological assessment method that uses inkblots to analyse a person’s emotions, personality, and psychological traits.

“I don’t even know what it means, honestly in a literal sense. If you had to sit down and write a technical definition of AGI, I think you would have a very hard time doing that,” he added.

Microsoft CTO Kevin Scott: AGI is a Rorschach test for people to project their anxieties on to pic.twitter.com/cVSHx2tCEH

— Tsarathustra (@tsarnick) May 5, 2024

The AGI Obsession

We’re still not sure if OpenAI will be the first to build AGI, but in obsessing over it, they surely top the list. “I don’t care if we burn $50 billion a year, we’re building AGI, and it is going to be expensive and totally worth it,” said OpenAI CEO Sam Altman on Stanford eCorner’s talk.

Now you know why Sama wants seven trillion dollars because he is not going to stop until AGI is here. pic.twitter.com/zYZVmWwppm

— AshutoshShrivastava (@ai_for_success) May 2, 2024

Calling ChatGPT mildly embarrassing and GPT-4 the dumbest model users will ever have to use again, he emphasised the importance of iterative deployment rather than delivering perfect solutions.

“If we build AGI in a basement and then the world is blissfully walking blindfolded along, I don’t think that makes us very good neighbours,” he said, stressing the need to put the product in people’s hands and letting society co-evolve with the technology.

Many were quick to call out this obsession.

Source: X

Source: X

Source: X

With such hype, one might think that OpenAI has a detailed plan or timeline or at least a definition in place for AGI; however, nothing concrete was said about that.

“It’s too loose a definition, and there’s too much room for misinterpretation. I’ve given up on trying to give the AGI a timeline,” said Altman, when asked for a best guess of when he thinks AGI will happen.

According to Altman, when people ask about the AGI timeline, what they really want to know is when the world is going to be super different and when their lives will change due to it.

Although he doesn’t know how to give a precise timeline of when we get to the milestone people care about, he believes that every year for the next many years, there will be dramatically more capable systems.

In another recent interview, when asked about when he thinks we as a humanity will build AGI, Altman said that he used to love to speculate on that but has now realised that it’s a very poorly formed question and that people use extremely different definitions of what AGI is.

Source: X

“I think it makes more sense to talk about when we’ll build systems that can do capability X or Y or Z rather than when we kind of fuzzily cross this one-mile marker,” he said. Adding that by the end of this decade and possibly somewhat sooner than that, we will have quite remarkable systems.

Of course, many weren’t convinced by his big talk and hype words. There have been discussions around Altman’s stance on burning that kind of money ($50 billion) to build AGI – it’s more than some countries’ GDP!

Source: X

All major big-tech companies, like OpenAI, Meta, Google DeepMind, and Tesla are in the pursuit of AGI, but tech enthusiasts and the developer community seem to be disappointed by OpenAI for making tall claims and not releasing anything significant lately.

OpenAI COO Brad Lightcap says generative AI, as it is today, will be ‘laughably bad’ within a year.
ChatGPT could soon take on more ‘complex work’ and be a ‘great teammate.’
First Sama, now Brad, why not, instead of talking, just release GPT-5? pic.twitter.com/RMLvaC3ci3

— AshutoshShrivastava (@ai_for_success) May 7, 2024

At this point, the joke is just getting funnier.

Sama already said GPT-4 is the dumbest model. Brad said the current model is laughable; we already have all the updates.
They are simply playing with eveyone now or they just postponed it to May 14 so they can mess up with Google I/O events. pic.twitter.com/ZYWIQIIZW5

— AshutoshShrivastava (@ai_for_success) May 7, 2024

Source: X

However, there are also those who like Altman’s enthusiasm. “He knows money [will] no longer have value when they have achieved AGI,” said a user. Another wrote, “It would be a minuscule price to pay for something that could transform humanity.”

In India, meanwhile, Soket AI Labs has become the first startup to build solutions towards ethical AGI. This AI research lab plans to do this by starting with smaller language models and eventually building advanced AI systems capable of achieving human-level intelligence.

The post AGI is a Rorschach Test for People to Project Their Technology Anxieties appeared first on Analytics India Magazine.

The Inventor of LSTM Unveils New Architecture for LLMs to Replace Transformers

The Inventor of LSTM Unveils New Architecture for LLMs to Replace Transformers

Sepp Hochreiter, the inventor of LSTM, has unveiled a new LLM architecture, featuring a significant innovation: xLSTM which stands for Extended Long Short-Term Memory. The new architecture addresses a major weakness of previous LSTM designs, which were sequential in nature and unable to process all information at once.

Click here to read the paper.

The weaknesses of LSTMs, compared to Transformers, include the inability to revise storage decisions, limited storage capacities, and the lack of parallelizability due to memory mixing. Unlike LSTMs, Transformers parallelise operations across tokens, significantly improving efficiency.

I am so excited that xLSTM is out. LSTM is close to my heart – for more than 30 years now. With xLSTM we close the gap to existing state-of-the-art LLMs. With NXAI we have started to build our own European LLMs. I am very proud of my team. https://t.co/IH7giCe3gd

— Sepp Hochreiter (@HochreiterSepp) May 8, 2024

The main components of the new architecture include a matrix memory for LSTM, eliminating memory mixing, and exponential gating. These modifications allow the LSTM to revise its memory more effectively when processing new data.

The xLSTM architecture boasts O(N) time complexity and O(1) memory complexity as the sequence length increases, making it much more efficient than Transformers, which have quadratic time and memory complexity (O(N^2)).

In evaluations comparing Transformer LLM, RWKV, and xLSTM trained on 15 billion tokens of text, the xLSTM[1:0] (1 mLSTM, 0 sLSTM blocks) performed the best. Moreover, xLSTM architecture follows scaling laws similar to traditional Transformer LLMs.

One of the most important aspects of the xLSTM architecture is its flexible ratio of MLSTM and SLSTM blocks. MLSTM, which stands for matrix memory parallelizable LSTMs, can operate over all tokens at once, similar to Transformers. On the other hand, SLSTM, while not parallelizable, enhances state tracking ability but slows down training and inference.

The xLSTM architecture builds upon the traditional LSTM by introducing exponential gating with memory mixing and a new memory structure. It performs favorably in language modeling compared to state-of-the-art methods such as Transformers and State Space Models.

The scaling laws suggest that larger xLSTM models will be significant competitors to current Large Language Models built with Transformer technology. Additionally, xLSTM has the potential to impact various other deep learning fields including Reinforcement Learning, Time Series Prediction, and the modelling of physical systems.

The post The Inventor of LSTM Unveils New Architecture for LLMs to Replace Transformers appeared first on Analytics India Magazine.