AI — Страница 872

Tata Electronics and Synopsys Partner for Factory Automation and Establishing an AI-enabled Fab

Tata Electronics has signed a Memorandum of Understanding (MOU) with Synopsys, a leading provider of silicon-to-systems design solutions, to collaborate on process technology bring-up and a foundry design platform to accelerate the successful ramp of customer products in India’s first fab being built by Tata Electronics in Dholera, Gujarat.

The two companies have identified the following areas of potential collaboration:

Advanced factory automation and yield data analytics solutions to help establish an AI-enabled fab
TCAD (Technology Computer Aided Design) flow set-up to enable accurate technology transfer from the technology partner
PDKs (process design kits) and design enablement
IP development, including foundation and analog IP
DTCO (Design Technology Co-optimization) methodologies

“For nearly 30 years, Synopsys has been researching and developing silicon-to-systems design solutions for customers and investing in workforce development in India. We applaud and support Tata Electronics’ vision to develop semiconductor manufacturing capacity in India, advancing supply chain resiliency for the global semiconductor industry,” Sassine Ghazi, president & CEO of Synopsys, said.

As previously announced, Tata Electronics plans to build India’s first fab in Dholera, Gujarat, with a total investment of INR 91,000 crores. In addition, another INR 27,000 crores will be invested in a greenfield facility in Jagiroad, Assam for assembly and testing of semiconductor chips.

Together these facilities will produce semiconductor chips for applications across automotive, mobile devices, artificial intelligence (AI), and other key segments to serve customers globally. As construction of the facilities progresses, it is critical to grow partnerships across the entire semiconductor ecosystem spanning process and design technology, and equipment suppliers.

With this intended collaboration with Synopsys, Tata Electronics solidifies a critical pillar for a holistic approach to achieve its targets to be the first to bring semiconductor manufacturing to India.

The post Tata Electronics and Synopsys Partner for Factory Automation and Establishing an AI-enabled Fab appeared first on Analytics India Magazine.

Etched is building an AI chip that only runs one type of model

Data moving through a circuit board with CPU in the center.

As generative AI touches a growing number of industries, the companies producing chips to run the models are benefitting enormously. Nvidia, in particular, wields massive influence, commanding an estimated 70% to 95% of the market for AI chips. Cloud providers from Meta to Microsoft are spending billions of dollars on Nvidia GPUs, wary of falling behind in the generative AI race.

It’s understandable then that generative AI vendors aren’t pleased with the status quo. A large portion of their success hinges on the whims of the dominant chipmakers. And so they, along with opportunist VCs, are on the hunt for promising upstarts to challenge the AI chip incumbents.

Etched is among the many, many alternative chip companies vying for a seat at the table — but it’s also among the most intriguing. Only two years old, Etched was founded by a pair of Harvard dropouts, Gavin Uberti (ex-OctoML and ex-Xnor.ai) and Chris Zhu, who along with Robert Wachen and former Cypress Semiconductor CTO Mark Ross, sought to create a chip that could do one thing: run AI models.

That’s not unusual. Plenty of startups and tech giants are developing chips that exclusively run AI models, also known as inferencing chips. Meta has MTIA, Amazon has Graviton and Inferentia, and so on. But Etched’s chips are unique in that they only run a single type of model: Transformers.

The transformer, proposed by a team of Google researchers back in 2017, has become the dominant generative AI model architecture by far.

Transformers underpin OpenAI’s video-generating model Sora. They’re at the heart of text-generating models like Anthropic’s Claude and Google’s Gemini. And they power art generators such as the newest version of Stable Diffusion.

“In 2022, we made a bet that transformers would take over the world,” Uberti, Etched’s CEO, told TechCrunch in an interview. “We’ve hit a point in the evolution of AI where specialized chips that can perform better than general-purpose GPUs are inevitable — and the technical decision-makers of the world know this.”

Etched’s chip, called Sohu, is an ASIC (application-specific integrated circuit) — a chip tailored for a particular application — made for running transformers. Manufactured using TSMC’s 4nm process, Sohu can deliver dramatically better inferencing performance than GPUs and other general-purpose AI chips while drawing less energy, claims Uberti.

“Sohu is an order of magnitude faster and cheaper than even Nvidia’s next generation of Blackwell GB200 GPUs when running text, image and video transformers,” Uberti said. “One Sohu server replaces 160 H100 GPUs […] Sohu will be a more affordable, efficient and environmentally-friendly option for business leaders that need specialized chips.”

How does Sohu achieve all this? In a few ways, but the most obvious (and intuitive) is a streamlined inferencing hardware-and-software pipeline. Because Sohu doesn’t run non-transformer models, the Etched team could do away with hardware components not relevant to transformers and trim the software overhead traditionally used for deploying and running non-transformers.

Etched is arriving on the scene at an inflection point in the race for generative AI infrastructure. Beyond cost concerns, the GPUs and other hardware components necessary to run models at scale today are dangerously power-hungry.

Goldman Sachs predicts that AI is poised to drive a 160% increase in data center electricity demand by 2030, contributing to a significant uptick in greenhouse gas emissions. Researchers at UC Riverside, meanwhile, estimate that global AI usage could cause data centers to suck up 1.1 trillion to 1.7 trillion gallons of fresh water by 2027, impacting local resources. (Many data centers use water to cool servers.)

Uberti optimistically — or bombastically, depending on how you interpret it — pitches Sohu as the solution to the industry’s consumption problem.

“In short, our future customers won’t be able to afford not to switch to Sohu,” Uberti said. “Companies are willing to take a bet on Etched because speed and cost are existential to the AI products they are trying to build.”

But can Etched, assuming it meets its goal of bringing Sohu to the mass market in the next few months, succeed when so many others are following close behind it?

The company lacks a direct competitor at present, but AI chip startup Perceive recently previewed a processor with hardware acceleration for transformers. Groq has also invested heavily in transformer-specific optimizations for its ASIC.

Competition aside, what if transformers one day fall out of favor? Uberti says, in that case, Etched will do the obvious: Design a new chip. Fair enough, but that’s a pretty drastic fallback option, considering how long it’s taken to bring Sohu to fruition.

None of these concerns have dissuaded investors from pouring an enormous amount of money into Etched, though.

Today, Etched said it has closed a $120 million Series A funding round, co-led by Primary Venture Partners and Positive Sum Ventures. Bringing Etched’s total raised to $125.36 million, the round saw participation from heavyweight angel backers including Peter Thiel (Uberti, Zhu and Wachen are Thiel Fellowship alums), GitHub CEO Thomas Dohmke, Cruise (and the Bot Company) co-founder Kyle Vogt, and Quora co-founder Charlie Cheever.

These investors presumably believe Etched has a reasonable chance of successfully scaling up its business of selling servers. Perhaps it does — Uberti claims unnamed customers have reserved “tens of millions of dollars” in hardware so far. The forthcoming launch of the Sohu Developer Cloud, which will let customers preview Sohu via an online interactive playground, should drive additional sales, Uberti suggested.

Still, it seems too early to tell whether this will be enough to propel Etched and its 35-person team into the future its co-founders are envisioning. The AI chip segment can be unforgiving in the best of times — see the high-profile near-failures of AI chip startups like Mythic and Graphcore, and the declining investment in AI chip ventures in 2023.

Uberti makes a strong sales pitch, though: “Video generation, audio-to-audio modalities, robotics, and other future AI use cases will only be possible with a faster chip like Sohu. The entire future of AI technology will be shaped by whether the infrastructure can scale.”

Grammarly adds 5 new security and control features for enterprise users

Grammarly AI printed on side of building

Working professionals compose a lot of text every day, such as messages, reports, emails, and more. While many could benefit from an AI writing assistant, like Grammarly, privacy remains a concern for some managers. Grammarly today released a slew of updates seeking to allay some of these concerns.

Grammarly's new security and control features allow Enterprise users to leverage the AI writing assistant at work and personalize their experience according to their organization's needs and preferences.

Also: How my 4 favorite AI tools help me get more done at work

Grammarly's Bring Your Own Key (BYOK) is a new feature that gives enterprises full access to, and control over, data encryption. The startup has also introduced session timeouts, which are meant to help protect companies against unauthorized access.

Business leaders can now manage employee access more effectively in Grammarly, with new custom roles, group-level security controls, and enterprise cost-center visibility, according to Grammarly.

The startup has also unveiled a Figma plugin that teams can use to access Grammarly's communication assistance in the design tool. This feature could be handy for design projects, as Grammarly lets organizations upload their style guide and brand tone profile.

Also: How to create a drop-down list in Excel — quickly and easily

Grammarly Enterprise users can access most of these new features today, including the session timeouts, custom roles, and group-level security controls. The Figma plugin is also available, but you'll have to download it from the Figma Community website. Grammarly plans to roll out the BYOK and enterprise cost-center visibility features in the coming weeks.

If your organization is interested in Grammarly for Enterprise, start a free trial. You can also contact the Sales Team for additional information, including pricing and logistics.

Pixxel to Manufacture Miniaturised Satellites for Indian Air Force Under iDEX Grant

Bengaluru-based space technology startup Pixxel has signed the 350th contract under the iDEX (Innovations for Defence Excellence) program to manufacture miniaturised multi-payload satellites for the Indian Air Force.

The contract, awarded as part of the iDEX Prime Space grant, marks a significant milestone in Pixxel’s mission to revolutionise the space industry in India.

The contract was signed between Awais Ahmed, CEO of Pixxel, and Anurag Bajpai, Additional Secretary (Defence Production) and CEO of IDEX-DIO, in the presence of Defence Secretary Giridhar Aramane, the Vice Chiefs of the Armed Forces, and other officials of the Ministry of Defence.

“We are delighted to receive iDEX’s grant and utilise our expertise of building microsatellites in-house to manufacture satellites externally for the first time,” said Ahmed. “This recognition highlights Pixxel’s dedication to pushing the boundaries of space exploration and innovation.”

Under the multi-crore contract, Pixxel will develop small satellites weighing up to 150 kg for electro-optical, infrared, synthetic aperture radar, and hyperspectral applications. The company will leverage its indigenous hyperspectral satellite technology and manufacturing expertise to build these satellites, enabling ease of manufacture, low cost, and ease of launch.

As Pixxel sets out to launch six commercial-grade hyperspectral satellites, ‘Fireflies’, this year, the company remains committed to harnessing its indigenous expertise and the power of hyperspectral satellites for a sustainable future. Building on its expertise, Pixxel now offers high-performance, cost-effective satellite manufacturing solutions, empowering clients to drive meaningful change with space data.

The post Pixxel to Manufacture Miniaturised Satellites for Indian Air Force Under iDEX Grant appeared first on Analytics India Magazine.

How To Create Minimal Docker Images for Python Applications

Image by Editor | Midjourney & Canva

Creating minimal Docker images for Python apps enhances security by reducing the attack surface, facilitates faster image builds, and improves overall application maintainability. Let’s learn how to create minimal Docker images for Python applications.

Prerequisites

Before you get started:

You should have Docker installed. Get Docker for your operating system if you haven’t already.
A sample Python application you need to build the minimal image for. You can also follow along with the example app we create.

Create a Sample Python Application

Let's create a simple Flask application for inventory management. This application will allow you to add, view, update, and delete inventory items. We'll then dockerize the application using the standard Python 3.11 image.

In your project directory, you should have app.py, requirements.txt, and Dockerfile:

inventory_app/  ├── app.py  ├── Dockerfile  ├── requirements.txt

Here’s the code for the Flask app for inventory management:

# app.py  from flask import Flask, request, jsonify    app = Flask(__name__)    # In-memory database for simplicity  inventory = {}    @app.route('/inventory', methods=['POST'])  def add_item():  	item = request.get_json()  	item_id = item.get('id')  	if not item_id:      		return jsonify({"error": "Item ID is required"}), 400  	if item_id in inventory:      		return jsonify({"error": "Item already exists"}), 400  	inventory[item_id] = item  	return jsonify(item), 201    @app.route('/inventory/', methods=['GET'])  def get_item(item_id):  	item = inventory.get(item_id)  	if not item:      		return jsonify({"error": "Item not found"}), 404  	return jsonify(item)    @app.route('/inventory/', methods=['PUT'])  def update_item(item_id):  	if item_id not in inventory:      		return jsonify({"error": "Item not found"}), 404  	updated_item = request.get_json()  	inventory[item_id] = updated_item  	return jsonify(updated_item)    @app.route('/inventory/', methods=['DELETE'])  def delete_item(item_id):  	if item_id not in inventory:      		return jsonify({"error": "Item not found"}), 404  	del inventory[item_id]  	return '', 204    if __name__ == '__main__':  	app.run(host='0.0.0.0', port=5000)

This is a minimal Flask application that implements basic CRUD (Create, Read, Update, Delete) operations for an in-memory inventory database. It uses Flask to create a web server that listens for HTTP requests on port 5000. When a request is received:

For a POST request to /inventory, it adds a new item to the inventory.
For a GET request to /inventory/<item_id>, it retrieves the item with the specified ID from the inventory.
For a PUT request to /inventory/<item_id>, it updates the item with the specified ID in the inventory.
For a DELETE request to /inventory/<item_id>, it deletes the item with the specified ID from the inventory.

Now create the requirements.txt file:

Flask==3.0.3

Next create the Dockerfile:

# Use the official Python 3.11 image  FROM python:3.11    # Set the working directory  WORKDIR /app    # Install dependencies  COPY requirements.txt requirements.txt  RUN pip install --no-cache-dir -r requirements.txt    # Copy the current directory contents into the container at /app  COPY . .    # Expose the port the app runs on  EXPOSE 5000    # Run the application  CMD ["python3", "app.py"]

Finally build the image (we use the tag full to identify that this uses the default Python image):

$ docker build -t inventory-app:full .

Once the build is complete you can run the docker images command:

$ docker images  REPOSITORY      TAG                 IMAGE ID       CREATED             SIZE  inventory-app   full                4e623743f556   2 hours ago         1.02GB

You’ll see that this super simple app is about 1.02 GB in size. Well, this is because the base image we used the default Python 3.11 image has a large number of Debian packages and is about 1.01 GB in size. So we need to find a smaller base image.

Well, here are the options:

python:version-alpine images are based on Alpine Linux and will give you the smallest final image. But you need to be able to install packages as well, yes? But that’s a challenge with alpine images.
python:version-slim comes with the minimal number of Debian packages needed to run Python. And you’ll (almost always) be able to install most required Python packages with pip.

So your base image should be small. But not too small that you face compatibility issues and wrap your head around installing dependencies (quite common for Python applications). That’s why we’ll use the python:3.11-slim base image in the next step and build our image.

Choosing the Optimal Base Image | Image by Author

Use the Slim Python Base Image

Now rewrite the Dockerfile to use the python:3.11-slim base image like so:

# Use the official lightweight Python 3.11-slim image  FROM python:3.11-slim    # Set the working directory  WORKDIR /app    # Install dependencies  COPY requirements.txt requirements.txt  RUN pip install --no-cache-dir -r requirements.txt    # Copy the current directory contents into the container at /app  COPY . .    # Expose the port the app runs on  EXPOSE 5000    # Run the application  CMD ["python3", "app.py"]

Let’s build the image (tagged slim):

$ docker build -t inventory-app:slim .

The python:3.11-slim base image is of size 131 MB. And the inventory-app:slim image is around 146 MB which is much smaller than the 1.02GB image we had earlier:

$ docker images  REPOSITORY      TAG                 IMAGE ID       CREATED             SIZE  inventory-app   slim                32784c60a992   About an hour ago   146MB  inventory-app   full                4e623743f556   2 hours ago         1.02GB

You can also use multi-stage builds to make the final image smaller. But that's for another tutorial!

Additional Resources

Here are a few useful resources:

Containerize Python Apps with Docker in 5 Easy Steps
python — Official Image | Docker Hub
Differences Between Standard Docker Images and Alpine Slim Versions

Bala Priya C is a developer and technical writer from India. She likes working at the intersection of math, programming, data science, and content creation. Her areas of interest and expertise include DevOps, data science, and natural language processing. She enjoys reading, writing, coding, and coffee! Currently, she's working on learning and sharing her knowledge with the developer community by authoring tutorials, how-to guides, opinion pieces, and more. Bala also creates engaging resource overviews and coding tutorials.

NVIDIA Blackwell Solidify Leadership, AMD & Intel to Gain Ground With MI300X & Gaudi3

The global data center semiconductor and component market skyrocketed an unprecedented 152 percent in the first quarter of 2024, marking a new milestone, according to a report from Dell’Oro Group.

This explosive growth was fueled by insatiable demand for GPUs and custom accelerators, particularly in the hyperscale cloud sector.

The report revealed that in Q1 2024, NVIDIA led all vendors in component revenues, accounting for nearly half of the reported figures, as supplies of its H100 GPUs improved for both cloud and enterprise markets. Samsung and Intel followed NVIDIA in the rankings.

Looking ahead, strong growth for accelerators is expected to continue into 2024, with GPUs remaining the primary choice for AI training and inference workloads. NVIDIA’s upcoming Blackwell platform is poised to strengthen the firm’s leadership position.

However, the report anticipates that custom accelerators and offerings from other vendors, such as the AMD MI300X/MI325X Instinct and Intel Gaudi3, will gain some market share.

The report also noted that revenues for Smart NICs and DPUs surged more than 50 percent in Q1 2024, driven by strong hyperscale adoption for both AI and non-AI use cases. Storage drives and memory saw significant price increases as vendors aimed to align supply with demand. The three major memory suppliers shifted production capacity from DRAM to AI-focused High Bandwidth Memory (HBM) products.

Baron Fung, Senior Research Director at Dell’Oro Group, highlighted, “Accelerators such as GPUs continue to drive substantial growth, with shipments hitting record highs each quarter. Meanwhile, traditional server and storage component markets returned to positive year-over-year growth as vendors and cloud service providers ramped up purchases in anticipation of robust system demand later this year.”

General-purpose computing components also rebounded strongly following an inventory correction cycle in 2023, experiencing double-digit revenue growth. “Average selling price (ASP) of components has increased significantly from a year ago adding to topline growth,” Fung explained.

“For CPUs, an increasing mix toward fourth- and fifth-generation CPUs, which have more cores and feature sets compared to their predecessors, have commanded higher ASPs.”

As data centers continue to expand and evolve to support the explosive growth of AI and cloud computing, the demand for high-performance semiconductors and components shows no signs of slowing down.

The post NVIDIA Blackwell Solidify Leadership, AMD & Intel to Gain Ground With MI300X & Gaudi3 appeared first on Analytics India Magazine.

EvolutionaryScale, backed by Amazon and Nvidia, raises $142M for protein-generating AI

Southeast Asia from space at night with city lights showing Southeast Asian cities in Thailand, Vietnam, Malaysia, Singapore and Indonesia, 3d rendering of planet Earth, elements from NASA

A relatively new startup called EvolutionaryScale has secured a massive tranche of cash to build AI models to generate novel proteins for scientific research.

EvolutionaryScale today announced that it raised $142 million in a seed round led by ex-GitHub CEO Nat Friedman, Daniel Gross and Lux Capital with participation from Amazon and NVentures, Nvidia’s corporate venture arm. The company also released ESM3, an AI model it describes as a “frontier model” for biology — one that can create proteins for use cases like drug discovery and materials science.

“ESM3 takes a step toward a future of biology where AI is a tool to engineer from first principles, the way we engineer structures, machines, and microchips and write computer programs,” EvolutionaryScale co-founder and chief scientist Alexander Rives said in a statement.

Rives, along with Tom Secru and Sal Candido, began developing generative AI models to explore proteins while at Meta’s AI research lab, FAIR, in 2019. After their team was disbanded, Rives, Secru and Candido left Meta to continue building on the work they’d started.

Characterizing proteins can reveal the mechanisms of a disease, including ways to slow it or reverse it, while creating proteins can lead to entirely new classes of drugs, tools and therapeutics. But the current process for designing proteins in the lab is costly, both from a computational and human resource standpoint.

Designing a protein entails coming up with a structure that could plausibly perform a task inside the body or a product, then finding a protein sequence — the sequence of amino acids that make up a protein — likely to “fold” into the structure. Proteins must correctly fold into three-dimensional shapes in order to carry out their intended function.

Trained on data set of 2.78 billion proteins, ESM3 can “reason over” the sequence, structure and function of proteins, Rives says — enabling the model to generate new proteins a la Google DeepMind’s AlphaFold. EvolutionaryScale is making the full 98-billion-parameter model available for non-commercial use through its cloud Forge developer platform and releasing a smaller version of the model for offline use.

EvolutionaryScale claims that it used ESM3 to generate a new variant of green fluorescent protein (GFP), the protein responsible for the glowing of jellyfish and luminescent colors in coral. A preprint paper on the company’s website details its work.

The fluorescent protein ‘esmGFP,’ created with EvolutionaryScale’s ESM3.

“We’ve been working on this for a long time, and we’re excited to share it with the scientific community and see what they do with it,” Rives continued.

EvolutionaryScale isn’t a charity, of course — the roughly-20-employee company tells TechCrunch that it plans to make money through a combination of partnerships, usage fees and revenue sharing. EvolutionaryScale might work with pharmaceutical companies to integrate ESM3 into their workflows, for example, or revenue-share with researchers for breakthrough discoveries commercialized using ESM3.

To this end, EvolutionaryScale says that it’ll soon bring ESM3 and its derivatives to select AWS customers via AWS’ SageMaker AI dev platform, Bedrock AI platform and HealthOmics service. ESM3 will also be available to select customers using NVIDIA’s NIM microservices, supported with an Nvidia enterprise software license.

EvolutionaryScale says that both AWS and Nvidia customers will be able to fine-tune ESM3 using their own data if they wish.

It could be a while before EvolutionaryScale turns a profit. In the company’s pitch deck, a copy of which Forbes managed to obtain last August, EvolutionaryScale repeatedly emphasized that it could take a decade for generative AI models to help design therapies. The firm will also have to fend off competition like DeepMind’s spinoff Isomorphic Labs, which already has contracts with big pharma companies, as well as Insitro, publicly-traded Recursion and Inceptive.

EvolutionaryScale’s big bet is scaling up its model training to incorporate data beyond proteins and create a general-purpose AI model for biotech applications.

“The incredible pace of new AI advances is being driven by increasingly large models, increasingly large data sets and increasing computational power,” an EvolutionaryScale spokesperson said. “The same holds true in biology. In research over the last five years, the ESM team has explored scaling in biology. We find that as language models scale, they develop an understanding of the underlying principles of biology, and discover biological structure and function.”

Sounds wildly ambitious to this reporter — but having deep-pocketed investors surely helps.

Zoho’s ManageEngine to Invest Another $10 Mn in GPU

At ManageEngine’s recently concluded CIO Meet in Chennai, Shailesh Kumar Davey, cofounder and vice president of engineering, at ManageEngine, told AIM that the company is planning to pour another 10 million dollars in investment in GPU and infrastructure in the next year.

Previously, the IT sister company of Zoho recently invested nearly $10 million in procuring GPUs from the influential GPU trio of NVIDIA, AMD and Intel.

“This investment supports running various medium and large language models for both Zoho and ManageEngine, with a focus on Indic languages,” Davey told AIM.

The focus on Indic language models along with the heavy investments in data centres across the country aligns with Zoho and ManageEngine’s overarching philosophy of making India one of its most popular markets.

“We are planning to open two more data centres in Delhi and Mumbai to cater to the needs of government and large enterprises,” Rajesh Ganesan, president of ManageEngine told AIM. Besides the Chennai data centre, this would be the second one in Mumbai but the first one in Delhi to focus on the demands of companies with complex regulation needs.

It is also going to set up new data centres in Dubai, Singapore, LATAM, and South Africa to comply with data privacy regulations. Currently, It has around 18 data centres and nearly 100 global POPs.

Growth Story

“The cloud portion of managing the business is growing at a clip of 65% compared to 2022. This growth is further complemented by a 30% (year-over-year) increase in the customer base from 2022 to 2023” said Ganesan. This is primarily driven by two key industries: IT and IT-enabled services, and the BFSI.

The traction in India has been notably significant, attributed to regulations enforced by the RBI. “We hope to continue this momentum, and soon, India will become the second largest revenue contributor for ManageEngine.”

In a previous interaction with AIM, Davey said that the company is experiencing dramatic growth in India, making it their second go-to market for growth after the US.

The company’s growth strategy aligns with Zoho’s principles, as Sridhar Vembu, co-founder and CEO told AIM, “Our goal is to create a world-class, top-five tech company, leveraging the talent pool we have. The common thread in all our investments is creating capabilities and R&D powering our products.”

ManageEngine focuses on serving the technology divisions within companies, particularly the IT departments, while Zoho focuses on other business lines like HR and finance. “The lines are blurring as every function inside an enterprise consumes its technology,” he added.

The post Zoho’s ManageEngine to Invest Another $10 Mn in GPU appeared first on Analytics India Magazine.

5 Tips to Step Up Your Data Science Game Right Away

Image by Author | Midjourney & Canva

Introduction

Data scientists are constantly navigating a changing field, along with its evolving technologies and techniques. The rapid growth and dynamic nature of this industry conspire to demand continuous learning and adaptation of participating professionals. Due to this constant growth, to be active and viable practitioners requires continued personal development. There are always more concepts, tools, and technologies to take up and master for both the novice and established data scientist.

And this is why we are here today. This article intends to provide practical advice for becoming a better data scientist by focusing on five different areas of proficiency. Whether you are starting out, or looking to get grounded after years as a practitioner, jump in and elevate your game.

1. Master the Mathematical Fundamentals

Understanding the fundamentals of the required mathematics is an elemental part of being able to work with data. The primary subjects of linear algebra, calculus, and probability are the grounding of so much of the modeling and algorithm work that data scientists do. The book Mathematics for Machine Learning is an excellent reference to start with, as are the courses in Coursera's Mathematics for Data Science specialization. 3Brown1Blue's YouTube videos are another fantastic resource for these topics. Putting these mathematical fundamentals into practice in real projects and exercises will ensure your knowledge stays solid.

2. Stay Updated with Industry Trends

Supposing one wishes to keep in-the-know and remain employable for the long-term in this field of both enormous breadth and depth, staying up-to-date on the latest tools, technologies, and methodologies can't be overlooked. From technological innovations such as automated machine learning and interpretability processes, to large scale data technologies and state of the art machine learning algorithms, the landscape from "good to know" to "need to know" is in constant flux. This isn't a frivolous concern: people and organizations want to be able to incorporate the latest where appropriate. What better place to keep on such topics as KDnuggets (you're already here), along with our sister sites Machine Learning Mastery and Statology.

But there are other great resources as well: popular sites like Towards Data Science, DataCamp, MarkTechPost, and a whole host of others are worthy of your time as well. The myriad podcasts, webinars, and YouTube channels all provide alternative avenues, with something that fits everyone's preferences. Communities and conferences, both online and in-person, can be great ways to both network and stay up in the latest trends.

3. Develop Strong Programming Skills

This can't be overstated: proficiency in one or more of Python, R, and SQL — key programming languages in the field — is an absolute must for anyone wanting to be a useful data scientist. Libraries such as Pandas and Matplotlib (Python) and packages such as dplyr and ggplot2 (R) for data work are important skills to acquire. Learning the most efficient ways to approach writing SQL queries is equally important, as SQL remains one of the most used language worldwide, especially when it comes to data science. There are, of course, many other languages that could come in handy for data work — Java, Rust, C++, Go, Javascript, Ruby… the list goes on and on. You can pick and choose from these what makes sense for you, but don't learn them to the neglect of The Big Three mentioned above; it just isn't worth the risk.

Through online platforms like HackerRank or LeetCode, or through GitHub contributions, one can improve their coding skills. Working on group projects necessitates an understanding of Git, which a person can use for version control. In short, don't buy into the hype that you don't need to code. If you can't, someone else will be needed to do so, and since there are so many data scientists that code, how do you positively differentiate yourself from them? Be a strong coder as a baseline, and then add on additional skills to set you apart.

4. Work with Real Datasets

Working with fresh facts and figures is a must for anyone wanting to be more than an academic in this field. There is nothing better than solving data issues on your own initiative and doing. Methods to do so include competing on Kaggle, taking on independent challenge projects, or even seeking out internships or volunteer work. By accurately solving a concern, including applying algorithms fittingly, understanding the various datasets, and recording all this work, people build up a robust portfolio.

The difference between sharing your portfolio project based on a reworking of the the Iris dataset and performing some in-depth analysis on robust and contemporary real-world data is night and day. Use real and valuable data.

5. Cultivate Communication and Collaboration Skills

In order to put complex analysis results in the hands of a non-academic audience, strong communication is key to success. Telling a complelling story with one's data along with eye-catching visualizations, a captivating and well-crafted accompanying speech, and supporting artifacts intended to preemptively answer questions and fill in the blanks for listeners is what it takes to convey a message well. Several tools are available to assist in your data science story time, including Tableau, Power BI, and even PowerPoint or Google Slides.

Alongside this persuasive projection, an effective data scientist will also employ active listening and preemptive question-answering, essential in conveying your sense of domain authority. These same skills can also help improve team effectiveness and project output. Expressing your ideas and findings, and working well with both the analytical team and your eventual audience, is another critical component of an effective data scientist, and re-doubling your efforts on mastering this aspect can help you step up your game.

Final Thoughts

This article aimed to express how to improve various aspects of your data science role. In these five areas — comprehensive informational backing, staying enlightened about evolutions in the industry, coding fluently and capably, working hands-on with real data, and having a knack for working with others — we have looked for ways to help the average data professional improve their game. Learning and growth in data science is continuous and constantly changing, so make sure you are all aboard when it comes to this journey.

Matthew Mayo (@mattmayo13) holds a master's degree in computer science and a graduate diploma in data mining. As managing editor of KDnuggets & Statology, and contributing editor at Machine Learning Mastery, Matthew aims to make complex data science concepts accessible. His professional interests include natural language processing, language models, machine learning algorithms, and exploring emerging AI. He is driven by a mission to democratize knowledge in the data science community. Matthew has been coding since he was 6 years old.

Do You Still ‘Google It’?

In Tom Shone’s book, The Nolan Variations, filmmaker Christopher Nolan sheds light on a thought-provoking insight: Google has far less information than one might assume.

Nolan cautions against overestimating Google’s prowess in information retrieval. “Google is not as powerful as people think in terms of information collation. They’re more powerful than people realise in all kinds of areas, such as collecting data on your movements. They’re very good at that. However, in a data search, the outcome is always limited,” he said.

“Try this experiment: visit a library, pick a random book, and jot down facts from ten random pages. Then, search for those facts online. While many think 90 percent of information is online, I suspect the real figure is closer to 0.9 percent,” Nolan suggests.

ChatGPT vs Google Search

With the introduction of OpenAI’s ChatGPT and Perplexity.AI, Google, which has long dominated as the primary source of online information, seems outdated. At first glance, comparing these to Google may seem like comparing apples to oranges, yet considering information retrieval and accuracy, the distinction blurs.

During an exclusive conversion with AIM, Zerodha CTO Kailash Nadh expressed optimism about moving away from Google Search in the future. He highlighted the challenges of finding technical solutions through traditional search methods, where users often scroll through multiple pages and countless comments to get useful information.

Nadh pointed out the efficiency of AI tools like ChatGPT and other chatbots, noting, “They’re so powerful that when you throw them a problem, they immediately suggest solutions. This saves me 45 minutes per issue—I hardly use Google for technical queries anymore.”

He believes these AI tools, by swiftly providing relevant insights, significantly streamline the search process, potentially diminishing the relevance of traditional Google searches over time.

SEO Optimised Doesn’t Mean Good Content

Not too long ago, consistently producing quality content offered a solid shot at outperforming competitors on search engine result pages (SERPs), securing rankings and attracting more organic traffic.

Today, while good content remains crucial for the website, it alone isn’t sufficient to compete.

Recently, Google introduced AI Overview in search results in the US, marking one of the most significant updates to its search engine in 25 years. These results, displayed at the top of the search page, provide users with a condensed overview before delving into the typical list of blue links.

Many experts believe generative AI in search has reduced organic search traffic by over 3%, and many businesses are facing the blunt reality of their website traffic dropping significantly since its recent algorithmic update.

A study by Search Engine Land has predicted an 18% to 64% decrease in organic clicks due to generative search.

This search is poised to influence various query types, from its influence on featured snippets and knowledge panels to its effects on search ads, navigation, transactional queries, and even long-tail queries.

It’s important to note that mobile disparities, website quality issues, accidental content alterations, and technical SEO challenges can all negatively impact traffic.

What’s Next?

Google Search dominates with 99,000 queries per second, 8.5 billion daily, and 2 trillion annually. According to Statista, as of April 2023, it attracted 83.9 billion global visits monthly.

However, its competitors are growing with Baidu at the top of the search market in China, and Yandex as the main search engine in Russia.

In the age of Gemini, Google Search is in the midst of an evolution. It now empowers users to ask multiple questions at once, leveraging the formidable capabilities of its AI model for a seamless search experience.

The era of search engines is far from over, instead, we’re witnessing a shift towards AI-driven exploration.

The post Do You Still ‘Google It’? appeared first on AIM.

Рубрика: AI