Google, Microsoft, Meta and More to Develop Open Standard for AI Chip Components in UALink Promoter Group

AMD, Broadcom, Cisco, Google, Hewlett Packard Enterprise (HPE), Intel, Meta and Microsoft are combining their expertise to create an open industry standard for an AI chip technology called Ultra Accelerator Link. The setup will improve high-speed and low latency communications between AI accelerator chips in data centres.

An open standard will advance artificial intelligence/machine learning cluster performance across the industry, meaning that no singular firm will disproportionately capitalise on the demand for the latest and greatest AI/ML, high-performance computing and cloud applications.

Notably absent from the so-called UALink Promoter Group are NVIDIA and Amazon Web Services. Indeed, the Promoter Group likely intends for its new interconnect standard to topple the two companies’ dominance in AI hardware and the cloud market, respectively.

The UALink Promoter Group expects to establish a consortium of companies that will manage the ongoing development of the UALink standard in Q3 of 2024, and they will be given access to UALink 1.0 at around the same time. A higher-bandwidth version is slated for release in Q4 2024.

SEE: Gartner Predicts Worldwide Chip Revenue Will Gain 33% in 2024

What is the UALink and who will it benefit?

The Ultra Accelerator Link, or UALink, is a defined way of connecting AI accelerator chips in servers to enable faster and more efficient communication between them.

AI accelerator chips, like GPUs, TPUs and other specialised AI processors, are the core of all AI technologies. Each one can perform huge numbers of complex operations simultaneously; however, to achieve high workloads necessary for training, running and optimising AI models, they need to be connected. The faster the data transfer between accelerator chips, the faster they can access and process the necessary data and the more efficiently they can share workloads.

The first standard due to be released by the UALink Promoter Group, UALink 1.0, will see up to 1,024 GPU AI accelerators, distributed over one or multiple racks in a server, connected to a single Ultra Accelerator Switch. According to the UALink Promoter Group, this will “allow for direct loads and stores between the memory attached to AI accelerators, and generally boost speed while lowering data transfer latency compared to existing interconnect specs.” It will also make it simpler to scale up workloads as demands increase.

While specifics about the UALink have yet to be released, group members said in a briefing on Wednesday that UALink 1.0 would involve AMD’s Infinity Fabric architecture while the Ultra Ethernet Consortium will cover connecting multiple “pods,” or switches. Its publication will benefit system OEMs, IT professionals and system integrators looking to set up their data centres in a way that will support high speeds, low latency and scalability.

Which companies joined the UALink Promoter Group?

  • AMD.
  • Broadcom.
  • Cisco.
  • Google.
  • HPE.
  • Intel.
  • Meta.
  • Microsoft.

Microsoft, Meta and Google have all spent billions of dollars on NVIDIA GPUs for their respective AI and cloud technologies, including Meta’s Llama models, Google Cloud and Microsoft Azure. However, supporting NVIDIA’s continued hardware dominance does not bode well for their respective futures in the space, so it is wise to eye up an exit strategy.

A standardised UALink switch will allow providers other than NVIDIA to offer compatible accelerators, giving AI companies a range of alternative hardware options upon which to build their system and not suffer vendor lock-in.

This benefits many of the companies in the group that have developed or are developing their own accelerators. Google has a custom TPU and the Axion processor; Intel has Gaudi; Microsoft has the Maia and Cobalt GPUs; and Meta has MTIA. These could all be connected using the UALink, which is likely to be provided by Broadcom.

SEE: Intel Vision 2024 Offers New Look at Gaudi 3 AI Chip

Which companies notably have not joined the UALink Promoter Group?

NVIDIA

NVIDIA likely hasn’t joined the group for two main reasons: its market dominance in AI-related hardware and its exorbitant amount of power stemming from its high value.

The firm currently holds an estimated 80% of the GPU market share, but it is also a large player in interconnect technology with NVLink, Infiniband and Ethernet. NVLink specifically is a GPU-to-GPU interconnect technology, which can connect accelerators within one or multiple servers, just like UALink. It is, therefore, not surprising that NVIDIA does not wish to share that innovation with its closest rivals.

Furthermore, according to its latest financial results, NVIDIA is close to overtaking Apple and becoming the world’s second most valuable company, with its value doubling to more than $2 trillion in just nine months.

The company does not look to gain much from the standardisation of AI technology, and its current position is also favourable. Time will tell if NVIDIA’s offering will become so integral to data centre operations that the first UALink products don’t topple its crown.

SEE: Supercomputing ‘23: NVIDIA High-Performance Chips Power AI Workloads

Amazon Web Services

AWS is the only of the major public cloud providers to not join the UALink Promoter Group. Like NVIDIA, this also could be related to its influence as the current cloud market leader and the fact that it is working on its own accelerator chip families, like Trainium and Inferentia. Plus, with a strong partnership of more than 12 years, AWS might also lend itself to hiding behind NVIDIA in this arena.

Why are open standards necessary in AI?

Open standards help to prevent disproportionate industry dominance by one firm that happened to be in the right place at the right time. The UALink Promoter Group will allow multiple companies to collaborate on the hardware essential to AI data centres so that no single organisation can take over it all.

This is not the first instance of this kind of revolt in AI; in December, more than 50 other organisations partnered to form the global AI Alliance to promote responsible, open-source AI and help prevent closed model developers from gaining too much power.

The sharing of knowledge also works to accelerate advancements in AI performance at an industry-wide scale. The demand for AI compute is continuously growing, and for tech firms to keep up, they require the very best in scale-up capabilities. The UALink standard will provide a “robust, low-latency and efficient scale-up network that can easily add computing resources to a single instance,” according to the group.

Forrest Norrod, executive vice president and general manager of the Data Center Solutions Group at AMD, said in a press release: “The work being done by the companies in UALink to create an open, high performance and scalable accelerator fabric is critical for the future of AI.

“Together, we bring extensive experience in creating large scale AI and high-performance computing solutions that are based on open-standards, efficiency and robust ecosystem support. AMD is committed to contributing our expertise, technologies and capabilities to the group as well as other open industry efforts to advance all aspects of AI technology and solidify an open AI ecosystem.”

Glue pizza? Gasoline spaghetti? Google explains what happened with its wonky AI search results

google-ai-overview

If you were on social media over the past week, you probably saw them. Screenshots of Google's new AI-powered search summaries went viral, mainly because Google was allegedly making wild recommendations like adding glue to your pizza, cooking spaghetti with gasoline, or suggesting that you should eat rocks for optimal health.

That was just the beginning.

Also: How to avoid AI Overviews in Google Search: Three easy ways

Other particularly egregious examples also went viral, seemingly of the rogue AI feature suggesting mixing bleach and vinegar to clean a washing machine, which would produce potentially deadly chlorine gas, or jumping off the Golden Gate Bridge in response to a query of "I'm feeling depressed."

So what happened, and why did Google's AI Overview recommend those things?

First, Google says, the majority of what went viral wasn't real.

Many screenshots were simply fake: "Some of these faked results have been obvious and silly. Others have implied that we returned dangerous results for topics like leaving dogs in cars, smoking while pregnant, and depression." Those AI Overviews never appeared, Google says.

Second, numerous screenshots were from people intending to get silly search results — like ones about eating rocks. "Prior to these screenshots going viral," Google said, "practically no one asked Google that question." If nobody is googling a given topic, it probably means there's not a lot of information available about it, or a data void. In such cases, there was only satirical content the AI interpreted as accurate.

Also: 7 ways to supercharge your Google searches with AI

Google admits that a few odd or inaccurate results did appear. Even those were for unusual queries, but they did expose some areas that need improvement. The company was able to determine a pattern of things that didn't go right and made more than a dozen technical improvements, including:

  • Better detection for nonsensical queries that shouldn't show an AI Overview and limited inclusion of satire and humor content

  • Limited use of user-generated content in responses that could offer misleading advice

  • Triggering restrictions for queries where AI Overviews were not proving to be helpful

  • Not showing AI Overviews for hard news topics where freshness and factuality are important and for most health topics

With billions of queries coming in every day, Google says, things will get weird sometimes. The company says it's learning from the errors, and promises to keep working to strengthen AI Overviews.

Featured

OpenAI’s new ChatGPT Edu is for universities. Here’s how teachers and students can benefit

AI creative illustration

Broadly speaking, higher education institutions are no longer rejecting Generative AI tools such as ChatGPT. They are instead increasingly implementing AI chatbots and other tools to enhance the teaching and learning experience.

To take advantage, OpenAI on Thursday unveiled ChatGPT Edu, which, as the name implies, is a version of ChatGPT built specifically for universities to deploy within their institutions so that students, educators, and researchers can enjoy the technology with enterprise-level security, controls, and other perks.

Also: Microsoft and Khan Academy offer a free AI assistant to all US teachers

ChatGPT Edu offers access to GPT-4o, OpenAI's most advanced flagship model unveiled earlier this month. GPT-4o excels in text interpretation, coding, and mathematics, plus offers users other advanced capabilities such as data analytics, web browsing, document summarization, vision, and the ability to build and share GPTs within organizations.

The educational version of the chatbot also includes perks not available in the free version of ChatGPT, such as a significantly higher message limits and robust security, data privacy, and administrative controls, which would be especially needed in a university for settings like group permissions. The biggest selling point, however, is that it is more affordable than ChatGPT Enterprise.

OpenAI shares that the impetus for developing this feature was seeing the success universities worldwide, such as the University of Oxford, Wharton School of the University of Pennsylvania, and Arizona State University, had when using ChatGPT Enterprise.

Also: OpenAI just gave free ChatGPT users browsing, data analysis, and more

Universities implemented ChatGPT to optimize their workflows in various ways, including using the chatbot to assist with tutoring students, writing grant applications, grading assignments, analyzing large datasets, and more, according to the blog post.

Like with ChatGPT Enterprise, OpenAI does not publicly share the price of ChatGPT Edu, likely because it isn't a one-size-fits-all offering. The company is inviting universities to contact its sales team to learn more and get started.

Artificial Intelligence

What is the EU’s AI Office? New Body Formed to Oversee the Rollout of General Purpose Models and AI Act

The European Commission has unveiled details about its new AI Office, which is being formed to govern the deployment of the general purpose models and the AI Act in the E.U. The office will be composed of five units covering different areas, including regulation, innovation and AI for Societal Good.

General purpose models refer to foundational AI models that can be used for a wide range of purposes, some of which may be unknown to the developer, like OpenAI’s GPT-4.

Coming into effect on June 16, the office will take charge of tasks such as drawing up codes of practice and advising on AI models developed before the AI Act comes into force in its entirety. It will also provide access to AI testing resources and ensure that state-of-the-art models are integrated into real-life applications.

The European Commission decided to establish the AI Office in January 2024 to support European startups and SMEs in their development of trustworthy AI. It sits within Directorate-General Connect, the department in charge of digital technologies.

The office will employ more than 140 staff, including technology specialists, administrative assistants, lawyers, policy specialists and economists. It will be led by the Head of AI Office who will act upon guidance from a Lead Scientific Adviser and an Adviser for International Affairs.

Margrethe Vestager, Executive Vice-President for a Europe Fit for the Digital Age, said in a press release: “The AI-office unveiled today, will help us ensure a coherent implementation of the AI Act. Together with developers and a scientific community, the office will evaluate and test general purpose AI to ensure that AI serves us as humans and uphold our European values.”

Tasks the AI Office will be responsible for

  • Ensuring the coherent implementation of the AI Act across Member States.
  • Enforcing rules of the AI Act and applying sanctions.
  • Developing codes of practise and conducting testing and evaluation of AI models.
  • Utilising the expertise of the European Artificial Intelligence Board, an independent scientific panel, big tech, SMEs and startups, academia, think tanks and civil society in decision-making.
  • Providing advice on AI best practices and access to testing resources like AI Factories and European Digital Innovation Hubs.
  • Funding and supporting innovative research into AI and robotics.
  • Supporting initiatives that ensure AI models made and trained in Europe are integrated into novel applications that boost the economy.
  • Building a strategic, coherent and effective European approach towards AI that acts as a reference point for other nations.

The five units of the AI Office

1. Regulation and Compliance Unit

The Regulatory and Compliance Unit will be responsible for ensuring the uniform application and enforcement of the AI Act across Union Member States. Personnel will perform investigations and administer sanctions in the case of infringements.

2. Unit on AI Safety

The Unit on AI Safety will develop testing frameworks that identify systemic risks present in general-purpose AI models and corresponding mitigations. A model presents systemic risk when the cumulative amount of compute used for its training is greater than a certain threshold, according to the EU AI Act.

This unit could be in response to the formation of AI Safety Institutes by the U.K., U.S. and other global nations. At May’s AI Seoul Summit, the E.U. agreed with 10 nations to form a collaborative network of AI Safety Institutes.

SEE: U.K. and U.S. Agree to Collaborate on the Development of Safety Tests for AI Models

3. Excellence in AI and Robotics Unit

The Excellence in AI and Robotics team will support and fund the development of models and their integration into useful applications. It also coordinates the GenAI4EU initiative, which aims to support the integration of generative AI into 14 industries, including health, climate and manufacturing, and the public sector.

4. AI for Societal Good Unit

The AI for Societal Good Unit will collaborate with international bodies to work on AI applications that benefit society as a whole, such as weather modelling, cancer diagnoses and digital twins for artistic reconstructions. The unit follows on from the decision in April for the E.U. to collaborate with the U.S. on research that addresses “global challenges for the public good.”

SEE: UK, G7 Countries to Use AI to Boost Public Services

5. AI Innovation and Policy Coordination Unit

The AI Innovation and Policy Coordination Unit will be responsible for the overall execution of the E.U.’s AI strategy. It will monitor trends and investment, support real-world AI testing, establish AI Factories that provide AI supercomputing service infrastructure and collaborate with European Digital Innovation Hubs.

The E.U. AI Act in brief

One of the main responsibilities of the AI Office is enforcing the AI Act, the world’s first comprehensive law on AI, throughout Member States. The Act is a set of E.U.-wide legislation that seeks to place safeguards on the use of AI in Europe, while simultaneously ensuring that European businesses can benefit from the rapidly evolving technology.

SEE: How to Prepare Your Business for the E.U. AI Act With KPMG’s E.U. AI Hub

While the AI Act was approved in March, there are still a few steps to be taken before businesses must abide by its regulations. The E.U. AI Act must first be published in the E.U. Official Journal, which is expected to happen by July this year. It will enter into force 20 days after publication, but the requirements will apply in stages through the following 24 months.

The AI Office is due to publish guidelines on the definition of AI systems and the prohibitions within six months of the AI act entering into force, and codes of practice within nine months.

Companies that fail to comply with the E.U. AI Act face fines ranging from €35 million ($38 million USD) or 7% of global turnover, to €7.5 million ($8.1 million USD) or 1.5% of turnover, depending on the infringement and size of the company.

The E.U.’s reputation for AI regulation

The fact that three of the office’s units — Excellence in AI and Robotics, AI for Societal Good and AI Innovation and Policy Coordination — focus on nurturing AI innovation and increasing use cases suggests the E.U. is not hellbent on stifling progress with its restrictions, as critics of the AI Act have suggested. Last year, OpenAI’s Sam Altman said he was specifically wary of over-regulation in the E.U.

On top of the AI Act, the E.U. is taking a number of steps to ensure AI models comply with the GDPR. On May 24, European Data Protection Board’s ChatGPT Taskforce ruled that OpenAI has not done enough to ensure its chatbot provides accurate responses. Data accuracy and privacy are two significant pillars of the GDPR and, in March 2023, Italy temporarily blocked ChatGPT for unlawfully collecting personal data.

In a report summarising the taskforce’s findings, researchers wrote: “Although the measures taken in order to comply with the transparency principle are beneficial to avoid misinterpretation of the output of ChatGPT, they are not sufficient to comply with the data accuracy principle.”

ElevenLab’s AI sound effect generator has finally launched. Listen for yourself

ElevenLabs AI sound effects

ElevenLabs is a leader in AI audio. Its tools, such as AI voice cloning, have achieved worldwide recognition. Today, the startup launched its AI Sound Effects tool to help creatives find the perfect sound effects for their projects.

Initially announced in February, the tool lets you generate sound effects, unique character voices, and music snippets from text prompts, according to ElevenLabs. You can hear sound effects created by the tool for OpenAI's Sora demo video below:

ElevenLabs says the tools are meant to help people, including content creators, film and television studio staff, and video game developers, generate the sounds they need to bring their projects to life "affordably and at scale."

"Over the last year, we've revolutionized AI Voices by producing the first truly emotive, human-like text-to-speech platform," ElevenLabs co-founder and CEO Mati Staniszewski said in a statement. "With the launch of text-to-sound effects, we're marking another major step forward, one that will equip creators with more audio tools to help them produce high-quality content."

To make AI effects possible, ElevenLabs partnered with Shutterstock to fine-tune its model using content from the Shutterstock audio library of licensed tracks, addressing ethical concerns about using a generative AI model.

The AI Sound Effects tool is live on the ElevenLabs site, with different tiered plans to accommodate user needs. You can try the tool for free, although it does count towards your monthly 10,000-character limit.

Also: OpenAI's Voice Engine can clone a voice from a 15-second clip

As someone who enjoys editing videos in my spare time and as part of my job, I was excited about the possibility of finding sound effects more easily. I gave the tool a try to see how it worked.

To start, visit the ElevenLabs website, click on sound effects on the right-hand panel, and type in what you want to hear. The first prompt I typed in was "small dog barking." The tool generated five different versions, as seen below:

As a proud Yorkie owner, I can attest that the generated sound effects were close to the real thing. The tool was intuitive, and the process was essentially the same as using most AI image or music generators.

When I used a more complex prompt, "women cheering," the generator took longer to output a result and the quality was not as accurate or useable as the first test. When I returned to simpler prompts, however, such as "kitchen alarm bell ringing," I had great results. The five outputs sounded like the prompt but varied slightly, offering different options.

Also: 5 reasons why I prefer Perplexity over every other AI chatbot

The AI Sound Effects tool can also generate music. When prompted to create a "lo-fi beat with a jazzy groove," the tool produced five high-quality options.

Ultimately, I was impressed with the tool and encourage you to test it. AI Sound Effects is a fun and free experience. That said, I would recommend not asking the tool to make human sounds. Instead, if you want to generate speech, look at ElevenLab's text-to-speech tool.

Artificial Intelligence

Top SQL Queries for Data Scientists

Top SQL Queries for Data Scientists

Image by Author

I know the word ‘Python’ is probably the most overused word in the context of data science. To some degree, there’s a reason for that. But, in this article, I want to focus on SQL, which often gets overlooked when talking about data science. I emphasize talking because, in practice, SQL is not overlooked at all. On the contrary, it’s one of the holy trinity of the programming languages in data science: SQL, Python, and R.

SQL is made for data querying and manipulation but also has respectable data analysis and reporting capabilities. I’ll show some of the main SQL concepts you need as a data scientist and some easy examples from StrataScratch and LeetCode.

Then, I’ll provide two common business scenarios in which all or most of those SQL concepts must be applied.

Main SQL Concepts for Data Scientists

Here’s the overview of the concepts I’ll discuss.

Top SQL Queries for Data Scientists

1. Querying and Filtering Data

This is where your practical work as a data scientist usually starts: querying a database and extracting only the data you need for your task.

This typically involves relatively simple SELECT statements with the FROM and WHERE clauses. To get the unique values, use DISTINCT. If you need to use several tables, you also add JOINs.

You’ll often need to use ORDER BY to make your dataset more organized.

Example of Combining Two Tables: You could be required to list the persons’ names and the city and state they live in by joining two tables and sorting the output by last name.

SELECT FirstName,         LastName,          City,          State  FROM Person p LEFT JOIN Address a  ON p.PersonId = a.PersonId  ORDER BY LastName ASC;  

2. Working with NULLs

NULLs are values that data scientists are often not indifferent to – they either want only NULLs, they want to remove them, or they want to replace them with something else.

You can select data with or without NULLs using IS NULL or IS NOT NULL in WHERE.

Replacing NULLs with some other values is typically done using conditional expressions:

  • NULLIF()
  • COALESCE()
  • CASE statement

Example of IS NULL: With this query, you can find all the customers not referred by the customer with ID = 2.

SELECT name   FROM customer   WHERE referee_id IS NULL OR referee_id <> 2;  

Example of COALESCE(): I can rework this example by saying I want to query all the data but also add a column that will show 0% as a host response rate instead of NULL.

SELECT *,         COALESCE(host_response_rate, '0%') AS edited_host_response_rate  FROM airbnb_search_details;  

3. Data Type Conversion

As a data scientist, you will convert data frequently. Data often doesn’t come in the desired format, so you must adapt it to your needs. This is usually done using CAST(), but there are also some alternatives, depending on your SQL flavor.

Example of Casting Data: This query casts the star data from VARCHAR to INTEGER and removes the values that have non-integer values.

SELECT business_name,         review_id,         user_id,         CAST(stars AS INTEGER) AS cast_stars,         review_date,         review_text,         funny,         useful,         cool  FROM yelp_reviews  WHERE stars  '?';  

4. Data Aggregation

To better understand the data they’re working with (or simply because they need to produce some reports), data scientists very often have to aggregate data.

In most cases, you must use aggregate functions and GROUP BY. Some of the common aggregate functions are:

  • COUNT()
  • SUM()
  • AVG()
  • MIN()
  • MAX()

If you want to filter aggregated data, use HAVING instead of WHERE.

Example of Sum: You can use this query to sum the bank account for each user and show only those with a balance above 1,000.

SELECT u.name,          SUM(t.amount) AS balance  FROM Users u  JOIN Transactions t  ON u.account = t.account  GROUP BY u.name  HAVING SUM(t.amount) > 10000;  

5. Handling Dates

Working with dates is commonplace for data scientists. Again, the dates are only sometimes formatted according to your taste or needs. To maximize the flexibility of dates, you will sometimes need to extract parts of dates or reformat them. To do that in PostgreSQL, you’ll most commonly use these date/time functions:

  • EXTRACT()
  • DATE_PART()
  • DATE_TRUNC()
  • TO_CHAR()

One of the common operations with dates is to find a difference between the dates or to add dates. You do that by simply subtracting or adding the two values or by using the functions dedicated for that, depending on the database you use.

Example of Extracting Year: The following query extracts the year from the DATETIME type column to show the number of violations per year for Roxanne Cafe.

SELECT EXTRACT(YEAR FROM inspection_date) AS year_of_violation,         COUNT(*) AS n_violations  FROM sf_restaurant_health_violations  WHERE business_name = 'Roxanne Cafe' AND violation_id IS NOT NULL  GROUP BY year_of_violation  ORDER BY year_of_violation ASC;  

Example of Date Formatting: With the query below, you format the start date as 'YYYY-MM' using TO_CHAR().

SELECT TO_CHAR(started_at, 'YYYY-MM'),         COUNT(*) AS n_registrations  FROM noom_signups  GROUP BY 1;  

6. Handling Text

Apart from dates and numerical data, very often databases contain text values. Sometimes, these values have to be cleaned, reformatted, unified, split and merged. Due to these needs, every database has many text functions. In PostgreSQL, some of the more popular ones are:

  • CONCAT() or ||
  • SUBSTRING()
  • LENGTH()
  • REPLACE()
  • TRIM()
  • POSITION()
  • UPPER() & LOWER()
  • REGEXP_REPLACE() & REGEXP_MATCHES() & REGEXP_SPLIT_TO_ARRAY()
  • LEFT() & RIGHT()
  • LTRIM() & RTRIM()

There are usually some overlapping string functions in all databases, but each has some distinct functions.

Example of Finding the Length of the Text: This query uses the LENGTH() function to find invalid tweets based on their length.

SELECT tweet_id   FROM Tweets   WHERE LENGTH(content) > 15;  

7. Ranking Data

Ranking data is one of the widespread tasks in data science. For instance, it can be used to find the best or worst-selling products, quarters with the highest revenue, songs ranked by number of streams, and the highest and lowest-paid employees.

The ranking is done using window functions (which we’ll talk a bit more in the next section):

  • ROW_NUMBER()
  • RANK()
  • DENSE_RANK()

Example of Ranking: This query uses DENSE_RANK() to rank hosts based on the number of beds they have listed.

SELECT host_id,          SUM(n_beds) AS number_of_beds,         DENSE_RANK() OVER(ORDER BY SUM(n_beds) DESC) AS rank  FROM airbnb_apartments  GROUP BY host_id  ORDER BY number_of_beds DESC;  

8. Window Functions

Window functions in SQL allow you to calculate the rows related to the current row. This characteristic is not only used to rank data. Depending on the window function category, they can have many different uses. You can read more about them in the window functions article. However, their main characteristic is that they can show analytical and aggregated data at the same time. In other words, they don’t collapse individual rows when performing calculations.

Example of FIRST_VALUE() Window Function: One window function example is to show the latest user login for a particular year. The FIRST_VALUE() window function makes this easier.

SELECT DISTINCT user_id,         FIRST_VALUE(time_stamp) OVER (PARTITION BY user_id ORDER BY time_stamp DESC) AS last_stamp  FROM Logins  WHERE EXTRACT(YEAR FROM time_stamp) = 2020;  

9. Subqueries & CTEs

Subqueries and CTEs (known as tidier subqueries) allow you to reach a more advanced level of calculations. By knowing subqueries and CTEs, you can write complex SQL queries, with subqueries or CTEs used for sub-calculations referenced in the main query.

Example of Subqueries and CTEs: The query below uses the subquery to find the first year of the product sale. This data is then used in WHERE for the main query to filter data.

SELECT product_id,          year AS first_year,          quantity,          price   FROM Sales   WHERE (product_id, year) IN (      SELECT product_id,              MIN(year) AS year       FROM Sales       GROUP BY product_id  );  

The code can be written using CTE instead of a subquery.

WITH first_year_sales AS (      SELECT product_id,              MIN(year) AS first_year       FROM Sales       GROUP BY product_id  )    SELECT s.product_id,          s.year AS first_year,          s.quantity,          s.price   FROM Sales s  JOIN first_year_sales AS fys   ON s.product_id = fys.product_id AND s.year = fys.first_year;  

Business Examples of Using SQL

Let’s now look at a couple of business cases where data scientists can use SQL and apply all (or most) of the concepts we discussed earlier.

Finding Best Selling Product

In this example, you must know subqueries, data aggregation, handling dates, ranking data using window functions, and filtering the output.

The subquery calculates each product's sales for each month and ranks them by sales. The main query then simply selects the required columns and leaves only products with the first rank, i.e., best-selling products.

SELECT sale_month,         description,         total_paid  FROM    (SELECT DATE_PART('MONTH', invoicedate) AS sale_month,            description,            SUM(unitprice * quantity) AS total_paid,            RANK() OVER (PARTITION BY DATE_PART('MONTH', invoicedate) ORDER BY SUM(unitprice * quantity) DESC) AS sale_rank     FROM online_retail     GROUP BY sale_month,              description) AS ranking_sales  WHERE sale_rank = 1;  

Calculating Moving Average

The rolling or moving average is a common business calculation to which data scientists can apply their extensive SQL knowledge, as in this example.

The subquery in the code below calculates revenues by month. The main query then uses the AVG() window functions to calculate the 3-month rolling average revenue.

SELECT t.month,         AVG(t.monthly_revenue) OVER(ORDER BY t.month ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS avg_revenue  FROM    (SELECT TO_CHAR(created_at::DATE, 'YYYY-MM') AS month,            SUM(purchase_amt) AS monthly_revenue     FROM amazon_purchases     WHERE purchase_amt>0     GROUP BY 1     ORDER BY 1) AS t  ORDER BY t.month ASC;  

Conclusion

All these SQL queries show you how to use SQL in your data science tasks. While SQL is not made for complex statistical analysis or machine learning, it’s perfect for querying, manipulating, aggregating data, and performing calculations.

These example queries should help you in your job. If you don’t have a data science job, many of these queries will come up in your SQL interview questions.

Nate Rosidi is a data scientist and in product strategy. He's also an adjunct professor teaching analytics, and is the founder of StrataScratch, a platform helping data scientists prepare for their interviews with real interview questions from top companies. Nate writes on the latest trends in the career market, gives interview advice, shares data science projects, and covers everything SQL.

More On This Topic

  • KDnuggets News, December 7: Top 10 Data Science Myths Busted • 4…
  • 4 Useful Intermediate SQL Queries for Data Science
  • How to Optimize SQL Queries for Faster Data Retrieval
  • 5 Tricky SQL Queries Solved
  • Solving 5 Complex SQL Problems: Tricky Queries Explained
  • How To Speed Up SQL Queries Using Indexes [Python Edition]

OpenAI’s GPT-5 and Apple’s ‘most advanced’ hardware of 2024 lead ZDNET’s Innovation Index

A smartphone displaying OpenAI's logo

Welcome to ZDNET's Innovation Index, which identifies the most innovative developments in tech from the past week and ranks the top four, based on votes from our panel of editors and experts. Our mission is to help you identify the trends that will have the biggest impact on the future.

Unsurprisingly, three out of four of this week's top trends revolve around AI, straddling big-picture developments with smaller-scale — but just as crucial — product improvements.

Without even materializing, OpenAI's GPT-5 topped this week's leaderboard on buzz and anticipation alone. Just weeks after the arrival of GPT-4o, we're looking forward to improved accuracy and multimodality from OpenAI's next model, as well as potential advancements towards artificial general intelligence (AGI). With no release date set, any autonomous capability claims from OpenAI are speculative — for now.

Next up was Meta's promising new approach to large language models (LLMs). Scientists at the company are exploring a solution to hallucinations that penalizes models for producing wrong answers. While there's still plenty of testing to be done, it's resulted in some benchmark improvements so far. If the team can successfully evolve how models decide which words to string together, the result could mean greater sophistication and contextual accuracy for generated text.

Ranked third was Microsoft planning to turn Windows Copilot into an app, a move that would bring the AI assistant's UX up to speed with other tools. Of course, using Copilot in Windows will be even easier — as in, single-click easy — if you have one of Microsoft's slick new AI-powered PCs. Either way, we're looking forward to how the improved functionality will expand usage.

Closing out the week at number four was ZDNET's assertion that the new iPad Pro is Apple's "most advanced piece of hardware in 2024." The thin, light form factor and Tandem OLED display just might be as cutting-edge as they look, putting the iPad back on the map.

Featured

We Won’t Just Replace Our Security Engineering Team with  a Coding Engine: AppDome CPO

replace engineers coding engine appdome

The discourse continues regarding whether AI can effectively mitigate security threats stemming from AI advancements. Notably, we’ve witnessed instances where security companies integrate AI into their defence mechanisms against cyber threats.

Appdome chief product officer Chris Roeckl stated that his company won’t simply substitute its security engineering team with AI, solely because of the current AI trend.
“We are a security company. We’re not about to replace our highly valuable security engineers with a coding engine anytime soon. The value of our security research team lies in their creativity and deep understanding of our code base. The human element is certainly very important to us in terms of how we provide our solution,” Roeckl told AIM in an exclusive interaction.

Founded in 2012, and headquartered in Redwood City, California, Appdome is a software company that provides a mobile app security and integration platform.

Appdome’s cyber defence automation platform is a no-code platform that enables app developers to add over 300 security, anti-fraud, anti-malware, anti-cheat, and other protections to Android and iOS apps.

Currently, Appdome leverages a machine-learning engine to power its platform but has not integrated large language models (LLMs) into its platform yet.

“It’s an unanswered question. Our research team is currently exploring – do we leverage AI to combat AI or rely on good old traditional practices? This remains an open question as we evaluate whether an AI-driven response could prove advantageous,” he added.

But There Are Benefits

During our discussion, Roeckl touched upon the benefits AI could bring to Appdome’s platform. For instance, AI could play a great role in recommending security features to Appdome’s clients, choosing from the 300+ features they have on their platform.

“Depending on where your app is, and what kind of app it is, we might be able to provide very specific recommendations on how to secure your app. So the idea of using an AI-based recommendation engine is very interesting to us,” he said

However, whether the recommendation engine will be powered by an LLM or a conventional recommendation engine is not determined yet. Roeckl notes that it could be a mixture of AI and their existing machine-learning engine.

AI is Making Bad Actors Good Coders

While AppDome is contemplating using LLMs, numerous threats are emerging due to AI’s increasing popularity. For instance, generative AI is making it easier for bad actors to write malware code.

ChatGPT, a free tool, can write code, while more advanced tools like GitHub Copilot are available for developers at a price of $10/month. However, GitHub has measures in place to prevent Copilot from being used to write malware and similar malicious content.

“We see that AI coding is lowering the barrier of entry for bad actors to actually code attacks. So now we see that mobile apps are under increasing threat of attack because bad actors can write good codes with the help of AI even though they don’t have good coding abilities,” Roeckl said.

Phone-Based AI Agents Will Open Doors to More Security Threats

AI is touching almost every field, and AI agents are expected to be the next big iteration. Experts envision that most consumer interactions online will involve agents performing tasks on their behalf.

Pretty soon, your smartphone’s built-in AI agent might communicate with your preferred food delivery app’s AI agent to place orders automatically on your behalf. This, however, according to Roeckl, opens the doors for more security vulnerabilities.

For these systems to communicate effectively, they rely on application programming interfaces (APIs) to facilitate interaction. However, APIs can be vulnerable to interception and misuse for malicious purposes.

“For instance, many APIs in mobile apps are exposed, leading to potential bot attacks, such as sneaker bots. Mobile channels are increasingly targeted for such attacks due to embedded coding logic within the apps.

“Failing to secure the app can expose the coding logic of the APIs, posing a significant threat. Therefore, as the API economy drives tighter interconnections between systems, brands must heighten vigilance in safeguarding these systems.” Roeckl said.

AI is Not Creating a New Category of Threats, Yet

While generative AI does open the door for more security threats in mobile security, according to Roeckl, no new kinds of threats are emerging. However, he does acknowledge that AI is resulting in a volumetric increase in threats overall.

“We’re not seeing unique attacks from AI. While we can track and identify them as originating from AI-powered systems, they largely resemble the standard types of attacks we encounter every day. Recently, we’ve particularly focused on social engineering attacks as a significant area of concern.”

Social engineering attacks are indeed becoming more sophisticated and dangerous with the integration of AI. “There are deepfakes of both images and voices. Additionally, there are phishing attacks, which can lead to mobile app vulnerabilities. AI elements have begun to infiltrate all these various types of attacks,” said Roeckl.

The post We Won’t Just Replace Our Security Engineering Team with a Coding Engine: AppDome CPO appeared first on AIM.

5 Free Python Courses for Data Science Beginners

python-courses
Image by Author

If you’re reading this article, you probably want to learn data science and land your first data role soon. So how do you go about learning data science?

After brushing up your basic math skills, you can start learning SQL or a programming language such as Python or R. If you learn R, you can do data and statistical analysis. But Python is more versatile and easier to learn than R.

So here is a list of beginner-friendly Python programming courses that’ll help you learn the fundamentals and start building projects. Let’s get started!

1. Python for Beginners – freeCodeCamp

The Python for Beginners course on freeCodeCamp’s YouTube channel is a full-length Python course for beginners. The course is over 4.5 hours long and will get you up and running with Python fundamentals by coding two simple games: rock, paper, scissors, and Blackjack.

The course starts by exploring the fundamentals like data types, variables, and operators. It then covers control flow, built-in functions, and data structures. The course also explores advanced concepts like decorators, object-oriented programming, and functional programming.

This course does not assume any prior programming experience with Python. But it covers enough ground to help you feel confident to start building your own projects.

Link: Python for Beginners – Full Course [Programming Tutorial]

2. Python – Kaggle

If you prefer working through bite-sized text-based lessons and running code snippets along the way, the Python course on Kaggle is for you.

Besides the basics of Python’s syntax and variables, the course covers the following topics:

  • Functions
  • Booleans and conditionals
  • Lists
  • Loops and list comprehensions
  • Strings and dictionaries
  • Working with external libraries

Link: Learn Python | Kaggle

3. Python Tutorial (with Mini-Projects) – freeCodeCamp

In the first course, Python for Beginners, you’d have coded two simple game projects. The Python Tutorial for Beginners (with mini-projects) is a video course with 23 chapters each focusing on a different topic.

Throughout the course, you’ll also get to work on several mini-projects. The course starts with the basics like data types and built-in data structures. But it also covers the following topics:

  • Functions
  • Recursion
  • Scope and closures
  • Command-line arguments
  • Lambdas and higher-order functions
  • Object-oriented programming
  • Errors and exceptions
  • File operations
  • Virtual environments

Link: Python Tutorial for Beginners (with mini-projects)

4. Python Tutorial – W3Schools

The Python Tutorial on W3Schools has bite-sized lessons along with quick practice exercises and examples that you can run in the browser.

The W3Schools Python tutorial covers the following topics:

  • Control flow
  • Built-in data structures
  • Classes and objects
  • Inheritance polymorphism
  • Working with dates, JSON, and RegEx

Besides Python fundamentals, the Python tutorial also has lessons on Python data science libraries: NumPy, pandas, and matplotlib.

Link: Python Tutorial

5. Object-Oriented Programming with Python

From one or more of the courses so far, you should be familiar with object-oriented programming (OOP) in Python and it’s time to learn more. Object Oriented Programming with Python is available for free on freeCodeCamp’s YouTube channel and is a comprehensive course to learn OOP fundamentals with Python.

This course covers the following:

  • Getting started with classes
  • Constructor
  • Class vs static methods
  • Inheritance
  • Getters and setters
  • OOP principles

Link: Object-Oriented Programming with Python — Full Course for Beginners

Wrapping Up

If you are a data science beginner looking to learn Python, I hope you found this list of courses helpful. Even as you’re learning Python, be sure to work on interesting projects on the side so that you get to apply what you learn and also build out your project portfolio.

So happy learning and coding!

Bala Priya C is a developer and technical writer from India. She likes working at the intersection of math, programming, data science, and content creation. Her areas of interest and expertise include DevOps, data science, and natural language processing. She enjoys reading, writing, coding, and coffee! Currently, she's working on learning and sharing her knowledge with the developer community by authoring tutorials, how-to guides, opinion pieces, and more. Bala also creates engaging resource overviews and coding tutorials.

More On This Topic

  • 7 Free Kaggle Micro-Courses for Data Science Beginners
  • 5 Free SQL Courses for Data Science Beginners
  • 3 Free Machine Learning Courses for Beginners
  • KDnuggets News, December 14: 3 Free Machine Learning Courses for…
  • KDnuggets News, May 4: 9 Free Harvard Courses to Learn Data…
  • KDnuggets News, October 5: Top Free Git GUI Clients for Beginners •…

Is Microsoft Recall a ‘privacy nightmare’? 7 reasons you can stop worrying about it

abstract spy concept with lots of eyes

There's an entire class of frankly creepy software designed to monitor every move someone makes on a smartphone or computer, often saving surreptitious screenshots of activity for review by the person who installed the app (typically a parent or a jealous spouse).

Those apps are usually filed under a category heading like "hidden screen recorders" or "parental monitoring tools," but we all know what they really are: spyware.

Also: How to find and remove spyware from your phone

All of which explains why Microsoft Recall, one of the signature features of the next-generation Microsoft Copilot+ PCs, is getting such a bad rap. Its primary job, after all, is to snap screenshots of your activity every few seconds, store them in an encrypted folder, and index them so that the person who set up that feature can review activity on that Windows PC after the fact.

That sounds an awful lot like spyware, doesn't it? But there's one big difference: You're the person setting up the screen recorder, and you're the person reviewing its results. No one else, including Microsoft, has access to that data. Most importantly, there's nothing hidden about it.

Those distinctions apparently don't matter to general-interest media sources like CNN or the BBC, where you might have read that Microsoft Recall is a "privacy nightmare." But that's a naïve and fundamentally inaccurate characterization.

Recall solves a very common problem. I can't remember the name of that website I visited last week, but I know enough about it to describe the page. Likewise, I don't know the name of the file containing an important contract I reviewed recently, but I remember a few details about it. Those scenarios are tailor-made for an AI-powered local search engine that can sift through your activity and not just files.

Also: How to screen record in Windows 10 or 11

Don't get me wrong. There are privacy issues associated with Microsoft Recall, just as there are with any feature designed to store and index personal data. But Microsoft appears to have addressed most of those issues in its design. And since the feature has yet to ship, it's not possible to judge how effective that design is. The only information anyone (including me) has to go by for now is what Microsoft has published in its brief descriptions and demos of Microsoft Recall.

So, what's the real story? Here's what we know so far.

1. You can turn this feature on or off during initial setup.

When you set up a new PC (or a new user account) that supports the Microsoft Recall feature, the initial setup experience includes a page for its settings. The default setting is on, but you can choose to turn it off, and you have the option to adjust settings. (It would be better if the default setting was unselected and required you to opt in.)

2. Those screen captures are stored and processed locally.

The AI that analyzes Recall snapshots runs locally. According to Microsoft, "No internet or cloud connections are required or used to save and analyze snapshots. Your snapshots aren't sent to Microsoft. Recall AI processing occurs locally, and your snapshots are securely stored on your local device only."

3. No one else can access the Recall data.

The folder where snapshots are stored is encrypted by default on Windows 11 PCs and is restricted to the signed-in user profile. Microsoft says it can't access or view the snapshots. That's consistent with its approach to the indexes it uses to search local user files, and it has no incentive to violate that commitment.

4. You can specify that certain apps are never recorded.

The Recall settings page allows you to specify how much storage is set aside for snapshots. It also allows you to filter out apps that you never want to see in your Recall snapshots. (Any content that is viewed in a private-mode browser or is protected by rights-management features is already protected.)

5. You can delete a snapshot.

If you see that a snapshot contains information that you'd rather not preserve, such as a password or a confidential document, you can delete it. You can also delete all activity for a specific time period, which might come in handy if you've been working on a sensitive project that you don't want preserved.

6. Your IT staff has ultimate control over Recall.

Unsurprisingly, Microsoft has enabled group policy and mobile device management settings that administrators can use to disable this feature completely. If that configuration is turned on for your managed device, all saved snapshots are deleted immediately, and you won't have the option to enable the feature going forward.

7. There are still risks, but they're limited.

The data that's saved by Microsoft Recall includes potentially sensitive information. If you change a password on a site that doesn't properly mask new password fields, your new password might be captured in a snapshot. Likewise, data you view as part of a website search can easily be saved and stored, and that data can potentially be incriminating or embarrassing.

For most people, the likelihood that that information will be exposed is small. The population that is most at risk includes journalists and activists who cross borders into hostile countries or who are targeted by police or security services. It also includes anyone in a marginalized population who might be exercising rights that are not supported by the jurisdiction they live in.

But is it spyware? Not under any reasonable definition of the word. You can't spy on yourself, and if someone else is able to access your local, encrypted Recall data … well, you have bigger problems.

Featured